A snake in the cookie jar

By Eirikur Eiriksson

Published 2020-06-08

Data and database systems have come a long way from the nascence of organised data collections that appeared early in the history of digital computing. At first, the advances were few and far between, predominantly driven by the evolution of computer hardware, most notably in data storage technologies. Zoom forward to the current era of rapid inventions, now lesser driven by the hardware evolution, and increasingly by discoveries in data utilisation.

Good examples are the applications of Python and other programming languages for data-related tasks. Not so much the programming languages themselves, but rather the plethora of available libraries with pre-packed functionalities and logic. Toolboxes are now virtually full of exciting new toys that produce stunning visualisations and interactive dashboards, perform complex analysis and operations with greatly reduced requirements of theoretical, technical and subject knowledge. The growing number of these toys goes hand in hand with the ever-increasing volumes of data, produced at rates that are even difficult to fathom.

As good as this sounds, it does introduce risks and uncertainties. Like a child chasing a balloon, the focus on the surrounding environment can be lost. Core fundamentals become blurred and potentially zoomed out of scope.

Seasoned data professionals agree that the correctness of the data is the holy grail. Religiously enforced by referential integrity, constraints and scrutiny of the data sources, database professionals know that they must also be data inspectors. Every bit of data should be checked, rechecked, and then checked again. The consensus is that a minute spent on avoiding “Garbage In – Garbage Out” (GIGO) at the source is worth many hours of knee jerks and “corrections”. The sanity of data simply cannot be forwarded to pre-packaged logic or be shaded by data availability. Even a single missing attribute or the slightest ambiguity can potentially cause erroneous results:

A large corporation was looking into optimising travel expenses through trip classifications and cost-based comparison by origin and destination only to find out that it could not tell the difference between Paris (France) and Paris (Texas) or even London (UK) and London (Ontario). Trips were potentially either classified as trans-Atlantic travels or awfully expensive domestic travels.”

Just as with readymade meals, sticking the meal in a microwave or an oven and follow the package instructions produces results. A digital equivalent would be loading a software library package into the computer memory and following the library package instructions.

Equally, the results are alike, something to munch on, most of the time turning out to be close to the real deal. But, without full control of the ingredients, their origin, the proportions, the methods, or the logic, those are still just readymade meals.

When it comes to data, minor complacency can turn the delight of a shiny new toy into a snake in the cookie jar.

High Skills Partners’ articles are provided for your interest and do not constitute advice, legal or otherwise. We aim to be informative, thought-provoking and to reflect our experience, whilst striving for accuracy and correctness.
© 2020 High Skills Partners Limited