When Your Data Speak, Can You Understand?

 “Data are becoming the new raw material of business.”

— Craig Mundie

If your company is like most, these days you have more data than you know what to do with and are collecting it faster than you can imagine.  IBM states the every day we create 2.5 quintillion bytes worth and several studies claim that 99+% of the world’s data was created in just the last two years.

And YOU are being asked to do more with it, to make “data-driven decisions”, whether you’re an individual contributor or the CEO, by more formally incorporating data into your decision-making processes.  As McAfee and Brynjolfsson report, “companies in the top third of their industry in the use of data-driven decision making were, on average, 5% more productive and 6% more profitable than their competitors.”  Businesses have recognized that “data is the new oil,” (a pronouncement credited to Sheffield mathematician Clive Humby who helped establish Tesco’s Clubcard in 1994) and those that aren’t wringing the most value from their data are going to be left in the dust.

But what does it mean to make “data-driven decisions” for those who aren’t data scientists?  Sure, we all get reports, data warehouse extracts, dashboards ad nauseum, but how can we make the most effective use of them?

A great starting point is developing “Data Literacy,” the ability to consume, produce, and think critically about data.  Data literate individuals are best positioned to extract the most value out of the data available to them and take appropriate steps to transform data into information, thence information into action.  Note that there’s a distinction between being “literate” and “fluent.”  A data literate person wouldn’t be expected to replace a Data Scientist but should be able to have productive discussions about the Data Scientists’ work and bridge the worlds of business and mathematics.

A data literate individual is capable of the following skills and behaviors:

  • Understanding where data came from and issues related to sourcing, context, sampling strategies, size, bias and other data quality matters;
  • Having sufficient knowledge of probability and statistics to understand the underlying meaning of the reported measures and to recognize when there is cause for additional scrutiny, i.e. when the numbers “don’t add up”;
  • Reading graphical sources of data (plots, charts, heat maps, etc.), recognizing the strengths and weaknesses of each format and understanding what the graphic doesn’t say, i.e. common ways data is obfuscated in presentation;
  • Performing “Exploratory Data Analysis,” a way to methodically interrogate data, determine key statistics about the data and identify the relationships among variables;
  • Producing compelling visualizations  through the effective use of graphical forms, color, text, etc. and clearly communicate the meaning behind the numbers;
  • Telling a “Data Story” by creating a narrative that explains data findings in an engaging and accessible manner.
  • (Optional, but recommended) Having a working knowledge of common machine-learning models.  Not necessarily how to build them (well, maybe some simple ones), but understanding commonly used approaches and their applicability to different types of problems in order to better communicate with your Data Scientists;

Don’t be afraid! The list can seem daunting at first, but the topics are readily mastered and once acquired will provide an ongoing sense of confidence as you’re applying them and making ever-better “data-driven decisions.”  In future posts I’ll revisit these skills individually and walk you through strategies to develop them.

Which skills do you find most useful?  Most challenging?  Please leave your thoughts in the comments below.