In data science, the theory in practice is not always the same as reality. When working with data, it’s not uncommon to be presented with several complex problems. Fortunately, you are not alone and there are blogs, slack channels, and useful information to come to the rescue. Plenty of problems
Category: Big Data
An amazing book about data visualization that I can’t recommend enough is The Truthful Art by Alberto Cairo.
In The Truthful Art, Cairo explains the principles of good data visualization. He describes five qualities that should be your foundation when you work with data visualization: truthful, functional, beautiful, insightful, and enlightening. Cairo also gives some great examples of biased and dishonest visualization.
Before I dive into the “Five Qualities of Great Visualizations,” there’s another related concept that I want to cover: data-ink ratio, introduced by Edward Tufte in The Visual Display of Quantitative Information.
Explaining conceptually what it really means, and why it matters.
This article outlines a mental framework to organize our work around Data Quality. Referencing the well-known DIKW Pyramid, data quality is the enabler that allows us to take raw data and use it to generate information, starting from raw data.
In this piece, we’ll go over a few common scenarios, review some theory, and finally outline some advice for anyone facing this increasingly common issue.
The amount of data being generated every second is almost impossible to comprehend. Current estimates say that 294 billion emails and 65 billion WhatsApp messages are sent every single day, and all of it leaves a data trail. The world economic forum estimates that the digital universe is expected to reach 44 zettabytes by 2020. To give you an idea of what that means, take a look at the byte prefixes and remember that each one multiplies by 1000: kilo, mega, giga, tera, peta, exa, zetta.
Understanding the big picture first will set the stage for success in this journey.
Data is one of the biggest new trends in both tech and business in general. Data “experts” are quickly becoming some of the best-paid individuals in the industry, and every single company wants to surf the wave of data capabilities.
It is becoming a fundamental way of understanding the world around us. We can think of data sciences as epistemology or a way of knowing. We can think of it, about a way to approach problems and solving them.
But as with any new trend, we have to ask ourselves: what do all these buzzwords actually mean?
What is a data scientist? In short, a person who is better at statistics than any software engineer and better at software engineering than any statistician.