For those of us working with data, we are all too familiar with how difficult it can be to fully understand what we are presented with. No matter how complex, data can be difficult for us to digest and make sense of. Data visualization is an effective technique that can
Tag: Big Data
In data science, the theory in practice is not always the same as reality. When working with data, it’s not uncommon to be presented with several complex problems. Fortunately, you are not alone and there are blogs, slack channels, and useful information to come to the rescue. Plenty of problems
The invention of smartphones and tablets has revolutionized how people go about their every day-to-day life. One study conducted by the Pew Research Center recently discovered that over 5 billion people worldwide own mobile devices. That number is expected to continue to rise sharply over the coming years. With so
Why should we care about Data Science? Nowadays more and more data is being generated by smartphones, social media, health, banks, stores, online services, governments, sensors, etc. Every piece of information is saved ‘just in case’. Thus, the available data cannot be processed by human’s brains, we need algorithms and
An amazing book about data visualization that I can’t recommend enough is The Truthful Art by Alberto Cairo.
In The Truthful Art, Cairo explains the principles of good data visualization. He describes five qualities that should be your foundation when you work with data visualization: truthful, functional, beautiful, insightful, and enlightening. Cairo also gives some great examples of biased and dishonest visualization.
Before I dive into the “Five Qualities of Great Visualizations,” there’s another related concept that I want to cover: data-ink ratio, introduced by Edward Tufte in The Visual Display of Quantitative Information.
Explaining conceptually what it really means, and why it matters.
This article outlines a mental framework to organize our work around Data Quality. Referencing the well-known DIKW Pyramid, data quality is the enabler that allows us to take raw data and use it to generate information, starting from raw data.
In this piece, we’ll go over a few common scenarios, review some theory, and finally outline some advice for anyone facing this increasingly common issue.
The amount of data being generated every second is almost impossible to comprehend. Current estimates say that 294 billion emails and 65 billion WhatsApp messages are sent every single day, and all of it leaves a data trail. The world economic forum estimates that the digital universe is expected to reach 44 zettabytes by 2020. To give you an idea of what that means, take a look at the byte prefixes and remember that each one multiplies by 1000: kilo, mega, giga, tera, peta, exa, zetta.
Understanding the big picture first will set the stage for success in this journey.
Data is one of the biggest new trends in both tech and business in general. Data “experts” are quickly becoming some of the best-paid individuals in the industry, and every single company wants to surf the wave of data capabilities.
It is becoming a fundamental way of understanding the world around us. We can think of data sciences as epistemology or a way of knowing. We can think of it, about a way to approach problems and solving them.
But as with any new trend, we have to ask ourselves: what do all these buzzwords actually mean?
What is a data scientist? In short, a person who is better at statistics than any software engineer and better at software engineering than any statistician.