Data Visualization

Data visualization is the practice of placing data in a graphic format to help convey the data’s significance. We also use the term data visualization to refer to the graphic itself, so it’s both a practice and the outcome of that practice. Data visualization or DataViz as some call it, is important because some patterns that might go unnoticed in tabular, text, or statistical form are more easily communicated and understood visually. DataVix is a crucial skill for those working with large datasets who need the ability to communicate the importance of that data…and that’s pretty much everyone in business these days. Let’s look at a few important reasons why dataviz is so crucial.

90% of information transmitted through the brain is visual in nature. As mammals, we are designed to take in information through our eyes. If we can format that data in such a way that it makes it easier to communicate and easier for the recipient to understand, why wouldn’t we?

Our brains also process images in as little as 13 milliseconds and visuals are processed 60,000 times faster than text . This means that the time it takes to comprehend complex data can be significantly decreased if we put that data in a format optimized to communicate visually. That’s dataviz!

Now that’s interesting and all, but how does that help us as marketers?

Presentations using visual aids are 43% more persuasive than those using text alone. And persuasion is the business we marketers are in.

Blog posts and articles with visuals perform 91% better than those without. It’d be silly to use blogging and content marketing and not include a visual component to ensure that your content is memorable.

Data Visualization Types

Let’s briefly go over the different categories of data visualizations from our reading, A Tour Through the Visualization Zoo.

Statistical

First are statistical visualizations. These visualizations take large datasets and use a variety of techniques to help make sense of the data, hopefully exposing patterns that we can use in some way. Scatter, violin, frequency or histogram, and box plots are typical examples.

Time-series

We also have time-series visualizations. We typically think of these as timelines. These visualizations organized items or events temporarily. In Western cultures, this would typically mean reading left to right as we move from the past to present. The ubiquitous stock value returns graph is another example, but notice this is also a line-graph. If a line graph is showing time-based information, we would consider it a time-series data visualization and not a statistical data visualization.

Maps

If the data is geographic in nature, then you can plot the information using the map type data visualization. For example, here’s a map showing the most well-known brands from each state. And another showing Doctor’s pay in America. Or a map showing the origins of articles in Wikipedia.

Networks 

Network maps show the interconnectedness between data points such as this visualization showing the financial connections between CEOs at the top Silicon Valley tech firms.

And while you might think this famous London Underground visualization is a map, it’s better described as a network data visualization because the actual line and station information is divorced from actual geographic placement. Does having a perfectly accurate representation of the distances between stations really matter to the passenger? No. This map shows only the information that’s really needed for the passenger and eliminates everything else extraneous. But this visualization does show the interconnectedness between the lines and stations quite effectively. 

In fact, this is what the London Underground map would look like if it was actually mapped geographically. Doesn’t exactly enhance communication and understanding does it?

Hierarchies

Finally, hierarchical data visualizations are similar to network visualization in that they show the interconnectedness between the data, but these data visualizations also show how portions of the data fit within or emerge from each other such as this family tree diagram of the Kennedy family. Or how pints, quarts, gallons and other forms of measurement fit within each other. Oh, and it’s quite possible, and rather common, to combine data visualization types together such as this statistical line graph that shows a time series. Or this front page visualization from The New York Times

Lying with Data Visualizations

Now that you’re familiar with the different types of data visualizations, there’s one more bit of information that’s important for you to know: 

how easy it is to lie with data visualizations. Perhaps you’ve heard the saying, “There are three kinds of lies: lies, damn lies, and statistics”? Don’t think because data is in a form intended for easier communication and understanding that it can’t be manipulated. Data visualizations are actually very easy to manipulate. Here are a few examples and the methods used to mislead, so you can know what not to do and how to avoid falling victim to it.

First, is ignoring conventions. We all know what a pie chart is right? It’s supposed to add up to 100%. So what does a pie chart mean that totals up to more than that? It doesn’t make much sense. This example is obviously wrong because the percentages are far too large. But what it the numbers didn’t seem wrong because the percentages were only subtlety too large. That can be a big problem and very confusing.

You can also hide negative trends but obscuring them in cumulative totals. Notices how the graph on the right looks so good. Everything’s great. Our revenue is growing like gangbusters. The graph on the right tells the true story. Yes, revenues are still in the positive but they are declining. Cumulative charts like the one on the left can hide all sorts of bad news. Don’t get suckered into believe them at first glance.

Omitting data points is another way to hide what’s really going on. For example, the graph on the left makes it look like it look like steep uninterrupted growth followed by consistent results. The reality is more complex and not a rosy. When you see a graph missing crucial data points like this, especially if it’s yearly results, question why. It could be someone is trying to fool you.

A very common approach to manipulating the visual display of results is truncating the y-axis. Really it could be either axis, but the y-axis is more common. The graph on the left makes the revenue results look a lot more dramatic. But when scaled correctly and the entire y-axis displayed, the truth is less impressive. Often this sort of misleading data visualization is not done intentionally. Typically, someone wants to show how some data series has changed, but because the amounts of change are so small, they end up visually amplifying those changes by truncating the y-axis. 

Finally, this is the most egregious example of lying through data visualization that I’ve ever seen. There’s virtually zero chance that the person who put the graph together didn’t know exactly what they were doing. Have you figured it out yet? It looks like Gun deaths in Florida went down after the “Stand Your Ground” law was passed in 2005, right? It may seem that way until you realize that 

The y-axis was flipped vertically. Now zero is at the top and 1000 is at the bottom. This is another example of defying conventions, but so bad it deserves its own category. The lesson to be learned here is that graphs that accompany politically charged stories should probably receive more scrutiny.

To recap,

Data visualization is crucial for business managers as it can help you see patterns and relationships that the raw data may not easily show.

It is both a process and an outcome.

There are five types of data visualizations: statistical, time-series, maps, networks, hierarchies

Data visualization types can be used in combination

Data visualizations can be used to mislead. Make sure you give extra scrutiny to visualizations in politically-charged topics.