How to Represent Data

Which chart is best for the data you have?


What is Data Visualisation?

We may have some abstract information in our minds, but we need data visualisations to communicate such information to other people, or as Stephen Few put it:

[Data Visualisation is] Graphical display of abstract information

Young children need to count on their fingers. This is because, numbers such as, 1, 2, 3, etc. are just abstract information to them. Using their fingers to represent those numbers make them more comprehensible and enables them to carry on basic addition and subtraction operations.

Here are 3 main chart-types we use in our daily lives to represent data, and you need to understand when to ues each of them.

Bar Charts

Bar charts are good in comparing multiple univariate values. As stated earlier, the human eye is good in comparing lengths.

The vertical bar charts are sometimes called column charts. Horizontal bar charts are more suitable when we have many items to compare.

It is advised to have your bars sorted according to their lengths, unless there is a certain order for the items you are comparing that you do not want to change.

In the case of bivariate data, you may need to use stacked bar charts or have adjacent bars with different colours. In both cases, you have to make sure that your two variables are of the same unit. Stacked bar charts are to be used when the two variables complement each other, i.e. their sum also makes sense.

Scatter Plot

When you want to compare to variables, and show what is the correlation between them, scatter plots can be your friend then. Scatter plots place each item in your dataset as a point, where it’s horizontal position on the x-axis is decided by one variable, while its vertical position on the y-axis is decided by the second variable.

When having a third variable, we may then decide to use bubble charts instead. Bubble charts are just like scatter plots, however the size of the dots (bubbles in this case) vary according to a third variable. You may also use colours to show additional variables in both your scatter plots and bubble charts.

The scatter plots above shows how the variables on the x and y axis are corrected. A positive correlation, means that the increase in one variable is accompanied by an increase in the other variable. while a negative correlation means that they are inversely proportional. When the dots are just scattered in a random order as in the third chart, it means that there is no correlation between the two variables.

One variation of a scatter plot is a bubble chart. Bubble charts are very similar to scatter plots, however, they can represent 3 different variables instead of 2. One variable is represented by the x-axis while the other is displayed by the y-axis, and the third variable is represented by the size of the bubbles in the chart. You may visit this article to learn how to create bubble charts using RAW.

Histograms

Having the records of 10 students and their heights, we can easily plot them into a bar chart to compare their heights. But, let’s say we want to plot the heights of the whole 5,000 students we have in the school. We will need a chart with 5000 bars with the name of each student written beside his own bar. What about plotting the heights of 1,000,000 nationwide students? A bar chart is not feasible with such high numbers and also with such high number we are not interested in the height of each individual student anymore. Hence, histograms.

A histogram show distributions rather than individuals. In the case of 1,000,000 school students, a histogram can show us what is the percentage of those whose heights are between 100 and 110 cm., and the percentage of those between 110 and 120 cm., and so forth.


Recommended Books





Share on Facebook Share on twitter