Designing dashboards and what data scientists look at

Designing dashboards is a blend of art, science and creativity.  Being able to communicate the most effective message in the most limited visual space.

This article begins by considering the position of the data scientist and some background on what works and doesn’t work from a science point of view.

  • The play’s the thing…
  • A scientific approach
  • Working an example
  • Creating the best solution

[read more=”Read more” less=”Read less”]

The play’s the thing…

You’re in a boardroom.  Two presenters come in with two competing ideas.

The first presenter dresses well.  They are enigmatic and passionate in their belief.  Presenting a visually attractive concept, they believe!  We will call this person our “dreamer”

The second is far more sedate.  Maybe their social skills could use a touch up.  They are dispassionate.  Yet their presentation is based on immutable facts. For objective purposes we need a “scientist”.

Which do you go with?  The one who was best at designing dashboards?

 

The best answer is to consider both presentations and weigh up the answer that helps the business most based on your experience and the goals and ambitions of the company.

This “play” occurs in boardrooms frequently throughout the world.  Wouldn’t it be better if the two presenters could work together to provide a passionate fact based approach?

 

For this article I start by looking into the scientist and their approach to designing dashboards.

 

A scientific approach

Where do we get our data?

Discrete and continuous data both arise from measurement.  These are often called metrics.

 

Discrete data comes from counts.  The data contains distinct values. e.g. The number of workers in a factory.

It is not possible to have 20.4 workers in a factory, therefore this data is discrete.

 

Continuous data arises where values are from a range or measurements e.g. mass, velocity.

For any given velocity, it is possible to refine the data further and further thus making visualisation of the data impractical.  Therefore velocity is continuous data.

 

Where representing large amounts of discrete data, e.g. children in a classroom, for all classes in the country, it may become more practical to display this data as a mean value, displaying the frequency of the data rather than the discrete values.

In this way, discrete data can become continuous for the purpose of more effective visualisation of the data.

 

Types of charts

Now we get to the visuals of designing dashboards.

 

Image from http://bit.ly/2ADSswj

Bar charts best display discrete data of a limited set of discrete values, where values may not be contiguous.  An example would be, the amount of workers in factories in Dublin 24.

 

Pie charts also represent discrete values but often best visualise percentages.  Viewers often struggle extracting values from the visualisation without clear diagram labeling.

 

Pie charts and bar chats commonly present data that relates to separate categories.

An example would be the representation of the amount of seats each political party has in the Dáil.

 

Histograms and grouped frequency tables of continuous data go well together.  The data groups into a limited number of distinct groups / classes.

An example would be, the age of students in Junior Infants across Ireland.

 

Scatterplots visualise the relationship between two variables, again most commonly for continuous data.

 

An example would be accident rate as a function of mean time since last service.

 

The best tool to use is the one that most clearly and effectively visualises the data to the reader.

 

Taking care in how we present data

Continuous data needs grouping also called classification.

You define your group / classification.  This leads to the data being visualised in very different ways.

As a result the presenter chooses the classifications through “interpretation” or best guess.  So when designing dashboards there is still interpretation.

 

If a classification is too broad, too much data will fall into the class.  This results in lost meaning from the visualisation.

Lets use the example of the age of students in Junior Infants across Ireland.  A classification of 4 to 5 and 5 to 6, is quite broad.  Much of the detail about the range within the ages is lost.

 

If a classification is too narrow, then it becomes increasingly difficult for the viewer to derive useful information from the visualisation.

If the example again is again the Junior Infants, then classifications divided by 0.00001 of an age would create too many bars on a histogram to be able to derive any meaningful information.

 

Working in averages

Some definitions

Mean, median and mode are all called the “average”.  Consequently a better description is “the number that best represents the data set”.

Mean (most commonly called “average”) sees the summation of the data values divided by the count of the data values.

You add them all the numbers up and divide by the count of numbers you have.

 

Start by sorting all the numbers.  The median, is the central value of the set of numbers.

The mode is a the most frequently occurring number in the data set.

 

Averages in action

Forgetting designing dashboards for a moment we need to do some concrete maths.  Considering the following data set

1,1,1,2,2,2,3,3,3,4,100

 

Firstly the mean of these numbers is 11.1.  When you up all the numbers you get 122.  Then you divide by 11 numbers which gives 11.090909 recurring, so we round up.

Next the median of these numbers is 2.  It is the 6th digit in the ordered set of numbers.

The mode of these numbers is a challenge as there are three data sets, with 3 digits each.  The mode is 2 as it is the middle selection yet, it could be any of the sets or all of them.

 

Secondly we noticed that the number 100 is skewing the mean result.  An outlier is a data value far beyond the main grouping of data.

This outlier changes the mean far away from the majority of numbers.

If you took the 100 out of the set, the mean becomes 2.2, rather than 11.

You could just ignore the 100, but then this is not a full and accurate data set.

 

Moreover another possibility is that the 100 is wrong, or inaccurate.  So clean data is vital and if you can clean it up you should exclude the value.

It is also possible that the value is accurate and therefore should not be summarily discounted.  The validity of the value should be researched.

 

In this example, the median offers a far more representative value for the data set.

 

Designing dashboards

Take a look at the following chart

The following example contains historical quarterly mean apartment prices for London from 1990 to 2005 known as time-series data.

From looking at this chart what can you determine?

Did the chart communicate the following to you?

This is not a bad line chart and is a staple when designing dashboards.  Yet could you easily derive the following?

 

Firstly after the year 2000, the mean price of apartments is increasing except for sporadic decreases.

Next after 2000, the largest price increases appear from Q4 into Q1 and generally Q1 into Q2.

Also following the year 2000, Q3 and Q4s generally have slower growth or experience decreasing value.

It can be seen that after 2000, prices have increased in almost a straight line graph.

Furthermore from 1995 to 2000, the prices appear generally consistent.

Generally, apartment prices have doubled every 10 years; circa £5,000 in 2000, to circa £10,000 in 2005, to circa £20,000 in 2010.

This visualisation clearly shows the escalating prices of London apartment prices and implies that prices look set to generally continue increasing.

 

Three major challenges

The first major challenge as highlighted by the example, is the selection of how to visualise the data when designing dashboards.

There is a lot of info available and choices of visualisation.  The reader has to be able to interpret that graphic.

The example does not scream at a normal reader or someone unfamiliar with the data the conclusions that can be drawn.

 

Next up, like the “average” example above.  Data has to be clean and complete.  When a data scientist is given data being able to clean and double check it takes time.

This is often summarised in colloquial business terms as “put crap in, get crap out”.

 

Prediction is often what businesses need.  If you could predict the lotto or the stock market through data science business would get very predictive.

If you have a brand new data / idea then it is impossible to analyse it as there is simply no data available upon which to draw conclusions.

Things like market research can help here in the generation of data and facts which can be visualised.

 

 

Creating the best solution

So as we go back to our boardroom to let things play out, our “scientist” has presented facts.  Their interpretation when designing dashboards has an impact so personal bias and their human creative side had an influence.

There may be more work to do.

  1. Firstly is the data correct?  Where did it come from and is it complete?
  2. Next was this the best way to present the data?
  3. Also can I compare the mean, median and mode of those values and data sets?
  4. Furthermore in words and statements rather than pictures can you provide insight into the visualised data?

Whilst it may appear harsh questioning the scientists work, as a manager your role in the play is to fully understand what’s being presented.

A scientist is dispassionate and should be able to not just present but also stand over the facts they present.

Next we call to the stage our “dreamer”…

As soon as the follow up article is written I’ll link it back here.

If there’s anything in this article you’d like to chat to me about you can contact me here or on social media.

[/read]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.