Plotly for effective data visualization

Author: Ruthger Righart

Blogs: https://rrighart.github.io

Web: https://www.rrighart.com

Plotly is well-known for its aesthetically appeasing visualizations. It allows for user interaction.

This Plotly guide is unlike other guides. Excellent introductory guides are already available, particulary at the Plotly site itself. So we are not reinventing the wheel here.

This guide focuses on a couple of key aspects, not much mentioned in data visualizations: 1). Writing more concise and readable code 2). Automation easily taking into account new data.

The current blog sheds some light on how to do this in Plotly, giving first examples where the code needs to be tuned manually, and second giving a better example where it is more concise and automated. For this purpose we use a simple example of weather data in Switzerland.

The code can be run with Python 2 or 3.

In [1]:
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib import cm
import plotly
import plotly.plotly as py
import plotly.graph_objs as go
from plotly import tools
plotly.tools.set_credentials_file(username='rrighart', api_key='9999999')

We read in the following data that we created with the help of Wikipedia:

In [2]:
data = {'Month': ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'],
        'Geneva': [1.5, 2.5, 6.5, 9.7, 14.2, 17.7, 20.2, 19.5, 15.4, 11.1, 5.5, 2.6],
        'Zurich': [-1, 0.4, 3.9, 7.8, 12.2, 15.5, 17.6, 16.8, 13.8, 8.9, 3.5, 0.2],
        'Lugano': [3.3, 4.5, 8.3, 11.4, 15.7, 19.6, 22.1, 21.5, 17.5, 13.0, 7.9, 4.3]
       }
df1 = pd.DataFrame(data, columns = ['Month', 'Geneva', 'Zurich', 'Lugano'])
In [3]:
df1
Out[3]:
Month Geneva Zurich Lugano
0 January 1.5 -1.0 3.3
1 February 2.5 0.4 4.5
2 March 6.5 3.9 8.3
3 April 9.7 7.8 11.4
4 May 14.2 12.2 15.7
5 June 17.7 15.5 19.6
6 July 20.2 17.6 22.1
7 August 19.5 16.8 21.5
8 September 15.4 13.8 17.5
9 October 11.1 8.9 13.0
10 November 5.5 3.5 7.9
11 December 2.6 0.2 4.3

Before we start with Plotly, let's see how this looks like in Matplotlib, using a lineplot that displays the monthly temperature, split for the three Swiss cities:

In [4]:
len(df1)
Out[4]:
12
In [5]:
fig = plt.figure(figsize = (10, 12))
ax1 = fig.add_subplot(111)
l1 = plt.plot(df1.Geneva, marker='o', color='blue')
l2 = plt.plot(df1.Zurich, marker='o', color='red')
l3 = plt.plot(df1.Lugano, marker='o', color='green')
plt.xlabel('Time-course (months)', fontsize=14)
plt.ylabel('Temperature (in Celsius)', fontsize=14)
plt.xticks(fontsize=14)
plt.yticks(fontsize=14)
plt.setp(ax1.get_xticklabels(), visible=True)
plt.xticks(range(len(df1)), df1.Month, size='small')

plt.legend(['Geneva', 'Zurich', 'Lugano'], bbox_to_anchor=(0., 1.02, 1., .102), loc=3, mode="expand", borderaxespad=0)
plt.show()

Looks nice! For reasons of clarity, the code was written out extensively. Let us now visualize this in Plotly, first writing out all the code:

In [6]:
trace0 =  go.Scatter(
          x=df1['Month'],
          y=df1['Geneva'],
          name='Geneva'
)

trace1 =  go.Scatter(
          x=df1['Month'],
          y=df1['Zurich'],
          name='Zurich'
)

trace2 =  go.Scatter(
          x=df1['Month'],
          y=df1['Lugano'],
          name='Lugano'
)

data = [trace0, trace1, trace2]

layout = {
    'title' : 'Mean temperature, split for cities',
    'xaxis' : {'title' : 'time-course (months)'},
    'yaxis' : {'title' : 'temperature (Celsius)'},
    'showlegend': True}

fig = go.Figure(data=data, layout=layout)

py.iplot(fig, filename='Temperature', sharing='public')
Out[6]:

Now let's add some colored rectangular shapes to highlight the different months:

In [7]:
trace0 =  go.Scatter(
          x=df1['Month'],
          y=df1['Geneva'],
          name='Geneva'
)

trace1 =  go.Scatter(
          x=df1['Month'],
          y=df1['Zurich'],
          name='Zurich'
)

trace2 =  go.Scatter(
          x=df1['Month'],
          y=df1['Lugano'],
          name='Lugano'
)

data = [trace0, trace1, trace2]

layout = {
    'title' : 'Mean temperature, split for cities',
    'xaxis' : {'title' : 'time-course (months)'},
    'yaxis' : {'title' : 'temperature (Celsius)'},
    'showlegend': True,
    'shapes': [{
        'type': 'rect', 'xref': 'x', 'yref': 'paper', 'x0' : 'February', 'y0' : 0, 'x1' : 'March', 'y1' : 1, 'fillcolor': '#d3d3d3', 'opacity': 0.2, 'line': {'width': 0,}
   },
        {
        'type': 'rect', 'xref': 'x', 'yref': 'paper', 'x0' : 'April', 'y0' : 0, 'x1' : 'May', 'y1' : 1, 'fillcolor': '#d3d3d3', 'opacity': 0.2, 'line': {'width': 0,}
   },
        {
        'type': 'rect', 'xref': 'x', 'yref': 'paper', 'x0' : 'June', 'y0' : 0, 'x1' : 'July', 'y1' : 1, 'fillcolor': '#d3d3d3', 'opacity': 0.2, 'line': {'width': 0,}
   },
        {
        'type': 'rect', 'xref': 'x', 'yref': 'paper', 'x0' : 'August', 'y0' : 0, 'x1' : 'September', 'y1' : 1, 'fillcolor': '#d3d3d3', 'opacity': 0.2, 'line': {'width': 0,}
   },
{
        'type': 'rect', 'xref': 'x', 'yref': 'paper', 'x0' : 'October', 'y0' : 0, 'x1' : 'November', 'y1' : 1, 'fillcolor': '#d3d3d3', 'opacity': 0.2, 'line': {'width': 0,}
   }
    ]
}

fig = go.Figure(data=data, layout=layout)

py.iplot(fig, filename='Temperature', sharing='public')
Out[7]:

As we can see, this works well, and produces beautiful figures. To explain a bit about the code: the trace lines take care that the lines for each category are displayed. In layout, the shapes are responsible for the rectangle shades for every other month. The rest speaks for itself.

But the code is quite long, at some points repetitive. Admittedly, if the dataset expands, probably quite some changes in the code are needed.

In order to make it more concise, we could start

  • create the shapes input in advance, looping through so-called dictionaries,
  • putting the traces in a loop, that reads in the countrynames.

Let us first make a side step: What are dictionaries? A dictionary maps values to keys and stores them in a variable. For example, if you'd like to store in a variable called shapes that the type is 'rect' (meaning rectangular), and the value for x0 is 'February', you could do something like the following:

In [8]:
shapes = dict(type='rect', x0='February')
In [9]:
type(shapes)
Out[9]:
dict

Another way to do this is the following notation:

In [10]:
shapes2 = {'type': 'rect', 'x0': 'February'}
In [11]:
type(shapes2)
Out[11]:
dict

These kind of data structures -- dictionaries -- are commonly used in Plotly. My impression is that the two types of notations are used interchangingly. However, for sake of consistency, I will only use the second type.

The following will create a shapes dictionary shps that we will use in the Plotly code. It basically indicates where the different shade rectangles should be situated.

In [12]:
shps = []

for i in np.arange(1,len(df1['Month'])-1, 2):
    shps.append({'type':'rect', 'xref':'x', 'yref':'paper', 'x0':df1['Month'][i], 'y0':0, 'x1':df1['Month'][i+1], 'y1':1, 'fillcolor': '#d3d3d3', 'opacity':0.2, 'line':{'width':0}}.copy()) 

Let us zoom into one element of the rectangles, there where the first one is positioned:

In [13]:
shps[0]
Out[13]:
{'fillcolor': '#d3d3d3',
 'line': {'width': 0},
 'opacity': 0.2,
 'type': 'rect',
 'x0': 'February',
 'x1': 'March',
 'xref': 'x',
 'y0': 0,
 'y1': 1,
 'yref': 'paper'}

This is the dictionary data type in Python:

In [14]:
type(shps[1])
Out[14]:
dict

In the next code, we will loop though the different traces to display the temperatures in each of the Swiss cities as well as simply put in the shsp data:

In [15]:
data = []
tracex = []
citynames = ['Geneva', 'Zurich', 'Lugano']


for i in range(0, len(citynames)):
    tracex =  go.Scatter(
          x=df1['Month'],
          y=df1[citynames[i]],
          name=citynames[i])
    data.append(tracex)

        
layout = {
    'title' : 'Mean temperature, split for cities',
    'xaxis' : {'title' : 'time-course (months)'},
    'yaxis' : {'title' : 'temperature (Celsius)'},
    'showlegend': True,
    'shapes': shps
}

fig = go.Figure(data=data, layout=layout)

py.iplot(fig, filename='Temperature2', sharing='public')
Out[15]:

The code is much shorter now. The loop does not finish as long as there are citynames.

How to add text annotations? For example, it may be useful to display for every month the season. In Switzerland, spring runs from March 20-June 20, summer from June 21-September 22, autumn from September 23-December 20, and winter from December 21-March 21. In the following code we make a simple annotation for the seasons:

In [16]:
season = ['wi', 'wi', 'wi / sp', 'sp', 'sp', 'sp / su', 'su', 'su', 'su / au', 'au', 'au', 'au / wi']

anns = []

for i in np.arange(0,len(season), 1):
    anns.append({'showarrow': False, 'text':season[i], 'textangle':-90, 'x':df1['Month'][i], 'y':25}.copy())

Some of the parameters explained: an arrow at the labels is not displayed, the text is shown at an angle of 90 degrees, and the text starts at the Y-value of 25:

In [17]:
anns[0:2]
Out[17]:
[{'showarrow': False, 'text': 'wi', 'textangle': -90, 'x': 'January', 'y': 25},
 {'showarrow': False,
  'text': 'wi',
  'textangle': -90,
  'x': 'February',
  'y': 25}]

In the original code we add now the annotations, and result is the following:

In [18]:
data = []
tracex = []
citynames = ['Geneva', 'Zurich', 'Lugano']


for i in range(0, len(citynames)):
    tracex =  go.Scatter(
          x=df1['Month'],
          y=df1[citynames[i]],
          name=citynames[i])
    data.append(tracex)

        
layout = {
    'title' : 'Mean temperature, split for cities',
    'xaxis' : {'title' : 'time-course (months)'},
    'yaxis' : {'title' : 'temperature (Celsius)'},
    'showlegend': True,
    'shapes': shps,
    'annotations': anns
}

fig = go.Figure(data=data, layout=layout)

py.iplot(fig, filename='Temperature3', sharing='public')
Out[18]:

Let's see how we could add other data to the existing plot. For example, to better estimate what is the best Swiss city to stay (regarding weather conditions), in addition to temperatures we would like to know rainfall:

In [19]:
data = {'Month': ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'],
        'Geneva': [76, 68, 70, 72, 84, 92, 79, 82, 100, 105, 88, 90],
        'Zurich': [67, 68, 68, 78, 96, 115, 106, 121, 83, 70, 84, 74],
        'Lugano': [66, 52, 80, 156, 196, 164, 153, 158, 185, 142, 127, 80]
       }
df2 = pd.DataFrame(data, columns = ['Month', 'Geneva', 'Zurich', 'Lugano'])

The DataFrame looks as follows:

In [20]:
df2
Out[20]:
Month Geneva Zurich Lugano
0 January 76 67 66
1 February 68 68 52
2 March 70 68 80
3 April 72 78 156
4 May 84 96 196
5 June 92 115 164
6 July 79 106 153
7 August 82 121 158
8 September 100 83 185
9 October 105 70 142
10 November 88 84 127
11 December 90 74 80

And we produce a barplot. Let's first produce it with "lengthy" code:

In [21]:
trace0 =  go.Bar(
          x=df2['Month'],
          y=df2['Geneva'],
          name='Geneva',
          marker = {'color':'rgb(0,0,255)', 'line':{'color': 'rgb(0,0,255)', 'width':1.4}},
          opacity=1
)

trace1 =  go.Bar(
          x=df2['Month'],
          y=df2['Zurich'],
          name='Zurich',
          marker =  {'color': 'rgb(204,0,0)', 'line': {'color': 'rgb(204,0,0)', 'width': 1.4}},
          opacity=1
)

trace2 =  go.Bar(
          x=df2['Month'],
          y=df2['Lugano'],
          name='Lugano',
          marker = {'color': 'rgb(0,102,0)', 'line': {'color': 'rgb(0,102,0)', 'width': 1.4}},
          opacity=1
)

data = [trace0, trace1, trace2]

layout = {
    'title' : 'Precipitation as a function of month',
    'xaxis' : {'title' : 'time-course(months)'},
    'yaxis' : {'title' : 'Rainfall (mm)'},
    'showlegend': True}
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='Rainfallx', sharing='public')
Out[21]:

We will now shorten the code by looping through the different elements of the figures. The loop is going through the cities and colors:

In [22]:
data = []
tracex = []
citynames = ['Geneva', 'Zurich', 'Lugano']
cols=['rgb(0,0,255)', 'rgb(204,0,0)', 'rgb(0,102,0)']

for i in range(0, len(citynames)):
    tracex =  go.Bar(
          x=df2['Month'],
          y=df2[citynames[i]],
          name=citynames[i],
          marker={'color': cols[i], 'line': {'color': cols[i], 'width': 1.4}}
    )
    data.append(tracex)

        
layout = {
    'title' : 'Precipitation as a function of month',
    'xaxis' : {'title' : 'time-course (months)'},
    'yaxis' : {'title' : 'Rainfall (mm)'},
    'showlegend': True
}

fig = go.Figure(data=data, layout=layout)

py.iplot(fig, filename='Rainfall', sharing='public')
Out[22]:

Next, it would be nice to get the temperature and rainfall data displayed together. It is possible to get the lines (temperature) and bars (rainfall) in a single plot, provided that different axes are used. The code below does this for a single city (more cities would clutter the image).

In [23]:
data = []
trace1 = []
trace2 = []
citynames = 'Geneva'
cols = 'rgb(0,0,255)'

trace1 =  go.Scatter(
    x=df1['Month'],
    y=df1[citynames],
    name='Temperature',
    marker={'color':cols, 'line':{'color':cols, 'width':1.4}})

trace2 =  go.Bar(
    x=df2['Month'],
    y=df2[citynames],
    name='Rainfall',
    opacity=0.5,
    marker={'color':cols, 'line':{'color':cols, 'width':1.4}},
    yaxis='y2')        

layout = go.Layout(
    title = citynames,
    xaxis = {'title':'time-course (months)'},
    yaxis = {'title':'temperature (Celsius)'},
    yaxis2 = {'title':'rainfall (mm)', 'overlaying':'y', 'side':'right'},
    showlegend = True)

data = [trace1, trace2]
fig = go.Figure(data=data, layout=layout)

py.iplot(fig, filename='Multichart', sharing='public')
Out[23]:

Would it be possible to get the data for the other cities as well, using shared axes?:

In [24]:
data = []
data1 = []
data2 = []
trace1 = []
trace2 = []
citynames = ['Geneva', 'Zurich', 'Lugano']
cols=['rgb(0,0,255)', 'rgb(204,0,0)', 'rgb(0,102,0)']

for i in range(0, len(citynames)):
    tracex =  go.Scatter(
          x=df1['Month'],
          y=df1[citynames[i]],
          marker=dict(color=cols[i], line=dict(color=cols[i], width=1.4)))
    data1.append(tracex)
    
for i in range(0, len(citynames)):
    tracey =  go.Bar(
          x=df2['Month'],
          y=df2[citynames[i]],
          opacity=0.5,
          marker=dict(color=cols[i], line=dict(color=cols[i], width=1.4)))
    data2.append(tracey)

fig = tools.make_subplots(rows=1, cols=3, shared_yaxes=True, subplot_titles=('Geneva', 'Zurich', 'Lugano'))

dat = [data1, data2, data1, data2, data1, data2]

for i in range(0,3):
    fig.append_trace(data1[i], 1, i+1)
    fig.append_trace(data2[i], 1, i+1)
    
fig['data'][1].update(yaxis='y3')
fig['data'][2].update(yaxis='y2')
fig['data'][3].update(yaxis='y4')
fig['data'][4].update(yaxis='y2')
fig['data'][5].update(yaxis='y6')

fig['layout'].update(height=600, width=1000, title='Temperature and Rainfall', showlegend=False)
fig['layout']['xaxis1'].update(title='Months', type='category',)
fig['layout']['yaxis1'].update(range=[0, 25], title='Temperature',)
fig['layout']['yaxis3']={'range':[0,200], 'title':'', 'overlaying':'y1', 'anchor':'x3', 'side':'right',}
fig['layout']['xaxis2'].update(title='Months', type='category',)
fig['layout']['yaxis2'].update(range=[0, 25], title='',)
fig['layout']['yaxis4']={'range':[0,200], 'title':'', 'overlaying':'y2', 'anchor':'x3', 'side':'right'}
fig['layout']['xaxis3'].update(title='Months', type='category',)
fig['layout']['yaxis6'].update(range=[0, 25], title='',)
fig['layout']['yaxis6']={'range':[0,200], 'title':'Rainfall (mm)', 'overlaying':'y2', 'anchor':'x3', 'side':'right'}

py.iplot(fig, filename='Multichart3', sharing='public')
This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y1 ]  [ (1,3) x3,y1 ]

Out[24]:

Closing words

That should give you a good first impression of Plotly. If you want to share some ideas, do not hesitate to contact me: rrighart@googlemail.com . Lucky coding!

Author: Ruthger Righart

Blogs: https://rrighart.github.io

Web: https://www.rrighart.com