04-02 Various types of plots

04 - 02 Various types of Plots

In previous module we saw one of the basic types of plots -- line plots. Apart from line plots there are various different types of plots supported by matplotlib.

In this module we shall take a look at some of them.

In this module, instead of plotting random data, we will load some of the publicly available datasets and plot them.

To load these files, we will be using read_csv function from pandas module.. somthing that we will dig deeper in the last module.

In [237]:
# Our boilerplate imports
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('seaborn-darkgrid')
import numpy as np
import pandas as pd
In [261]:
global_co2_emission = pd.read_csv('sample_datasets/global_co2_emission.csv')

04 - 02.01 Scatter plots

Scatter plots are similar to line plots except that instead of points being joined by line segments, in scatter plots, the points are represented by a shape like circle, triangle, dot etc. These shapes are known as markers

Lets take a look at a simple sine plot

In [3]:
fig = plt.figure()
ax = fig.add_subplot(111) 
theta = np.linspace(-np.pi, np.pi, 50)
plt.scatter(theta, np.sin(theta))
Out[3]:
<matplotlib.collections.PathCollection at 0x107cb1eb8>

Markers

There are various types of markers supported by matplotlib. Below is a visual representation of them

marker description marker description marker description marker description
"." point "+" plus "," pixel "x" cross
"o" circle "D" diamond "d" thin_diamond
"8" octagon "s" square "p" pentagon "*" star
"|" vertical line "_" horizontal line "h" hexagon1 "H" hexagon2
0 tickleft 4 caretleft "<" triangle_left "3" tri_left
1 tickright 5 caretright ">" triangle_right "4" tri_right
2 tickup 6 caretup "^" triangle_up "2" tri_up
3 tickdown 7 caretdown "v" triangle_down "1" tri_down
"None" nothing None nothing " " nothing "" nothing

Code to generate this plot is in XX - Miscellaneous plots.ipynb notebook

In [4]:
fig = plt.figure()
ax = fig.add_subplot(111) 
theta = np.linspace(-np.pi, np.pi, 50)
plt.scatter(theta, np.sin(theta), marker="*")
Out[4]:
<matplotlib.collections.PathCollection at 0x109bdb630>

The main difference between plt.scatter and plt.plot is that plt.scatter can be used to control the properties of each individual point (such as size, face color, edge color, etc.). Let's take a look at such an example

In [37]:
N = 50
x = np.random.rand(N)
y = np.random.rand(N)
colors = np.random.rand(N)
area = np.pi * (15 * np.random.rand(N))**2  # 0 to 15 point radii
fig = plt.figure()
plt.scatter(x, y, s=area, c=colors, alpha=0.5, cmap='gist_earth')
plt.colorbar()
Out[37]:
<matplotlib.colorbar.Colorbar at 0x10be42b00>

The color argument is automatically mapped to a color scale which is shown using plt.colorbar(), and the size argument (area of the circles in our example) is given in pixels.

Colorbars are much like legends because they help to describe the data being displayed. While legends describe plots, i.e., plot(), scatter(), hist(), stem(), colorbars describe images.

In this way, the color and size of points can be used to convey information in the visualization, in order to visualize multidimensional data.

04 - 02.02 Bar Plots

I don't think anyone needs any explanation at bar plots. Lets directly dive into a simple bar plot

In [72]:
plt.bar(np.arange(10), np.random.randint(1, 100, 10), yerr=np.random.randint(2, 10, 10))
Out[72]:
<Container object of 10 artists>

For any scientific measurement, accurate accounting for errors is nearly as important, if not more important, than accurate reporting of the number itself. In visualization of data and results, showing these errors effectively can make a plot convey much more complete information. The yerr in above example represents the data for the error bar.

Just like scatter plot, we can play around with the properties of barplot changing things like width of the bars, colors, alignment of inidividual bars (either centered or at the edge), adding error bars, changing to log scale to name a few. You can also use barh to plot horizontal bars.

Lets plot the global $CO_2$ emission.

In [127]:
fig, ax = plt.subplots(figsize=(15,7))
ax.bar(global_co2_emission['Year'], global_co2_emission['Total'], 
        width=0.5, align='center', color='#34495e')
ax.set_xticks(global_co2_emission['Year'][::10]);
ax.set_xlabel('Year')
ax.set_ylabel('Million Tons of $CO_2$')
ax.set_title('Global $CO_2$ emissions')
Out[127]:
<matplotlib.text.Text at 0x1177d13c8>

In the above plot we changed the granularity of the ticks shown on the x-axis. For showing the year on the plot, we used set_xticks method passing it the numpy array containing year information and selecting 1 year every 10 years.

color parameter

color parameter can be passed as strings. For very basic colors, you can even get away with just a single letter:

  • b: blue
  • g: green
  • r: red
  • c: cyan
  • m: magenta
  • y: yellow
  • k: black
  • w: white
Hex values

Colors can also be specified by supplying a hex string, such as '#0000FF' for blue. You can check this website out for obtaining hex code for colors http://htmlcolorcodes.com/

256 Shades of Gray

A gray level can be given instead of a color by passing a string representation of a number between 0 and 1, inclusive. '0.0' is black, while '1.0' is white. '0.75' would be a lighter shade of gray.

RGB[A] tuples

You may come upon instances where the previous ways of specifying colors do not work. This can sometimes happen in some of the deeper, stranger levels of the code. When all else fails, the universal language of colors for matplotlib is the RGB[A] tuple. This is the "Red", "Green", "Blue", and sometimes "Alpha" tuple of floats in the range of [0, 1]. One means full saturation of that channel, so a red RGBA tuple would be (1.0, 0.0, 0.0, 1.0), whereas a partly transparent green RGBA tuple would be (0.0, 1.0, 0.0, 0.75).

Note, oftentimes there is a separate argument for "alpha" where-ever you can specify a color. The value for "alpha" will usually take precedence over the alpha value in the RGBA tuple. There is no easy way around this problem.

To get all the colors and hex values, run the following code:

import matplotlib
for name, hex in matplotlib.colors.cnames.items():
    print(name, hex)

Hatches

A bar can have a hatching defined for it.

  • / - diagonal hatching
  • \ - back diagonal
  • | - vertical
  • - - horizontal
  • + - crossed
  • x - crossed diagonal
  • o - small circle
  • O - large circle (upper-case 'o')
  • . - dots
  • * - stars

The above letters can be combined, in which case all the specified hatchings are done. If same letter repeats, it increases the density of hatching of that pattern.

In [107]:
bars = plt.bar([1, 2, 3, 4], [10, 12, 15, 17])
# add hatches to the first bar
plt.setp(bars[0], hatch='x-',)
plt.show()

plt.setp is a method for setting a paroperty on any artist object(s). We won't be digging deep into the concept of Artists but think of them as simply plots that you create on the figure.

So using plt.setp you can add more properties to an already created object (if it is supported by that plot type)

04 - 02.03 Pie charts

I guess these type of charts don't need any explanation either. Lets take look at an example plot

In [109]:
labels       = ['Samsung', 'Apple', 'Huawei', 'Oppo', 'Vivo', 'Others']
market_share = [21, 12.5, 9.3, 7.1, 5.9, 44.2]
explode      = [0.2, 0, 0, 0, 0, 0]
fig, ax      = plt.subplots()
ax.pie(market_share, explode=explode, labels=labels, autopct='%1.1f%%',
        shadow=True, startangle=90)
# Equal aspect ratio so that we get a circle
ax.axis('equal');

04 - 02.04 Histograms

A histogram is a graphical representation of the frequency of occurrence or distribution of the set of data. This plot gives you information like outliers, skewness, etc. of the dataset.

Lets take a look at an example

In [255]:
#Random numbers from normal distribution
data = np.random.randn(1000)
print("Max: ", data.max())
print("Min: ", data.min())
fig, ax = plt.subplots()
ax.hist(data)
Max:  3.05114422539
Min:  -3.14609849337
Out[255]:
(array([   5.,   18.,   75.,  167.,  225.,  230.,  175.,   71.,   29.,    5.]),
 array([-3.14609849, -2.52637422, -1.90664995, -1.28692568, -0.66720141,
        -0.04747713,  0.57224714,  1.19197141,  1.81169568,  2.43141995,
         3.05114423]),
 <a list of 10 Patch objects>)

The x-axis is the range of the values in data array and y-axis is the times that number has appeared in the array.

The hist() function has many parameters that can be fine tuned for visualizing dsitrbution of the data

In [145]:
fig, ax = plt.subplots()
ax.hist(data, bins=30, normed=True, alpha=0.8,
         histtype='barstacked', color='lightcoral',
         edgecolor='none');

You can also plot multiple histograms in a single figure.. for example when you want to look at an image and plot the histogram of Red, Green and Blue pixels.

For this example, we will import scipy module to just load an RGB image of a racoon

In [238]:
import scipy.misc
racoon  = scipy.misc.face()
fig, ax = plt.subplots()
ax.imshow(racoon)
# disable the grid lines and axis
ax.grid(False)
ax.axis('off')
Out[238]:
(-0.5, 1023.5, 767.5, -0.5)

Now lets look at the distribution for Red, Green and Blue pixels

In [239]:
# The image is a 3-D numpy array.. 3D = Red, Green, Blue
print('Type: ', type(racoon))
# An RGB image should have a shape of rows x cols x 3
print("Shape: ", racoon.shape)
# Lets check the dtype
print("Dtype: ", racoon.dtype)
# dype of uint8 means a range of 0 - 255
Type:  <class 'numpy.ndarray'>
Shape:  (768, 1024, 3)
Dtype:  uint8
In [240]:
# Great, now lets plot the red, green and blue color channels
fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(15, 15))
colors = ['red', 'green', 'dodgerblue']
for i in range(3):
    axes[i][0].imshow(racoon[:,:,i])
    axes[i][1].hist(racoon[:,:,i].ravel(), bins=255, alpha=0.3, color=colors[i])