04 - 02 Various types of Plots¶
In previous module we saw one of the basic types of plots -- line plots. Apart from line plots there are various different types of plots supported by matplotlib.
In this module we shall take a look at some of them.
In this module, instead of plotting random data, we will load some of the publicly available datasets and plot them.
To load these files, we will be using
pandasmodule.. somthing that we will dig deeper in the last module.
# Our boilerplate imports %matplotlib inline import matplotlib.pyplot as plt plt.style.use('seaborn-darkgrid') import numpy as np import pandas as pd
global_co2_emission = pd.read_csv('sample_datasets/global_co2_emission.csv')
04 - 02.01 Scatter plots¶
Scatter plots are similar to line plots except that instead of points being joined by line segments, in scatter plots, the points are represented by a shape like circle, triangle, dot etc. These shapes are known as markers
Lets take a look at a simple sine plot
fig = plt.figure() ax = fig.add_subplot(111) theta = np.linspace(-np.pi, np.pi, 50) plt.scatter(theta, np.sin(theta))
<matplotlib.collections.PathCollection at 0x107cb1eb8>
There are various types of markers supported by matplotlib. Below is a visual representation of them
|"|"||vertical line||"_"||horizontal line||"h"||hexagon1||"H"||hexagon2|
Code to generate this plot is in
XX - Miscellaneous plots.ipynbnotebook
fig = plt.figure() ax = fig.add_subplot(111) theta = np.linspace(-np.pi, np.pi, 50) plt.scatter(theta, np.sin(theta), marker="*")
<matplotlib.collections.PathCollection at 0x109bdb630>
The main difference between
plt.plot is that
plt.scatter can be used to control the properties of each individual point (such as size, face color, edge color, etc.). Let's take a look at such an example
N = 50 x = np.random.rand(N) y = np.random.rand(N) colors = np.random.rand(N) area = np.pi * (15 * np.random.rand(N))**2 # 0 to 15 point radii fig = plt.figure() plt.scatter(x, y, s=area, c=colors, alpha=0.5, cmap='gist_earth') plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x10be42b00>
The color argument is automatically mapped to a color scale which is shown using
plt.colorbar(), and the
size argument (area of the circles in our example) is given in pixels.
Colorbars are much like legends because they help to describe the data being displayed. While legends describe plots, i.e.,
stem(), colorbars describe images.
In this way, the color and size of points can be used to convey information in the visualization, in order to visualize multidimensional data.
04 - 02.02 Bar Plots¶
I don't think anyone needs any explanation at bar plots. Lets directly dive into a simple bar plot
plt.bar(np.arange(10), np.random.randint(1, 100, 10), yerr=np.random.randint(2, 10, 10))
<Container object of 10 artists>
For any scientific measurement, accurate accounting for errors is nearly as important, if not more important, than accurate reporting of the number itself. In visualization of data and results, showing these errors effectively can make a plot convey much more complete information. The
yerrin above example represents the data for the error bar.
Just like scatter plot, we can play around with the properties of barplot changing things like width of the bars, colors, alignment of inidividual bars (either centered or at the edge), adding error bars, changing to log scale to name a few. You can also use
barh to plot horizontal bars.
Lets plot the global $CO_2$ emission.
fig, ax = plt.subplots(figsize=(15,7)) ax.bar(global_co2_emission['Year'], global_co2_emission['Total'], width=0.5, align='center', color='#34495e') ax.set_xticks(global_co2_emission['Year'][::10]); ax.set_xlabel('Year') ax.set_ylabel('Million Tons of $CO_2$') ax.set_title('Global $CO_2$ emissions')
<matplotlib.text.Text at 0x1177d13c8>
In the above plot we changed the granularity of the ticks shown on the x-axis. For showing the year on the plot, we used
set_xticks method passing it the numpy array containing year information and selecting 1 year every 10 years.
color parameter can be passed as strings. For very basic colors, you can even get away with just a single letter:
- b: blue
- g: green
- r: red
- c: cyan
- m: magenta
- y: yellow
- k: black
- w: white
Colors can also be specified by supplying a hex string, such as
'#0000FF' for blue. You can check this website out for obtaining hex code for colors http://htmlcolorcodes.com/
256 Shades of Gray¶
A gray level can be given instead of a color by passing a string representation of a number between 0 and 1, inclusive.
'0.0' is black, while
'1.0' is white.
'0.75' would be a lighter shade of gray.
You may come upon instances where the previous ways of specifying colors do not work. This can sometimes happen in some of the deeper, stranger levels of the code. When all else fails, the universal language of colors for matplotlib is the RGB[A] tuple. This is the "Red", "Green", "Blue", and sometimes "Alpha" tuple of floats in the range of [0, 1]. One means full saturation of that channel, so a red RGBA tuple would be
(1.0, 0.0, 0.0, 1.0), whereas a partly transparent green RGBA tuple would be
(0.0, 1.0, 0.0, 0.75).
Note, oftentimes there is a separate argument for "alpha" where-ever you can specify a color. The value for "alpha" will usually take precedence over the alpha value in the RGBA tuple. There is no easy way around this problem.
To get all the colors and hex values, run the following code:import matplotlib for name, hex in matplotlib.colors.cnames.items(): print(name, hex)
A bar can have a hatching defined for it.
- / - diagonal hatching
- \ - back diagonal
- | - vertical
- - - horizontal
- + - crossed
- x - crossed diagonal
- o - small circle
- O - large circle (upper-case 'o')
- . - dots
- * - stars
The above letters can be combined, in which case all the specified hatchings are done. If same letter repeats, it increases the density of hatching of that pattern.
bars = plt.bar([1, 2, 3, 4], [10, 12, 15, 17]) # add hatches to the first bar plt.setp(bars, hatch='x-',) plt.show()
plt.setp is a method for setting a paroperty on any artist object(s). We won't be digging deep into the concept of Artists but think of them as simply plots that you create on the figure.
plt.setp you can add more properties to an already created object (if it is supported by that plot type)
04 - 02.03 Pie charts¶
I guess these type of charts don't need any explanation either. Lets take look at an example plot
labels = ['Samsung', 'Apple', 'Huawei', 'Oppo', 'Vivo', 'Others'] market_share = [21, 12.5, 9.3, 7.1, 5.9, 44.2] explode = [0.2, 0, 0, 0, 0, 0] fig, ax = plt.subplots() ax.pie(market_share, explode=explode, labels=labels, autopct='%1.1f%%', shadow=True, startangle=90) # Equal aspect ratio so that we get a circle ax.axis('equal');
04 - 02.04 Histograms¶
A histogram is a graphical representation of the frequency of occurrence or distribution of the set of data. This plot gives you information like outliers, skewness, etc. of the dataset.
Lets take a look at an example
#Random numbers from normal distribution data = np.random.randn(1000) print("Max: ", data.max()) print("Min: ", data.min()) fig, ax = plt.subplots() ax.hist(data)
Max: 3.05114422539 Min: -3.14609849337
(array([ 5., 18., 75., 167., 225., 230., 175., 71., 29., 5.]), array([-3.14609849, -2.52637422, -1.90664995, -1.28692568, -0.66720141, -0.04747713, 0.57224714, 1.19197141, 1.81169568, 2.43141995, 3.05114423]), <a list of 10 Patch objects>)
x-axis is the range of the values in data array and
y-axis is the times that number has appeared in the array.
hist() function has many parameters that can be fine tuned for visualizing dsitrbution of the data
fig, ax = plt.subplots() ax.hist(data, bins=30, normed=True, alpha=0.8, histtype='barstacked', color='lightcoral', edgecolor='none');
You can also plot multiple histograms in a single figure.. for example when you want to look at an image and plot the histogram of Red, Green and Blue pixels.
For this example, we will import scipy module to just load an RGB image of a racoon
import scipy.misc racoon = scipy.misc.face() fig, ax = plt.subplots() ax.imshow(racoon) # disable the grid lines and axis ax.grid(False) ax.axis('off')
(-0.5, 1023.5, 767.5, -0.5)
Now lets look at the distribution for Red, Green and Blue pixels
# The image is a 3-D numpy array.. 3D = Red, Green, Blue print('Type: ', type(racoon)) # An RGB image should have a shape of rows x cols x 3 print("Shape: ", racoon.shape) # Lets check the dtype print("Dtype: ", racoon.dtype) # dype of uint8 means a range of 0 - 255
Type: <class 'numpy.ndarray'> Shape: (768, 1024, 3) Dtype: uint8
# Great, now lets plot the red, green and blue color channels fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(15, 15)) colors = ['red', 'green', 'dodgerblue'] for i in range(3): axes[i].imshow(racoon[:,:,i]) axes[i].hist(racoon[:,:,i].ravel(), bins=255, alpha=0.3, color=colors[i])