03-03 Broadcasting and computation

03 - 03 Broadcasting and more Computation

03 - 03.01 Broadcasting

Broadcasting is a set of rules for applying ufuncs (e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes. It is an important functionality to leverage the power of Numpy.

If you remember from previous module, ufunc operations are performed element-by-element wise. Lets take a look at adding a scalar (we did this in Arithmetic subsection of previous module)

In [1]:
from __future__ import print_function
import numpy as np
arr1 = np.random.randint(1, 40, 5)
num  = 5
print("Arr1: \n{}".format(arr1), end="\n\n")
print("num : \n{}".format(num), end="\n\n")
print("Sum : \n{}".format(arr1+num), end="\n\n")
Arr1: 
[33 12 39 11 31]

num : 
5

Sum : 
[38 17 44 16 36]

Broadcasting allows these types of binary operations to be performed on arrays of different sizes just as we added a scalar (think of a scalar as a zero-dimensional array) to the array.

We can think of this as an operation that stretches or duplicates the value 5 into the array [5, 5, 5, 5, 5], and adds it to the array. This duplication does not actually take place during Broadcasting but it is a useful logic to remember when you talk about broadcasting.

Just like adding scalar, we can perform broadcasting on multi-dimensional arrays as well.. however there are rules to be followed for broadcasting to work.

  • Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
  • Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
  • Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Lets add two arrays of different sizes

In [2]:
arr1 = np.ones((3, 4))
arr2 = np.arange(4)

Lets match these arrays to our set of Rules.

Rule 1: Shape mismatch!

  • arr1 is of shape m1 x n1 = 3 x 4
  • arr2 is of shape m2 x n2 = 1 x 4

Rule 2: Stretch m2 or the first dimension of arr2 to match m1 or the first dimension of arr1. So Now,

  • arr1 is of shape m1 x n1 = 3 x 4
  • arr2 is of shape m2 x n2 = 3 x 4

Rule 3: Doesnt apply since m1 x n1 = m2 x n2

In [3]:
arr1 + arr2
Out[3]:
array([[ 1.,  2.,  3.,  4.],
       [ 1.,  2.,  3.,  4.],
       [ 1.,  2.,  3.,  4.]])

Now lets look at an example where we add arr1 to the transpose of arr1 itself. Lets first print out the transpose and then we shall apply the rules as we did for previous example

In [4]:
print("Arr1: \n{}".format(arr1))
print("arr1.shape: {}".format(arr1.shape), end="\n\n")
print("Arr1 Transpose: \n{}".format(arr1.T))
print("arr1.T.shape: {}".format(arr1.T.shape))
Arr1: 
[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]
arr1.shape: (3, 4)

Arr1 Transpose: 
[[ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]
arr1.T.shape: (4, 3)

Lets apply our rules:

Rule 1: Shape mismatch!

  • arr1 is of shape m1 x n1 = 3 x 4
  • arr1.T is of shape m2 x n2 = 4 x 3

Rule 2: Stretch m1 or the first dimension of arr1 to match m2 or the first dimension of arr1.T. So now,

  • arr1 is of shape m1 x n1 = 4 x 4
  • arr1.T is of shape m2 x n2 = 4 x 3

Rule 3: n1 and n2 or the second dimension of both the arrays are definitely not 1 and they don't match! This will raise a ValueError

In [5]:
arr1 + arr1.T
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-d3b06db3ad5f> in <module>()
----> 1 arr1 + arr1.T

ValueError: operands could not be broadcast together with shapes (3,4) (4,3) 

So whats important is that the second dimension of both the arrays need to match! The first dimension can be stretched to match the size of the largest array. Thats how broadcasting works!

Take a look at the example from previous module where we got a ValueError when we tried broadcasting on two arrays of different shape:

arr1 = np.array([1., 2., 3., 4.])
arr2 = np.linspace(4, 16, num=3)
arr1 + arr2

Can you solve it now?

03 - 03.02 Aggregation

When performing analysis on any dataset, most of the times the first thing that you would end up doing is finding the summary statistics of the datasets. Things like maximum, minimum, mean, variance etc. is the first thing you would look at (for the relevant columns). Numpy provides such fast-performing aggregation ufuncs. Lets take a look at some

.. 03.02.01 sum

As the name suggests, this function will return the sum of all the values of an array

In [35]:
arr = np.random.randint(1, 700, 10000)
print("Sum of 1D: {}".format(np.sum(arr)))
Sum of 1D: 3499093
In [36]:
arr2d = np.random.uniform(1, 700, (3, 4))
print("Sum of 2D: {}".format(np.sum(arr2d)))
Sum of 2D: 3956.78803416312

.. 03.02.02 max and min

This will find the maximum and minimum value in an array.

In [37]:
print("Max of 1D arr: {}".format(np.max(arr)))
print("Min of 1D arr: {}".format(np.min(arr)))
print("Max of 2D arr: {}".format(np.max(arr2d)))
print("Min of 2D arr: {}".format(np.min(arr2d)))
Max of 1D arr: 699
Min of 1D arr: 1
Max of 2D arr: 585.196247095094
Min of 2D arr: 25.090820435319227

We can also get minimum and maximum along a particular axis

  • axis 0 = along the column
  • axis 1 = along the row
In [40]:
print("2D Array: \n{}".format(arr2d), end="\n\n")
print("Max of 2D arr along axis 0: {}".format(np.amax(arr2d, axis=0)))
print("Min of 2D arr along axis 0: {}".format(np.amin(arr2d, axis=0)))
2D Array: 
[[ 284.98861255  247.58527166  223.10549946  218.04079432]
 [ 585.1962471    25.09082044  432.97492665  372.13297385]
 [ 507.73110919  320.88604643  170.8015855   568.25414701]]

Max of 2D arr along axis 0: [ 585.1962471   320.88604643  432.97492665  568.25414701]
Min of 2D arr along axis 0: [ 284.98861255   25.09082044  170.8015855   218.04079432]

.. 03.03.03 std

Compute standard deviation along a particular axis

In [41]:
print("2D Array: \n{}".format(arr2d), end="\n\n")
print("Std along axis 0: \n{}".format(np.std(arr2d, axis=0)))
print("Std along axis 1: \n{}".format(np.std(arr2d, axis=1)))
2D Array: 
[[ 284.98861255  247.58527166  223.10549946  218.04079432]
 [ 585.1962471    25.09082044  432.97492665  372.13297385]
 [ 507.73110919  320.88604643  170.8015855   568.25414701]]

Std along axis 0: 
[ 127.25289398  125.77387122  113.29202026  143.31678461]
Std along axis 1: 
[  26.46734614  205.06017735  156.87982677]

We will take a look at more of these in module on matplotlib so that while we will be printing the output, we will also be able to plot the results for a better understanding. If you are intersted to know about more such functions, you can check the link on official documentation here: Numpy Statistics Routines

Related

comments powered by Disqus