## 03 - 03 Broadcasting and more Computation¶

### 03 - 03.01 Broadcasting¶

Broadcasting is a set of rules for applying `ufuncs`

(e.g., addition, subtraction, multiplication, etc.) on arrays of different sizes. It is an important functionality to leverage the power of Numpy.

If you remember from previous module, `ufunc`

operations are performed element-by-element wise. Lets take a look at adding a scalar (we did this in Arithmetic subsection of previous module)

```
from __future__ import print_function
import numpy as np
arr1 = np.random.randint(1, 40, 5)
num = 5
print("Arr1: \n{}".format(arr1), end="\n\n")
print("num : \n{}".format(num), end="\n\n")
print("Sum : \n{}".format(arr1+num), end="\n\n")
```

Broadcasting allows these types of binary operations to be performed on arrays of different sizes just as we added a scalar (think of a scalar as a zero-dimensional array) to the array.

We can think of this as an operation that *stretches* or *duplicates* the value `5`

into the array `[5, 5, 5, 5, 5]`

, and adds it to the array. This duplication does not actually take place during Broadcasting but it is a useful logic to remember when you talk about broadcasting.

Just like adding scalar, we can perform broadcasting on multi-dimensional arrays as well.. however there are rules to be followed for `broadcasting`

to work.

- Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is
*padded*with ones on its leading (left) side. - Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
- Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Lets add two arrays of different sizes

```
arr1 = np.ones((3, 4))
arr2 = np.arange(4)
```

Lets match these arrays to our set of Rules.

Rule 1: Shape mismatch!

- arr1 is of shape
`m1 x n1 = 3 x 4`

- arr2 is of shape
`m2 x n2 = 1 x 4`

Rule 2: **Stretch** `m2`

or the first dimension of arr2 to match `m1`

or the first dimension of arr1. So Now,

- arr1 is of shape
`m1 x n1 = 3 x 4`

- arr2 is of shape
`m2 x n2 = 3 x 4`

Rule 3: Doesnt apply since m1 x n1 = m2 x n2

```
arr1 + arr2
```

Now lets look at an example where we add arr1 to the transpose of arr1 itself. Lets first print out the transpose and then we shall apply the rules as we did for previous example

```
print("Arr1: \n{}".format(arr1))
print("arr1.shape: {}".format(arr1.shape), end="\n\n")
print("Arr1 Transpose: \n{}".format(arr1.T))
print("arr1.T.shape: {}".format(arr1.T.shape))
```

Lets apply our rules:

Rule 1: Shape mismatch!

- arr1 is of shape
`m1 x n1 = 3 x 4`

- arr1.T is of shape
`m2 x n2 = 4 x 3`

Rule 2: **Stretch** `m1`

or the first dimension of arr1 to match `m2`

or the first dimension of arr1.T. So now,

- arr1 is of shape
`m1 x n1 = 4 x 4`

- arr1.T is of shape
`m2 x n2 = 4 x 3`

Rule 3: `n1`

and `n2`

or the second dimension of both the arrays are definitely not `1`

and they don't match! This will raise a `ValueError`

```
arr1 + arr1.T
```

So whats important is that the second dimension of both the arrays need to match! The first dimension can be stretched to match the size of the largest array. Thats how broadcasting works!

Take a look at the example from previous module where we got a ValueError when we tried broadcasting on two arrays of different shape:

```
arr1 = np.array([1., 2., 3., 4.])
arr2 = np.linspace(4, 16, num=3)
arr1 + arr2
```

Can you solve it now?

### 03 - 03.02 Aggregation¶

When performing analysis on any dataset, most of the times the first thing that you would end up doing is finding the summary statistics of the datasets. Things like maximum, minimum, mean, variance etc. is the first thing you would look at (for the relevant columns). Numpy provides such fast-performing aggregation ufuncs. Lets take a look at some

#### .. 03.02.01 sum¶

As the name suggests, this function will return the sum of all the values of an array

```
arr = np.random.randint(1, 700, 10000)
print("Sum of 1D: {}".format(np.sum(arr)))
```

```
arr2d = np.random.uniform(1, 700, (3, 4))
print("Sum of 2D: {}".format(np.sum(arr2d)))
```

#### .. 03.02.02 max and min¶

This will find the maximum and minimum value in an array.

```
print("Max of 1D arr: {}".format(np.max(arr)))
print("Min of 1D arr: {}".format(np.min(arr)))
print("Max of 2D arr: {}".format(np.max(arr2d)))
print("Min of 2D arr: {}".format(np.min(arr2d)))
```

We can also get minimum and maximum along a particular axis

- axis 0 = along the column
- axis 1 = along the row

```
print("2D Array: \n{}".format(arr2d), end="\n\n")
print("Max of 2D arr along axis 0: {}".format(np.amax(arr2d, axis=0)))
print("Min of 2D arr along axis 0: {}".format(np.amin(arr2d, axis=0)))
```

#### .. 03.03.03 std¶

Compute standard deviation along a particular axis

```
print("2D Array: \n{}".format(arr2d), end="\n\n")
print("Std along axis 0: \n{}".format(np.std(arr2d, axis=0)))
print("Std along axis 1: \n{}".format(np.std(arr2d, axis=1)))
```

We will take a look at more of these in module on matplotlib so that while we will be printing the output, we will also be able to plot the results for a better understanding. If you are intersted to know about more such functions, you can check the link on official documentation here: `Numpy Statistics Routines`