03-01 Numpy Array Basics

03 - 01 Numpy Array Basics

Numpy's main object is the homogeneous multidimensional array. Numpy's array class is called ndarray. It is a table of numbers, indexed by a tuple of positive integers. In numpy dimensions are called as axes. The number of axes is known as rank.

Numpy arrays are similar to Python lists with few differences such as:

  • All the elements in a numpy array must be of same datatype.
  • You can't change the size of a numpy array (atleast not without making a full copy.. we'll see this a little later)
  • Numpy arrays are easy to construct and to manipulate.
  • Numpy arrays support “vectorized” operations like elementwise addition and multiplication without having to run a for loop explicitly in python.

We'll cover basic array manipulations here:

  • Attributes of arrays: Determining the size, shape, memory consumption, and data types of arrays
  • Creating arrays: Different ways of creating the Arrays
  • Indexing of arrays: Getting and setting the value of individual array elements
  • Slicing of arrays: Getting and setting smaller subarrays within a larger array
  • Reshaping of arrays: Changing the shape of a given array
  • Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many

03 - 01.01 Attributes of Arrays

In [1]:
from __future__ import print_function
import numpy as np
# Single dimensional Array from a list
arr = np.array([1, 2, 3, 4], dtype=float)
print('Type: ',type(arr))
print('Shape: ',arr.shape)
print('Dimension: ',arr.ndim)
print('Itemsize: ',arr.itemsize)
print('Size: ',arr.size)
Type:  <class 'numpy.ndarray'>
Shape:  (4,)
Dimension:  1
Itemsize:  8
Size:  4

The above is one of the many ways in which a numpy array can be created. The np.array() in above case takes two arguments: the list to be converted to numpy array and the datatype (dtype) of every member of the list.

There are many different attributes of ndarray class and by now you should be able to understand how to access those attributes and get help for them (Hint: <TAB> completion).

Let's understand at some of the attributes that we printer above.

ndarray.ndim

It is the number of axes or dimensions of the array.

ndarray.shape

It is the dimension of the array. This is a tuple of integers indicating the size of the array in each dimension. For matrix with n rows and m columns, the shape will be (m, n). The shape attribute is thus a tuple. For single dimensional arrays, the second element of the tuple will be None (as it is on our case).

ndarray.dtype

It is an object describing the type of the elements in the array. Remember that all the elements need to be of same datatype in a numpy array. Additionally numpy provides its own int16, int32, float64 and so on.

ndarray.itemsize

The size in bytes of each element of the array. For example an array of elements of type float64 has itemsize of $\frac{64}{8} = 8$ and one complex32 has item size of $\frac{32}{8} = 4$.

ndarray.data

This is the buffer containing the actual elements of the array. Normally this attribute is not used as numpy offers many fancy indexing facilities.

Let's take a look at another example:

In [2]:
# Elements have to be of same datatype
arr = np.array([1, 2.0, "ucsl"])
print("Datatype: ", arr.dtype)
Datatype:  <U32

Since we did not pass the dtype parameter, Numpy saw that there are mixed types and it converts the datatype of all the elements to type Unicode32 (or String32 if you are using Python2).

To know all the datatypes supported by Numpy, you can type

In [2]: np.typeDict

and check the output

If we would've passed the dtype as float or anything other than a type of string or unicode, we would've recevied a value error. (Try it!)

03 - 01.02 Creating Arrays

There are many different ways in which a numpy array can be created. We saw one in the above example. Lets look at some other ways of creating arrays

In [3]:
arr1 = np.arange(5, dtype=float)
print('arange() with float dtype: \n',arr1)
# Divide the range between start and stop in equal `num` intervals
arr2 = np.linspace(0, 8, num=5)
print('\n linspace(): \n', arr2)
arr3 = np.ones((2, 3), dtype=float)
print ('\n ones(): \n',arr3)
arr4 = np.zeros((2,3), dtype=float)
print ('\n zeros(): \n',arr4)
arr5 = np.empty((2, 4))
print('\n Empty: \n',arr5)  # Your output may be different..
arr6 = np.ones_like(arr1)
print('\n Ones_like(): \n',arr6)
arr7 = np.diag(arr1)
print('\n Diagonal array: \n',arr7)
arange() with float dtype: 
 [ 0.  1.  2.  3.  4.]

 linspace(): 
 [ 0.  2.  4.  6.  8.]

 ones(): 
 [[ 1.  1.  1.]
 [ 1.  1.  1.]]

 zeros(): 
 [[ 0.  0.  0.]
 [ 0.  0.  0.]]

 Empty: 
 [[  1.28822975e-231  -3.11107787e+231   1.97626258e-323   0.00000000e+000]
 [  0.00000000e+000   0.00000000e+000   1.28822975e-231   5.60597947e-309]]

 Ones_like(): 
 [ 1.  1.  1.  1.  1.]

 Diagonal array: 
 [[ 0.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  2.  0.  0.]
 [ 0.  0.  0.  3.  0.]
 [ 0.  0.  0.  0.  4.]]

np.arange()

is the same as the range function that we used previously. This method will however return a numpy array.

np.zeros() and np.ones()

as the name suggests, generate new arrays of specified dimensions filled with these values. These are most commonly used functions to create new arrays.

np.empty()

This function creates an array whose initial content is random and depends on the state of the memory. If not specified, the data type of the created array is float64

np.ones_like() , np.zeros_like() and np.empty_like()

These functions create a new array with the same dimensions and type as the existing one but with the values as either ones or zeros or random value.

np.diag()

As the name suggests, this will construct a diagonal array

Let's take a look at an example for creating multi-dimensional array

In [4]:
arr2d = np.arange(27).reshape(3, 9)
print("2D array: \n{}\n".format(arr2d))
arr3d = np.arange(27).reshape(3,3,3)
print("3D array: \n{}\n".format(arr3d))
2D array: 
[[ 0  1  2  3  4  5  6  7  8]
 [ 9 10 11 12 13 14 15 16 17]
 [18 19 20 21 22 23 24 25 26]]

3D array: 
[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]

Numpy displays the arrays in a similar way to nested lists but with the following layout:

  • the last axis is printed from left to right
  • the second to last axis is printed from top to bottom.
  • the rest rest are also printed from top to bottom with each slice separated from the next by an empty line Simply put, single dimensional array are printed as rows, bi dimensional and multi-dimensional are printed as matrices and as lists of matrices respectively.

We will look at reshaping of arrays later in this module

03 - 01.03 Array Indexing

Numpy arrays are indexed in the same way as lists are so accessing the elements for single dimensional array is equivalent to accessing elements in a list

In [5]:
arr = np.arange(3, 10)
print(arr[4])
7

You can also use negative indexing like we did for lists

In [6]:
print(arr[-3])
7

Multi-dimensional array items can be accessed using comma-separated tuple of indexes

In [7]:
arr3d = np.arange(27).reshape(3,3,3)
dim, row, col = 2, 1, 0
print("3D array: \n", arr3d, end="\n\n")
print("Element at {dim}, {row}, {col} is: {val}".format(dim=dim, 
                                                        row=row, 
                                                        col=col, 
                                                        val=arr3d[dim, row, col]))
3D array: 
 [[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]]

 [[ 9 10 11]
  [12 13 14]
  [15 16 17]]

 [[18 19 20]
  [21 22 23]
  [24 25 26]]]

Element at 2, 1, 0 is: 21

03 - 01.04 Array Slicing

Slicing extracts the portion of a sequence by specifying a lower and upper bound. The lower bound element is included, but the upper-bound element is not included in slicing. Just like lists, there is a third parameter step which means the strides to be taken between the elements. If any of these are unspecified, they default to the values start=0, stop=size of dimension, step=1. Each of these parameters are separated by colons (:)

In [8]:
arr = np.linspace(0, 8, num=5)
print("Original Array: \n", arr, end="\n\n")
# let the slicings begin
print("arr[:3]: ", arr[:3])
print("arr[-5:5:2]: ", arr[-5:5:2])
print("arr[::2]: ", arr[::2])
# Reverse the elements
print("arr[::-1]: ", arr[::-1])
# Reverse every other array from index 2
print("arr[2::-2]: ", arr[2::-2])
Original Array: 
 [ 0.  2.  4.  6.  8.]

arr[:3]:  [ 0.  2.  4.]
arr[-5:5:2]:  [ 0.  4.  8.]
arr[::2]:  [ 0.  4.  8.]
arr[::-1]:  [ 8.  6.  4.  2.  0.]
arr[2::-2]:  [ 4.  0.]

For multi-dimensional array, we specify in rows, columns format.

In [9]:
# Array of random integers between low and high of fixed size(mxn)
arr = np.random.randint(low=0, high=100, size=(3,4))
print("2D array: \n", arr, end="\n\n")
# first row, three columns
print("first row, three columns: \n", arr[:1, :3], end="\n\n")
# all rows, third column
print("all rows, third column: \n", arr[:, 3], end="\n\n")
# changing dimensions 
print("reversing rows and columns together: \n",
     arr[::-1, ::-1], end="\n\n")
2D array: 
 [[74 19 29 29]
 [66  9 69 21]
 [25 76 27 49]]

first row, three columns: 
 [[74 19 29]]

all rows, third column: 
 [29 21 49]

reversing rows and columns together: 
 [[49 27 76 25]
 [21 69  9 66]
 [29 29 19 74]]

Slices are references to the original array in memory. Changing the values in a slice also changes the original array

In [10]:
arr1 = np.arange(5)
# slice arr1
arr2 = arr1[3:5]
print("arr1: \n", arr1, end="\n\n")
print("Sliced array: \n", arr2)
print('\nBefore changing, arr2[0]: \n',arr2[0])
# change value for 0th element of the slice
arr2[0] = 99
print('\nAfter changing arr2[0], arr1: \n',arr1)
arr1: 
 [0 1 2 3 4]

Sliced array: 
 [3 4]

Before changing, arr2[0]: 
 3

After changing arr2[0], arr1: 
 [ 0  1  2 99  4]

03 - 01.05 Reshaping Arrays

We have been using reshape function to view a one dimensional array as a multi-dimensional array. This nifty method only works if your new array shape matches the size of the original array i.e size = m x n

One can also row and column elements using newaxis method

In [11]:
arr = np.random.randint(low=0, high=100, size=12)
print("Original Array: \n", arr, end="\n\n")
print("Reshaped to 3 x 4: \n", arr.reshape(3,4), end="\n\n")
print("Row vector : \n", arr[np.newaxis, :], end="\n\n")
print("Column vector : \n", arr[:, np.newaxis], end="\n\n")
Original Array: 
 [34 79 54  9 39 86 99  2 61  4 93 94]

Reshaped to 3 x 4: 
 [[34 79 54  9]
 [39 86 99  2]
 [61  4 93 94]]

Row vector : 
 [[34 79 54  9 39 86 99  2 61  4 93 94]]

Column vector : 
 [[34]
 [79]
 [54]
 [ 9]
 [39]
 [86]
 [99]
 [ 2]
 [61]
 [ 4]
 [93]
 [94]]

03 - 01.06 Concatenating Arrays

Just like Python Lists, you can concatenate two arrays using Numpy's concatenate(), hstack() and vstack() functions.

However, you must remember that just like lists, when you combine a Numpy array, an actualy copy of both the arrays are made. If you created the two arrays separately, they are randomly scattered in memory, and there is no way to represent them as a view Numpy array. It is always advisible to know the size of array that you will be needing before-hand so that you can start with one big array, and have each of the small arrays be a view to the big array (you can leverage the power of slicing!)

In [12]:
# Creating two 1D arrays separately
arr1 = np.arange(10)
arr2 = np.arange(10, 20)
arr3 = np.concatenate((arr1, arr2))
print("Arr1: \n{}".format(arr1), end="\n\n")
print("Arr2: \n{}".format(arr2), end="\n\n")
print("Concatenated Array: \n{}".format(arr3))
Arr1: 
[0 1 2 3 4 5 6 7 8 9]

Arr2: 
[10 11 12 13 14 15 16 17 18 19]

Concatenated Array: 
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]

Concatenation of two multi-dimensional arrays, it is better to use the hstack() and vstack() for stacking against horizontal and vertical axis respectively.

In [13]:
arr1 = np.random.randint(1, 10, 8).reshape(2, 4)
arr2 = np.random.randint(90, 100, 8).reshape(2, 4)
# stacking horizontally
hs_arr = np.hstack((arr1, arr2))
# stacking vertically
vs_arr = np.vstack((arr1, arr2))
print("Arr1: \n{}".format(arr1), end="\n\n")
print("Arr2: \n{}".format(arr2), end="\n\n")
print("Horizontally Stacked Array: \n{}".format(hs_arr), end="\n\n")
print("Vertically Stacked Array: \n{}".format(vs_arr), end="\n\n")
Arr1: 
[[7 9 2 2]
 [8 5 3 3]]

Arr2: 
[[91 98 98 98]
 [99 93 94 97]]

Horizontally Stacked Array: 
[[ 7  9  2  2 91 98 98 98]
 [ 8  5  3  3 99 93 94 97]]

Vertically Stacked Array: 
[[ 7  9  2  2]
 [ 8  5  3  3]
 [91 98 98 98]
 [99 93 94 97]]

03 - 01.07 Splitting Arrays

Just like concatenating multiple arrays into one, Numpy's split(), hsplit() and vsplit() allows splitting of one array into multiple smaller ones.

In [14]:
arr1 = np.arange(20)
np.split(arr1, (2, 8, 10, 14))
Out[14]:
[array([0, 1]),
 array([2, 3, 4, 5, 6, 7]),
 array([8, 9]),
 array([10, 11, 12, 13]),
 array([14, 15, 16, 17, 18, 19])]

np.split() takes the array that we want to split as the first argument and as a second argument, it requires a list or tuple of the index of the elements at which we want to split the array. More the number of split-points, there will be one more subarray i.e N split-points, leads to N + 1 subarrays.

Similarly for multi-dimensional arrays, we can use hsplit() and vsplit()

In [15]:
arr2d = np.random.randint(0, 9, (3,3))
print("Original Array: \n{}".format(arr2d), end="\n\n")
# split along horizontal axis
arr1, arr2 = np.hsplit(arr2d, [2])
print("First Split: \n{}".format(arr1), end="\n\n")
print("Remaining Split: \n{}".format(arr2), end="\n\n")
Original Array: 
[[8 5 4]
 [8 5 2]
 [4 8 3]]

First Split: 
[[8 5]
 [8 5]
 [4 8]]

Remaining Split: 
[[4]
 [2]
 [3]]

Related

comments powered by Disqus