03-02 Ufuncs

03 - 02 Ufuncs

Python's default implementation does some operations slowly. This is in part due to the dynamic and interpreted nature of the language. It is this feature that allows types to be flexible but since the type has to be checked at every operation, the sequences of operations cannot be compiled down to efficient machine code as in languages like C. Lets take a look at python native implementation of this:

In [1]:
from __future__ import print_function
import numpy as np
def get_sin(arr):
    # Create an empty output array of same size as input
    output = np.empty_like(arr)
    for i in range(len(output)):
        output[i] = np.sin(arr[i])
    return output
In [2]:
input_arr = np.random.uniform(-np.pi, np.pi, 10000000)
%time get_sin(input_arr)
CPU times: user 10.2 s, sys: 332 ms, total: 10.5 s
Wall time: 11.1 s
Out[2]:
array([-0.99637969, -0.48288803, -0.71654059, ..., -0.62246327,
       -0.96061903, -0.86626307])
  • ipython adds some commands to add further enhancements to the interactivity of ipython. These commands begin with % and are known as magic commands.
  • %time gives information about the time taken to execute a python statement.
  • There are many built-in magic commands .. and as always, since the magic commands start with %, you can simply type % in one of the code blocks and press ? or Shift + <TAB> after it to get the docstring.
  • Remeber that these magic commands are specific only to ipython (and jupyter notebooks). These cannot be implemented in native python code.

Even though the above implementation is correct and might look optimized for people who are familiar with languages like C and Java, the above loop takes significant amount of time (check the total CPU times) and is horribly inefficient due to the reasons we mentioned above.

This is where Numpy's ufuncs come to save the day. NumPy provides a convenient interface into these kinds of statically typed, compiled routine. This is known as a vectorized operation. This can be accomplished by simply performing an operation on the array, which will then be applied to each element.

The vectorized approach is designed to push the loop part of the code into the compiled layer that underlies NumPy, leading to much faster execution.

Let's take a look at Numpy ufunc based solution for same example

In [3]:
input_arr = np.random.uniform(-np.pi, np.pi, 10000000)
%time np.sin(input_arr)
CPU times: user 165 ms, sys: 10.7 ms, total: 175 ms
Wall time: 194 ms
Out[3]:
array([ 0.84128428,  0.99238247,  0.80994785, ...,  0.21779299,
       -0.18411827,  0.25756921])

Thats much faster, right?

You can also use these ufuncs on multi-dimensional array.

In [4]:
arr = np.random.randint(1, 100, (3, 4))
# take reciprocal
print("Original Array: \n{}".format(arr), end="\n\n")
print("Reciprocal: \n{}".format(1/arr), end="\n\n")
Original Array: 
[[84 70 92 11]
 [64 93 16 41]
 [57 16  6 59]]

Reciprocal: 
[[0 0 0 0]
 [0 0 0 0]
 [0 0 0 0]]

03 - 02.01 Array Mathematics

.. 02.01.01 Arithmetic Operations

Python's native operators can be directly used as a convinient wrapper for Numpy's ufuncs to broadcast the operation over all the elements of that array.

In [5]:
x = np.arange(-5, 5)
print("x      =", x)
print("x + 10  =", x + 10) # wrapper for np.sum 
print("x - 10  =", x - 10) # wrapper for np.subtract
print("x * 4  =", x * 4)  # wrapper for np.multiply
print("x / 4  =", x / 4)  # wrapper for np.divide
print("x % 4  =", x % 4)  # wrapper for np.mod
print("x // 4 =", x // 4) # wrapper for np.floor_divide
print("x ** 2 =", x ** 2) # wrapper for np.power
print("abs(x) =", abs(x)) # wrapper for np.abs
x      = [-5 -4 -3 -2 -1  0  1  2  3  4]
x + 5  = [ 5  6  7  8  9 10 11 12 13 14]
x - 5  = [-15 -14 -13 -12 -11 -10  -9  -8  -7  -6]
x * 2  = [-20 -16 -12  -8  -4   0   4   8  12  16]
x / 2  = [-2 -1 -1 -1 -1  0  0  0  0  1]
x % 2  = [3 0 1 2 3 0 1 2 3 0]
x // 2 = [-2 -1 -1 -1 -1  0  0  0  0  1]
x ** 2 = [25 16  9  4  1  0  1  4  9 16]
abs(x) = [5 4 3 2 1 0 1 2 3 4]

The above operations have been performed on the array of a particular datatype and so the result will have the same datatype as the array that is being operated on. However when you perform any operation on an array that results in a different datatype or on multiple arrays of different datatypes, the type of the resulting array will correspond to the more precise one. This is also known as upcasting.

In the above example, check the output of division (/). Can you find the type of that array?

When standard mathematical operations are used with numpy arrays, they are applied on an element-by-element basis and a new array is created and filled with the result. This means that the arrays should be of same size when any mathematical operation is performed on them.

In [6]:
arr1 = np.array([1., 2., 3., 4.])
arr2 = np.linspace(4, 16, num=4)
print("Array1: \n{}".format(arr1), end="\n\n")
print("Array2: \n{}".format(arr2), end="\n\n")
print("\n Array2 - Array1: \n {}".format(arr2-arr1), end="\n\n")
Array1: 
[ 1.  2.  3.  4.]

Array2: 
[  4.   8.  12.  16.]


 Array2 - Array1: 
 [  3.   6.   9.  12.]

However, if there was a size mismatch, then we would receive a ValueError

In [7]:
arr2 = np.linspace(4, 16, num=3)
print("\n Array2 - Array1: \n {}".format(arr2-arr1), end="\n\n")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-a27f81dd39cf> in <module>()
      1 arr2 = np.linspace(4, 16, num=3)
----> 2 print("\n Array2 - Array1: \n {}".format(arr2-arr1), end="\n\n")

ValueError: operands could not be broadcast together with shapes (3,) (4,) 

Well you might wonder why was it that we did not get a broadcast error when we performed addition of a single number over an array.. We shall look at this in the module on Broadcasting.

.. 02.01.02 Trignometric Functions

Just like Arithemetic operations, Numpy provides a bunch of trignometric ufuncs. Lets take a look at some

In [8]:
input_arr = np.random.uniform(-1, 1, 5)
print("Input Array: \n{}".format(input_arr), end="\n\n")
print("sin: \n{}".format(np.sin(input_arr)), end="\n\n")
print("cos: \n{}".format(np.cos(input_arr)), end="\n\n")
print("tan: \n{}".format(np.tan(input_arr)), end="\n\n")
print("arcsin: \n{}".format(np.arcsin(input_arr)), end="\n\n")
print("arccos: \n{}".format(np.arccos(input_arr)), end="\n\n")
print("arctan: \n{}".format(np.arctan(input_arr)), end="\n\n")
Input Array: 
[-0.60242999 -0.10144717  0.47132173 -0.68604652 -0.76303577]

sin: 
[-0.56664636 -0.10127325  0.4540643  -0.63348311 -0.6911187 ]

cos: 
[ 0.8239611   0.99485865  0.89096892  0.77375652  0.72274127]

tan: 
[-0.68771008 -0.10179662  0.50962979 -0.81871118 -0.95624635]

arcsin: 
[-0.64654207 -0.10162198  0.4907888  -0.75604113 -0.86799692]

arccos: 
[ 2.21733839  1.67241831  1.08000752  2.32683745  2.43879325]

arctan: 
[-0.54220434 -0.10110128  0.44044292 -0.6012997  -0.65179193]

.. 02.01.03 Logarithmic Functions

Numpy provides logarithmic ufuncs for different bases

In [9]:
input_arr = np.random.randint(1, 7, 5)
print("x        =", input_arr)
print("ln(x)    =", np.log(input_arr))
print("log2(x)  =", np.log2(input_arr))
print("log10(x) =", np.log10(input_arr))
x        = [4 3 6 4 2]
ln(x)    = [ 1.38629436  1.09861229  1.79175947  1.38629436  0.69314718]
log2(x)  = [ 2.         1.5849625  2.5849625  2.         1.       ]
log10(x) = [ 0.60205999  0.47712125  0.77815125  0.60205999  0.30103   ]

Counterpart of Logs, we also have exponential ufuncs

In [10]:
input_arr = np.random.randint(1, 7, 5)
print("x     =", input_arr)
print("e^x   =", np.exp(input_arr))
print("2^x   =", np.exp2(input_arr))
print("10^x   =", np.power(10, input_arr))
x     = [5 3 3 6 1]
e^x   = [ 148.4131591    20.08553692   20.08553692  403.42879349    2.71828183]
2^x   = [ 32.   8.   8.  64.   2.]
10^x   = [ 100000    1000    1000 1000000      10]

Related

comments powered by Disqus