Collab Link

Click above image to access the interactive version of this notebook

Ufuncs

Python’s default implementation does some operations slowly. This is in part due to the dynamic and interpreted nature of the language. It is this feature that allows types to be flexible but since the type has to be checked at every operation, the sequences of operations cannot be compiled down to efficient machine code as in languages like C. Let’s take a look at Python native implementation of this:

from __future__ import print_function
import numpy as np
def get_sin(arr):
    # Create an empty output array of same size as input
    output = np.empty_like(arr)
    for i in range(len(output)):
        output[i] = np.sin(arr[i])
    return output
input_arr = np.random.uniform(-np.pi, np.pi, 10000000)
%time get_sin(input_arr)
CPU times: user 13 s, sys: 53.9 ms, total: 13.1 s
Wall time: 13.1 s





array([-0.64745651, -0.99996545,  0.99756044, ..., -0.96197339,
        0.85711429,  0.99999453])

Even though the above implementation is correct and might look optimized for people who are familiar with languages like C and Java, the above loop takes significant amount of time (check the total CPU times) and is horribly inefficient due to the reasons we mentioned above.

This is where NumPy’s ufuncs come to save the day. NumPy provides a convenient interface into these kinds of statically-typed, compiled routine. This is known as a vectorized operation. This can be accomplished by simply performing an operation on the array, which will then be applied to each element.

The vectorized approach is designed to push the loop part of the code into the compiled layer that underlies NumPy, leading to much faster execution.

Let’s take a look at the NumPy ufunc based solution for the same example

input_arr = np.random.uniform(-np.pi, np.pi, 10000000)
%time np.sin(input_arr)
CPU times: user 133 ms, sys: 450 µs, total: 133 ms
Wall time: 134 ms





array([-0.9657678 ,  0.79841143,  0.28326751, ..., -0.88320409,
       -0.79466346,  0.70549672])

Thats much faster, right?

You can also use these ufuncs on multi-dimensional array.

arr = np.random.randint(1, 100, (3, 4))
# take reciprocal
print("Original Array: \n{}".format(arr), end="\n\n")
print("Reciprocal: \n{}".format(1/arr), end="\n\n")
Original Array: 
[[96 45 96 63]
 [18 65 61 42]
 [31 21 65 19]]

Reciprocal: 
[[0.01041667 0.02222222 0.01041667 0.01587302]
 [0.05555556 0.01538462 0.01639344 0.02380952]
 [0.03225806 0.04761905 0.01538462 0.05263158]]

Array Mathematics

Arithmetic Operations

Python’s native operators can be directly used as a convenient wrapper for NumPy’s ufuncs to broadcast the operation over all the elements of that array.

x = np.arange(-5, 5)
print("x      =", x)
print("x + 10  =", x + 10) # wrapper for np.sum 
print("x - 10  =", x - 10) # wrapper for np.subtract
print("x * 4  =", x * 4)  # wrapper for np.multiply
print("x / 4  =", x / 4)  # wrapper for np.divide
print("x % 4  =", x % 4)  # wrapper for np.mod
print("x // 4 =", x // 4) # wrapper for np.floor_divide
print("x ** 2 =", x ** 2) # wrapper for np.power
print("abs(x) =", abs(x)) # wrapper for np.abs
x      = [-5 -4 -3 -2 -1  0  1  2  3  4]
x + 10  = [ 5  6  7  8  9 10 11 12 13 14]
x - 10  = [-15 -14 -13 -12 -11 -10  -9  -8  -7  -6]
x * 4  = [-20 -16 -12  -8  -4   0   4   8  12  16]
x / 4  = [-1.25 -1.   -0.75 -0.5  -0.25  0.    0.25  0.5   0.75  1.  ]
x % 4  = [3 0 1 2 3 0 1 2 3 0]
x // 4 = [-2 -1 -1 -1 -1  0  0  0  0  1]
x ** 2 = [25 16  9  4  1  0  1  4  9 16]
abs(x) = [5 4 3 2 1 0 1 2 3 4]

The above operations have been performed on the array of a particular datatype and so the result will have the same datatype as the array that is being operated on. However when you perform any operation on an array that results in a different datatype or on multiple arrays of different datatypes, the type of the resulting array will correspond to the more precise one. This is also known as upcasting.

In the above example, check the output of division (/). Can you find the type of that array?

When standard mathematical operations are used with numpy arrays, they are applied on an element-by-element basis and a new array is created and filled with the result. This means that the arrays should be of same size when any mathematical operation is performed on them.

arr1 = np.array([1., 2., 3., 4.])
arr2 = np.linspace(4, 16, num=4)
print("Array1: \n{}".format(arr1), end="\n\n")
print("Array2: \n{}".format(arr2), end="\n\n")
print("\n Array2 - Array1: \n {}".format(arr2-arr1), end="\n\n")
Array1: 
[1. 2. 3. 4.]

Array2: 
[ 4.  8. 12. 16.]


 Array2 - Array1: 
 [ 3.  6.  9. 12.]

However, if there was a size mismatch, then we would receive a ValueError

arr2 = np.linspace(4, 16, num=3)
print("\n Array2 - Array1: \n {}".format(arr2-arr1), end="\n\n")
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-7-6db414b41f4e> in <module>
      1 arr2 = np.linspace(4, 16, num=3)
----> 2 print("\n Array2 - Array1: \n {}".format(arr2-arr1), end="\n\n")


ValueError: operands could not be broadcast together with shapes (3,) (4,) 

Well you might wonder why was it that we did not get a broadcast error when we performed addition of a single number over an array. We shall look at this in the module on Broadcasting.

Trignometric Functions

Just like Arithmetic operations, NumPy provides a bunch of trigonometric ufuncs. Let’s take a look at some.

input_arr = np.random.uniform(-1, 1, 5)
print("Input Array: \n{}".format(input_arr), end="\n\n")
print("sin: \n{}".format(np.sin(input_arr)), end="\n\n")
print("cos: \n{}".format(np.cos(input_arr)), end="\n\n")
print("tan: \n{}".format(np.tan(input_arr)), end="\n\n")
print("arcsin: \n{}".format(np.arcsin(input_arr)), end="\n\n")
print("arccos: \n{}".format(np.arccos(input_arr)), end="\n\n")
print("arctan: \n{}".format(np.arctan(input_arr)), end="\n\n")
Input Array: 
[-0.95633706 -0.49033535 -0.48827774  0.5121991   0.16100066]

sin: 
[-0.81708531 -0.47092175 -0.46910558  0.49009532  0.16030601]

cos: 
[0.57651678 0.88217498 0.88314209 0.87166885 0.98706737]

tan: 
[-1.4172793  -0.53381898 -0.53117792  0.56224943  0.16240635]

arcsin: 
[-1.27420085 -0.51247449 -0.51011515  0.53774331  0.16170446]

arccos: 
[2.84499718 2.08327082 2.08091147 1.03305302 1.40909187]

arctan: 
[-0.76308315 -0.45588604 -0.4542259   0.47335919  0.15963079]

Logarithmic Functions

Numpy provides logarithmic ufuncs for different `bases'

input_arr = np.random.randint(1, 7, 5)
print("x        =", input_arr)
print("ln(x)    =", np.log(input_arr))
print("log2(x)  =", np.log2(input_arr))
print("log10(x) =", np.log10(input_arr))
x        = [5 6 4 6 2]
ln(x)    = [1.60943791 1.79175947 1.38629436 1.79175947 0.69314718]
log2(x)  = [2.32192809 2.5849625  2.         2.5849625  1.        ]
log10(x) = [0.69897    0.77815125 0.60205999 0.77815125 0.30103   ]

Counterpart of Logs, we also have exponential ufuncs

input_arr = np.random.randint(1, 7, 5)
print("x     =", input_arr)
print("e^x   =", np.exp(input_arr))
print("2^x   =", np.exp2(input_arr))
print("10^x   =", np.power(10, input_arr))
x     = [5 1 4 4 2]
e^x   = [148.4131591    2.71828183  54.59815003  54.59815003   7.3890561 ]
2^x   = [32.  2. 16. 16.  4.]
10^x   = [100000     10  10000  10000    100]
Mohit Sharma
Mohit Sharma
Senior Software Development Engineer, DevOps

DevOps engineer with a strong Linux background and over a decade of experience designing, automating and managing mission critical infrastructure deployments by leveraging SRE principles and other DevOps processes. Expert in scripting using python with an emphasis on real-time, high speed data pipelines and distributed computing across networks.