Arrays through Numpy
Numpy is a Python library. But if you are new to all this, you might be asking, what is a library?
Arrays through Numpy
Numpy is a Python library. But if you are new to all this, you might be asking, what is a library?
To put it really simply (maybe a little too simply), libraries are just collections of scripts that you can use for specific purposes. You could technically code everything from scratch using just basic Python, but why re-invent the wheel when someone else has already done it before?
What else, other than Numpy?
For the analysis of data, the most common libraries that we will use, alongside what they are good for, are as follows:
Numpy — Arrays, linear algebra operations, Fourier transformations, and random number capabilities
Pandas — Data structures and functions for analyzing data — labeled array data structures; Series, DataFrame, and Panel, aggregation functions, datetime functions, I/O operations
Matplotlib — line plots, contour plots, scatter plots, and Basemap plots
Scikit-learn — supports various machine learning models, such as classification, regression, and clustering algorithms
Don’t worry too much about the technical terms above. We will get into them in subsequent articles. For now, let’s focus on Numpy.
If you are familiar with Matlab, then this is going to be familiar. Numpy allows you to the same matrices and matrix operations. If not, just google for an introduction to matrix algebra (Khan Academy, or any of the multiple sources out there).
Let’s first import Numpy and create our first array.
import numpy as np
new_array = np.array([[1,2],[3,4], [5,6]]) # creating a new 3x2 array - using a list of lists
new_arrayOutput
array([[1, 2],
[3, 4],
[5, 6]])We can check on the dimensions, shape and data type of the array easily.
new_array.ndim # dimensions will be 2
new_array.shape # shape of array will be (3,2), i.e. 3 x 2
len(new_array) # length of array's dimension will be 3
new_array.dtype # dtype will be int64There are also a whole set of helper functions that can be used to create arrays
# creating an empty array - beware that it is not exactly zero
np.empty([2,2], dtype=np.float64)# creating an identity matrix
np.eye(3, dtype=np.int)# create array of ones
np.ones([3,3], dtype=np.int)# create array of zeros
np.zeros([4,3])# array based on a range
np.arange(3,10)# fill array with specified number
np.full((3,3), 4, dtype=np.float32)Accessing element or elements within an array is simple.
a[0,2] # row 0, col 2
a[:,2] # all rows, col 2
a[2,:] # row 2, all colAnd there are a whole range of other functions which can be easily applied to Numpy matrices, which we outline in the Jupyter notebook here.
Another area we cover in the Jupyter notebook relates to the speed that Numpy offers.
Many a time, we will loop through a list to say, compute the mean of all numbers in the list. Most times, that can be pretty fast. But if the list is huge, using a for loop to do the computation can be really slow. Numpy is fantastic when it comes to speeding up such calculations.
# Let's generate a random sequence of numbers, could be treated as say, returns
import numpy.random as npr
randlist = []
npr.seed(1000)
for i in range(0, 10000000):
randlist.append(npr.standard_normal())With a for loop
%%time
rand_sum=0
for num in randlist:
rand_sum = rand_sum + num
print("Mean is {0:.5f}".format(rand_sum/len(randlist)))Output
Mean is 0.00021
Wall time: 1.46 sWith Numpy array
%%time
np_avg = np.mean(randlist)
print("Mean is %.5f" % np_avg)Output
Mean is 0.00021
Wall time: 364 msThe Jupyter notebook with the code, and some other useful tips is here


