Introduction
I am going to go for a Raymond Hettinger style presentation, https://www.cs.odu.edu/~tkennedy/cs330/s21/Public/languageResources/#python-programming-videos.
These materials are web-centric (i.e., do not need to be printed and are available at https://www.cs.odu.edu/~tkennedy/numpy-workshop).
Who am I?
I have taught various courses, including:
- CS 300T - Computers in Society
- CS 333 - Programming and Problem Solving
- CS 330 - Object Oriented Programming and Design
- CS 350 - Introduction to Software Engineering
- CS 410 - Professional Workforce Development I
- CS 411W - Professional Workforce Development II
- CS 417 - Computational Methods & Software
Most of my free time is spent writing Python 3 and Rust code, tweaking my Vim configuration, or learning a new (programming) language. My current language of interests are Rust (at the time of writing) and Python (specifically the NumPy library).
Referenced Courses & Materials
I may reference materials (e.g., lecture notes) and topics from various courses, including:
- CS 330 - Object Oriented Programming & Design
- CS 350 - Introduction to Software Engineering
- CS 417 - Computational Methods & Software
I may also reference a couple examples from the previous:
Overview
What is NumPy?
NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.
Retrieved from https://numpy.org/doc/stable/user/whatisnumpy.html
We can only scratch the surface during a one (1) hour workshop.
A Few Short Examples
My original intention was to focus on a few short (10 to 20 line) examples that focused on arrays and statistics... with a little linear algebra at the end.
If you are interested in a collection of problems to develop your NumPy understanding... Coffee Break NumPy by Christian Mayer is an excellent read. In fact... I read the book during the 2020 Winter Break.
Let us start with a few short NumPy basics (e.g., arrays and broadcasting) then build a Matrix Solver!.
Creating Arrays
The code snippets from this section are extracted from array_creation.py.
NumPy arrays can be...
-
Initialized to all zeroes
array_size = 8 zeroes_array = np.zeros(array_size) print(zeroes_array) print()
-
Initialized to all ones
array_size = 12 ones_array = np.ones(array_size) print(ones_array) print()
-
Allocated and left uninitialized
# Contents are "whatever happens to be in memory" array_size = 16 unitialized_array = np.empty(array_size) print(unitialized_array) print()
-
Created from a Python List of
int
-spython_list = [2, 4, 8, 16, 32, 64] np_array = np.array(python_list) print(np_array) print()
-
Created from a Python List of
float
-spython_list = [2., 4., 8., 16., 32., 64.] np_array = np.array(python_list) print(np_array) print()
Broadcasting
The next couple snippets are extracted from broadcasting.py.
In Python... each element of a list must be updated one at a time. If a list of prices needed to be reduced by 10%, each one would need to be multiplied by 0.9 within a loop...
prices = [1.00, 2.95, 8.40, 3.50, 3.30, 16.91]
for idx in range(len(prices)):
price[idx] *= 0.9
or using a list comprehension...
prices = [1.00, 2.95, 8.40, 3.50, 3.30, 16.91]
prices = [0.9 * price for price in prices]
print(prices)
NumPy's broadcasting mechanic allows us to write a simple prices *= 0.9
.
prices = np.array([1.00, 2.95, 8.40, 3.50, 3.30, 16.91])
prices *= 0.9
print(prices)
print()
print("*" * 80)
print()
The obvious benefit is less typing. The more important one is optimization. NumPy's core is implemented in C. The official NumPy Documentation provides a succinct overview in its Why is NumPy Fast? section.
How much faster is NumPy? Let us run a quick using benchmark_broadcasting.py.
num_values = 1000000
num_runs = 100
def op_wrapper_py():
prices = range(1, num_values, 1)
prices = [0.9 * price for price in prices]
py_list = timeit.timeit(op_wrapper_py, number=num_runs)
def op_wrapper_np():
prices = np.arange(0, num_values, 1, dtype=np.float64)
prices[:] *= 0.9
np_array = timeit.timeit(op_wrapper_np, number=num_runs)
print(f"Python Time: {py_list:.4f}")
print(f"NumPy Time : {np_array:.4f}")
On a Core i7-6700k... For 1 million numbers, run 100 times... The NumPy code is a little over 10 times faster than the pure Python code.
Time | |
---|---|
Python | 5.1248 |
NumPy | 0.3168 |
Remaining Topics
There are a few more topics to introduce:
- Indexing - will be covered as part of the Matrix Solver example
- I/O
- Index Arrays
- Boolean (Mask) Index Arrays
Implementing a Matrix Solver
Linear Algebra
In CS 417/517 Computational Methods... I require students to implement a Matrix Solver... from scratch. NumPy provides implementations of matrix operations (e.g., multiplication).
Let us implement a quick NumPy based Matrix solver, for this Discrete Case Least Squares Approximation Problem.
import numpy as np
def print_matrices(matrix_XTX, matrix_XTY):
"""
Print the XTX and XTY matrices
"""
print("{:*^40}".format("XTX"))
print(matrix_XTX)
print()
print("{:*^40}".format("XTY"))
print(matrix_XTY)
def _backsolve(matrix_XTX, matrix_XTY):
num_rows, _ = matrix_XTX.shape
for i in reversed(range(1, num_rows)):
for j in reversed(range(0, i)):
s = matrix_XTX[j, i]
matrix_XTX[j, i] -= (s * matrix_XTX[i, i])
matrix_XTY[j] -= (s * matrix_XTY[i])
def solve_matrix(matrix_XTX, matrix_XTY):
"""
Solve a matrix and return the resulting solution vector
"""
# Get the dimensions (shape) of the XTX matrix
num_rows, num_columns = matrix_XTX.shape
for i in range(0, num_rows):
# Find column with largest entry
largest_idx = i
current_col = i
for j in range(i + 1, num_rows):
if matrix_XTX[largest_idx, i] < matrix_XTX[j, current_col]:
largest_idx = j
# Swap
if largest_idx != current_col:
matrix_XTX[[i, largest_idx], :] = matrix_XTX[[largest_idx, i], :]
matrix_XTY[[i, largest_idx]] = matrix_XTY[[largest_idx, i]]
# Scale
scaling_factor = matrix_XTX[i, i]
matrix_XTX[i, :] /= scaling_factor
matrix_XTY[i] /= scaling_factor
# Eliminate
for row_i in range(i + 1, num_rows):
s = matrix_XTX[row_i][i]
matrix_XTX[row_i] = matrix_XTX[row_i] - s * matrix_XTX[i]
matrix_XTY[row_i] = matrix_XTY[row_i] - s * matrix_XTY[i]
# print("{:-^80}".format(f"Iteration #{i:}"))
# print_matrices(matrix_XTX, matrix_XTY)
_backsolve(matrix_XTX, matrix_XTY)
return matrix_XTY
def main():
# Set up input data points, X, Y, and XT
points = [(0., 0.), (1., 1.), (2., 4.)]
matrix_X = np.array([[1., 0., 0.],
[1., 1., 1.],
[1., 2., 4.]])
matrix_Y = np.array([0,
1,
4])
matrix_XT = matrix_X.transpose()
# Compute XTX and XTY
matrix_XTX = np.matmul(matrix_XT, matrix_X)
matrix_XTY = np.matmul(matrix_XT, matrix_Y)
print_matrices(matrix_XTX, matrix_XTY)
print()
print("{:-^40}".format("Solution"))
solution = solve_matrix(matrix_XTX, matrix_XTY)
print(solution)
if __name__ == "__main__":
main()