This is an old version of this page. You can view the most recent version or browse the history.

Python Workshop 3 NumPy

Introduction

I am going to go for a Raymond Hettinger style presentation, https://www.cs.odu.edu/~tkennedy/cs330/s21/Public/languageResources/#python-programming-videos.

These materials are web-centric (i.e., do not need to be printed and are available at https://www.cs.odu.edu/~tkennedy/numpy-workshop).

Who am I?

I have taught various courses, including:

CS 300T - Computers in Society
CS 333 - Programming and Problem Solving
CS 330 - Object Oriented Programming and Design
CS 350 - Introduction to Software Engineering
CS 410 - Professional Workforce Development I
CS 411W - Professional Workforce Development II
CS 417 - Computational Methods & Software

Most of my free time is spent writing Python 3 and Rust code, tweaking my Vim configuration, or learning a new (programming) language. My current language of interests are Rust (at the time of writing) and Python (specifically the NumPy library).

Referenced Courses & Materials

I may reference materials (e.g., lecture notes) and topics from various courses, including:

CS 330 - Object Oriented Programming & Design
CS 350 - Introduction to Software Engineering
CS 417 - Computational Methods & Software

I may also reference a couple examples from the previous:

Overview

What is NumPy?

NumPy is the fundamental package for scientific computing in Python. It is a Python library that provides a multidimensional array object, various derived objects (such as masked arrays and matrices), and an assortment of routines for fast operations on arrays, including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete Fourier transforms, basic linear algebra, basic statistical operations, random simulation and much more.

Retrieved from https://numpy.org/doc/stable/user/whatisnumpy.html

We can only scratch the surface during a one (1) hour workshop.

A Few Short Examples

My original intention was to focus on a few short (10 to 20 line) examples that focused on arrays and statistics... with a little linear algebra at the end.

If you are interested in a collection of problems to develop your NumPy understanding... Coffee Break NumPy by Christian Mayer is an excellent read. In fact... I read the book during the 2020 Winter Break.

Let us start with a few short NumPy basics (e.g., arrays and broadcasting) then build a Matrix Solver!.

Creating Arrays

The code snippets from this section are extracted from array_creation.py.

NumPy arrays can be...

Initialized to all zeroes

    array_size = 8
    zeroes_array = np.zeros(array_size)
    print(zeroes_array)
    print()

Initialized to all ones

    array_size = 12
    ones_array = np.ones(array_size)
    print(ones_array)
    print()

Allocated and left uninitialized

    # Contents are "whatever happens to be in memory"
    array_size = 16
    unitialized_array = np.empty(array_size)
    print(unitialized_array)
    print()

Created from a Python List of int-s

    python_list = [2, 4, 8, 16, 32, 64]
    np_array = np.array(python_list)
    print(np_array)
    print()

Created from a Python List of float-s

    python_list = [2., 4., 8., 16., 32., 64.]
    np_array = np.array(python_list)
    print(np_array)
    print()

Broadcasting

The next couple snippets are extracted from broadcasting.py.

In Python... each element of a list must be updated one at a time. If a list of prices needed to be reduced by 10%, each one would need to be multiplied by 0.9 within a loop...

    prices = [1.00, 2.95, 8.40, 3.50, 3.30, 16.91]

    for idx in range(len(prices)):
        price[idx] *= 0.9

or using a list comprehension...

    prices = [1.00, 2.95, 8.40, 3.50, 3.30, 16.91]
    prices = [0.9 * price for price in prices]

    print(prices)

NumPy's broadcasting mechanic allows us to write a simple prices *= 0.9.

    prices = np.array([1.00, 2.95, 8.40, 3.50, 3.30, 16.91])
    prices *= 0.9

    print(prices)

    print()
    print("*" * 80)
    print()

The obvious benefit is less typing. The more important one is optimization. NumPy's core is implemented in C. The official NumPy Documentation provides a succinct overview in its Why is NumPy Fast? section.

How much faster is NumPy? Let us run a quick using benchmark_broadcasting.py.

    num_values = 1000000
    num_runs = 100

    def op_wrapper_py():
        prices = range(1, num_values, 1)
        prices = [0.9 * price for price in prices]

    py_list = timeit.timeit(op_wrapper_py, number=num_runs)

    def op_wrapper_np():
        prices = np.arange(0, num_values, 1, dtype=np.float64)
        prices[:] *= 0.9

    np_array = timeit.timeit(op_wrapper_np, number=num_runs)

    print(f"Python Time: {py_list:.4f}")
    print(f"NumPy Time : {np_array:.4f}")

On a Core i7-6700k... For 1 million numbers, run 100 times... The NumPy code is a little over 10 times faster than the pure Python code.

	Time
Python	5.1248
NumPy	0.3168

Remaining Topics

There are a few more topics to introduce:

Indexing - will be covered as part of the Matrix Solver example
I/O
Index Arrays
Boolean (Mask) Index Arrays

Implementing a Matrix Solver

Linear Algebra

In CS 417/517 Computational Methods... I require students to implement a Matrix Solver... from scratch. NumPy provides implementations of matrix operations (e.g., multiplication).

Let us implement a quick NumPy based Matrix solver, for this Discrete Case Least Squares Approximation Problem.

import numpy as np

def print_matrices(matrix_XTX, matrix_XTY):
    """
    Print the XTX and XTY matrices
    """

    print("{:*^40}".format("XTX"))
    print(matrix_XTX)

    print()
    print("{:*^40}".format("XTY"))
    print(matrix_XTY)


def _backsolve(matrix_XTX, matrix_XTY):

    num_rows, _ = matrix_XTX.shape

    for i in reversed(range(1, num_rows)):
        for j in reversed(range(0, i)):
            s = matrix_XTX[j, i]

            matrix_XTX[j, i] -= (s * matrix_XTX[i, i])
            matrix_XTY[j] -= (s * matrix_XTY[i])


def solve_matrix(matrix_XTX, matrix_XTY):
    """
    Solve a matrix and return the resulting solution vector
    """

    # Get the dimensions (shape) of the XTX matrix
    num_rows, num_columns = matrix_XTX.shape

    for i in range(0, num_rows):
        # Find column with largest entry
        largest_idx = i
        current_col = i
        for j in range(i + 1, num_rows):

            if matrix_XTX[largest_idx, i] < matrix_XTX[j, current_col]:
                largest_idx = j

        # Swap
        if largest_idx != current_col:
            matrix_XTX[[i, largest_idx], :] = matrix_XTX[[largest_idx, i], :]
            matrix_XTY[[i, largest_idx]] = matrix_XTY[[largest_idx, i]]

        # Scale
        scaling_factor = matrix_XTX[i, i]
        matrix_XTX[i, :] /= scaling_factor
        matrix_XTY[i] /= scaling_factor

        # Eliminate
        for row_i in range(i + 1, num_rows):
            s = matrix_XTX[row_i][i]

            matrix_XTX[row_i] = matrix_XTX[row_i] - s * matrix_XTX[i]
            matrix_XTY[row_i] = matrix_XTY[row_i] - s * matrix_XTY[i]

        #  print("{:-^80}".format(f"Iteration #{i:}"))
        #  print_matrices(matrix_XTX, matrix_XTY)

    _backsolve(matrix_XTX, matrix_XTY)

    return matrix_XTY


def main():

    # Set up input data points, X, Y, and XT
    points = [(0., 0.), (1., 1.), (2., 4.)]

    matrix_X = np.array([[1., 0., 0.],
                         [1., 1., 1.],
                         [1., 2., 4.]])

    matrix_Y = np.array([0,
                         1,
                         4])

    matrix_XT = matrix_X.transpose()

    # Compute XTX and XTY
    matrix_XTX = np.matmul(matrix_XT, matrix_X)
    matrix_XTY = np.matmul(matrix_XT, matrix_Y)

    print_matrices(matrix_XTX, matrix_XTY)

    print()
    print("{:-^40}".format("Solution"))
    solution = solve_matrix(matrix_XTX, matrix_XTY)
    print(solution)


if __name__ == "__main__":
    main()

GitLab