This is an old version of this page. You can view the most recent version or browse the history.

Python Workshop

I am going to go for a Raymond Hettinger style presentation, https://www.cs.odu.edu/~tkennedy/cs330/f20/Public/languageResources/#python-programming-videos.

These materials are web-centric (i.e., do not need to be printed and are available at https://www.cs.odu.edu/~tkennedy/python-workshop).

Who am I?

I have taught various courses, including:

CS 300T - Computers in Society
CS 333 - Programming and Problem Solving
CS 330 - Object Oriented Programming and Design
CS 350 - Introduction to Software Engineering
CS 410 - Professional Workforce Development I
CS 411W - Professional Workforce Development II
CS 417 - Computational Methods & Software

Most of my free time is spent writing Python 3 and Rust code, tweaking my Vim configuration, or learning a new (programming) language. My current language of interest is Rust (at the time of writing).

Referenced Courses & Materials

I am going to pull from CS 330, CS 350, CS 411W, and CS 417 lecture notes

CS 330 - Object Oriented Programming & Design
- S.O.L.I.D
- Iterators
CS 350 - Introduction to Software Engineering
CS 411W - Professional Workforce Development II
- Git Review
CS 417 - Computational Methods & Software
- Python non-linear solver discussion

I will also pull a couple examples from my previous Git workshop, https://www.cs.odu.edu/~tkennedy/git-workshop. i

The Broad Strokes

T.B.W

Topics

~~A quick overview of procedural, object-oriented and functional style programming (very briefly).~~
~~PEP 8, https://www.python.org/dev/peps/pep-0008/ and PEP 20, https://www.python.org/dev/peps/pep-0020/~~
~~Loops, context managers, and list/generator comprehensions, https://www.cs.odu.edu/~tkennedy/cs330/f20/Public/switchingToPython/index.html.~~
~~The basic Python data structures (List, Dictionary, and Set)~~
Generator Expressions
A few Python modules, including Zip, json, and argparse, https://www.cs.odu.edu/~tkennedy/cs330/f20/Public/switchingToPython/index.html#python-includes-batteries
Writing Pythonic code (e.g., using enumerate), https://www.cs.odu.edu/~tkennedy/cs330/f20/Public/whichLanguageIsIt/index.html#a-little-python.
Time permitting... a little unit testing.

Procedural, Object-Oriented & Functional Programming

There are generally three styles of code found in Python.

Procedural


point1 = (0, 5)
point2 = (8, 3)
point3 = (1, 7)

points = [point1, point2, point3]

for point in points:
    print(sqrt(point.x ** 2 + point.y ** 2))

Object Oriented

class Point:

    def __init__(self, x, y):
        self.x = x
        self.y = y

    def __eq__(self):
        pass

    def __hash__(self):
        pass

    def __str__(self):
        pass

    def magnitude(self)
        return sqrt(self.x ** 2 + self.y ** 2)

Functional

point1 = (0, 5)
point2 = (8, 3)
point3 = (1, 7)

points = [point1, point2, point3]

shortest_distance = min((sqrt(point.x ** 2 +  point.y ** 2)) for point in points)
largest_distance = max((sqrt(point.x ** 2 +  point.y ** 2)) for point in points)
average_distance = sum((sqrt(point.x ** 2 +  point.y ** 2)) for point in points) / len(points)

Of course... we should clean it up...

points = [(0, 5), (8, 3), (1, 7)]

distances = [sqrt(point.x ** 2 +  point.y ** 2) for point in points]

shortest_distance = min(distances)
largest_distance  = max(distances)
average_distance  = sum(distances) / len(points)

None of these examples are particularly well written...

No __main__
No documentation
Everything is in "main"

Pythonic Code & "Good" Code

There are quite a few "rules" when writing Pythonic code... starting with...

Style - PEP 8
Zen of Python - PEP 20

Other rules come from the community...

Always have a if __name__ == "__main__".
Use f-strings over format where possible.
Use with closures
Write pydoc style documentation.
Use functions and modules.
No global variables.
Do not always use object-oriented design.
Do not forget about the Python GIL

Other rules come from general software engineering practices:

Follow Test Driven Development (TDD)
Use top down design
Do not write monolithic functions
Use a code linter or style checker (e.g., pylint)
Use self-dcoumenting name
Remember S.O.L.I.D.
Iterators are magic (CS 330).

Data Structures

When I work in Python, I generally focus on three core (fundamental) data structures.

Lists: prime_numbers = [1, 2, 3, 5, 7, 11, 13, 17, 19]
Dictionaries: favourite_colors = {"Thomas": "Blue", "Jessica": "Purple"}
- collections.defaultdict
- collections.Counter
Sets: some_colors = {"Blue", "Red", "Green", "Cyan", "Teal"}

If we want to map these to (modern) C++, Java, and Rust... we end up with...

Python	C++	Java	Rust
`list`	`std::list`	`java.util.List`	`std::collections::LinkedList`
`dict`	`std::unordered_map`	`java.util.HashMap`	`std::collections::HashMap`
`set`	`std::unordered_set`	`java.util.HashSet`	`std::collections::HashSet`

I am not listing tuple as a true data structure.

Loops and List/Generator Comprehensions

This section is based on notes from CS 330 Object Oriented Programming & Design.

Lists & List Comprehensions

The next few discussions will include list comprehensions, dictionary comprehensions and set comprehensions.

Suppose we have a list of programming terms and want to create a second list containing the length of each term. We might take the usual C, C++, or Java approach:

Word Count - Boring C++ Loop

using std::string;
using std::vector;


int main(int argc, char** argv)
{
    vector<string> some_terms {"Hello", "world", "with", "for", "while", "int"};
    vector<int> term_lengths(some_terms.size(), 0);

    for (int i = 0; i < term_lengths.size(); i++) {
        term_lengths[i] = some_terms[i].size();
    }

    return 0;
}

and translate it into Python:

Word Count - Boring Python Loop

def main():
    some_terms = ["Hello", "world", "with", "for", "while", "int"]

    term_lengths = []

    for term in some_terms:
        term_lengths.append(len(term))


if __name__ == "__main__":
    main()

The Python version can (and should) use a list comprehension.

Word Count - Fun Python Loop

def main():
    some_terms = ["Hello", "world", "with", "for", "while", "int"]

    term_lengths = [len(term) for term in some_terms]


if __name__ == "__main__":
    main()

Depending on how many terms we have... a generator expression might be more appropriate:

Word Count - Really Fun Python Loop

def main():
    some_terms = ["Hello", "world", "with", "for", "while", "int"]

    term_lengths = (len(term) for term in some_terms)


if __name__ == "__main__":
    main()

Modern C++ and std::transform

Modern C++11 and newer provide the std::transform method. Combined with lambda functions we can take the original C++ code... and rewrite it as

Word Count - C++ std::transform

using std::string;
using std::vector;


int main(int argc, char** argv)
{
    vector<string> some_terms {"Hello", "world", "with", "for", "while", "int"};
    vector<int> term_lengths;

    std::transform(some_terms.begin(), some_terms.end(), std::back_inserter(term_lengths),
                   [](const string& t) -> int {
                       return t.size();
                   });

    return 0;
}

Java has the java.util.stream package, which provides similar functionality to Python comprehensions and C++ std::transform. However, in Java, we would end up dealing with the Integer wrapper class if we wanted to use a non-array data structure.

Word Count - Java Streams

import java.util.Arrays;
import java.util.List;

public class IntStreamDemo
{
    public static void main(String... args)
    {
        List<String> some_terms = Arrays.asList("Hello", "world", "with",
                                                "for", "while", "int");

        int[] term_lengths = some_terms.stream()
                           .mapToInt(s -> s.length())
                           .toArray();
    }
}

The Python implementation is the most succinct, approachable, and readable.

Context Managers

Python provides the with statement (construct). This allows the setup and teardown involved in using resources (e.g., files, sockets, and database connections) to handled elsewhere.

This has two main benefits:

There is less boilerplate code.
It is impossible to forget to close/deallocate a resource.

To write to a file, one might write:

Python File IO - Basic

text_file = open("some_file.txt", "w")

for number in range(1, 100):
    text_file.write(f"{number}\n")

Did you notice the missing fclose(text_file)? With one small with the file close operation will be handled automatically.

Python File IO - Using with

with open("some_file.txt", "w") as text_file:
    for number in range(1, 100):
        text_file.write(f"{number}\n")

This also works for other types of files--including compressed files.

Python File IO - Using with and gzip

import gzip

with gzip.open("some_file.txt.gz", "wt") as text_file:
    for number in range(1, 100):
        text_file.write(f"{number}\n")

Python Includes Batteries

For many languages external libraries are usually required for common operations. Python includes batteries.

Operation	Built-in Python Module
Zip Files	`import zipfile`
GZipped Files	`import gzip`
Reading, writing, or generating JSON	`import json`
Converting objects to JSON	`import json`
Serializing objects and data structures	`import pickle`
Working with time	`import time`
Working with dates and time	`import datetime`
Working with SQLite	`import sqlite3`
Building a calendar	`import calendar`
Generating log files	`import logfile`
Advanced command line arguments	`import argparse`

Libraries & pip

When external libraries are required, the Python pip utility and a requirements.txt can be used for all dependency and configuration management.

In C/C++ we hope for a Linux environment (or Docker). In Java... Gradle is a popular build and configuration management tool.

Examples & Case Studies

Monte Carlo Integration

This program is from my offering of CS 417/517 Computational Methods](https://www.cs.odu.edu/~tkennedy/cs417/f20/Directory/outline/index.html).

Let us start with the top:

#! /usr/bin/env python3

import random
import sys

from typing import (Callable, Tuple)

Point = Tuple[float, float]

This program uses three Python modules:

random for random number generation
sys for command line arguments (i.e., sys.argv)
typing for type hints

The last line (i.e., Point = Tuple[float, float] is a type alias.

I am a stickler for type hints and function/method documentation. Anytime code is written... it must be documented at the API level. While Python type hints do no necessarily gain us a performance benefit, type hints increase readability. Type hints are an important part of documentation.

Point Generation

Let us tackle the point generation function (generate_random_points).

def generate_random_points(f: Callable,
                           lower_limit: float,
                           upper_limit: float,
                           n: int) -> Point:
    """
    Generate a sequence of random x values and plug them into f(x).

    Args:
        f: mathematical function
        lower_limit: 'a' the lower bound
        upper_bound: 'b' the upper bound
        n: number of points to generate

    Yields:
        A sequence of points in the form (x, f(x))
    """

    for _ in range(0, n):
        x = random.uniform(lower_limit, upper_limit)
        y = f(x)

        yield (x, y)

This function has full pydoc documentation, complete with:

complete description
explanation of arguments
explnation of yield-ed values

...and type hints!

Take particular note of for _ in range(0, n). The underscore _ can be used any time a variable is required syntactically, but the value will be ignored.

The Main Function

Always wrap your main/driver code in a main function. This will prevent variables from ending up in the global/module namespace... which can (will) lead to frustrating bugs later.

Let us start with a naive main function, one that has quite a bit of room for improvement.

def naive_main():
    """
    This is a "naive" main function used to demonstrate the basic premise
    behind Monte Carlo integration.
    """

    num_points = int(sys.argv[1])
    limit_a = float(sys.argv[2])
    limit_b = float(sys.argv[3])

    math_f = lambda x: x**2
    #  math_f = lambda x: cos(x)

    print("{:-^80}".format("Points"), file=sys.stderr)

    temp_sum = 0
    for i, point in enumerate(generate_random_points(math_f, limit_a, limit_b, num_points)):
        print(f"{i:5d} - ({point[0]:>12.8f}, {point[1]:>12.8f})", file=sys.stderr)

        temp_sum += point[1]

    integral_result = (limit_b - limit_a) / float(num_points) * temp_sum

    print(f"{integral_result:16.8f}")

The first three (3) lines

    num_points = int(sys.argv[1])
    limit_a = float(sys.argv[2])
    limit_b = float(sys.argv[3])

grab command line arguments and parse them into int or float values.

Next... I defined a lambda function. This is the mathematical function f(x) that will be integrated.

    math_f = lambda x: x**2

Note that any line that includes file=sys.stderr is debugging output. By convention (in C, C++, Java, Python, and Rust) production output is written to standard out and debugging output is written to standard error.

The rest of the function is not very Pythonic...

    temp_sum = 0
    for i, point in enumerate(generate_random_points(math_f, limit_a, limit_b, num_points)):
        print(f"{i:5d} - ({point[0]:>12.8f}, {point[1]:>12.8f})", file=sys.stderr)

        temp_sum += point[1]

    integral_result = (limit_b - limit_a) / float(num_points) * temp_sum

    print(f"{integral_result:16.8f}")

There is:

a temporary sum variable temp_sum
a line over 80 characters in length
an increment operation (temp_sum += point[1])

The next version of main (i.e., not_so_naive_main) corrects a few style and design issues.

def not_so_naive_main():
    """
    This main function demonstrates the more "Pythonic" approach
    """

    num_points = int(sys.argv[1])
    limit_a = float(sys.argv[2])
    limit_b = float(sys.argv[3])

    math_f = lambda x: x**2
    #  math_f = lambda x: cos(x)

    point_sequence = generate_random_points(math_f, limit_a, limit_b, num_points)
    f_of_x_values = (y for x, y in point_sequence)

    integral_result = ((limit_b - limit_a) /
                       float(num_points) *
                       sum(f_of_x_values))

    print(f"{integral_result:16.8f}")

Instead of looping over all the points

    for i, point in enumerate(generate_random_points(math_f, limit_a, limit_b, num_points)):

the generator is assigned to a variable:

    point_sequence = generate_random_points(math_f, limit_a, limit_b, num_points)

Since we only need the y values from each point... an inline generator expression can be used

    f_of_x_values = (y for x, y in point_sequence)

This leads to a far more concise and readable computation.

    integral_result = ((limit_b - limit_a) /
                       float(num_points) *
                       sum(f_of_x_values))

def main_without_a_table_flip():
    """
    This main demonstrates the impact of the number of points on Monte Carlo
    integration
    """

    num_points = int(sys.argv[1])  # Unused in this version of main
    limit_a = float(sys.argv[2])
    limit_b = float(sys.argv[3])
    max_magnitude = int(sys.argv[4])

    math_f = lambda x: x**2

    print("| {:^16} | {:^20} |".format("# Points", "Est. f(x)"))

    max_num_points = 2 ** max_magnitude
    point_sequence = list(generate_random_points(math_f, limit_a, limit_b, max_num_points))

    for magnitude in range(0, max_magnitude + 1):
        num_points = 2 ** magnitude

        f_of_x_values = (y for x, y in point_sequence[:num_points])

        integral_result = ((limit_b - limit_a) /
                           float(num_points) *
                           sum(f_of_x_values))

        print(f"| {num_points:>16} | {integral_result:^20.8f} |")


if __name__ == "__main__":
    #  naive_main()
    #  not_so_naive_main()
    main_without_a_table_flip()

GitLab