I am going to go for a Raymond Hettinger style presentation, https://www.cs.odu.edu/~tkennedy/cs330/f20/Public/languageResources/#python-programming-videos.
These materials are web-centric (i.e., do not need to be printed and are available at https://www.cs.odu.edu/~tkennedy/python-workshop).
Who am I?
I have taught various courses, including:
- CS 300T - Computers in Society
- CS 333 - Programming and Problem Solving
- CS 330 - Object Oriented Programming and Design
- CS 350 - Introduction to Software Engineering
- CS 410 - Professional Workforce Development I
- CS 411W - Professional Workforce Development II
- CS 417 - Computational Methods & Software
Most of my free time is spent writing Python 3 and Rust code, tweaking my Vim configuration, or learning a new (programming) language. My current language of interest is Rust (at the time of writing).
Referenced Courses & Materials
I am going to pull from CS 330, CS 350, CS 411W, and CS 417 lecture notes.
- CS 330 - Object Oriented Programming & Design
- CS 350 - Introduction to Software Engineering
- CS 417 - Computational Methods & Software
I will also pull a couple examples from the previous (Git) workshop, https://www.cs.odu.edu/~tkennedy/git-workshop.
The Broad Strokes
This workshop is intended as discussion on how to write more concise, idiomatic (Pythonic), readable, and reusable Python code.
Procedural, Object-Oriented & Functional Programming
There are generally three styles of code found in Python.
Procedural
point1 = (0, 5)
point2 = (8, 3)
point3 = (1, 7)
points = [point1, point2, point3]
for point in points:
print(sqrt(point[0] ** 2 + point[1] ** 2))
Procedural style code tends to be how most quick Python programs are written.
Object Oriented
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self):
pass
def __hash__(self):
pass
def __str__(self):
pass
def magnitude(self)
return sqrt(self.x ** 2 + self.y ** 2)
A proper discussion of object oriented Python would require explanation of the rules of a class checklist.
C++ | Java | Python 3 | Rust |
---|---|---|---|
Default Constructor | Default Constructor | __init__ |
new() or Default trait |
Copy Constructor | Clone and/or Copy Constructor | __deepcopy__ |
Clone trait |
Destructor | |||
finalize (deprecated/discouraged) | __del__ |
Drop trait |
|
Assignment Operator (=) | |||
Accessors (Getters) | Accessors (Getters) | Accessors (@property ) |
Accessors (Getters) |
Mutators (Setters) | Mutators (Setters) | Setter (@attribute.setter ) |
Mutators (setters) |
Swap | |||
Logical Equivalence Operator (==) | equals | __eq__ |
std::cmp::PartialEq trait |
Less-Than / Comes-Before Operator (<) | hashCode | __hash__ |
std::cmp::PartialOrd trait |
std::hash (actual hashing)
|
hashCode | __hash__ |
std::hash::Hash trait |
Stream Insertion Operator (<<) | toString | __str__ |
std::fmt::Display trait |
__repr__ |
std::fmt::Debug trait |
||
begin() and end()
|
iterator |
__iter__ |
iter() and iter_mut()
|
Functional
point1 = (0, 5)
point2 = (8, 3)
point3 = (1, 7)
points = [point1, point2, point3]
shortest_distance = min((sqrt(point[0] ** 2 + point[1] ** 2)) for point in points)
largest_distance = max((sqrt(point[0] ** 2 + point[1] ** 2)) for point in points)
average_distance = sum((sqrt(point[0] ** 2 + point[1] ** 2)) for point in points) / len(points)
Of course... we should clean it up...
points = [(0, 5), (8, 3), (1, 7)]
distances = [sqrt(point[0] ** 2 + point[1] ** 2) for point in points]
shortest_distance = min(distances)
largest_distance = max(distances)
average_distance = sum(distances) / len(points)
None of these examples are particularly well written...
- No
__main__
- No documentation
- Everything is in
"main"
Pythonic Code & "Good" Code
There are quite a few "rules" when writing Pythonic code... starting with...
Other rules come from the community...
- Always have a
if __name__ == "__main__"
. - Use f-strings over format where possible.
- Use
with
closures. - Write pydoc style documentation.
- Use functions and modules.
- Avoid global variables.
- Do not always use object-oriented design.
- Do not forget about the Python GIL.
Other rules come from general software engineering practices:
- Follow Test Driven Development (TDD).
- Use top down design.
- Do not write monolithic functions.
- Use a code linter or style checker (e.g., pylint).
- Use self-documenting names.
- Remember S.O.L.I.D.
- Iterators are magic (CS 330).
Data Structures
When I work in Python, I generally focus on three core (fundamental) data
structures (or specialized variations from the collections
module).
- Lists:
prime_numbers = [1, 2, 3, 5, 7, 11, 13, 17, 19]
- Dictionaries:
favourite_colors = {"Thomas": "Blue", "Jessica": "Purple"}
collections.defaultdict
collections.Counter
- Sets:
some_colors = {"Blue", "Red", "Green", "Cyan", "Teal"}
If we want to map these to (modern) C++, Java, and Rust... we end up with...
Python | C++ | Java | Rust |
---|---|---|---|
list |
std::list |
java.util.List |
std::collections::LinkedList |
dict |
std::unordered_map |
java.util.HashMap |
std::collections::HashMap |
set |
std::unordered_set |
java.util.HashSet |
std::collections::HashSet |
I am not listing tuple as a true data structure.
Loops and List/Generator Comprehensions
This section is based on notes from CS 330 Object Oriented Programming & Design.
Lists & List Comprehensions
The next few discussions will include list comprehensions, dictionary comprehensions and set comprehensions.
Suppose we have a list of programming terms and want to create a second list containing the length of each term. We might take the usual C, C++, or Java approach:
Word Count - Boring C++ Loop
using std::string; using std::vector; int main(int argc, char** argv) { vector<string> some_terms {"Hello", "world", "with", "for", "while", "int"}; vector<int> term_lengths(some_terms.size(), 0); for (int i = 0; i < term_lengths.size(); i++) { term_lengths[i] = some_terms[i].size(); } return 0; }
and translate it into Python:
Word Count - Boring Python Loop
def main(): some_terms = ["Hello", "world", "with", "for", "while", "int"] term_lengths = [] for term in some_terms: term_lengths.append(len(term)) if __name__ == "__main__": main()
The Python version can (and should) use a list comprehension.
Word Count - Fun Python Loop
def main(): some_terms = ["Hello", "world", "with", "for", "while", "int"] term_lengths = [len(term) for term in some_terms] if __name__ == "__main__": main()
Depending on how many terms we have... a generator expression might be more appropriate:
Word Count - Really Fun Python Loop
def main(): some_terms = ["Hello", "world", "with", "for", "while", "int"] term_lengths = (len(term) for term in some_terms) if __name__ == "__main__": main()
Modern C++ and std::transform
Modern C++11 and newer provide the std::transform
method. Combined with
lambda functions
we can take the original C++ code... and rewrite it as
Word Count - C++
std::transform
using std::string; using std::vector; int main(int argc, char** argv) { vector<string> some_terms {"Hello", "world", "with", "for", "while", "int"}; vector<int> term_lengths; std::transform(some_terms.begin(), some_terms.end(), std::back_inserter(term_lengths), [](const string& t) -> int { return t.size(); }); return 0; }
Java has the java.util.stream
package, which provides similar functionality
to Python comprehensions and C++ std::transform
. However, in Java, we would
end up dealing with the Integer
wrapper class if we wanted to use a non-array
data structure.
Word Count - Java Streams
import java.util.Arrays; import java.util.List; public class IntStreamDemo { public static void main(String... args) { List<String> some_terms = Arrays.asList("Hello", "world", "with", "for", "while", "int"); int[] term_lengths = some_terms.stream() .mapToInt(s -> s.length()) .toArray(); } }
The Python implementation is the most succinct, approachable, and readable.
Context Managers
Python provides the with
statement (construct). This allows the setup and
teardown involved in using resources (e.g., files, sockets, and database
connections) to be handled elsewhere.
This has two main benefits:
- There is less boilerplate code.
- It is impossible to forget to close/deallocate a resource.
To write to a file, one might write:
Python File IO - Basic
text_file = open("some_file.txt", "w") for number in range(1, 100): text_file.write(f"{number}\n")
Did you notice the missing fclose(text_file)
? With one small with
the file
close operation will be handled automatically.
Python File IO - Using
with
with open("some_file.txt", "w") as text_file: for number in range(1, 100): text_file.write(f"{number}\n")
This also works for other types of files--including compressed files.
Python File IO - Using
with
andgzip
import gzip with gzip.open("some_file.txt.gz", "wt") as text_file: for number in range(1, 100): text_file.write(f"{number}\n")
Python Includes Batteries
For many languages external libraries are usually required for common operations. Python includes batteries.
Operation | Built-in Python Module |
---|---|
Working with zip files | import zipfile |
Working with gzipped files | import gzip |
Reading, writing, or generating JSON | import json |
Converting objects to JSON | import json |
Serializing objects and data structures | import pickle |
Working with time | import time |
Working with dates and time | import datetime |
Working with SQLite | import sqlite3 |
Building a calendar | import calendar |
Generating log files | import logfile |
Using advanced command line arguments | import argparse |
Third-Party (External) Libraries & pip
When external libraries are required, the Python pip
utility and a
requirements.txt
can be used for all dependency and configuration management.
In C/C++ we hope for a Linux environment (or Docker). In Java... Gradle is a popular build and configuration management tool. In Rust... Cargo handles dependency and configuration management.
Examples & Case Studies
Monte Carlo Integration
This program is from my offering of CS 417/517 Computational Methods.
Let us start with the top:
#! /usr/bin/env python3
import random
import sys
from typing import (Callable, Tuple)
Point = Tuple[float, float]
This program uses three Python modules:
-
random
for random number generation -
sys
for command line arguments (i.e.,sys.argv
) -
typing
for type hints
The last line (i.e., Point = Tuple[float, float]
is a type alias.
I am a stickler for type hints and function/method documentation. Anytime code is written... it must be documented at the API level. While Python type hints do not necessarily gain us a performance benefit, type hints increase readability. Type hints are an important part of documentation.
Point Generation
Let us tackle the point generation function (generate_random_points
).
def generate_random_points(f: Callable,
lower_limit: float,
upper_limit: float,
n: int) -> Point:
"""
Generate a sequence of random x values and plug them into f(x).
Args:
f: mathematical function
lower_limit: 'a' the lower bound
upper_bound: 'b' the upper bound
n: number of points to generate
Yields:
A sequence of points in the form (x, f(x))
"""
for _ in range(0, n):
x = random.uniform(lower_limit, upper_limit)
y = f(x)
yield (x, y)
This function has full pydoc documentation, complete with:
- complete description
- explanation of arguments
- explnation of
yield
-ed values
...and type hints!
Take particular note of for _ in range(0, n)
. The underscore _
can be used
any time a variable is required syntactically, but the value will be ignored.
The Main Function
Always wrap your main/driver code in a main function. This will prevent variables from ending up in the global/module namespace... which can (will) lead to frustrating bugs later.
Let us start with a naive main function, one that has quite a bit of room for improvement.
def naive_main():
"""
This is a "naive" main function used to demonstrate the basic premise
behind Monte Carlo integration.
"""
num_points = int(sys.argv[1])
limit_a = float(sys.argv[2])
limit_b = float(sys.argv[3])
math_f = lambda x: x**2
# math_f = lambda x: cos(x)
print("{:-^80}".format("Points"), file=sys.stderr)
temp_sum = 0
for i, point in enumerate(generate_random_points(math_f, limit_a, limit_b, num_points)):
print(f"{i:5d} - ({point[0]:>12.8f}, {point[1]:>12.8f})", file=sys.stderr)
temp_sum += point[1]
integral_result = (limit_b - limit_a) / float(num_points) * temp_sum
print(f"{integral_result:16.8f}")
The first three (3) lines
num_points = int(sys.argv[1])
limit_a = float(sys.argv[2])
limit_b = float(sys.argv[3])
grab command line arguments and parse them into int
or float values
.
Next... I defined a lambda function. This is the mathematical function f(x)
that will be integrated.
math_f = lambda x: x**2
Note that any line that includes file=sys.stderr
is debugging output. By
convention (in C, C++, Java, Python, and Rust) production output is written to
standard out and debugging output is written to standard error.
The rest of the function is not very Pythonic...
temp_sum = 0
for i, point in enumerate(generate_random_points(math_f, limit_a, limit_b, num_points)):
print(f"{i:5d} - ({point[0]:>12.8f}, {point[1]:>12.8f})", file=sys.stderr)
temp_sum += point[1]
integral_result = (limit_b - limit_a) / float(num_points) * temp_sum
print(f"{integral_result:16.8f}")
There is:
- a temporary sum variable
temp_sum
- a line over 80 characters in length
- an increment operation (
temp_sum += point[1]
)
The next version of main (i.e., not_so_naive_main
) corrects a few style and
design issues.
def not_so_naive_main():
"""
This main function demonstrates the more "Pythonic" approach
"""
num_points = int(sys.argv[1])
limit_a = float(sys.argv[2])
limit_b = float(sys.argv[3])
math_f = lambda x: x**2
# math_f = lambda x: cos(x)
point_sequence = generate_random_points(math_f, limit_a, limit_b, num_points)
f_of_x_values = (y for x, y in point_sequence)
integral_result = ((limit_b - limit_a) /
float(num_points) *
sum(f_of_x_values))
print(f"{integral_result:16.8f}")
Instead of looping over all the points...
for i, point in enumerate(generate_random_points(math_f, limit_a, limit_b, num_points)):
the generator is assigned to a variable...
point_sequence = generate_random_points(math_f, limit_a, limit_b, num_points)
Since we only need the y
values from each point... an inline generator
expression can be used...
f_of_x_values = (y for x, y in point_sequence)
This leads to a far more concise and readable computation.
integral_result = ((limit_b - limit_a) /
float(num_points) *
sum(f_of_x_values))
The Final Main Function
Now we can tackle the final (and true) main function...
def main_without_a_table_flip():
"""
This main demonstrates the impact of the number of points on Monte Carlo
integration
"""
num_points = int(sys.argv[1]) # Unused in this version of main
limit_a = float(sys.argv[2])
limit_b = float(sys.argv[3])
max_magnitude = int(sys.argv[4])
math_f = lambda x: x**2
print("| {:^16} | {:^20} |".format("# Points", "Est. f(x)"))
max_num_points = 2 ** max_magnitude
point_sequence = list(generate_random_points(math_f, limit_a, limit_b, max_num_points))
for magnitude in range(0, max_magnitude + 1):
num_points = 2 ** magnitude
f_of_x_values = (y for x, y in point_sequence[:num_points])
integral_result = ((limit_b - limit_a) /
float(num_points) *
sum(f_of_x_values))
print(f"| {num_points:>16} | {integral_result:^20.8f} |")
In CS 417/517 (Computational Methods) I use this example to demonstrate a few things... including how the number of points used to approximate the integral can increase the accuracy of the estimate.
First... I add a fourth command line argument...
num_points = int(sys.argv[1]) # Unused in this version of main
limit_a = float(sys.argv[2])
limit_b = float(sys.argv[3])
max_magnitude = int(sys.argv[4])
Now... things get a little interesting...
max_num_points = 2 ** max_magnitude
point_sequence = list(generate_random_points(math_f, limit_a, limit_b, max_num_points))
Even though generate_random_points
is a generator expression... I can turn
it into a list. However, this only works because generate_random_points
is a
finite sequence (i.e., it stops generating points).
Instead of generating points repeatedly each time the loop executes...
for magnitude in range(0, max_magnitude + 1):
num_points = 2 ** magnitude
f_of_x_values = (y for x, y in point_sequence[:num_points])
integral_result = ((limit_b - limit_a) /
float(num_points) *
sum(f_of_x_values))
print(f"| {num_points:>16} | {integral_result:^20.8f} |")
I determine the maximum number of points needed in the final loop iteration and sample them (albeit naively) using:
point_sequence[:num_points])
This is an example of Python's list slicing syntax. The [:num_points]
takes
all the points from the beginning of the list (i.e., 0
) up to (but not
including) the index specified by num_points
.
__main__
If The last chunk is what tells the Python interpreter what to run.
if __name__ == "__main__":
# naive_main()
# not_so_naive_main()
main_without_a_table_flip()
This allows us to run our script directly. If we import the Python script into
a larger program... the __name__ == "__main__"
will evaluate to False
.
This will allow us to call our functions from a larger program without having
to rewrite any code.
If Time Permits
If time permits... we will discuss a few additional examples:
- Implementing a Non-Linear Solver
- A few Python Examples from CS 330 (Object Oriented Programming & Design)
- Test Driven Development