I am going to go for a Raymond Hettinger style presentation, https://www.cs.odu.edu/~tkennedy/cs330/f20/Public/languageResources/#python-programming-videos.
These materials are web-centric (i.e., do not need to be printed and are available at https://www.cs.odu.edu/~tkennedy/python-workshop).
Who am I?
I have taught various courses, including:
- CS 300T - Computers in Society
- CS 333 - Programming and Problem Solving
- CS 330 - Object Oriented Programming and Design
- CS 350 - Introduction to Software Engineering
- CS 410 - Professional Workforce Development I
- CS 411W - Professional Workforce Development II
- CS 417 - Computational Methods & Software
Most of my free time is spent writing Python 3 and Rust code, tweaking my Vim configuration, or learning a new (programming) language. My current language of interest is Rust (at the time of writing).
Referenced Courses & Materials
I am going to pull from CS 330, CS 350, CS 411W, and CS 417 lecture notes.
- CS 330 - Object Oriented Programming & Design
- CS 350 - Introduction to Software Engineering
- CS 417 - Computational Methods & Software
I will also pull a couple examples from the previous (Git) workshop, https://www.cs.odu.edu/~tkennedy/git-workshop.
The Broad Strokes
This workshop is intended as discussion on how to write more concise, idiomatic (Pythonic), readable, and reusable Python code.
Procedural, Object-Oriented & Functional Programming
There are generally three styles of code found in Python.
Procedural
point1 = (0, 5)
point2 = (8, 3)
point3 = (1, 7)
points = [point1, point2, point3]
for point in points:
print(sqrt(point.x ** 2 + point.y ** 2))
Procedural style code tends to be how most quick Python programs are written.
Object Oriented
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self):
pass
def __hash__(self):
pass
def __str__(self):
pass
def magnitude(self)
return sqrt(self.x ** 2 + self.y ** 2)
A proper discussion of object oriented Python would require explanation of the rules of a class checklist.
C++ | Java | Python 3 | Rust |
---|---|---|---|
Default Constructor | Default Constructor | __init__ |
new() or Default trait |
Copy Constructor | Clone and/or Copy Constructor | __deepcopy__ |
Clone trait |
Destructor | |||
finalize (deprecated/discouraged) | __del__ |
Drop trait | |
Assignment Operator (=) | |||
Accessors (Getters) | Accessors (Getters) | Accessors (@property ) |
Accessors (Getters) |
Mutators (Setters) | Mutators (Setters) | Setter (@attribute.setter ) |
Mutators (setters) |
Swap | |||
Logical Equivalence Operator (==) | equals | __eq__ |
std::cmp::PartialEq trait |
Less-Than / Comes-Before Operator (<) | hashCode | __hash__ |
std::cmp::PartialOrd trait |
Stream Insertion Operator (<<) | toString | __str__ |
std::fmt::Display trait |
__repr__ |
std::fmt::Debug trait | ||
begin() and end()
|
iterator |
__iter__ |
iter() and iter_mut()
|
Functional
point1 = (0, 5)
point2 = (8, 3)
point3 = (1, 7)
points = [point1, point2, point3]
shortest_distance = min((sqrt(point.x ** 2 + point.y ** 2)) for point in points)
largest_distance = max((sqrt(point.x ** 2 + point.y ** 2)) for point in points)
average_distance = sum((sqrt(point.x ** 2 + point.y ** 2)) for point in points) / len(points)
Of course... we should clean it up...
points = [(0, 5), (8, 3), (1, 7)]
distances = [sqrt(point.x ** 2 + point.y ** 2) for point in points]
shortest_distance = min(distances)
largest_distance = max(distances)
average_distance = sum(distances) / len(points)
None of these examples are particularly well written...
- No
__main__
- No documentation
- Everything is in
"main"
Pythonic Code & "Good" Code
There are quite a few "rules" when writing Pythonic code... starting with...
Other rules come from the community...
- Always have a
if __name__ == "__main__"
. - Use f-strings over format where possible.
- Use
with
closures - Write pydoc style documentation.
- Use functions and modules.
- No global variables.
- Do not always use object-oriented design.
- Do not forget about the Python GIL
Other rules come from general software engineering practices:
- Follow Test Driven Development (TDD)
- Use top down design
- Do not write monolithic functions
- Use a code linter or style checker (e.g., pylint)
- Use self-dcoumenting name
- Remember S.O.L.I.D.
- Iterators are magic (CS 330).
Data Structures
When I work in Python, I generally focus on three core (fundamental) data structures.
- Lists:
prime_numbers = [1, 2, 3, 5, 7, 11, 13, 17, 19]
- Dictionaries:
favourite_colors = {"Thomas": "Blue", "Jessica": "Purple"}
collections.defaultdict
collections.Counter
- Sets:
some_colors = {"Blue", "Red", "Green", "Cyan", "Teal"}
If we want to map these to (modern) C++, Java, and Rust... we end up with...
Python | C++ | Java | Rust |
---|---|---|---|
list |
std::list |
java.util.List |
std::collections::LinkedList |
dict |
std::unordered_map |
java.util.HashMap |
std::collections::HashMap |
set |
std::unordered_set |
java.util.HashSet |
std::collections::HashSet |
I am not listing tuple as a true data structure.
Loops and List/Generator Comprehensions
This section is based on notes from CS 330 Object Oriented Programming & Design.
Lists & List Comprehensions
The next few discussions will include list comprehensions, dictionary comprehensions and set comprehensions.
Suppose we have a list of programming terms and want to create a second list containing the length of each term. We might take the usual C, C++, or Java approach:
Word Count - Boring C++ Loop
using std::string; using std::vector; int main(int argc, char** argv) { vector<string> some_terms {"Hello", "world", "with", "for", "while", "int"}; vector<int> term_lengths(some_terms.size(), 0); for (int i = 0; i < term_lengths.size(); i++) { term_lengths[i] = some_terms[i].size(); } return 0; }
and translate it into Python:
Word Count - Boring Python Loop
def main(): some_terms = ["Hello", "world", "with", "for", "while", "int"] term_lengths = [] for term in some_terms: term_lengths.append(len(term)) if __name__ == "__main__": main()
The Python version can (and should) use a list comprehension.
Word Count - Fun Python Loop
def main(): some_terms = ["Hello", "world", "with", "for", "while", "int"] term_lengths = [len(term) for term in some_terms] if __name__ == "__main__": main()
Depending on how many terms we have... a generator expression might be more appropriate:
Word Count - Really Fun Python Loop
def main(): some_terms = ["Hello", "world", "with", "for", "while", "int"] term_lengths = (len(term) for term in some_terms) if __name__ == "__main__": main()
Modern C++ and std::transform
Modern C++11 and newer provide the std::transform
method. Combined with
lambda functions
we can take the original C++ code... and rewrite it as
Word Count - C++
std::transform
using std::string; using std::vector; int main(int argc, char** argv) { vector<string> some_terms {"Hello", "world", "with", "for", "while", "int"}; vector<int> term_lengths; std::transform(some_terms.begin(), some_terms.end(), std::back_inserter(term_lengths), [](const string& t) -> int { return t.size(); }); return 0; }
Java has the java.util.stream
package, which provides similar functionality
to Python comprehensions and C++ std::transform
. However, in Java, we would
end up dealing with the Integer
wrapper class if we wanted to use a non-array data structure.
Word Count - Java Streams
import java.util.Arrays; import java.util.List; public class IntStreamDemo { public static void main(String... args) { List<String> some_terms = Arrays.asList("Hello", "world", "with", "for", "while", "int"); int[] term_lengths = some_terms.stream() .mapToInt(s -> s.length()) .toArray(); } }
The Python implementation is the most succinct, approachable, and readable.
Context Managers
Python provides the with
statement (construct). This allows the setup and
teardown involved in using resources (e.g., files, sockets, and database
connections) to handled elsewhere.
This has two main benefits:
- There is less boilerplate code.
- It is impossible to forget to close/deallocate a resource.
To write to a file, one might write:
Python File IO - Basic
text_file = open("some_file.txt", "w") for number in range(1, 100): text_file.write(f"{number}\n")
Did you notice the missing fclose(text_file)
? With one small with
the file
close operation will be handled automatically.
Python File IO - Using
with
with open("some_file.txt", "w") as text_file: for number in range(1, 100): text_file.write(f"{number}\n")
This also works for other types of files--including compressed files.
Python File IO - Using
with
andgzip
import gzip with gzip.open("some_file.txt.gz", "wt") as text_file: for number in range(1, 100): text_file.write(f"{number}\n")
Python Includes Batteries
For many languages external libraries are usually required for common operations. Python includes batteries.
Operation | Built-in Python Module |
---|---|
Zip Files | import zipfile |
GZipped Files | import gzip |
Reading, writing, or generating JSON | import json |
Converting objects to JSON | import json |
Serializing objects and data structures | import pickle |
Working with time | import time |
Working with dates and time | import datetime |
Working with SQLite | import sqlite3 |
Building a calendar | import calendar |
Generating log files | import logfile |
Advanced command line arguments | import argparse |
Libraries & pip
When external libraries are required, the Python pip
utility and a
requirements.txt
can be used for all dependency and configuration management.
In C/C++ we hope for a Linux environment (or Docker). In Java... Gradle is a popular build and configuration management tool.
Examples & Case Studies
Monte Carlo Integration
This program is from my offering of CS 417/517 Computational Methods](https://www.cs.odu.edu/~tkennedy/cs417/f20/Directory/outline/index.html).
Let us start with the top:
#! /usr/bin/env python3
import random
import sys
from typing import (Callable, Tuple)
Point = Tuple[float, float]
This program uses three Python modules:
-
random
for random number generation -
sys
for command line arguments (i.e.,sys.argv
) -
typing
for type hints
The last line (i.e., Point = Tuple[float, float]
is a type alias.
I am a stickler for type hints and function/method documentation. Anytime code is written... it must be documented at the API level. While Python type hints do no necessarily gain us a performance benefit, type hints increase readability. Type hints are an important part of documentation.
Point Generation
Let us tackle the point generation function (generate_random_points
).
def generate_random_points(f: Callable,
lower_limit: float,
upper_limit: float,
n: int) -> Point:
"""
Generate a sequence of random x values and plug them into f(x).
Args:
f: mathematical function
lower_limit: 'a' the lower bound
upper_bound: 'b' the upper bound
n: number of points to generate
Yields:
A sequence of points in the form (x, f(x))
"""
for _ in range(0, n):
x = random.uniform(lower_limit, upper_limit)
y = f(x)
yield (x, y)
This function has full pydoc documentation, complete with:
- complete description
- explanation of arguments
- explnation of
yield
-ed values
...and type hints!
Take particular note of for _ in range(0, n)
. The underscore _
can be used
any time a variable is required syntactically, but the value will be ignored.
The Main Function
Always wrap your main/driver code in a main function. This will prevent variables from ending up in the global/module namespace... which can (will) lead to frustrating bugs later.
Let us start with a naive main function, one that has quite a bit of room for improvement.
def naive_main():
"""
This is a "naive" main function used to demonstrate the basic premise
behind Monte Carlo integration.
"""
num_points = int(sys.argv[1])
limit_a = float(sys.argv[2])
limit_b = float(sys.argv[3])
math_f = lambda x: x**2
# math_f = lambda x: cos(x)
print("{:-^80}".format("Points"), file=sys.stderr)
temp_sum = 0
for i, point in enumerate(generate_random_points(math_f, limit_a, limit_b, num_points)):
print(f"{i:5d} - ({point[0]:>12.8f}, {point[1]:>12.8f})", file=sys.stderr)
temp_sum += point[1]
integral_result = (limit_b - limit_a) / float(num_points) * temp_sum
print(f"{integral_result:16.8f}")
The first three (3) lines
num_points = int(sys.argv[1])
limit_a = float(sys.argv[2])
limit_b = float(sys.argv[3])
grab command line arguments and parse them into int
or float values
.
Next... I defined a lambda function. This is the mathematical function f(x)
that will be integrated.
math_f = lambda x: x**2
Note that any line that includes file=sys.stderr
is debugging output. By
convention (in C, C++, Java, Python, and Rust) production output is written to
standard out and debugging output is written to standard error.
The rest of the function is not very Pythonic...
temp_sum = 0
for i, point in enumerate(generate_random_points(math_f, limit_a, limit_b, num_points)):
print(f"{i:5d} - ({point[0]:>12.8f}, {point[1]:>12.8f})", file=sys.stderr)
temp_sum += point[1]
integral_result = (limit_b - limit_a) / float(num_points) * temp_sum
print(f"{integral_result:16.8f}")
There is:
- a temporary sum variable
temp_sum
- a line over 80 characters in length
- an increment operation (
temp_sum += point[1]
)
The next version of main (i.e., not_so_naive_main
) corrects a few style and
design issues.
def not_so_naive_main():
"""
This main function demonstrates the more "Pythonic" approach
"""
num_points = int(sys.argv[1])
limit_a = float(sys.argv[2])
limit_b = float(sys.argv[3])
math_f = lambda x: x**2
# math_f = lambda x: cos(x)
point_sequence = generate_random_points(math_f, limit_a, limit_b, num_points)
f_of_x_values = (y for x, y in point_sequence)
integral_result = ((limit_b - limit_a) /
float(num_points) *
sum(f_of_x_values))
print(f"{integral_result:16.8f}")
Instead of looping over all the points...
for i, point in enumerate(generate_random_points(math_f, limit_a, limit_b, num_points)):
the generator is assigned to a variable...
point_sequence = generate_random_points(math_f, limit_a, limit_b, num_points)
Since we only need the y
values from each point... an inline generator
expression can be used...
f_of_x_values = (y for x, y in point_sequence)
This leads to a far more concise and readable computation.
integral_result = ((limit_b - limit_a) /
float(num_points) *
sum(f_of_x_values))
def main_without_a_table_flip():
"""
This main demonstrates the impact of the number of points on Monte Carlo
integration
"""
num_points = int(sys.argv[1]) # Unused in this version of main
limit_a = float(sys.argv[2])
limit_b = float(sys.argv[3])
max_magnitude = int(sys.argv[4])
math_f = lambda x: x**2
print("| {:^16} | {:^20} |".format("# Points", "Est. f(x)"))
max_num_points = 2 ** max_magnitude
point_sequence = list(generate_random_points(math_f, limit_a, limit_b, max_num_points))
for magnitude in range(0, max_magnitude + 1):
num_points = 2 ** magnitude
f_of_x_values = (y for x, y in point_sequence[:num_points])
integral_result = ((limit_b - limit_a) /
float(num_points) *
sum(f_of_x_values))
print(f"| {num_points:>16} | {integral_result:^20.8f} |")
if __name__ == "__main__":
# naive_main()
# not_so_naive_main()
main_without_a_table_flip()
If Time Permits
If time permits... we will discuss a few additional examples:
- Implementing a Non-Linear Solver
- A few Python Examples from CS 330 (Object Oriented Programming & Design)
- Test Driven Development