Presenter: Mike Mueller (Python Academy)
Sample code and exercises available here. Slides have not been posted yet; I'll update when available.
Premature optimization is the root of all evil – C.A.R. Hoare
Knowing when optimization is premature defines the difference between the master engineer and the apprentice – via Twitter
Guidelines:
make sure your program is actually slow
don't optimize as you go
only realistic use cases should be considered
architecture can be essential to performance
check for bugs that slow the program down
find bottlenecks using profiling
always check the results of optimization with unit testing
The greatest speedup is usually achieved by optimization of algorithms
External causes
network
database
system calls
Measurement
Measuring in stones
test.pystone
a single core of my C2D laptop gets ~24k stones/second
Profiling CPU usage:
cProfile
pstats
Profiling memory usage:
Guppy_pe
Algorithms and Anti-patterns
String concatenation
use lists and ''.join(list) to create large strings
list generators
l = [x * x for x in range(10)]
sum(x * x for x in range(10)
g = (x * x for x in xrange(10))
g.next()
list(g)
data structures
sets vs lists
use sets instead of lists if performing membership tests
if order matters, create a set from your list
this has a cost, so weigh set_creation_cost vs list_search_cost * number_of_searches
use set.intersection to for intersection tests
deque vs lists
use deque (doubly-linked list) instead of lists for faster inserts
deque has a higher index access cost
dict vs defaultdict
defaultdict is faster for larger data sets
Python big-O
O(1)
len(list)
len(dict)
list[i]
del dict[i]
x in dict
x in set
list.append(i)
O(n)
loops on lists, strings, sets
string methods
x in list
O(n log n)
list.sort()
O(n^2)
nested loops
O(n^3)
nested nested loops
Caching
Reuse before you recalculate
deterministic caching
eg numerical results that are always the same for the given inputs
non-deterministic caching
expiring cache
eg caching of database queries
JIT compilation
Numeric libraires
Parallel
Python 2.6 – import multiprocessing
Python <>
Using both cores, my laptop gets ~45k stones/second


