Tuesday, July 20, 2010

Faster Python Programs through Optimization

Presenter: Mike Mueller (Python Academy)

Sample code and exercises available here. Slides have not been posted yet; I'll update when available.

Premature optimization is the root of all evil – C.A.R. Hoare

Knowing when optimization is premature defines the difference between the master engineer and the apprentice – via Twitter

Guidelines:

    • make sure your program is actually slow

    • don't optimize as you go

    • only realistic use cases should be considered

    • architecture can be essential to performance

    • check for bugs that slow the program down

    • find bottlenecks using profiling

    • always check the results of optimization with unit testing

The greatest speedup is usually achieved by optimization of algorithms

External causes

    • network

    • database

    • system calls

Measurement

    • Measuring in stones

      • test.pystone

      • a single core of my C2D laptop gets ~24k stones/second

    • Profiling CPU usage:

      • cProfile

      • pstats

    • Profiling memory usage:

      • Guppy_pe

Algorithms and Anti-patterns

  • String concatenation

    • use lists and ''.join(list) to create large strings

  • list generators

    • l = [x * x for x in range(10)]

    • sum(x * x for x in range(10)

    • g = (x * x for x in xrange(10))

      • g.next()

      • list(g)

  • data structures

    • sets vs lists

      • use sets instead of lists if performing membership tests

      • if order matters, create a set from your list

        • this has a cost, so weigh set_creation_cost vs list_search_cost * number_of_searches

      • use set.intersection to for intersection tests

    • deque vs lists

      • use deque (doubly-linked list) instead of lists for faster inserts

      • deque has a higher index access cost

    • dict vs defaultdict

      • defaultdict is faster for larger data sets

  • Python big-O

    • O(1)

      • len(list)

      • len(dict)

      • list[i]

      • del dict[i]

      • x in dict

      • x in set

      • list.append(i)

    • O(n)

      • loops on lists, strings, sets

      • string methods

      • x in list

    • O(n log n)

      • list.sort()

    • O(n^2)

      • nested loops

    • O(n^3)

      • nested nested loops

  • Caching

    • Reuse before you recalculate

    • memoization

    • deterministic caching

      • eg numerical results that are always the same for the given inputs

    • non-deterministic caching

      • expiring cache

      • eg caching of database queries

    • Memcached

  • JIT compilation

  • Numeric libraires

  • Parallel

    • Python 2.6 – import multiprocessing

    • Python <>

      • Using both cores, my laptop gets ~45k stones/second

Design Patterns in Python

Alex Martelli (Google)












The Facade DP – provide a controlled subset of functionality to a client. Most Python libraries are facades. Layers on top of bigger packages.

An explanation of basic DPs.

There are other patterns...











This talk is about Pythonic Patterns

Singletons in Python … use a module instead of a class. They are inherently singletons, you can use it one time.

The Canceled Perl Session

The Mastering Perl session was canceled, but the presenter has uploaded his slides. I'm disappointed about missing this one, but the DTrace session turned out to be pretty interesting.

Python AppEngine

Google's Python AppEngine
Ikai Lan (Google)


Hands on will need:
Python 2.5.x

In attendance also from Google:
Shawn Lynch (sp?)
Alex Martell (sp?)
John Modell (sp?)

DiveIntoPython.org <-- Online book; 30 min proficiency in Python

App Engine – Google's tool for writing quick webapps that run on their systems at Appspot.com.

Write App Engine code (see my public git repo at http://github.com/jcalahan/appenginetest)

Run it with:
~/google_appengine_1.3.5/dev_appserver.py

View it with:

http://localhost:8080/_ah/admin/ <-- Local admin console

Morning Dew

Chris found his morning mountain dew!