Monday, July 19, 2010

DTrace Hands-On

Presenter: Angelo Rajadurai (Oracle/Sun)

The full tutorial is available here, or you can grab the OSCON slides here. There is also a Google group set up here.


Why DTrace

Some performance issues only show up in production and DTrace allows you to perform live instrumentation in a production environment. DTrace doesn't require a debug binary or access to source code. The instrumentation can be performed on demand and you can probe any arbitrary location. Active DTrace probes require little system overhead, and disabled probes require no overhead.

Simple Example

% dtrace -P syscall'/execname==”java”/{@[probefunc]=count()}'

This tells DTrace that you want to use the syscall probe to count the number of system calls made by the 'java' executable.

D-Scripts

More complex setups are easier served using scripts. DTrace uses it's own scripting language called D-Script. The simple example above is actually a simple D-Script. More complex D-Sripts will allow you to introduce variables, stack multiple probes, and format output.

DTrace Patterns

Event Trace: Basically printf() debugging. Good for rare events, but gets very expensive. One thing I discovered today is that printf() performs ~2400() system calls to print a string.

Count: A collection or summary of events. The simple example above is uses the Count pattern.

What's In Between: Traces everything that happens between two events (eg function entry/exit).

Time Spent: Performance profiling. Find the amount of time spent in a function.

Profile: A polling probe that takes a sample over time to create an overview.

Gotcha's

Probes do have a performance impact. This is usually small, but it possible to unintentionally cripple a production system. DTrace contains a “killswitch” that should prevent you from completely killing the system.

DTrace only provides probe results. It can't tell you what to probe, or interpret the probe results. It's just an instrumentation tool.

General Approach

  1. Define problem

    1. program foo is running much slower on Wednesdays

  2. Verify “known” conditions

    1. “It can't be the disks, we just replaced them”

  3. Attempt to reproduce the problem.

    1. If the problem can't be reproduced, it can't be instrumented

  4. Isolate the problem

    1. CPU, I/O, Network, etc

  5. Determine if other tools are more applicable

    1. Vendor supplied instrumentation packages

No comments:

Post a Comment