poor man's profiler

rationale

Sampling tools like oprofile or dtrace's profile provider don't really provide methods to see what [multithreaded] programs are blocking on - only where they spend CPU time. Though there exist advanced techniques (such as systemtap and dtrace call level probes), it is overkill to build upon that. Poor man doesn't have time. Poor man needs food.

method

For a poor developer to understand what a program is doing, he needs to see stacks. Once upon a time (back in Linux 2.4) there was a 'pstack' tool for that, Solaris has it too.

Modern Linux systems though do not have such facilities, and one needs to improvise, like.. use debuggers - they can walk threads and provide stacks.

technology

Getting stacks:

gdb -ex "set pagination 0" -ex "thread apply all bt" \
  --batch -p $(pidof mysqld)

Or for version-impaired (gdb 6.3 and older):

(echo "set pagination 0"; 
 echo "thread apply all bt"; 
 echo "quit"; cat /dev/zero ) | gdb -p $(pidof mysqld)

Collapsing traces (awk!):

BEGIN { s = ""; }
/Thread/ { print s; s = ""; }
/^\#/ { if ($3 != "in") { $4 = $2 } }
/^\#/ { if (s != "" ) { s = s "," $4} else { s = $4 } }
END { print s }

Full technology demonstration:

#!/bin/bash
nsamples=1
sleeptime=0
pid=$(pidof mysqld)

for x in $(seq 1 $nsamples)
  do
    gdb -ex "set pagination 0" -ex "thread apply all bt" -batch -p $pid
    sleep $sleeptime
  done | \
awk '
  BEGIN { s = ""; } 
  /^Thread/ { print s; s = ""; } 
  /^\#/ { if (s != "" ) { s = s "," $4} else { s = $4 } } 
  END { print s }' | \
sort | uniq -c | sort -r -n -k 1,1

output

291 pthread_cond_wait@@GLIBC_2.3.2,one_thread_per_connection_end,handle_one_connection
 57 read,my_real_read,my_net_read,do_command,handle_one_connection,start_thread
 26 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,os_aio_simulated_handle,fil_aio_wait,io_handler_thread,start_thread
  3 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,srv_purge_worker_thread
  1 select,os_thread_sleep,srv_purge_thread
  1 select,os_thread_sleep,srv_master_thread
  1 select,os_thread_sleep,srv_lock_timeout_and_monitor_thread
  1 select,os_thread_sleep,srv_error_monitor_thread
  1 select,handle_connections_sockets,main,select
  1 read,vio_read_buff,my_real_read,my_net_read,cli_safe_read,handle_slave_io
  1 pthread_cond_wait@@GLIBC_2.3.2,os_event_wait_low,sync_array_wait_event,rw_lock_s_lock_spin,buf_page_get_gen,btr_cur_search_to_nth_level,row_search_for_mysql,ha_innodb::index_read,handler::index_read_idx_map,join_read_const,join_read_const_table,make_join_statistics,JOIN::optimize,mysql_select,handle_select,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection
  1 pread64,os_file_pread,os_file_read,fil_io,buf_read_page_low,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level,row_search_index_entry,row_upd_step,row_update_for_mysql,ha_innodb::delete_row,handler::ha_delete_row,mysql_delete,mysql_execute_command,mysql_parse,Query_log_event::do_apply_event,apply_event_and_update_pos,handle_slave_sql
  1 pread64,os_file_pread,os_file_read,fil_io,buf_read_page_low,buf_read_page,buf_page_get_gen,btr_cur_search_to_nth_level,row_search_for_mysql,ha_innodb::index_read,handler::index_read_idx_map,join_read_const,join_read_const_table,make_join_statistics,JOIN::optimize,mysql_select,handle_select,execute_sqlcom_select,mysql_execute_command,mysql_parse,dispatch_command,do_command,handle_one_connection
  1 do_sigwait,sigwait,signal_hand

success stories and references

PoorMansProfiler

We hear this technology has been used by performance engineers at Google, Facebook, Wikipedia, Intel, Sun Microsystems and other places.

better alternatives

credits

Originally this technology was released as a collaborative effort by Mark Callaghan and Domas Mituzas at this blog post. You can still find us at mysql@facebook.

Website manufactured by Domas during festive 2009 season - shameless plug for his other gdb tricks.