Archive

Posts Tagged ‘Macro’

Measuring Code Performance in C

August 11th, 2009 No comments

When optimizing code for speed, it is useful to have a very precise measurement of the time a certain routine or loop needs to complete. When measuring time only, the QueryPerformanceFrequency() and QueryPerformanceCounter() APIs can be employed. If we need to be even more accurate and measure the number of CPU clock cycles, the x86 asm instruction rdtsc, or the Visual C++ compiler intrinsic __rdtsc() function can be used. The return value of this function is a unsigned 64 bit value that wraps around every 194 years or so on a 3 Ghz CPU. Originally, the RDTSC instruction loads the current count of the time-stamp counter into the EDX:EAX registers for further processing. The instrinsic function is merely a wrapper that returns a 64 bit value of the combined registers.

The following two macros come in handy for wrapping a section to be examined and outputting the result to stdout or a function of your choice.

#define TIMER_START( description ) { \
	PCTSTR szDescription = description; \
	_tprintf( _T("\n-->Starting %s...\n"), description ); \
	LARGE_INTEGER start; \
	QueryPerformanceCounter( &start ); \
	DWORD64 ticks = __rdtsc();
 
#define TIMER_END( iterations ) \
	ticks = __rdtsc() - ticks; \
	LARGE_INTEGER frequency, end; \
	QueryPerformanceCounter( &end ); \
	QueryPerformanceFrequency( &frequency ); \
	DWORD64 it = iterations; \
	DWORD64 diff = end.QuadPart - start.QuadPart; \
	_tprintf( _T("-->Finished %s in %I64d ms %I64d ticks\n"), szDescription, diff / ( frequency.QuadPart / 1000 ), ticks ); \
	_tprintf( _T("-->%I64d iterations - %I64d iterations/sec - %I64d ticks/iteration\n\n"), it, ( it * frequency.QuadPart ) / diff, ticks / it ); }

Technically you would have to subtract the duration of two __rdtsc() calls from the final result, but most likely the time taken is not measurable by QueryPerformanceCounter().
You could also use the above as a separate function:

typedef struct TIMER_DATA{ PCTSTR szDescription; DWORD64 iterations; LARGE_INTEGER start_time; DWORD64 start_ticks; } *PTIMER_DATA;
 
inline void timer_start( PTIMER_DATA data )
{
	_tprintf( _T("\n-->Starting %s...\n"), ( data->szDescription == NULL ) ? _T("") : data->szDescription );
	QueryPerformanceCounter( &(data->start_time) ); \
	data->start_ticks = __rdtsc();
}
 
inline void timer_end( PTIMER_DATA data )
{
	DWORD64 ticks = __rdtsc() - data->start_ticks;
	LARGE_INTEGER frequency, end;
	QueryPerformanceCounter( &end );
	QueryPerformanceFrequency( &frequency );
	DWORD64 diff = end.QuadPart - data->start_time.QuadPart;
	_tprintf( _T("-->Finished %s in %I64d ms %I64d ticks\n"), ( data->szDescription == NULL ) ? _T("") : data->szDescription, diff / ( frequency.QuadPart / 1000 ), ticks );
	_tprintf( _T("-->%I64d iterations - %I64d iterations/sec - %I64d ticks/iteration\n\n"), data->iterations, ( data->iterations * frequency.QuadPart ) / diff, ticks / data->iterations );
}

Usage:

TIMER_START( _T("bla") )
 
/* Your code here */
 
TIMER_END( 200000 )
 
TIMER_DATA data;
data.szDescription = _T("bla");
data.iterations = 200000;
timer_start( &data );
 
/* Your code here */
 
timer_end( &data );

Especially when optimizing tight loops it is recommended to measure the execution time of the complete loop and then divide the result by the number of iterations instead of measuring each iteration individually.