git.openfabrics.org - ~shefty/librdmacm.git/commit

author	Sean Hefty <sean.hefty@intel.com>
	Tue, 8 May 2012 00:16:47 +0000 (17:16 -0700)
committer	Sean Hefty <sean.hefty@intel.com>
	Tue, 8 May 2012 00:16:47 +0000 (17:16 -0700)
commit	ec6a8efe211b0dc98548443c2e0d67e2c355351f
tree	3fcdf48082614bb29c44242534cfeecb47e07114	tree \| snapshot
parent	5658ff385e0449a78a325d430163e524b7a97ec4	commit \| diff

rsockets: Optimize synchronization to improve performance

Hotspot performance analysis using VTune showed pthread_mutex_unlock()
as the most significant hotspot when transferring small messages using
rstream.  To reduce the impact of using pthread mutexes, replace it
with a custom lock built using an atomic variable and a semaphore.
When there's no contention for the lock (which is the expected case
for nonblocking sockets), the synchronization is reduced to
incrementing and decrementing an atomic variable.

A test that acquired and released a lock 2 billion times reported that
the custom lock was roughly 20% faster than using the mutex.
26.6 seconds versus 33.0 seconds.

Unfortunately, further analysis showed that using the custom lock
provided a minimal performance gain on rstream itself, and simply
moved the hotspot to the custom unlock call.  The hotspot is likely
a result of some other interaction, rather than caused by slowness
in releasing a lock.  However, we keep the custom lock based on
the results of the direct lock tests that were done.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

configure.in		diff \| blob \| history
src/cma.h		diff \| blob \| history
src/rsocket.c		diff \| blob \| history