Version: 1
-Previous: d694872ccf13424dbdbafdb6089b0cc1109595fd
-Head: 5658ff385e0449a78a325d430163e524b7a97ec4
+Previous: 4e7c4df1d759a22b94dceda9738bf1713ad74b21
+Head: d463576a25bf301625544eba23337232e36d3115
Applied:
+ rs-locking: d463576a25bf301625544eba23337232e36d3115
Unapplied:
preload: 5dfe7abc07064485c5100e04e5412279244c2bc3
Hidden:
--- /dev/null
+Bottom: de666c51520c9988ea3a07e332fa0402fdef6010
+Top: de666c51520c9988ea3a07e332fa0402fdef6010
+Author: Sean Hefty <sean.hefty@intel.com>
+Date: 2012-05-07 17:16:47 -0700
+
+rsockets: Optimize synchronization to improve performance
+
+Performance analysis using VTune showed that pthread_mutex_unlock()
+is the single biggest contributor to increasing latency for 64-byte
+transfers. Unlocked was followed by get_sw_cqe(), then
+__pthread_mutex_lock(). Replace the use of mutexes with an atomic
+and a semaphore. When there's no contention for the lock (which
+would usually be the case when using nonblocking sockets), the
+code simply increments and decrements an atomic varible. Semaphores
+are only used when contention occurs.
+
+Signed-off-by: Sean Hefty <sean.hefty@intel.com>
+
+
+---
+
+