Version: 1
-Previous: 4e36919008dc64ad55216a38e5e4c45ae31f4f6e
-Head: 605927186f8fa864e9554e1bc398ed8b3c44c4e0
+Previous: 6698a30f94c23cb77d250d18345deb02eefef507
+Head: e69294e4ca7fb71f62ca32fc9f0ab385813cfe58
Applied:
- rs-locking: c4f4253dedaacc9fa27726bda6f1167b280576ad
- refresh-temp: 605927186f8fa864e9554e1bc398ed8b3c44c4e0
+ rs-locking: e69294e4ca7fb71f62ca32fc9f0ab385813cfe58
Unapplied:
comp_locks: b89aab130b4619806557e11e6b9c10964f00743f
preload: 5dfe7abc07064485c5100e04e5412279244c2bc3
rsockets: Optimize synchronization to improve performance
-Performance analysis using VTune showed that pthread_mutex_unlock()
-is the single biggest contributor to increasing latency for 64-byte
-transfers. Unlocked was followed by get_sw_cqe(), then
-__pthread_mutex_lock(). Replace the use of mutexes with an atomic
-and a semaphore. When there's no contention for the lock (which
-would usually be the case when using nonblocking sockets), the
-code simply increments and decrements an atomic varible. Semaphores
-are only used when contention occurs.
+Hotspot performance analysis using VTune showed pthread_mutex_unlock()
+as the most significant hotspot when transferring small messages using
+rstream. To reduce the impact of using pthread mutexes, replace it
+with a custom lock built using an atomic variable and a semaphore.
+When there's no contention for the lock (which is the expected case
+for nonblocking sockets), the synchronization is reduced to
+incrementing and decrementing an atomic variable.
+
+A test that acquired and released a lock 2 billion times reported that
+the custom lock was roughly 20% faster than using the mutex.
+26.6 seconds versus 33.0 seconds.
+
+Unfortunately, further analysis showed that using the custom lock
+provided a minimal performance gain on rstream itself, and simply
+moved the hotspot to the custom unlock call. The hotspot is likely
+a result of some other interaction, rather than caused by slowness
+in releasing a lock. However, we keep the custom lock based on
+the results of the direct lock tests that were done.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>