From 77c2254db97075b531fea67b761866431c288f19 Mon Sep 17 00:00:00 2001
From: Sean Hefty <sean.hefty@intel.com>
Date: Wed, 9 May 2012 15:50:18 -0700
Subject: [PATCH] refresh

---
 meta                 |  7 +++----
 patches/refresh-temp | 10 ----------
 patches/rs-locking   | 26 ++++++++++++++++++--------
 3 files changed, 21 insertions(+), 22 deletions(-)
 delete mode 100644 patches/refresh-temp

diff --git a/meta b/meta
index f83a6c57..1f8175dd 100644
--- a/meta
+++ b/meta
@@ -1,9 +1,8 @@
 Version: 1
-Previous: 4e36919008dc64ad55216a38e5e4c45ae31f4f6e
-Head: 605927186f8fa864e9554e1bc398ed8b3c44c4e0
+Previous: 6698a30f94c23cb77d250d18345deb02eefef507
+Head: e69294e4ca7fb71f62ca32fc9f0ab385813cfe58
 Applied:
-  rs-locking: c4f4253dedaacc9fa27726bda6f1167b280576ad
-  refresh-temp: 605927186f8fa864e9554e1bc398ed8b3c44c4e0
+  rs-locking: e69294e4ca7fb71f62ca32fc9f0ab385813cfe58
 Unapplied:
   comp_locks: b89aab130b4619806557e11e6b9c10964f00743f
   preload: 5dfe7abc07064485c5100e04e5412279244c2bc3
diff --git a/patches/refresh-temp b/patches/refresh-temp
deleted file mode 100644
index 34f46ba2..00000000
--- a/patches/refresh-temp
+++ /dev/null
@@ -1,10 +0,0 @@
-Bottom: b2e4cd7d670a626fa22b05cff6a1dccc973e9f0d
-Top:    b2e4cd7d670a626fa22b05cff6a1dccc973e9f0d
-Author: Sean Hefty <sean.hefty@intel.com>
-Date:   2012-05-09 15:25:37 -0700
-
-Refresh of rs-locking
-
----
-
-
diff --git a/patches/rs-locking b/patches/rs-locking
index d78bafc6..b037eeae 100644
--- a/patches/rs-locking
+++ b/patches/rs-locking
@@ -5,14 +5,24 @@ Date:   2012-05-07 17:16:47 -0700
 
 rsockets: Optimize synchronization to improve performance
 
-Performance analysis using VTune showed that pthread_mutex_unlock()
-is the single biggest contributor to increasing latency for 64-byte
-transfers.  Unlocked was followed by get_sw_cqe(), then
-__pthread_mutex_lock().  Replace the use of mutexes with an atomic
-and a semaphore.  When there's no contention for the lock (which
-would usually be the case when using nonblocking sockets), the
-code simply increments and decrements an atomic varible.  Semaphores
-are only used when contention occurs.
+Hotspot performance analysis using VTune showed pthread_mutex_unlock()
+as the most significant hotspot when transferring small messages using
+rstream.  To reduce the impact of using pthread mutexes, replace it
+with a custom lock built using an atomic variable and a semaphore.
+When there's no contention for the lock (which is the expected case
+for nonblocking sockets), the synchronization is reduced to
+incrementing and decrementing an atomic variable.
+
+A test that acquired and released a lock 2 billion times reported that
+the custom lock was roughly 20% faster than using the mutex.
+26.6 seconds versus 33.0 seconds.
+
+Unfortunately, further analysis showed that using the custom lock
+provided a minimal performance gain on rstream itself, and simply
+moved the hotspot to the custom unlock call.  The hotspot is likely
+a result of some other interaction, rather than caused by slowness
+in releasing a lock.  However, we keep the custom lock based on
+the results of the direct lock tests that were done.
 
 Signed-off-by: Sean Hefty <sean.hefty@intel.com>
 
-- 
2.41.0