From 77c2254db97075b531fea67b761866431c288f19 Mon Sep 17 00:00:00 2001 From: Sean Hefty Date: Wed, 9 May 2012 15:50:18 -0700 Subject: [PATCH] refresh --- meta | 7 +++---- patches/refresh-temp | 10 ---------- patches/rs-locking | 26 ++++++++++++++++++-------- 3 files changed, 21 insertions(+), 22 deletions(-) delete mode 100644 patches/refresh-temp diff --git a/meta b/meta index f83a6c57..1f8175dd 100644 --- a/meta +++ b/meta @@ -1,9 +1,8 @@ Version: 1 -Previous: 4e36919008dc64ad55216a38e5e4c45ae31f4f6e -Head: 605927186f8fa864e9554e1bc398ed8b3c44c4e0 +Previous: 6698a30f94c23cb77d250d18345deb02eefef507 +Head: e69294e4ca7fb71f62ca32fc9f0ab385813cfe58 Applied: - rs-locking: c4f4253dedaacc9fa27726bda6f1167b280576ad - refresh-temp: 605927186f8fa864e9554e1bc398ed8b3c44c4e0 + rs-locking: e69294e4ca7fb71f62ca32fc9f0ab385813cfe58 Unapplied: comp_locks: b89aab130b4619806557e11e6b9c10964f00743f preload: 5dfe7abc07064485c5100e04e5412279244c2bc3 diff --git a/patches/refresh-temp b/patches/refresh-temp deleted file mode 100644 index 34f46ba2..00000000 --- a/patches/refresh-temp +++ /dev/null @@ -1,10 +0,0 @@ -Bottom: b2e4cd7d670a626fa22b05cff6a1dccc973e9f0d -Top: b2e4cd7d670a626fa22b05cff6a1dccc973e9f0d -Author: Sean Hefty -Date: 2012-05-09 15:25:37 -0700 - -Refresh of rs-locking - ---- - - diff --git a/patches/rs-locking b/patches/rs-locking index d78bafc6..b037eeae 100644 --- a/patches/rs-locking +++ b/patches/rs-locking @@ -5,14 +5,24 @@ Date: 2012-05-07 17:16:47 -0700 rsockets: Optimize synchronization to improve performance -Performance analysis using VTune showed that pthread_mutex_unlock() -is the single biggest contributor to increasing latency for 64-byte -transfers. Unlocked was followed by get_sw_cqe(), then -__pthread_mutex_lock(). Replace the use of mutexes with an atomic -and a semaphore. When there's no contention for the lock (which -would usually be the case when using nonblocking sockets), the -code simply increments and decrements an atomic varible. Semaphores -are only used when contention occurs. +Hotspot performance analysis using VTune showed pthread_mutex_unlock() +as the most significant hotspot when transferring small messages using +rstream. To reduce the impact of using pthread mutexes, replace it +with a custom lock built using an atomic variable and a semaphore. +When there's no contention for the lock (which is the expected case +for nonblocking sockets), the synchronization is reduced to +incrementing and decrementing an atomic variable. + +A test that acquired and released a lock 2 billion times reported that +the custom lock was roughly 20% faster than using the mutex. +26.6 seconds versus 33.0 seconds. + +Unfortunately, further analysis showed that using the custom lock +provided a minimal performance gain on rstream itself, and simply +moved the hotspot to the custom unlock call. The hotspot is likely +a result of some other interaction, rather than caused by slowness +in releasing a lock. However, we keep the custom lock based on +the results of the direct lock tests that were done. Signed-off-by: Sean Hefty -- 2.41.0