From 60afd9682aed92cc7f95464b9460ca7272ca1b97 Mon Sep 17 00:00:00 2001 From: Amir Vadai Date: Sun, 19 Dec 2010 14:36:05 +0200 Subject: [PATCH] sdp: Technical reporter fixes - Fixed typo and style - Added Module Parameters chapter - Added ZCopy chapter Signed-off-by: Amir Vadai --- sdp_release_notes.txt | 213 +++++++++++++++++++++++++++++------------- 1 file changed, 150 insertions(+), 63 deletions(-) diff --git a/sdp_release_notes.txt b/sdp_release_notes.txt index 5c5ab25..903599d 100644 --- a/sdp_release_notes.txt +++ b/sdp_release_notes.txt @@ -1,7 +1,7 @@ Open Fabrics Enterprise Distribution (OFED) - SDP in OFED 1.5.2 Release Notes + SDP in MLNX_OFED 1.5.2 Release Notes - August 2010 + December 2010 @@ -10,25 +10,28 @@ Table of Contents =============================================================================== 1. Overview 2. Bug Fixes and Enhancements since OFED 1.5.2 -3. Known Issues -4. Verification Applications/Flows/Tests +3. ZCopy +4. Known Issues +5. Verification Applications/Flows/Tests +6. Module Parameters =============================================================================== 1. Overview =============================================================================== +Sockets Direct Protocol (SDP) is an InfiniBand byte-stream transport protocol +that provides TCP stream semantics. Capable of utilizing InfiniBand's advanced +protocol offload capabilities, SDP can provide lower latency, higher bandwidth, +and lower CPU utilization than IPoIB or +Ethernet running some sockets-based applications. + SDP in OFED is at GA level for MLNX OFED 1.5.2 -Main changes are: -- Inline + blueflame support -- Stability issues -- Bug fixes -Missing features: -- AIO support -- ZCopy pipeline mode -- BUG2160 - Use TCP port space - will enable libsdp bind both TCP and SDP - sockets in an atomic operation. -- BUG2147 - Support ZCopy when accessing socket in multithreaded environment -- Use fast reg mr's instead of fmr's +=============================================================================== +2. Main Features and Changes +=============================================================================== +- Added support for Inline and blueflame +- Improved stability issues +- Bug fixes =============================================================================== 2. Bug Fixes and Enhancements since OFED 1.5.2 @@ -47,24 +50,47 @@ Missing features: - sdpprf was moved from /proc to debugfs/sdp - debugfs/ - Socket history + +=============================================================================== +3. ZCopy +=============================================================================== +- ZCopy is enabled by default for blocks larger than 64K. ZCopy can be disabled + by setting the module paramter sdp_zcopy_thresh to zero or to any other value + by setting it to another non zero value. +- ZCOPY mode gives good performance for large blocks with very small cpu + utilization. When in use, all messages longer than 'sdp_zcopy_thresh' bytes + in length will cause the user space buffer to be pinned and the data sent + directly from the original buffer. This results in less CPU usage and on many + systems in enhanced bandwidth. + ZCOPY is most efficient with multi stream jobs and it performs better as the + message size increases. + The default 64K value for 'sdp_zcopy_thresh' is sometimes too low for some + systems. You must experiment with your hardware to select the best value. + +- ZCOPY vs BCOPY: + ZCOPY performance is more efficient in weak cpu and multi streams, whereas + BCOPY is more efficient in single stream. + =============================================================================== -3. Known Issues +4. Known Issues =============================================================================== -- Sometimes socket bind is failed with EINVAL, because TCP socket was binded - successfully but SDP was occupied. See Bugzilla 2159 and Bugzilla 2160 +- SDP is at beta level on Infinihost HCA family -- when SO_REUSEADDR is set, can't bind more than one socket to IP_ANY and a - specific port. TCP does allow doing that unless one of the sockets is - listening. +- Occasionally, socket bind fails when using EINVAL. Although TCP socket is binded + successfully, SDP is occupied, thus causing the socket bind failure. + See Bugzilla 2159 and Bugzilla 2160 -- BUG 1331 - TCP allows connecting to IP_ANY - 0.0.0.0 (as a destination address!). - SDP does not allow connecting to IP_ANY and will reject the connection. +- When SO_REUSEADDR is set, only a single socket can be bind to the IP_ANY and a + specific port. TCP limitation, unless one of the sockets is listening. -- BUG 1444 - The setsockopt(SO_RCVBUF) is not working in sdp socket. To limit top - system wide sdp memory usage for recv, use the module parameter top_mem_usage. +- BUG 1331 - Although TCP allows connecting to IP_ANY - 0.0.0.0 + (as a destination address!), SDP does not allow connecting to the IP_ANY + and rejects the connection. -- SDP is at beta level on Infinihost HCA family +- BUG 1444 - The setsockopt(SO_RCVBUF) is not functional in sdp socket. + To limit top system wide sdp memory usage for recv, + use the module parameter top_mem_usage. - Each SDP socket currently consumes up to 2 MBytes of memory. If this value is high for your installation, it is possible to trade off performance @@ -72,21 +98,22 @@ Missing features: "rcvbuf_scale" module parameter (default: 16). Note: The minimum legal value for the "rcvbuf_scale" module is 1. - At this parameter value, each socket will consume approximately 128 KBytes. + At this parameter value, each socket will consume approximately 128 KBytes. - Small message size performance is low when messages are sent by client at a rate lower than the rate at which they are consumed by server, and when TCP_CORK is not set. This is observed, for example, with iperf - benchmark. As a workaround, set the TCP_CORK socket option + benchmark. + Workaround: Set the TCP_CORK socket option to ensure data is sent in at least 32K byte chunks. - Performance is low on 32-bit kernels, as SDP utilizes high memory - to ease memory pressure. Moving to a 64-bit kernel solves this - problem even if the application remains a 32-bit one. + to ease memory pressure. + Workaround: Move to a 64-bit kernel if the application remains a 32-bit one. - By default, SDP utilizes a 2 Kbyte MTU size. This may cause PCI-X cards using Mellanox Technologies "Infinihost" HCAs to experience low bandwidth. - Workaround: reset the MTU size to 1K in this situation, using either of + Workaround: Reset the MTU size to 1K in this situation, using either of the two methods below: 1. Activate the "tavor quirk" workaround in opensm: @@ -101,46 +128,21 @@ Missing features: set the tavor_quirk module parameter of the rdma_cm module to value 1 (default: 0). -- When waiting for RX, driver first poll and then arm interrupt and goes to - sleep. polling duration could be set by recv_poll module parameter. The - higher this value is, the CPU utilization is higher, and number of +- When waiting for RX, the driver first polls, arms interrupt and then goes to + sleep. Polling duration could be set by recv_poll module parameter. The + higher this value is, the higher the CPU utilization is, and the number of interrupts is lower. This should be fine tuned according to the specific environment and application latency. -- ZCopy is enabled by default for blocks larger than 64K. ZCopy can be disabled - by setting the module paramter sdp_zcopy_thresh to zero or to any other value - by setting it to another non zero value. +- When using SDP over RoCE, and the peer has a card that does not support RoCE + a delay in the connection establishment may occur. -- ZCOPY mode gives good performance for large blocks with very small cpu - utilization. When in use, all messages longer than 'sdp_zcopy_thresh' bytes - in length will cause the user space buffer to be pinned and the data sent - directly from the original buffer. This results in less CPU usage and on many - systems in enhanced bandwidth. - ZCOPY is most efficient with multi stream jobs and it performs better as the - message size increases. - The default 64K value for 'sdp_zcopy_thresh' is sometimes too low for some - systems. You must experiment with your hardware to select the best value. - -- ZCOPY vs BCOPY: - ZCOPY performance is more efficient in weak cpu and multi streams, whereas - BCOPY is more efficient in single stream. - -- To disable using SDP over RoCE, set 'sdp_link_layer_ib_only' module parameter - to 1. - -- to enable debugging of data path, compile driver with CONFIG_SDP_DEBUG_DATA. - traces are stored in a cyclic buffer in debufs/sdpprf. - To dump trace to dmesg, use sdp_debug_level: - bit 0: trace packets - bit 1: trace SDP driver internals - -- BUG2185 - kernel panic at sdp thresholds_test when accessing sdpstats - It is reported that sometimes accessing /proc/net/sdpstats causes kernel +- BUG2185 - Occasionally, accessing /proc/net/sdpstats, causes kernel panic. =============================================================================== -4. Verification Applications/Flows/Tests +5. Verification Applications/Flows/Tests =============================================================================== - ssh/sshd - wget/netscape/firefox/apache @@ -155,4 +157,89 @@ Missing features: - Various Java client server applications (SUN:jre, BEA:jrockit/WebLogic, GNU:gij/gcj) - Many UNIX utilities to verify that pre-load did not harm the applications +=============================================================================== +6. Module Parameters +=============================================================================== + +General +------- +sdp_link_layer_ib_only: + Supports only link layer of type InfiniBand. + It is useful when not using SDP over RoCE. + +sdp_debug_level: + Enables connection establishment and teardown debug tracing. + +sdp_data_debug_level: + Enables datapath debug tracing. If set to 1, it shows only packets >1. + To enable debugging of data path, compile driver with CONFIG_SDP_DEBUG_DATA. + + +recv_poll: + Enables poll receiving before arming the interrupt. Set a higher value + to decrease the number of RX interrupts. Consequently, the CPU + utilization will be higher. + +sdp_keepalive_time: + Default idle time in seconds before keepalive probe sent. + +Resources +--------- +rcvbuf_initial_size: + Receives buffer initial size in bytes. + +rcvbuf_scale: + Not in use + +top_mem_usage: + Top system wide sdp memory usage for recv (in MB). + +max_large_sockets: + Not in use + +sdp_fmr_pool_size: + Number of FMRs to allocate for pool + +sdp_fmr_dirty_wm: + Watermark to flush fmr pool + +Thresholds +---------- +sdp_inline_thresh: + Inline copy threshold. effective to new sockets only; 0=Off. + +sdp_zcopy_thresh: + Zero copy using RDMA threshold; 0=Off. + If smaller than page size, set to page size. + +Interrupt hardware moderation: +------------------------------ +sdp_rx_coal_target: + Target number of bytes to coalesce with interrupt moderation. + +sdp_rx_coal_time: + rx coal time (jiffies). + +sdp_rx_rate_low: + rx_rate low (packets/sec). + +sdp_rx_coal_time_low: + low moderation usec. + +sdp_rx_rate_high: + rx_rate high (packets/sec). + +sdp_rx_coal_time_high: + high moderation usec. + +sdp_rx_rate_thresh: + rx rate thresh (). + +sdp_sample_interval: + sample interval (jiffies). + +hw_int_mod_count: + Forced hw int moderation val. -1 for auto (packets). 0 to disable. +hw_int_mod_usec: + Forced hw int moderation val. -1 for auto (usec). 0 to disable. -- 2.41.0