From 76c7726407c207add3ce569d6100724482d92f85 Mon Sep 17 00:00:00 2001 From: Tziporet Koren Date: Sun, 10 May 2009 16:06:00 +0300 Subject: [PATCH] updates for 1.4.1 Signed-off-by: Tziporet Koren --- OFED_release_notes.txt | 13 +-- ipoib_release_notes.txt | 74 ++++++++++++---- mlx4_release_notes.txt | 34 ++++++-- mstflint_release_notes.txt | 7 ++ mvapich_release_notes.txt | 6 +- opensm_release_notes.txt | 13 ++- sdp_release_notes.txt | 169 ++++++++----------------------------- 7 files changed, 150 insertions(+), 166 deletions(-) diff --git a/OFED_release_notes.txt b/OFED_release_notes.txt index 0f401d6..87be635 100644 --- a/OFED_release_notes.txt +++ b/OFED_release_notes.txt @@ -1,7 +1,7 @@ Open Fabrics Enterprise Distribution (OFED) - Version 1.4.1-rc4 + Version 1.4.1-rc5 Release Notes - April 2009 + May 2009 =============================================================================== @@ -31,7 +31,7 @@ Note: If you plan to upgrade the OFED package on your cluster, please upgrade all of its nodes to this new version. -1.1 OFED 1.4 Contents +1.1 OFED 1.4.1 Contents ----------------------- The OFED package contains the following components: - OpenFabrics core and ULPs: @@ -136,6 +136,7 @@ companies: - Qlogic - Flextronics - Sun + - Mellanox 1.5 Third Party Packages ------------------------ @@ -241,9 +242,9 @@ for each package in the docs directory. - NFS/RDMA: In beta qaulity with backports for RHEL 5.2, 5.3 and SLES 10 SP2 - Updated MPI packages: mvapich-1.1.0-3143 - Open MPI 1.3.1 -- Updated bonding package: ib-bonding-0.9.0-38 -- Updated DAPL: compat-dapl-1.2.13 and dapl-2.0.16 + Open MPI 1.3.2 +- Updated bonding package: ib-bonding-0.9.0-40 +- Updated DAPL: compat-dapl-1.2.14 and dapl-2.0.19 - Updated opensm version to include critical bug fixes - Fixed RDS iWARP support - Low level drivers updated: ehca, mlx4, cxgb3, nes, ipath diff --git a/ipoib_release_notes.txt b/ipoib_release_notes.txt index 82e9e1e..295be48 100644 --- a/ipoib_release_notes.txt +++ b/ipoib_release_notes.txt @@ -1,7 +1,7 @@ Open Fabrics Enterprise Distribution (OFED) - IPoIB in OFED 1.4 Release Notes + IPoIB in OFED 1.4.1 Release Notes - December 2008 + May 2009 =============================================================================== @@ -14,7 +14,8 @@ Table of Contents 5. The ib-bonding driver 6. Bug Fixes and Enhancements Since OFED 1.3 7. Bug Fixes and Enhancements Since OFED 1.3.1 -8. Performance tuning +8. Bug Fixes and Enhancements Since OFED 1.4 +9. Performance tuning =============================================================================== 1. Overview @@ -153,6 +154,16 @@ Usage and configuration: 11. The IPoIB module uses a Linux implementation for Large Receive Offload (LRO) in kernel 2.6.24 and later. These kernels require installing the "inet_lro" module. + +12. ConnectX only: If you have a port configured as ETH, and are running IPoIB + in connected mode -- and then change the port type to IB, the IPoIB mode + changes to datagram mode. + +13. When working with ISCSI, you must disable LRO (even if you are working in + connected mode). This is because there is a bug in older kernels which causes + a kernel panic. + + =============================================================================== 4. DHCP Support of IPoIB @@ -210,6 +221,7 @@ utility, read the documentation for the ib-bonding package. Notes: * Using /etc/infiniband/openib.conf to create a persistent configuration is no longer supported +* On RHEL4_U7, cannot set a slave interface as primary. =============================================================================== @@ -243,21 +255,47 @@ Notes: - Bonding: Set default number of grat. ARP after failover to three (was one) =============================================================================== -8. Performance tuning +8. Bug Fixes and Enhancements Since OFED 1.4 +=============================================================================== +- Performance tuning is enabled by default for IPOIB CM. +- Clear IPOIB_FLAG_ADMIN_UP if ipoib_open fails +- disable napi while cq is being drained (bugzilla #1587) +- rdma_cm: Use rate from ipoib broadcast when joining ipoib multicast + When joining IPoIB multicast group, use the same rate as in the broadcast + group. Otherwise, if rdma_cm creates this group before IPoIB does, it might get + a different rate. This will cause IPoIB to fail joining to the same group later + on, because IPoIB has a strict rate selection. +- fix unprotected use of priv->broadcast in ipoib_mcast_join_task. +- Do not join broadcast group if interface is brought down + + +=============================================================================== +9. Performance tuning =============================================================================== -- In IPoIB connected mode, the throughput of medium and large messages can be - increased by setting the following TCP parameters as follows: - - /sbin/sysctl -w net.ipv4.tcp_timestamps=0 - /sbin/sysctl -w net.ipv4.tcp_sack=0 - /sbin/sysctl -w net.core.netdev_max_backlog=250000 - /sbin/sysctl -w net.core.rmem_max=16777216 - /sbin/sysctl -w net.core.wmem_max=16777216 - /sbin/sysctl -w net.core.rmem_default=16777216 - /sbin/sysctl -w net.core.wmem_default=16777216 - /sbin/sysctl -w net.core.optmem_max=16777216 - /sbin/sysctl -w net.ipv4.tcp_mem="16777216 16777216 16777216" - /sbin/sysctl -w net.ipv4.tcp_rmem="4096 87380 16777216" - /sbin/sysctl -w net.ipv4.tcp_wmem="4096 65536 16777216" +When IPoIB is configured to run in connected mode, tcp parameter tuning is +performed at driver startup -- to improve the throughput of medium and large +messages. +The driver startup scripts set the following TCP parameters as follows: + + net.ipv4.tcp_timestamps=0 + net.ipv4.tcp_sack=0 + net.core.netdev_max_backlog=250000 + net.core.rmem_max=16777216 + net.core.wmem_max=16777216 + net.core.rmem_default=16777216 + net.core.wmem_default=16777216 + net.core.optmem_max=16777216 + net.ipv4.tcp_mem="16777216 16777216 16777216" + net.ipv4.tcp_rmem="4096 87380 16777216" + net.ipv4.tcp_wmem="4096 65536 16777216" + +This tuning is effective only for connected mode. If you run in datagram mode, +it actually reduces performance. + +If you change the IPoIB run mode to "datagram" while the driver is running, +the tuned parameters do not get reset to their default values. We therefore +recommend that you change the IPoIB mode only while the driver is down +(by setting line "SET_IPOIB_CM=yes" to "SET_IPOIB_CM=no" in file +/etc/infiniband/openib.conf, and then restarting the driver). diff --git a/mlx4_release_notes.txt b/mlx4_release_notes.txt index 709037c..9527c7c 100644 --- a/mlx4_release_notes.txt +++ b/mlx4_release_notes.txt @@ -1,7 +1,7 @@ Open Fabrics Enterprise Distribution (OFED) ConnectX driver (mlx4) in OFED 1.4 Release Notes - December 2008 + May 2009 =============================================================================== @@ -10,8 +10,10 @@ Table of Contents 1. Overview 2. Supported Firmware Versions 3. VPI (Virtual Process Interconnect) -4. Infiniband new features and bug fixes -5. Known Issues +4. Infiniband new features and bug fixes since OFED 1.3.1 +5. Infiniband (mlx4_ib) new features and bug fixes since OFED 1.4 +6. Eth (mlx4_en) new features and bug fixes since OFED 1.4 +7. Known Issues =============================================================================== 1. Overview @@ -39,7 +41,7 @@ configurations, the driver is split into three modules: =============================================================================== - This release was tested with FW 2.6.000. - The minimal version to use is 2.3.000. -- To use both IB and Ethernet use FW version 2.6.0 +- To use both IB and Ethernet (VPI) use FW version 2.6.0 =============================================================================== 3. VPI (Virtual Protocol Interconnect) @@ -105,7 +107,7 @@ o Port type management: =============================================================================== -4. Infiniband new features and bug fixes +4. Infiniband new features and bug fixes since OFED 1.3.1 =============================================================================== Features that are enabled with FW 2.5.0 only: - Send with invalidate and Local invalidate send queue work requests. @@ -129,7 +131,26 @@ Non FW dependent features: =============================================================================== -5. Known Issues +5. Infiniband new features and bug fixes since OFED 1.4 +=============================================================================== +- Enable setting 4K MTU for ConnectX ports. +- Support optimized registration of huge pages backed memory. + With this optimization, the number of MTT entries used is significantly + lower than for regular memory, so the HCA will access registered memory with + fewer cache misses and improved performance. + For more information on this topic, please refer to Linux documentation file: + Documentation/vm/hugetlbpage.txt +- Do not enable blueflame sends if write combining is not available +- Add write combining support for for PPC64, and thus enable blueflame sends. +- Unregister IB device before executing CLOSE_PORT. + +=============================================================================== +6. Eth (mlx4_en) new features and bug fixes since OFED 1.4 +=============================================================================== +- Yevgeni - ... + +=============================================================================== +7. Known Issues =============================================================================== - mlx4_en driver is not supported on PPC64 and IA64 - The mlx4_en module uses a Linux implementation for Large Receive Offload @@ -158,6 +179,7 @@ In order to set mlx4 parameters, add the following line(s) to /etc/modpobe.conf: options mlx4_en parameter= mlx4_core parameters: + set_4k_mtu: attempt to set 4K MTU to all ConnectX ports (default 0) msi_x: attempt to use MSI-X if nonzero (default 1) enable_qos: Enable Quality of Service support in the HCA if > 0, (default 0) block_loopback Block multicast loopback packets if > 0 (default: 1) diff --git a/mstflint_release_notes.txt b/mstflint_release_notes.txt index 8f11333..e200d95 100644 --- a/mstflint_release_notes.txt +++ b/mstflint_release_notes.txt @@ -57,4 +57,11 @@ Table of Contents 4. Known Issues =============================================================================== +* In the very unlikely event that you get the following error message when + running mstflint: + Warning: memory access to device 0a:00.0 failed: Input/output error. + Warning: Fallback on IO: much slower, and unsafe if device in use. + *** buffer overflow detected ***: mstflint terminated + + simply run "mst start" and then re-run mstflint. diff --git a/mvapich_release_notes.txt b/mvapich_release_notes.txt index 09aef45..1526dda 100644 --- a/mvapich_release_notes.txt +++ b/mvapich_release_notes.txt @@ -1,7 +1,7 @@ Open Fabrics Enterprise Distribution (OFED) - OSU MPI MVAPICH-1.1.0, in OFED 1.4.0 Release Notes + OSU MPI MVAPICH-1.1.0, in OFED 1.4.r10 Release Notes - December 2008 + May 2009 =============================================================================== @@ -59,6 +59,8 @@ MVAPICH-1.1.0 has the following additional features: =============================================================================== 5. Known Issues =============================================================================== +- Shared memory broadcast optimization is disabled by default. + - MVAPICH MPI compiled on AMD x86_64 does not work with MVAPICH MPI compiled on Intel X86_64 (EM64t). Workaround: diff --git a/opensm_release_notes.txt b/opensm_release_notes.txt index 11223de..a7352d0 100644 --- a/opensm_release_notes.txt +++ b/opensm_release_notes.txt @@ -3,7 +3,7 @@ Version: OpenSM 3.2.x Repo: git://git.openfabrics.org/~sashak/management.git -Date: Dec 2008 +Date: May 2009 1 Overview ---------- @@ -239,6 +239,8 @@ are listed in Table 3. OpenSM prints list of "Invalid Cached Option" error messages. This does not affect OpenSM functionality. +* SMs do not hand-over when running on ConnectX in a switch-based topology. + 3 Unsupported IB Compliance Statements -------------------------------------- The following section lists all the IB compliance statements which @@ -320,6 +322,8 @@ information regarding each compliance statement. * Don't startup automatically on SuSE based systems +* Discovery bug, where some ports were leaved unlinked (without remote side). + 4.2 Other Bug Fixes * opensm/osm_console.c: fix seg fault when running "portstatus ca" in @@ -402,6 +406,13 @@ information regarding each compliance statement. * Other less critical or visible bugs were also fixed. +* opensm: update LFTs when entering master + +* opensm: invalidate routing cache when entering master state + +* opensm/osm_port_info_rcv.c: don't clear sw->need_update if port 0 is active + + 5 Main Verification Flows ------------------------- diff --git a/sdp_release_notes.txt b/sdp_release_notes.txt index 2f3fd09..533be6e 100644 --- a/sdp_release_notes.txt +++ b/sdp_release_notes.txt @@ -1,7 +1,7 @@ Open Fabrics Enterprise Distribution (OFED) - SDP in OFED 1.4 Release Notes + SDP in OFED 1.4.1 Release Notes - December 2008 + May 2009 @@ -9,18 +9,19 @@ Table of Contents =============================================================================== 1. Overview -2. Bug Fixes and Enhancements -3. Known Issues -4. Verification Applications/Flows/Tests +2. Bug Fixes and Enhancements since OFED 1.3 +3. Bug Fixes and Enhancements since OFED 1.4 +4. Known Issues +5. Verification Applications/Flows/Tests =============================================================================== 1. Overview =============================================================================== -SDP in OFED is at GA level for OFED 1.4. +SDP in OFED is at GA level for OFED 1.4.1 =============================================================================== -2. Bug Fixes and Enhancements +2. Bug Fixes and Enhancements since OFED 1.3 =============================================================================== * Cleanup - Compilation warnings @@ -37,6 +38,16 @@ SDP in OFED is at GA level for OFED 1.4. - Having now full windows interoperability. +=============================================================================== +2. Bug Fixes and Enhancements since OFED 1.4 +=============================================================================== +SDP: +- BUG1311 Netpipe fails with a IB_WC_LOC_LEN_ERR. +- BUG1472 - clean socket timeouts and refcount when device is removed +- BUG1502 - scheduling while atomic +- BUG1309 - SDP close is slow + fix recv buffer initial size setting +- BUG1087 - fixed recovery from failing rdma_create_qp() + =============================================================================== 3. Known Issues =============================================================================== @@ -49,12 +60,6 @@ SDP in OFED is at GA level for OFED 1.4. - TCP allows connecting to IP_ANY - 0.0.0.0 (as a destination address!). SDP does not allow - and will reject the connection. -- BUG1309 - sometimes SDP close connection takes longer than TCP close. - -- BUG1256 - libsdp does not support epoll - -- BUG1087 - sometimes libsdp does not recover well when host is running out of QPs. - - Each SDP socket currently consumes up to 2 MBytes of memory. If this value is high for your installation, it is possible to trade off performance for lower memory utilization per socket by reducing the value of the @@ -116,124 +121,12 @@ SDP in OFED is at GA level for OFED 1.4. - Various Java client server applications (SUN:jre, BEA:jrockit/WebLogic, GNU:gij/gcj) - Many UNIX utilities to verify that pre-load did not harm the applications -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Open Fabrics Enterprise Distribution (OFED) - SDP in OFED 1.4 Release Notes - - December 2008 - - - -=============================================================================== -Table of Contents -=============================================================================== -1. Overview -2. Bug Fixes and Enhancements -3. Known Issues -4. Verification Applications/Flows/Tests - -=============================================================================== -1. Overview -=============================================================================== -SDP in OFED is at GA level for OFED 1.3. - - -=============================================================================== -2. Bug Fixes and Enhancements -=============================================================================== -* Fixes for SDP specification compliance - - OOB data not marked as solicited (bug 596) - - DisConn, ChRcvBuf, ChRcvBufAck marked solicited (bug 644) - - Do not send DisConn if only 1 credit (bug 646) - - Validate ChRcvBuf range (bug 647) - -* Cleanup - - Compilation warnings - - New kernel support - -* New function - - SIOCOUTQ ioctl support - - Add keepalive support - - New /sys options: sdp_keepalive_probes_sent, sdp_keepalive_time - - New options: SOCK_KEEPALIVE, TCP_KEEPIDLE - - Add Zero copy bcopy support (bzcopy) - - New /sys option: sdp_zcopy_thresh - -* Bugs fixed - - Resize buffers if out of credits (bug 556) - - Resize using skb_put (bug 620) - - Move to accept queue on RTU drop and DREQ (bug 645) - - Modify memory allocation to support in kernel users - - Fix reference count but that prevents driver unload - - connect() now allows AF_INET_SDP and AF_INET (bug 294) - - poll() always returns POLLOUT on non-blocking socket (bug 829) - - Executing netperf with TCP_CORK never ends (bug 837) - - -=============================================================================== -3. Known Issues -=============================================================================== -- Each SDP socket currently consumes up to 2 MBytes of memory. If this value - is high for your installation, it is possible to trade off performance - for lower memory utilization per socket by reducing the value of the - "rcvbuf_scale" module parameter (default: 16). - - Note: the minimum legal value for this parameter is 1. - At this parameter value, each socket will consume approximately 128 KBytes. - -- Small message size performance is low when messages are sent by client - at a rate lower than the rate at which they are consumed by server, - and when TCP_CORK is not set. This is observed, for example, with iperf - benchmark. As a workaround, set the TCP_CORK socket option - to ensure data is sent in at least 32K byte chunks. - -- Performance is low on 32-bit kernels, as SDP utilizes high memory - to ease memory pressure. Moving to a 64-bit kernel solves this - problem even if the application remains a 32-bit one. - -- By default, SDP utilizes a 2 Kbyte MTU size. This may cause PCI-X cards - using Mellanox Technologies "Infinihost" HCAs to experience low bandwidth. - Workaround: reset the MTU size to 1K in this situation, using either of - the two methods below: - - 1. Activate the "tavor quirk" workaround in opensm: - a. Create an opensm options cache file (/var/cache/osm/opensm.opts): - > opensm --cache-options -o - b. Add the following line to /var/cache/osm/opensm.opts: - enable_quirks TRUE - c. Rerun opensm using your usual command line options to activate - the opensm quirk option. - - 2. Activate the "tavor quirk" workaround in cma: - set the tavor_quirk module parameter of the rdma_cm module to value 1 - (default: 0). - -- The new BZCOPY mode is only effective for large block transfers. - By setting the /sys parameter 'sdp_zcopy_thresh' to a non-zero value, a - non-standard SDP speedup is enabled. All messages longer than - 'sdp_zcopy_thresh' bytes in length will cause the user space buffer to - be pinned and the data sent directly from the original buffer. This - results in less CPU use and, on many systems, much better bandwidth. - The default 64K value for 'sdp_zcopy_thresh' is sometimes too low for - some systems. You must experiment with your hardware to select the - best value. - -- Windows interoperability - The Windows version of SDP does not support resizing buffers using the - standard protocol messages. There will sometimes be inter-operability - problems for this reason. - -=============================================================================== -4. Verification Applications/Flows/Tests -=============================================================================== -See the corresponding section in the SDP release notes above. - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Open Fabrics Enterprise Distribution (OFED) - libsdp v. 9382 in OFED 1.4 Release Notes + libsdp v. 9382 in OFED 1.4.1 Release Notes - December 2008 + May 2009 =============================================================================== @@ -242,8 +135,9 @@ Table of Contents 1. Overview 2. New Features 3. Bug Fixes -4. Known Issues -5. Verification Applications/Flows/Tests +4. Bug Fixes and Enhancements since OFED 1.4 +5. Known Issues +6. Verification Applications/Flows/Tests =============================================================================== 1. Overview @@ -266,9 +160,8 @@ is 1.3. * Add libsdp-devel sub-package - =============================================================================== -3 Bug Fixes +3. Bug Fixes =============================================================================== The following list of bugs were fixed. Note that other less critical or visible bugs were also fixed. @@ -286,7 +179,17 @@ or visible bugs were also fixed. returning -1. =============================================================================== -4. Known Issues +4. Bug Fixes and Enhancements since OFED 1.4 +=============================================================================== +libsdp: +* Enable building libsdp on Solaris +* BUG1256 - Add epoll support + +sdpnetstat: +* BUF1513 - sdpnetstat is not showing all the listening processes on ipv6 sockets. + +=============================================================================== +5. Known Issues =============================================================================== * libsdp cannot provide its socket switch functionality for executables statically linked with libc. @@ -296,7 +199,7 @@ or visible bugs were also fixed. =============================================================================== -5. Verification Applications/Flows/Tests +6. Verification Applications/Flows/Tests =============================================================================== See the corresponding section in the SDP release notes above. -- 2.46.0