- OpenSM Release Notes 3.0.11
- =============================
-
-Version: OpenFabrics Enterprise Distribution (OFED) 1.2
-Repo: git://git.openfabrics.org/~ofed_1_2/management.git (release)
- git://git.openfabrics.org/~halr/management.git (development)
-Date: April 2007
-
-1 Overview
-----------
-This document describes the contents of the OpenSM OFED 1.2 release.
-OpenSM is an InfiniBand compliant Subnet Manager and Administration,
-and runs on top of OpenIB. The OpenSM version for this release
-is openib-3.0.11
-
-This document includes the following sections:
-1 This Overview section (describing new features and software
- dependencies)
-2 Known Issues And Limitations
-3 Unsupported IB compliance statements
-4 Major Bug Fixes
-5 Main Verification Flows
-6 Qualified software stacks and devices
-
-1.1 Major New Features
-
-* Routing improvements
- Two additional routing algorithms have been added in addition to
- performance improvements to the existing routing algorithms. The
- two new routing algorithms are FAT tree and LASH. LASH is
- experimental. See the opensm man page for additional details.
-
-* SA Optional Record support now "virtually" complete
- Includes SA InformInfo improvements and InformInfoRecord support in
- addition to support for the remaining SA optional records
- (MulticastForwardingTableRecord, SwitchInfoRecord). Also, SMInfoRecord
- support was improved to include all SMs found.
-
-* SA database dump/restore
- OpenSM now includes the ability to dump and restore the SA database.
- This allows for all SA registrations (multicast, services, and events)
- to be saved and restored later.
-
- In verbose mode, OpenSM will dump SA DB (existing multicast groups,
- services and InformInfo) into dump file which named "opensm-sa.dump"
- and located under standard OpenSM dump directory (/var/log by default).
-
- If option -S is specified and SA DB dump file name is provided, OpenSM
- will try to restore SA database from this file. And if restore is
- successful, OpenSM won't ask for client reregistration at subnet bring-up.
-
-* Modular routing for multicast
- In conjunction was SA database dump/restore, there is the ability to
- dump and load switch lid matrices (min hops tables) which are used
- for multicast route calculation.
-
-* IB router enablement
- OpenSM now supports router ports properly (in terms of PortInfo handling).
- There is also some experiment support for IB routers which is enabled via
- the ROUTER_EXP compile flag. This support includes SA PathRecord and
- MCMemberRecord support for off subnet GIDs.
-
-* Socket support added to console
- OpenSM console now supports remote in addition to local access.
- Remote access is currently via telnet.
-
-1.2 Minor New Features:
-
-* Change output format of DR path from hex to decimal port numbers
-
-* Log rotation
- The OpenSM log can now be rotated while OpenSM is running (without
- stopping and restarting OpenSM). This is accomplished via SIGUSR1.
-
-* Support scope for IPoIB multicast groups in partition config
-
-* Dump filename changed from subnet.lst to osm-subnet.lst
- Default temp directory for non Windows platforms was previously changed
- from /tmp to /var/log.
-
-* Add option for force SDR link speed
- Add option to opensm.opts to force link speed. Currently, only forcing
- to SDR link speed is supported.
-
-1.3 Library API Changes
-
- None
-
-1.4 Software Dependencies
-
-OpenSM depends on the installation of either OFED 1.2, OFED 1.1,
-OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD
-distribution), or Mellanox VAPI stacks. The qualified driver versions
-are provided in Table 2, "Qualified IB Stacks".
-
-1.5 Supported Devices Firmware
-
-The main task of OpenSM is to initialize InfiniBand devices. The
-qualified devices and their corresponding firmware versions
-are listed in Table 3.
-
-2 Known Issues And Limitations
-------------------------------
-
-* No Service / Key associations:
- There is no way to manage Service access by Keys.
-
-* No SM to SM SMDB synchronization:
- Puts the burden of re-registering services, multicast groups, and
- inform-info on the client application (or IB access layer core).
-
-* No "port down" event handling:
- Changing the switch port through which OpenSM connects to the IB
- fabric may cause incorrect operation. Please restart OpenSM whenever
- such a connectivity change is made.
-
-* Changing connections during SM operation:
- Under some conditions the SM can get confused by a change in
- cabling (moving a cable from one switch port to the other) and
- momentarily see this as having the same GUID appear connected
- to two different IB ports. Under some conditions, when the SM fails to
- get the corresponding change event it might mistakenly report this case
- as a "duplicated GUID" case and abort. It is advisable to double-check
- the syslog after each such change in connectivity and restart
- OpenSM if it has exited. The same error ("duplicated GUID") will
- also appear with a loopback plug.
-
-3 Unsupported IB Compliance Statements
---------------------------------------
-The following section lists all the IB compliance statements which
-OpenSM does not support. Please refer to the IB specification for detailed
-information regarding each compliance statement.
-
-* C14-22 (Authentication):
- M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
- SubnSet method. As a work-around, an OpenSM option is provided for
- defining the protect bits.
-
-* C14-67 (Authentication):
- On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
- the SM shall generate a SubnGetResp if the M_Key matches, or
- silently drop the packet if M_Key does not match.
-
-* C15-0.1.23.4 (Authentication):
- InformInfoRecords shall always be provided with the QPN set to 0,
- except for the case of a trusted request, in which case the actual
- subscriber QPN shall be returned.
-
-* o13-17.1.2 (Event-FWD):
- If no permission to forward, the subscription should be removed and
- no further forwarding should occur.
-
-* C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
- GUIDInfo - SM should enable assigning Port GUIDInfo.
-
-* C14-44 (Initialization):
- If the SM discovers that it is missing an M_Key to update CA/RT/SW,
- it should notify the higher level.
-
-* C14-62.1.1.12 (Initialization):
- PortInfo:M_Key - Set the M_Key to a node based random value.
-
-* C14-62.1.1.13 (Initialization):
- PortInfo:P_KeyProtectBits - set according to an optional policy.
-
-* C14-62.1.1.24 (Initialization):
- SwitchInfo:DefaultPort - should be configured for random FDB.
-
-* C14-62.1.1.32 (Initialization):
- RandomForwardingTable should be configured.
-
-* o15-0.1.12 (Multicast):
- If the JoinState is SendOnlyNonMember = 1 (only), then the endport
- should join as sender only.
-
-* o15-0.1.8 (Multicast):
- If a request for creating an MCG with fields that cannot be met,
- return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).
-
-* C15-0.1.8.6 (SA-Query):
- Respond to SubnAdmGetTraceTable - this is an optional attribute.
-
-* C15-0.1.13 Services:
- Reject ServiceRecord create, modify or delete if the given
- ServiceP_Key does not match the one included in the ServiceGID port
- and the port that sent the request.
-
-* C15-0.1.14 (Services):
- Provide means to associate service name and ServiceKeys.
-
-4 Major Bug Fixes
------------------
-
-The following is a list of bugs that were fixed. Note that other less critical
-or visible bugs were also fixed.
-
-* osm_port_info_rcv.c: In __osm_pi_rcv_process_endport,
- isSMdisabled also indicates that an SM is present so poll SMInfo
-
-* osm_sm_state_mgr.c: In __osm_sm_state_mgr_send_master_sm_info_req,
- handle master GUID port not found properly
-
-* osm_sa_multipath_record.c: In __osm_mpr_rcv_get_path_parms, return
- IB_NOT_FOUND rather than IB_ERROR when can't route to LID from switch
-
-* osm_sa_path_record.c: In __osm_pr_rcv_get_path_parms, return IB_NOT_FOUND
- rather than IB_ERROR when can't route to LID from switch
-
-* osm_vendor_ibumad.c: In osm_vendor_set_sm, set issmfd to
- -1 on open error
-
-* osm_vendor_ibumad: Termination crash fix
- When OpenSM is terminated umad_receiver thread still running even after
- the structures are destroyed and freed, this causes to random (but easily
- reproducible) crashes. The reason is that osm_vendor_delete() does not
- care about thread termination. This patch adds the receiver thread
- cancellation (by using pthread_cancel() and pthread_join()) and cares to
- keep have all mutexes unlocked upon termination. There is also minor
- termination code consolidation - osm_vendor_port_close() function.
-
-* osm_port_profile.h: Fix reinsertion issue in osm_port_prof_set_ignored_port
-
-* osm_matrix.h: Fix segfault with up/down and root nodes file
-
-* osm_sa_path_record.c: In osm_pr_rcv_process, fix endian of hop_limit
-
-* osm_vendor_ibumad.c: Close umad port in osm_vendor_delete
-
-* osm_sa_(multipath path)_record.c: Fix MultiPathRecord/PathRecord issues
- with using MTU/rate/PktLife explicitly ignoring selectors
-
- OpenSM just uses the resulting path MTU/rate/pkt-life and fail the
- query even though the selector might be allowing for selecting an
- appropriate value.
-
- After this fix, the following results are obtained for a case of
- path allowing maximal 2K MTU.
-
-In standard mode:
-------------------------------------------------------------
-MTU greater than ... 256 (0x01) -> equal to ....... 2K
-MTU less than ...... 256 (0x41) -> NO PATHS
-MTU equal to ....... 256 (0x81) -> equal to ....... 256
-MTU largest possible 256 (0xc1) -> equal to ....... 2K
-MTU greater than ... 512 (0x02) -> equal to ....... 2K
-MTU less than ...... 512 (0x42) -> equal to ....... 256
-MTU equal to ....... 512 (0x82) -> equal to ....... 512
-MTU largest possible 512 (0xc2) -> equal to ....... 2K
-MTU greater than ... 1K (0x03) -> equal to ....... 2K
-MTU less than ...... 1K (0x43) -> equal to ....... 512
-MTU equal to ....... 1K (0x83) -> equal to ....... 1K
-MTU largest possible 1K (0xc3) -> equal to ....... 2K
-MTU greater than ... 2K (0x04) -> NO PATHS
-MTU less than ...... 2K (0x44) -> equal to ....... 1K
-MTU equal to ....... 2K (0x84) -> equal to ....... 2K
-MTU largest possible 2K (0xc4) -> equal to ....... 2K
-MTU greater than ... 4K (0x05) -> NO PATHS
-MTU less than ...... 4K (0x45) -> equal to ....... 2K
-MTU equal to ....... 4K (0x85) -> NO PATHS
-MTU largest possible 4K (0xc5) -> equal to ....... 2K
-============================================================
-
-With enable_quirks (when one of the ends is a Tavor device):
-------------------------------------------------------------
-MTU greater than ... 256 (0x01) -> equal to ....... 1K
-MTU less than ...... 256 (0x41) -> NO PATHS
-MTU equal to ....... 256 (0x81) -> equal to ....... 256
-MTU largest possible 256 (0xc1) -> equal to ....... 2K
-MTU greater than ... 512 (0x02) -> equal to ....... 1K
-MTU less than ...... 512 (0x42) -> equal to ....... 256
-MTU equal to ....... 512 (0x82) -> equal to ....... 512
-MTU largest possible 512 (0xc2) -> equal to ....... 2K
-MTU greater than ... 1K (0x03) -> NO PATHS
-MTU less than ...... 1K (0x43) -> equal to ....... 512
-MTU equal to ....... 1K (0x83) -> equal to ....... 1K
-MTU largest possible 1K (0xc3) -> equal to ....... 2K
-MTU greater than ... 2K (0x04) -> NO PATHS
-MTU less than ...... 2K (0x44) -> equal to ....... 1K
-MTU equal to ....... 2K (0x84) -> equal to ....... 2K
-MTU largest possible 2K (0xc4) -> equal to ....... 2K
-MTU greater than ... 4K (0x05) -> NO PATHS
-MTU less than ...... 4K (0x45) -> equal to ....... 1K
-MTU equal to ....... 4K (0x85) -> NO PATHS
-MTU largest possible 4K (0xc5) -> equal to ....... 2K
-============================================================
-
-* osm_pkey_rcv.c: rwlock double release fix
- When the port is removed from subnet, but previously requested pkey
- table block is received after this - the lock will be released twice.
- This leads to deadlocks later when other MAD processor will try to
- acquire the same lock.
-
-* osm_sa_informinfo.c: Fix InformInfoRecord searches
-
-* Better SA MCMemberRecord leave locking
- Hold locked multicast group leave request (MCMember Record) processing.
- This prevents kind of race with multicast group join request where
- those requests can be reordered during processing.
-
-* osm_sa_informinfo.c: Conformance changes for subscribe component
-
-* osm_sa_path_record.c: Handle LID 0 as error
-
-* Fix comparing InformInfo records
- 1. The received InformInfo struct was modified before dumping it.
- 2. The function that compares InformInfo structures was just
- comparing the whole memory allocated for it, including reserved
- fields. Fixed to compare more selectively.
-
- As for QPN, from the IB spec, table 119 InformInfo:
- QPN : Ignored except when subscribe=0 (an unsubscribe
- request). Queue pair to which Report()s were sent as
- a result of a corresponding subscription. If no
- subscription for this Report() with this QPN exists,
- the request to unsubscribe performs no action and
- produces GetResp() with status indicating an invalid
- field value.
-
-* osm_trap_rcv.c: Reduce repeated trap messages so log doesn't fill
- so quickly
-
-* osm_helper.c: Fix stack smashing detected problem in osm_dump_service_record
-
-* Fix permission on db files directory
- When creating directory for db files (guid2lid) storing create it with
- reasonable permissions (current 777 decimal = octal 01411) and don't do
- it world writable.
-
-* Fix node_desc.description as string usages
-
-5 Main Verification Flows
--------------------------
-
-OpenSM verification is run using the following activities:
-* osmtest - a stand-alone program
-* ibmgtsim (IB management simulator) based - a set of flows that
- simulate clusters, inject errors and verify OpenSM capability to
- respond and bring up the network correctly.
-* small cluster regression testing - where the SM is used on back to
- back or single switch configurations. The regression includes
- multiple OpenSM dedicated tests.
-* cluster testing - when we run OpenSM to setup a large cluster, perform
- hand-off, reboots and reconnects, verify routing correctness and SA
- responsiveness at the ULP level (IPoIB and SDP).
-
-5.1 osmtest
-
-osmtest is an automated verification tool used for OpenSM
-testing. Its verification flows are described by list below.
-
-* Inventory File: Obtain and verify all port info, node info, link and path
- records parameters.
-
-* Service Record:
- - Register new service
- - Register another service (with a lease period)
- - Register another service (with service p_key set to zero)
- - Get all services by name
- - Delete the first service
- - Delete the third service
- - Added bad flows of get/delete non valid service
- - Add / Get same service with different data
- - Add / Get / Delete by different component mask values (services
- by Name & Key / Name & Data / Name & Id / Id only )
-
-* Multicast Member Record:
- - Query of existing Groups (IPoIB)
- - BAD Join with insufficient comp mask (o15.0.1.3)
- - Create given MGID=0 (o15.0.1.4)
- - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
- - Create BAD MGID=0xFA. (o15.0.1.6)
- - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
- - New MGID with invalid join state (o15.0.1.9)
- - Retry of existing MGID - See JoinState update (o15.0.1.11)
- - BAD RATE when connecting to existing MGID (o15.0.1.13)
- - Partial JoinState delete request - removing FullMember (o15.0.1.14)
- - Full Delete of a group (o15.0.1.14)
- - Verify Delete by trying to Join deleted group (o15.0.1.14)
- - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)
-
-* GUIDInfo Record:
- - All GUIDInfoRecords in subnet are obtained
-
-* MultiPathRecord:
- - Perform some compliant and noncompliant MultiPathRecord requests
- - Validation is via status in responses and IB analyzer
-
-* PKeyTableRecord:
- - Perform some compliant and noncompliant PKeyTableRecord queries
- - Validation is via status in responses and IB analyzer
-
-* LinearForwardingTableRecord:
- - Perform some compliant and noncompliant LinearForwardingTableRecord queries
- - Validation is via status in responses and IB analyzer
-
-* Event Forwarding: Register for trap forwarding using reports
- - Send a trap and wait for report
- - Unregister non-existing
-
-* Trap 64/65 Flow: Register to Trap 64-65, create traps (by
- disconnecting/connecting ports) and wait for report, then unregister.
-
-* Stress Test: send PortInfoRecord queries, both single and RMPP and
- check for the rate of responses as well as their validity.
-
-
-5.2 IB Management Simulator OpenSM Test Flows:
-
-The simulator provides ability to simulate the SM handling of virtual
-topologies that are not limited to actual lab equipment availability.
-OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
-regressions use smaller (16 and 128 nodes clusters).
-
-The following test flows are run on the IB management simulator:
-
-* Stability:
- Up to 12 links from the fabric are randomly selected to drop packets
- at drop rates up to 90%. The SM is required to succeed in bringing the
- fabric up. The resulting routing is verified to be correct as well.
-
-* LID Manager:
- Using LMC = 2 the fabric is initialized with LIDs. Faults such as
- zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
- randomly assigned to various nodes and other errors are randomly
- output to the guid2lid cache file. The SM sweep is run 5 times and
- after each iteration a complete verification is made to ensure that all
- LIDs that could possibly be maintained are kept, as well as that all nodes
- were assigned a legal LID range.
-
-* Multicast Routing:
- Nodes randomly join the 0xc000 group and eventually the
- resulting routing is verified for completeness and adherence to
- Up/Down routing rules.
-
-* osmtest:
- The complete osmtest flow as described in the previous table is run on
- the simulated fabrics.
-
-* Stress Test:
- This flow merges fabric, LID and stability issues with continuous
- PathRecord, ServiceRecord and Multicast Join/Leave activity to
- stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get
- were added to the test such both existing and non existing nodes
- perform them in random order.
-
-5.3 OpenSM Regression
-
-Using a back-to-back or single switch connection, the following set of
-tests is run nightly on the stacks described in table 2. The included
-tests are:
-
-* Stress Testing: Flood the SA with queries from multiple channel
- adapters to check the robustness of the entire stack up to the SA.
-
-* Dynamic Changes: Dynamic Topology changes, through randomly
- dropping SMP packets, used to test OpenSM adaptation to an unstable
- network & verify DB correctness.
-
-* Trap Injection: This flow injects traps to the SM and verifies that it
- handles them gracefully.
-
-* SA Query Test: This test exhaustively checks the SA responses to all
- possible single component mask. To do that the test examines the
- entire set of records the SA can provide, classifies them by their
- field values and then selects every field (using component mask and a
- value) and verifies that the response matches the expected set of records.
- A random selection using multiple component mask bits is also performed.
-
-5.4 Cluster testing:
-
-Cluster testing is usually run before a distribution release. It
-involves real hardware setups of 16 to 32 nodes (or more if a beta site
-is available). Each test is validated by running all-to-all ping through the IB
-interface. The test procedure includes:
-
-* Cluster bringup
-
-* Hand-off between 2 or 3 SM's while performing:
- - Node reboots
- - Switch power cycles (disconnecting the SM's)
-
-* Unresponsive port detection and recovery
-
-* osmtest from multiple nodes
-
-* Trap injection and recovery
-
-
-6 Qualification
-----------------
-
-Table 2 - Qualified IB Stacks
-=============================
-
-Stack | Version
------------------------------------------|--------------------------
-OFED | 1.2
-OFED | 1.1
-OFED | 1.0
-OpenIB Gen2 (IBG2 distribution) | 1.0
-OpenIB Gen1 (IBGD distribution) | 1.8.0
-VAPI (Mellanox InfiniBand HCA Driver) | 3.2 and later
-
-Table 3 - Qualified Devices and Corresponding Firmware
-======================================================
-
-Mellanox
-Device | FW versions
---------|-----------------------------------------------------------
-MT43132 | InfiniScale - fw-43132 5.2.0 (and later)
-MT47396 | InfiniScale III - fw-47396 0.5.0 (and later)
-MT23108 | InfiniHost - fw-23108 3.3.2 (and later)
-MT25204 | InfiniHost III Lx - fw-25204 1.0.1i (and later)
-MT25208 | InfiniHost III Ex (InfiniHost Mode) - fw-25208 4.6.2 (and later)
-MT25208 | InfiniHost III Ex (MemFree Mode) - fw-25218 5.0.1 (and later)
-
-QLogic/PathScale
-Device | Note
---------|-----------------------------------------------------------
-iPath | QHT6040 (PathScale InfiniPath HT-460)
-iPath | QHT6140 (PathScale InfiniPath HT-465)
-iPath | QLE6140 (PathScale InfiniPath PE-880)
-
-Note: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
-QP0 and QP1. However, it does support it as a device on the subnet.
-
+ OpenSM Release Notes 3.0.11\r
+ =============================\r
+\r
+Version: OpenFabrics Enterprise Distribution (OFED) 1.2\r
+Repo: git://git.openfabrics.org/~ofed_1_2/management.git (release)\r
+ git://git.openfabrics.org/~halr/management.git (development)\r
+Date: May 2007\r
+\r
+1 Overview\r
+----------\r
+This document describes the contents of the OpenSM OFED 1.2 release. \r
+OpenSM is an InfiniBand compliant Subnet Manager and Administration,\r
+and runs on top of OpenIB. The OpenSM version for this release \r
+is openib-3.0.11\r
+\r
+This document includes the following sections:\r
+1 This Overview section (describing new features and software\r
+ dependencies)\r
+2 Known Issues And Limitations\r
+3 Unsupported IB compliance statements\r
+4 Major Bug Fixes\r
+5 Main Verification Flows\r
+6 Qualified software stacks and devices\r
+\r
+1.1 Major New Features\r
+\r
+* Routing improvements\r
+ Two additional routing algorithms have been added in addition to\r
+ performance improvements to the existing routing algorithms. The\r
+ two new routing algorithms are FAT tree and LASH. See the \r
+ opensm man page for additional details.\r
+\r
+* SA Optional Record support now "virtually" complete\r
+ Includes SA InformInfo improvements and InformInfoRecord support in\r
+ addition to support for the remaining SA optional records \r
+ (MulticastForwardingTableRecord, SwitchInfoRecord). Also, SMInfoRecord \r
+ support was improved to include all SMs found.\r
+\r
+* SA database dump/restore\r
+ OpenSM now includes the ability to dump and restore the SA database.\r
+ This allows for all SA registrations (multicast, services, and events)\r
+ to be saved and restored later.\r
+\r
+ In verbose mode, OpenSM will dump SA DB (existing multicast groups,\r
+ services and InformInfo) into dump file which named "opensm-sa.dump"\r
+ and located under standard OpenSM dump directory (/var/log by default).\r
+\r
+ If option -S is specified and SA DB dump file name is provided, OpenSM\r
+ will try to restore SA database from this file. And if restore is\r
+ successful, OpenSM won't ask for client reregistration at subnet bring-up.\r
+\r
+* Modular routing for multicast\r
+ In conjunction was SA database dump/restore, there is the ability to\r
+ dump and load switch lid matrices (min hops tables) which are used\r
+ for multicast route calculation.\r
+\r
+* IB router enablement\r
+ OpenSM now supports router ports properly (in terms of PortInfo handling).\r
+ There is also some experimental support for IB routers which is enabled\r
+ via the ROUTER_EXP compile flag. This support includes SA PathRecord and \r
+ MCMemberRecord support for off subnet GIDs.\r
+\r
+* Socket support added to console\r
+ OpenSM console now supports remote in addition to local access.\r
+ Remote access is currently via telnet.\r
+\r
+1.2 Minor New Features:\r
+\r
+* Change output format of DR path from hex to decimal port numbers\r
+\r
+* Log rotation\r
+ The OpenSM log can now be rotated while OpenSM is running (without\r
+ stopping and restarting OpenSM). This is accomplished via SIGUSR1.\r
+\r
+* Support scope for IPoIB multicast groups in partition config\r
+\r
+* Dump filename changed from subnet.lst to osm-subnet.lst\r
+ Default temp directory for non Windows platforms was previously changed\r
+ from /tmp to /var/log.\r
+\r
+* Add option for force SDR link speed\r
+ Add option to opensm.opts to force link speed. Currently, only forcing\r
+ to SDR link speed is supported. This option is not supported as a\r
+ command line option.\r
+\r
+1.3 Library API Changes\r
+\r
+ None\r
+\r
+1.4 Software Dependencies\r
+\r
+OpenSM depends on the installation of either OFED 1.2, OFED 1.1,\r
+OFED 1.0, OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD\r
+distribution), or Mellanox VAPI stacks. The qualified driver versions\r
+are provided in Table 2, "Qualified IB Stacks".\r
+\r
+1.5 Supported Devices Firmware\r
+\r
+The main task of OpenSM is to initialize InfiniBand devices. The\r
+qualified devices and their corresponding firmware versions \r
+are listed in Table 3.\r
+\r
+2 Known Issues And Limitations\r
+------------------------------\r
+\r
+* No Service / Key associations:\r
+ There is no way to manage Service access by Keys. \r
+\r
+* No SM to SM SMDB synchronization: \r
+ Puts the burden of re-registering services, multicast groups, and\r
+ inform-info on the client application (or IB access layer core).\r
+\r
+* No "port down" event handling:\r
+ Changing the switch port through which OpenSM connects to the IB\r
+ fabric may cause incorrect operation. Please restart OpenSM whenever\r
+ such a connectivity change is made.\r
+\r
+* Changing connections during SM operation:\r
+ Under some conditions the SM can get confused by a change in \r
+ cabling (moving a cable from one switch port to the other) and \r
+ momentarily see this as having the same GUID appear connected \r
+ to two different IB ports. Under some conditions, when the SM fails to \r
+ get the corresponding change event it might mistakenly report this case\r
+ as a "duplicated GUID" case and abort. It is advisable to double-check\r
+ the syslog after each such change in connectivity and restart\r
+ OpenSM if it has exited. The same error ("duplicated GUID") will \r
+ also appear with a loopback plug.\r
+\r
+3 Unsupported IB Compliance Statements\r
+--------------------------------------\r
+The following section lists all the IB compliance statements which\r
+OpenSM does not support. Please refer to the IB specification for detailed\r
+information regarding each compliance statement. \r
+\r
+* C14-22 (Authentication):\r
+ M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one\r
+ SubnSet method. As a work-around, an OpenSM option is provided for\r
+ defining the protect bits.\r
+\r
+* C14-67 (Authentication):\r
+ On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then\r
+ the SM shall generate a SubnGetResp if the M_Key matches, or\r
+ silently drop the packet if M_Key does not match.\r
+\r
+* C15-0.1.23.4 (Authentication):\r
+ InformInfoRecords shall always be provided with the QPN set to 0,\r
+ except for the case of a trusted request, in which case the actual\r
+ subscriber QPN shall be returned. \r
+\r
+* o13-17.1.2 (Event-FWD):\r
+ If no permission to forward, the subscription should be removed and\r
+ no further forwarding should occur.\r
+\r
+* C14-24.1.1.5 and C14-62.1.1.22 (Initialization):\r
+ GUIDInfo - SM should enable assigning Port GUIDInfo.\r
+\r
+* C14-44 (Initialization):\r
+ If the SM discovers that it is missing an M_Key to update CA/RT/SW,\r
+ it should notify the higher level.\r
+\r
+* C14-62.1.1.12 (Initialization):\r
+ PortInfo:M_Key - Set the M_Key to a node based random value.\r
+\r
+* C14-62.1.1.13 (Initialization):\r
+ PortInfo:P_KeyProtectBits - set according to an optional policy.\r
+\r
+* C14-62.1.1.24 (Initialization):\r
+ SwitchInfo:DefaultPort - should be configured for random FDB.\r
+\r
+* C14-62.1.1.32 (Initialization):\r
+ RandomForwardingTable should be configured.\r
+\r
+* o15-0.1.12 (Multicast):\r
+ If the JoinState is SendOnlyNonMember = 1 (only), then the endport\r
+ should join as sender only.\r
+\r
+* o15-0.1.8 (Multicast):\r
+ If a request for creating an MCG with fields that cannot be met,\r
+ return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).\r
+\r
+* C15-0.1.8.6 (SA-Query):\r
+ Respond to SubnAdmGetTraceTable - this is an optional attribute.\r
+\r
+* C15-0.1.13 Services:\r
+ Reject ServiceRecord create, modify or delete if the given\r
+ ServiceP_Key does not match the one included in the ServiceGID port\r
+ and the port that sent the request.\r
+\r
+* C15-0.1.14 (Services):\r
+ Provide means to associate service name and ServiceKeys.\r
+\r
+4 Major Bug Fixes\r
+-----------------\r
+\r
+The following is a list of bugs that were fixed. Note that other less critical\r
+or visible bugs were also fixed.\r
+\r
+* osm_port_info_rcv.c: In __osm_pi_rcv_process_endport,\r
+ isSMdisabled also indicates that an SM is present so poll SMInfo\r
+\r
+* osm_sm_state_mgr.c: In __osm_sm_state_mgr_send_master_sm_info_req, \r
+ handle master GUID port not found properly\r
+\r
+* osm_sa_multipath_record.c: In __osm_mpr_rcv_get_path_parms, return \r
+ IB_NOT_FOUND rather than IB_ERROR when can't route to LID from switch\r
+\r
+* osm_sa_path_record.c: In __osm_pr_rcv_get_path_parms, return IB_NOT_FOUND\r
+ rather than IB_ERROR when can't route to LID from switch\r
+\r
+* osm_vendor_ibumad.c: In osm_vendor_set_sm, set issmfd to\r
+ -1 on open error\r
+\r
+* osm_vendor_ibumad: Termination crash fix\r
+ When OpenSM is terminated umad_receiver thread still running even after\r
+ the structures are destroyed and freed, this causes to random (but easily\r
+ reproducible) crashes. The reason is that osm_vendor_delete() does not\r
+ care about thread termination. This patch adds the receiver thread\r
+ cancellation (by using pthread_cancel() and pthread_join()) and cares to\r
+ keep have all mutexes unlocked upon termination. There is also minor\r
+ termination code consolidation - osm_vendor_port_close() function.\r
+\r
+* osm_port_profile.h: Fix reinsertion issue in osm_port_prof_set_ignored_port\r
+\r
+* osm_matrix.h: Fix segfault with up/down and root nodes file\r
+\r
+* osm_sa_path_record.c: In osm_pr_rcv_process, fix endian of hop_limit\r
+\r
+* osm_vendor_ibumad.c: Close umad port in osm_vendor_delete\r
+\r
+* osm_sa_(multipath path)_record.c: Fix MultiPathRecord/PathRecord issues\r
+ with using MTU/rate/PktLife explicitly ignoring selectors\r
+\r
+ OpenSM just uses the resulting path MTU/rate/pkt-life and fail the\r
+ query even though the selector might be allowing for selecting an\r
+ appropriate value.\r
+\r
+ After this fix, the following results are obtained for a case of\r
+ path allowing maximal 2K MTU.\r
+\r
+In standard mode:\r
+------------------------------------------------------------\r
+MTU greater than ... 256 (0x01) -> equal to ....... 2K\r
+MTU less than ...... 256 (0x41) -> NO PATHS\r
+MTU equal to ....... 256 (0x81) -> equal to ....... 256\r
+MTU largest possible 256 (0xc1) -> equal to ....... 2K\r
+MTU greater than ... 512 (0x02) -> equal to ....... 2K\r
+MTU less than ...... 512 (0x42) -> equal to ....... 256\r
+MTU equal to ....... 512 (0x82) -> equal to ....... 512\r
+MTU largest possible 512 (0xc2) -> equal to ....... 2K\r
+MTU greater than ... 1K (0x03) -> equal to ....... 2K\r
+MTU less than ...... 1K (0x43) -> equal to ....... 512\r
+MTU equal to ....... 1K (0x83) -> equal to ....... 1K\r
+MTU largest possible 1K (0xc3) -> equal to ....... 2K\r
+MTU greater than ... 2K (0x04) -> NO PATHS\r
+MTU less than ...... 2K (0x44) -> equal to ....... 1K\r
+MTU equal to ....... 2K (0x84) -> equal to ....... 2K\r
+MTU largest possible 2K (0xc4) -> equal to ....... 2K\r
+MTU greater than ... 4K (0x05) -> NO PATHS\r
+MTU less than ...... 4K (0x45) -> equal to ....... 2K\r
+MTU equal to ....... 4K (0x85) -> NO PATHS\r
+MTU largest possible 4K (0xc5) -> equal to ....... 2K\r
+============================================================\r
+\r
+With enable_quirks (when one of the ends is a Tavor device):\r
+------------------------------------------------------------\r
+MTU greater than ... 256 (0x01) -> equal to ....... 1K\r
+MTU less than ...... 256 (0x41) -> NO PATHS\r
+MTU equal to ....... 256 (0x81) -> equal to ....... 256\r
+MTU largest possible 256 (0xc1) -> equal to ....... 2K\r
+MTU greater than ... 512 (0x02) -> equal to ....... 1K\r
+MTU less than ...... 512 (0x42) -> equal to ....... 256\r
+MTU equal to ....... 512 (0x82) -> equal to ....... 512\r
+MTU largest possible 512 (0xc2) -> equal to ....... 2K\r
+MTU greater than ... 1K (0x03) -> NO PATHS\r
+MTU less than ...... 1K (0x43) -> equal to ....... 512\r
+MTU equal to ....... 1K (0x83) -> equal to ....... 1K\r
+MTU largest possible 1K (0xc3) -> equal to ....... 2K\r
+MTU greater than ... 2K (0x04) -> NO PATHS\r
+MTU less than ...... 2K (0x44) -> equal to ....... 1K\r
+MTU equal to ....... 2K (0x84) -> equal to ....... 2K\r
+MTU largest possible 2K (0xc4) -> equal to ....... 2K\r
+MTU greater than ... 4K (0x05) -> NO PATHS\r
+MTU less than ...... 4K (0x45) -> equal to ....... 1K\r
+MTU equal to ....... 4K (0x85) -> NO PATHS\r
+MTU largest possible 4K (0xc5) -> equal to ....... 2K\r
+============================================================\r
+\r
+* osm_pkey_rcv.c: rwlock double release fix\r
+ When the port is removed from subnet, but previously requested pkey\r
+ table block is received after this - the lock will be released twice.\r
+ This leads to deadlocks later when other MAD processor will try to\r
+ acquire the same lock.\r
+\r
+* osm_sa_informinfo.c: Fix InformInfoRecord searches\r
+\r
+* Better SA MCMemberRecord leave locking\r
+ Hold locked multicast group leave request (MCMember Record) processing.\r
+ This prevents kind of race with multicast group join request where\r
+ those requests can be reordered during processing.\r
+\r
+* osm_sa_informinfo.c: Conformance changes for subscribe component\r
+\r
+* osm_sa_path_record.c: Handle LID 0 as error\r
+\r
+* Fix comparing InformInfo records\r
+ 1. The received InformInfo struct was modified before dumping it.\r
+ 2. The function that compares InformInfo structures was just\r
+ comparing the whole memory allocated for it, including reserved\r
+ fields. Fixed to compare more selectively.\r
+\r
+ As for QPN, from the IB spec, table 119 InformInfo:\r
+ QPN : Ignored except when subscribe=0 (an unsubscribe\r
+ request). Queue pair to which Report()s were sent as\r
+ a result of a corresponding subscription. If no\r
+ subscription for this Report() with this QPN exists,\r
+ the request to unsubscribe performs no action and\r
+ produces GetResp() with status indicating an invalid\r
+ field value.\r
+\r
+* osm_trap_rcv.c: Reduce repeated trap messages so log doesn't fill\r
+ so quickly\r
+\r
+* osm_helper.c: Fix stack smashing detected problem in osm_dump_service_record\r
+\r
+* Fix permission on db files directory\r
+ When creating directory for db files (guid2lid) storing create it with\r
+ reasonable permissions (current 777 decimal = octal 01411) and don't do\r
+ it world writable.\r
+\r
+* Fix node_desc.description as string usages\r
+\r
+5 Main Verification Flows\r
+-------------------------\r
+\r
+OpenSM verification is run using the following activities:\r
+* osmtest - a stand-alone program\r
+* ibmgtsim (IB management simulator) based - a set of flows that\r
+ simulate clusters, inject errors and verify OpenSM capability to\r
+ respond and bring up the network correctly.\r
+* small cluster regression testing - where the SM is used on back to\r
+ back or single switch configurations. The regression includes\r
+ multiple OpenSM dedicated tests.\r
+* cluster testing - when we run OpenSM to setup a large cluster, perform\r
+ hand-off, reboots and reconnects, verify routing correctness and SA\r
+ responsiveness at the ULP level (IPoIB and SDP).\r
+\r
+5.1 osmtest\r
+\r
+osmtest is an automated verification tool used for OpenSM\r
+testing. Its verification flows are described by list below. \r
+\r
+* Inventory File: Obtain and verify all port info, node info, link and path\r
+ records parameters.\r
+\r
+* Service Record:\r
+ - Register new service\r
+ - Register another service (with a lease period)\r
+ - Register another service (with service p_key set to zero)\r
+ - Get all services by name\r
+ - Delete the first service\r
+ - Delete the third service\r
+ - Added bad flows of get/delete non valid service\r
+ - Add / Get same service with different data \r
+ - Add / Get / Delete by different component mask values (services\r
+ by Name & Key / Name & Data / Name & Id / Id only )\r
+\r
+* Multicast Member Record: \r
+ - Query of existing Groups (IPoIB)\r
+ - BAD Join with insufficient comp mask (o15.0.1.3)\r
+ - Create given MGID=0 (o15.0.1.4)\r
+ - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)\r
+ - Create BAD MGID=0xFA. (o15.0.1.6)\r
+ - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)\r
+ - New MGID with invalid join state (o15.0.1.9)\r
+ - Retry of existing MGID - See JoinState update (o15.0.1.11)\r
+ - BAD RATE when connecting to existing MGID (o15.0.1.13)\r
+ - Partial JoinState delete request - removing FullMember (o15.0.1.14)\r
+ - Full Delete of a group (o15.0.1.14)\r
+ - Verify Delete by trying to Join deleted group (o15.0.1.14)\r
+ - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)\r
+\r
+* GUIDInfo Record:\r
+ - All GUIDInfoRecords in subnet are obtained\r
+\r
+* MultiPathRecord:\r
+ - Perform some compliant and noncompliant MultiPathRecord requests\r
+ - Validation is via status in responses and IB analyzer\r
+\r
+* PKeyTableRecord:\r
+ - Perform some compliant and noncompliant PKeyTableRecord queries\r
+ - Validation is via status in responses and IB analyzer\r
+\r
+* LinearForwardingTableRecord:\r
+ - Perform some compliant and noncompliant LinearForwardingTableRecord queries\r
+ - Validation is via status in responses and IB analyzer\r
+\r
+* Event Forwarding: Register for trap forwarding using reports\r
+ - Send a trap and wait for report\r
+ - Unregister non-existing\r
+\r
+* Trap 64/65 Flow: Register to Trap 64-65, create traps (by\r
+ disconnecting/connecting ports) and wait for report, then unregister.\r
+\r
+* Stress Test: send PortInfoRecord queries, both single and RMPP and\r
+ check for the rate of responses as well as their validity.\r
+\r
+\r
+5.2 IB Management Simulator OpenSM Test Flows:\r
+\r
+The simulator provides ability to simulate the SM handling of virtual\r
+topologies that are not limited to actual lab equipment availability.\r
+OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily\r
+regressions use smaller (16 and 128 nodes clusters).\r
+\r
+The following test flows are run on the IB management simulator:\r
+\r
+* Stability:\r
+ Up to 12 links from the fabric are randomly selected to drop packets\r
+ at drop rates up to 90%. The SM is required to succeed in bringing the\r
+ fabric up. The resulting routing is verified to be correct as well.\r
+\r
+* LID Manager:\r
+ Using LMC = 2 the fabric is initialized with LIDs. Faults such as\r
+ zero LID, Duplicated LID, non-aligned (to LMC) LIDs are \r
+ randomly assigned to various nodes and other errors are randomly\r
+ output to the guid2lid cache file. The SM sweep is run 5 times and\r
+ after each iteration a complete verification is made to ensure that all\r
+ LIDs that could possibly be maintained are kept, as well as that all nodes\r
+ were assigned a legal LID range.\r
+\r
+* Multicast Routing:\r
+ Nodes randomly join the 0xc000 group and eventually the\r
+ resulting routing is verified for completeness and adherence to\r
+ Up/Down routing rules. \r
+\r
+* osmtest:\r
+ The complete osmtest flow as described in the previous table is run on\r
+ the simulated fabrics.\r
+\r
+* Stress Test:\r
+ This flow merges fabric, LID and stability issues with continuous \r
+ PathRecord, ServiceRecord and Multicast Join/Leave activity to\r
+ stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get\r
+ were added to the test such both existing and non existing nodes\r
+ perform them in random order.\r
+\r
+5.3 OpenSM Regression\r
+\r
+Using a back-to-back or single switch connection, the following set of\r
+tests is run nightly on the stacks described in table 2. The included\r
+tests are:\r
+\r
+* Stress Testing: Flood the SA with queries from multiple channel\r
+ adapters to check the robustness of the entire stack up to the SA.\r
+\r
+* Dynamic Changes: Dynamic Topology changes, through randomly\r
+ dropping SMP packets, used to test OpenSM adaptation to an unstable\r
+ network & verify DB correctness.\r
+\r
+* Trap Injection: This flow injects traps to the SM and verifies that it\r
+ handles them gracefully. \r
+\r
+* SA Query Test: This test exhaustively checks the SA responses to all\r
+ possible single component mask. To do that the test examines the\r
+ entire set of records the SA can provide, classifies them by their\r
+ field values and then selects every field (using component mask and a\r
+ value) and verifies that the response matches the expected set of records.\r
+ A random selection using multiple component mask bits is also performed.\r
+\r
+5.4 Cluster testing:\r
+\r
+Cluster testing is usually run before a distribution release. It\r
+involves real hardware setups of 16 to 32 nodes (or more if a beta site\r
+is available). Each test is validated by running all-to-all ping through the IB\r
+interface. The test procedure includes:\r
+\r
+* Cluster bringup \r
+\r
+* Hand-off between 2 or 3 SM's while performing:\r
+ - Node reboots\r
+ - Switch power cycles (disconnecting the SM's)\r
+\r
+* Unresponsive port detection and recovery\r
+\r
+* osmtest from multiple nodes\r
+\r
+* Trap injection and recovery\r
+\r
+\r
+6 Qualification \r
+----------------\r
+\r
+Table 2 - Qualified IB Stacks\r
+=============================\r
+\r
+Stack | Version\r
+-----------------------------------------|--------------------------\r
+OFED | 1.2\r
+OFED | 1.1\r
+OFED | 1.0\r
+OpenIB Gen2 (IBG2 distribution) | 1.0\r
+OpenIB Gen1 (IBGD distribution) | 1.8.0\r
+VAPI (Mellanox InfiniBand HCA Driver) | 3.2 and later\r
+\r
+Table 3 - Qualified Devices and Corresponding Firmware \r
+======================================================\r
+\r
+Mellanox\r
+Device | FW versions \r
+--------|-----------------------------------------------------------\r
+MT43132 | InfiniScale - fw-43132 5.2.0 (and later)\r
+MT47396 | InfiniScale III - fw-47396 0.5.0 (and later)\r
+MT23108 | InfiniHost - fw-23108 3.3.2 (and later)\r
+MT25204 | InfiniHost III Lx - fw-25204 1.0.1i (and later)\r
+MT25208 | InfiniHost III Ex (InfiniHost Mode) - fw-25208 4.6.2 (and later)\r
+MT25208 | InfiniHost III Ex (MemFree Mode) - fw-25218 5.0.1 (and later)\r
+\r
+QLogic/PathScale\r
+Device | Note\r
+--------|-----------------------------------------------------------\r
+iPath | QHT6040 (PathScale InfiniPath HT-460)\r
+iPath | QHT6140 (PathScale InfiniPath HT-465)\r
+iPath | QLE6140 (PathScale InfiniPath PE-880)\r
+\r
+Note: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose \r
+QP0 and QP1. However, it does support it as a device on the subnet.\r
+\r