From f8d85d4a97e39476f552ba8e642f5ecbce2897ca Mon Sep 17 00:00:00 2001 From: Tziporet Koren Date: Wed, 27 Feb 2008 12:29:13 +0200 Subject: [PATCH] update for 1.3 Signed-off-by: Oren Kladnitsky --- ibutils_release_notes.txt | 203 +++++++++++++++++++++----------------- 1 file changed, 111 insertions(+), 92 deletions(-) diff --git a/ibutils_release_notes.txt b/ibutils_release_notes.txt index e993c77..89005b9 100644 --- a/ibutils_release_notes.txt +++ b/ibutils_release_notes.txt @@ -1,120 +1,139 @@ - Open Fabrics Enterprise Distribution (OFED) - IBUTILS in OFED 1.3 Release Notes - - February 2008 + Open Fabrics InfiniBand Diagnostic Utilities + -------------------------------------------- +******************************************************************************* +RELEASE: OFED 1.3 +DATE: Feb 2008 =============================================================================== Table of Contents =============================================================================== 1. Overview -2. Requirements -3. Reports +2. New features +3. Major Bugs Fixed 4. Known Issues - =============================================================================== 1. Overview =============================================================================== -The IBUTILS package provides diagnostic tools and procedures for InfiniBand -fabrics. Target users of these utilities are network and data-center managers -with basic knowledge of the InfiniBand specification. - -The following tools are provided: -o ibdiagnet - performs a diagnostic check of the entire InfiniBand subnet. - This utility should be used upon suspicion of fabric misbehavior. The default - invocation can be enhanced to perform more advanced checks and to produce - additional reports. The following is a partial set of checks it performs: - - Checks for a single master Subnet Manager (SM) - - Checks that all routes between hosts are set correctly (including multicast - groups) - - Checks for fabric links health - -o ibdiagpath - traces a path between two nodes specified by LIDs or a directed - path. This utility should be used when the connectivity between two specific - hosts is broken. - -o ibdiagui - a graphic user interface on top of ibdiagnet - ibdiagui is mostly suitable for medium sized fabrics (<100 nodes) and for - users who wish to explore an unknown InfiniBand subnet. The main features it - provides are an automatically generated connectivity graph, an object- - properties browser, and hyperlinks of the ibdiagnet log to these widgets. - -Note: man pages are provided for each tool. - -The package tools perform the following diagnostic procedures: -* Discover the InfiniBand subnet connectivity -* Determine whether or not an SM is running -* Identify links which drop packets and/or incur errors by sending MAD - packets multiple times, across all the links, reporting port monitor counters -* Identify fabric level mismatches or inconsistencies such as - - Duplicate port GUIDs - Two or more different ports with the same GUID - - Duplicate node GUIDs - Two or more different nodes with the same node GUID - - Duplicate LIDs - Two or more devices that have the same assigned LID - - Zero valued LIDs - A device with LID=0 indicates that the SM did not - assign a LID to this device - - Zero valued system GUIDs - A device with system GUID=0 indicates that - the vendor did not assign it a GUID - - An InfiniBand link is in the INIT state which prevents data transfer - - Unexpected link width (when using the -lw flag) - - Unexpected link speed (when using the -ls flag) - - Partitions and SL2VL settings preventing communication between specific - nodes (ibdiagpath) - +The ibdiag package was enhanced to check more aspects of the network setup, +including partitions, IPoIB and QoS. Additional major feature is it's ability +to write a topology file of the discovered network. A summary table is provided +with a list of the executed checks and their results. =============================================================================== -2. Requirements +2. New Features =============================================================================== -Software Dependencies: - -1. ibis and ibdm must be installed (are part of this OFED package). - -2. ibdiagui also depends on the installation of the following packages: - - Tk8.4 is standard on all Linux distributions. If it is missing on your - machine, download it from http://www.tcl.tk/software/tcltk/ - - Graphviz - an automatic graph layout utility. It can be downloaded from - http://www.graphviz.org/ - +The following new checks were added to the tools: + +ibdiagnet new features: +----------------------- ++ Partitions Check: + - Validate all leaf switch ports (connected to a host) which enforce + partitions are not blocking partitions set on the host ports they + are connected to. + - Report for each partition the member hosts and their membership status. + Full membership allows hosts to communicate to any other member. + Partial membership allows communication with full members only. + The new report file is named ibdiagnet.pkey. + ++ IPoIB Subnets Check: + - The IPoIB subnets and their properties are reported. + - For each group all the host ports that are part of the partition are + checked to have a high enough communication rate to be part of the group + (warn if not). + - If all the group members can use a communication rate higher then the group + rate a warning is produced as the subnet uses a suboptimal rate. + +Other changes: ++ The multicast groups report was enhanced to provide the details of each + group and the members list is provided in a new report file: ibdiagnet.mcgs. + ++ A new flag, -wt , was added. ibdiagnet, with the new option, + writes out a discovered topology file by the provided file-name and + the required new IBNL files into an output directory named ibdiag_ibnl. + This new feature allows you to capture the current state of the fabric + and later compare to it. Such the features provided by the "Topology + Matching" check become available. These feature include recognizing + changes in connections, speed and width. + ++ Load subnet database from file: + Ibdiagnet dumps its internal database, which contains the subnet structure, + to a file (/tmp/ibdiagnet.db by default). This file can be loaded in later + ibdiagnet runs (using the -load_db option). When this option is set, + ibdiagnet loads the subnet data from the file and skips the discovery stage. + Using this option can save the subnet discovery time for large cluster. + Note: Some if ibdiagnet checks would not be performed when the -load_db + option is set. These checks are: + - Duplicated GUIDs. + - Zero GUIDs. + - Links in INIT state. + - SMs status. + + + A new flag, -skip , was added. When this flag is specified, + ibdiagnet skips the given check. One or more space separated values can be + specified. + Available skip options: dup_guids, zero_guids, pm, logical_state, part, ipoib. + The -skip flag can be used in order to run only specific checks, or to reduce + ibdiagnet run time. + +ibdiagpath new features: +------------------------ ++ Partitions Check: + - The list of partitions of source and destination ports is reported. + - A check for which partitions are common to the source, destination and + every port on the path (if enforcing partitions) is calculated and + reported. A warning is provided if a source partition is blocked by + a port on the path. + An error is provided in there are no common partitions for the path. + ++ IPoIB Subnets Check: + - The IPoIB subnets available for the path and reported. + - If the source or destination ports are members in partitions which have + an IPoIB group and for some reason can not join the group a warning is + provided. + ++ QoS Check: + With the introduction of QoS, the following new issues might arise from + improper setup of the fabric: + - VL Arbitration Tables might use VLs which are higher then the currently + supported maximal VL on the port. A warning is provided for such cases. + - VL Arbitration Tables might "block" a VL by setting its weight to zero. + A warning is provided for these cases + - SLs (service levels) might be mapped to VLs which are blocked by the + two above rules. In such case these SLs can not be used by the path. + A report including the set of "valid" SLs for the path is provided. + - If there are no "valid" SLs an error is provided since the source and + destination ports can not communicate. + +Common changes to all tools: +---------------------------- +A summary table of all the checks performed and their total number of errors and +warnings was added to the tools standard output. =============================================================================== -3. Reports +3. Major Bugs Fixed =============================================================================== -The default directory for all generated report files is /tmp. - -The ibadiagnet utility collect summary information regarding all the fabric SMs -during the run, and then output that information at the end of the run in the -file /tmp/ibdiagnet.sm. - -Each report message includes the following items: - - Device Type - - Device portGUID - - The direct path to the device - - If a topology file is provided for matching with the discovered InfiniBand - fabric, the node name is also provided in the report message. Otherwise, - hostnames are included only in HCA-related report messages. ++ Fabrics Qualities report is now available in the main log file (and not only + in the standard output =============================================================================== 4. Known Issues =============================================================================== - ibdiagpath issues: - - If no subnet manager is initialized in the subnet, FDB tables may be - incorrectly set. Consequently, PortCounter MADs cannot be sent. - - A link along a LID-routed path in INIT state causes ibdiagpath performance - queries to fail since the queries cannot proceed via non-ACTIVE links. +- Ibdiagnet tries to query port counters for ports in INIT state. In this + case, run time would be longer and an error message for each port would be + printed to screen. + Workaround: + * Use "-skip pm" option if links in INIT state are found. + * Run opensm to activate the links. - - ibdiagpath cannot validate the provided topology file against the existing - fabric topology. If the topology file includes a device/link that does not - exist, or the device/link information is incorrect, then ibdiagpath may - -- in name-based routing -- extract a non-existing path based on the - incorrect topology file. +- A failure in IPoIB check may cause ibdiagnet to exit, without printing the + summary report. - - If the hostname provided for the -s flag is not the actual local hostname, - then all the extracted names from the topology file will be incorrect. - Nevertheless, all the other information provided will be correct. +- Ibdiagnet "-wt" option may generate a bad topology file when running on a + cluster that contains complex switch systems. - - If running on RHEL5, then executing "ibis exit" to terminate ibis yields - an unclean exit. -- 2.41.0