--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ How To Build OFED 1.1
+
+ October 2006
+
+
+==============================================================================
+Table of contents
+==============================================================================
+1. Overview
+2. Usage
+3. Requirements
+
+==============================================================================
+1. Overview
+==============================================================================
+The script "build_ofed.sh" is used to build the OFED package based on the
+OpenFabrics project and InfiniBand git tree. The package is built under the
+current working directory.
+
+The OFED package includes InfiniBand kernel modules, userspace libraries,
+diagnostic tools, performance benchmarks, firmware burning tools, Open MPI and
+OSU MPI.
+
+See OFED_release_notes.txt for more details.
+
+==============================================================================
+2. Usage
+==============================================================================
+
+The build script for the OFED package can be downloaded from:
+ https://openib.org/svn/gen2/branches/1.1/ofed/build
+
+Name: build_ofed.sh
+
+
+Usage: build_ofed.sh --ver|-v <OFED version> --git|-g <path to git tree>
+ [--svnrev|-r <SVN revision to use for userspace>]
+ [--tmpdir <tmpdir to use as a work area>]
+ [--without-makedist]
+ [--userspace|-u <path to userspace directory>]
+ [--ofed-scripts <path to ofed scripts directory>]
+ [--ofed-docs <path to ofed docs directory>]
+ [--mpidir|-m <path to mpi directory>]
+ [--extrasdir|-e <path to extras directory>]
+
+ Required:
+ --ver Determines the name of the OFED version that is built
+ --git Path to a local GIT tree (directory). The tree must
+ have previously been created by one of the methods
+ provided in the "Requirements" section below.
+
+ Optional:
+ --svnrev The svn revision for extraction of the userspace
+ component (default: most recent)
+
+ --tmpdir Directory to use as a work area (default: /tmp )
+
+ --without-makedist Do not execute "make dist" for the userspace
+ component (default: do "make dist")
+
+ --userspace If you have already checked out the userspace
+ component, you may use this option to request
+ that the userspace component be taken from the
+ given directory. Otherwise, the userspace_URL
+ (see below) will be used.
+
+ --ofed_scripts If you have already checked out the scripts
+ component, you may use this option to request that
+ the scripts component be taken from the given
+ directory. Otherwise, the ofed_scripts_URL (see
+ below) will be used.
+
+ --ofed_docs If you have already checked out the docs component,
+ you may use this option to request that the docs
+ component be taken from the given directory.
+ Otherwise, the ofed_docs_URL (see below) will be
+ used.
+
+ --mpidir If you have already checked out the mpi component,
+ you may use this option to request that the mpi
+ component be taken from the given directory.
+ Otherwise, the mpi_URL (see below) will be used.
+
+ --extrasdir If you have already checked out the extras
+ component, you may use this option to request that
+ the extras component be taken from the given
+ directory. Otherwise, the extras_URL (see below)
+ will be used.
+
+
+Sources are extracted by default from the following locations:
+ userspace_URL:
+ https://openib.org/svn/gen2/branches/1.1/src/userspace
+ openib_scripts_URL:
+ https://openib.org/svn/gen2/branches/1.1/ofed/openib/scripts
+ ofed_scripts_URL:
+ https://openib.org/svn/gen2/branches/1.1/ofed/scripts
+ ofed_docs_URL:
+ https://openib.org/svn/gen2/branches/1.1/ofed/docs
+ mpi_URL:
+ https://openib.org/svn/gen2/branches/1.1/ofed/mpi
+ extras_URL:
+ https://openib.org/svn/gen2/branches/1.1/ofed/extras
+
+Example:
+
+ ./build_ofed.sh --ver 1.1-rc6 --git /local/git/ofed_1_1/
+
+ This command will create a package (i.e., subtree) called OFED-1.1-rc6
+ in the current working direcory. The git tree "/local/git/ofed_1_1/"
+ in this example is a local InfiniBand git tree which was created using
+ one of the methods in the "Requirements" section below.
+
+==============================================================================
+3. Requirements
+==============================================================================
+
+1. Git:
+ Can be downloaded from:
+ http://www.kernel.org/pub/software/scm/git/git-1.4.2.tar.gz
+
+2. Subversion:
+ Can be downloaded from:
+ http://subversion.tigris.org
+
+3. InfiniBand Git tree:
+ There are two ways to get the infiniband git tree:
+ - The faster way:
+ mkdir gitdir
+ cd gitdir
+ git clone --bare \
+ git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git \
+ .git
+ git fetch git://www.mellanox.co.il/~git/infiniband ofed_1_1 \
+ ofed_addons cma_branch ehca_branch mst_sdp
+
+ - The slower way:
+ mkdir gitdir
+ cd gitdir
+ git clone -s --bare git://www.mellanox.co.il/~git/infiniband .git
+ git checkout ofed_1_1 `git-ls-tree -r --name-only ofed_1_1 \
+ include/rdma include/scsi/srp.h drivers/infiniband \
+ Documentation/infiniband ofed_scripts kernel_patches`
+ echo 'ref: refs/heads/ofed_1_1' > .git/HEAD
+
+4. Autotools:
+
+ libtool-1.5.20 or higher
+ autoconf-2.59 or higher
+ automake-1.9.6 or higher
+ m4-1.4.4 or higher
+
+ The above tools can be downloaded from the following URLs:
+
+ libtool - "http://ftp.gnu.org/gnu/libtool/libtool-1.5.20.tar.gz"
+ autoconf - "http://ftp.gnu.org/gnu/autoconf/autoconf-2.59.tar.gz"
+ automake - "http://ftp.gnu.org/gnu/automake/automake-1.9.6.tar.gz"
+ m4 - "http://ftp.gnu.org/gnu/m4/m4-1.4.4.tar.gz"
+
--- /dev/null
+OpenIB.org BSD license:
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions
+are met:
+
+* Redistributions of source code must retain the above copyright
+notice, this list of conditions and the following disclaimer.
+
+* Redistributions in binary form must reproduce the above
+copyright notice, this list of conditions and the following
+disclaimer in the documentation and/or other materials provided
+with the distribution.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
+FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
+COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
+INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
+BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
+LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
+LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN
+ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGE.
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ MPI in OFED 1.1 README
+
+ October 2006
+
+
+===============================================================================
+Table of Contents
+===============================================================================
+1. General
+2. OSU MVAPICH MPI
+3. Open MPI
+
+
+===============================================================================
+1. General
+===============================================================================
+Two MPI stacks are included in this release of OFED:
+
+- Ohio State University (OSU) MVAPICH 0.9.7 (Modified by Mellanox
+ Technologies)
+- Open MPI 1.1.1-1
+
+Setup, compilation and run information of OSU MVAPICH and Open MPI is
+provided below in sections 2 and 3 respectively.
+
+1.1 Installation Note
+---------------------
+In Step 2 of the main menu of install.sh, options 2, 3 and 4 can install
+one or more MPI stacks. Please refer to docs/OFED_Installation_Guide.txt
+to learn about the different options.
+
+The installation script allows each MPI to be compiled using one or
+more compilers. Users need to set, per MPI stack installed, the PATH
+and/or LD_LIBRARY_PATH so as to install the desired compiled MPI stacks.
+
+1.2 MPI Tests
+-------------
+OFED includes four basic tests that can be run against each MPI stack:
+bandwidth (bw), latency (lt), Intel MPI Benchmark, and Presta. The tests
+are located under: <prefix>/mpi/<compiler>/<mpi stack>/tests/,
+where <prefix> is /usr/local/ofed by default.
+
+===============================================================================
+2. OSU MVAPICH MPI
+===============================================================================
+
+This package is a modified version of the Ohio State University (OSU)
+MVAPICH Rev 0.9.7 MPI software package, and is the officially supported
+MPI stack for this release of OFED. Modifications to the original version
+include: additional features, bug fixes, and RPM packaging.
+See http://nowlab.cse.ohio-state.edu/projects/mpi-iba/ for more details.
+
+
+2.1 Setting up for OSU MVAPICH MPI
+----------------------------------
+To launch OSU MPI jobs, its installation directory needs to be included
+in PATH and LD_LIBRARY_PATH. To set them, execute one of the following
+commands:
+ source <prefix>/mpi/<compiler>/<mpi stack>/etc/mvapich.sh
+ -- when using sh for launching MPI jobs
+ or
+ source <prefix>/mpi/<compiler>/<mpi stack>/etc/mvapich.csh
+ -- when using csh for launching MPI jobs
+
+
+2.2 Compiling OSU MVAPICH MPI Applications:
+-------------------------------------------
+***Important note***:
+A valid Fortran compiler must be present in order to build the MVAPICH MPI
+stack and tests.
+
+The default gcc-g77 Fortran compiler is provided with all RedHat Linux
+releases. SuSE distributions earlier than SuSE Linux 9.0 do not provide
+this compiler as part of the default installation.
+
+The following compilers are supported by OFED's OSU MPI package: gcc,
+intel and pathscale. The install script prompts the user to choose
+the compiler with which to build the OSU MVAPICH MPI RPM. Note that more
+than one compiler can be selected simultaneously, if desired.
+
+For details see:
+ http://nowlab.cse.ohio-state.edu/projects/mpi-iba/mvapich_user_guide.html
+
+To review the default configuration of the installation, check the default
+configuration file: <prefix>/mpi/<compiler>/<mpi stack>/etc/mvapich.conf
+
+2.3 Running OSU MVAPICH MPI Applications:
+-----------------------------------------
+Requirements:
+o At least two nodes. Example: mtlm01, mtlm02
+o Machine file: Includes the list of machines. Example: /root/cluster
+o Bidirectional rsh or ssh without a password
+
+Note for OSU: ssh will be used unless -rsh is specified. In order to use
+rsh, add to the mpirun_rsh command the parameter: -rsh
+
+*** Running OSU tests ***
+
+/usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 -hostfile /root/cluster /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_benchmarks-2.2/osu_bw
+/usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 -hostfile /root/cluster /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_benchmarks-2.2/osu_latency
+/usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 -hostfile /root/cluster /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_benchmarks-2.2/osu_bibw
+/usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 -hostfile /root/cluster /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/osu_benchmarks-2.2/osu_bcast
+
+*** Running Intel MPI Benchmark test (Full test) ***
+
+/usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 -hostfile /root/cluster /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/IMB-2.3/IMB-MPI1
+
+*** Running Presta test ***
+
+/usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 -hostfile /root/cluster /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/presta-1.4.0/com -o 100
+/usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 -hostfile /root/cluster /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/presta-1.4.0/glob -o 100
+/usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/bin/mpirun_rsh -np 2 -hostfile /root/cluster /usr/local/ofed/mpi/gcc/mvapich-0.9.7-mlx2.2.0/tests/presta-1.4.0/globalop
+
+
+===============================================================================
+3. Open MPI
+===============================================================================
+
+Open MPI is a next-generation MPI implementation from the Open MPI
+Project (http://www.open-mpi.org/). Version 1.1.1-1 of Open MPI is
+included in this release, which is also available directly
+from the main Open MPI web site. This MPI stack is being offered in
+OFED as a "technology preview," meaning that it is not officially
+supported yet. It is expected that future releases of OFED will have
+fully supported versions of Open MPI.
+
+A working Fortran compiler is not required to build Open MPI, but some
+of the included MPI tests are written in Fortran. These tests will not
+compile/run if Open MPI is built without Fortran support.
+
+The following compilers are supported by OFED's Open MPI package: GNU,
+Pathscale, Intel, or Portland. The install script prompts the user
+for the compiler with which to build the Open MPI RPM. Note that more
+than one compiler can be selected simultaneously, if desired.
+
+Users should check the main Open MPI web site for additional
+documentation and support. (Note: The FAQ file considers
+InfiniBand tuning among other issues.)
+
+3.1 Setting up for Open MPI:
+----------------------------
+The Open MPI team strongly advises users to put the Open MPI installation
+directory in their PATH and LD_LIBRARY_PATH. This can be done at the
+system level if all users are going to use Open MPI. Specifically:
+- add <prefix>/bin to PATH
+- add <prefix>/lib to LD_LIBRARY_PATH
+
+<prefix> is the directory where the desired Open MPI instance was installed.
+("instance" refers to the compiler used for Open MPI compilation at install
+time.)
+
+If using rsh or ssh to launch MPI jobs, you *must* set the variables described
+above in your shell startup files (e.g., .bashrc, .cshrc, etc.).
+
+If you are using a job scheduler to launch MPI jobs (e.g., SLURM, Torque),
+setting the PATH and LD_LIBRARY_PATH is still required, but it does
+not need to be set in your shell startup files. Procedures describing
+how to add these values to PATH and LD_LIBRARY_PATH are described in
+detail at:
+ http://www.open-mpi.org/faq/?category=running
+
+3.2 Compiling Open MPI Applications:
+------------------------------------
+(copied from http://www.open-mpi.org/faq/?category=mpi-apps -- see
+this web page for more details)
+
+The Open MPI team strongly recommends that you simply use Open MPI's
+"wrapper" compilers to compile your MPI applications. That is, instead
+of using (for example) gcc to compile your program, use mpicc. Open
+MPI provides a wrapper compiler for four languages:
+
+ Language Wrapper compiler name
+ ------------- --------------------------------
+ C mpicc
+ C++ mpiCC, mpicxx, or mpic++
+ (note that mpiCC will not exist
+ on case-insensitive file-systems)
+ Fortran 77 mpif77
+ Fortran 90 mpif90
+ ------------- --------------------------------
+
+Note that if no Fortran 77 or Fortran 90 compilers were found when
+Open MPI was built, Fortran 77 and 90 support will automatically be
+disabled (respectively).
+
+If you expect to compile your program as:
+
+ > gcc my_mpi_application.c -lmpi -o my_mpi_application
+
+Simply use the following instead:
+
+ > mpicc my_mpi_application.c -o my_mpi_application
+
+Specifically: simply adding "-lmpi" to your normal compile/link
+command line *will not work*. See
+http://www.open-mpi.org/faq/?category=mpi-apps if you cannot use the
+Open MPI wrapper compilers.
+
+Note that Open MPI's wrapper compilers do not do any actual compiling
+or linking; all they do is manipulate the command line and add in all
+the relevant compiler / linker flags and then invoke the underlying
+compiler / linker (hence, the name "wrapper" compiler). More
+specifically, if you run into a compiler or linker error, check your
+source code and/or back-end compiler -- it is usually not the fault of
+the Open MPI wrapper compiler.
+
+3.3 Running Open MPI Applications:
+----------------------------------
+Open MPI uses either the "mpirun" or "mpiexec" commands to launch
+applications. If your cluster uses a resource manager (such as SLURM
+or Torque), providing a hostfile is not necessary:
+
+ > mpirun -np 4 my_mpi_application
+
+If you use rsh/ssh to launch applications, they must be set up to NOT
+prompt for a password (see http://www.open-mpi.org/faq/?category=rsh
+for more details on this topic). Moreover, you need to provide a hostfile
+containing a list of hosts to run on.
+
+Example:
+
+ > cat hostfile
+ node1.example.com
+ node2.example.com
+ node3.example.com
+ node4.example.com
+
+ > mpirun -np 4 -hostfile hostfile my_mpi_application
+ (application runs on all 4 nodes)
+
+In the following examples, replace <N> with the number of nodes to run on,
+and <HOSTFILE> with the filename of a valid hostfile listing the nodes
+to run on.
+
+Example1: Running the OSU bandwidth:
+
+ > cd /usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/tests/osu_benchmarks-2.2
+ > mpirun -np <N> -hostfile <HOSTFILE> osu_bw
+
+Example2: Running the Intel MPI Benchmark benchmarks:
+
+ > cd /usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/tests/IMB-2.3
+ > mpirun -np <N> -hostfile <HOSTFILE> IMB-MPI1
+
+Example3: Running the Presta benchmarks:
+
+ > cd /usr/local/ofed/mpi/gcc/openmpi-1.1.1-1/tests/presta-1.4.0
+ > mpirun -np <N> -hostfile <HOSTFILE> com -o 100
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ Version 1.1
+ Installation Guide
+
+ October 2006
+
+==============================================================================
+Table of contents
+==============================================================================
+
+ 1. Overview
+ 2. Contents of the OFED Distribution
+ 3. HW and SW Requirements
+ 4. How to download and extract the OFED
+ 5. Installing OFED Software
+ 6. Building OFED RPMs
+ 7. IPoIB Configuration
+ 8. Uninstalling OFED
+ 9. Configuration
+ 10. Related Documentation
+
+
+==============================================================================
+1. Overview
+==============================================================================
+
+This is the OpenFabrics Enterprise Distribution (OFED) version 1.1
+software package supporting InfiniBand fabrics. It is composed of
+several software modules intended for use on a computer cluster
+constructed as an InfiniBand subnet.
+
+This document describes how to install the various modules and test them in
+a Linux environment.
+
+General Notes:
+ 1) The install script removes all previously installed OFED packages
+ and re-installs from scratch. (Note: Configuration files will not
+ be removed). You will be prompted to acknowledge the deletion of
+ the old packages.
+
+ 2) When installing OFED on an entire [homogeneous] cluster, a common
+ strategy is to build the software only once (perhaps on a shared
+ file system such as NFS). The resulting RPMs can then be installed
+ on all nodes in the cluster using any cluster-aware tools (such as
+ pdsh).
+
+==============================================================================
+2. OFED Package Contents
+==============================================================================
+
+The OFED Distribution package generates RPMs for installing the following:
+
+ o OpenFabrics core and ULPs:
+ - HCA drivers (mthca, ipath, ehca)
+ - core
+ - Upper Layer Protocols: IPoIB, SDP, SRP Initiator, iSER Initiator
+ and uDAPL
+ o OpenFabrics utilities:
+ - OpenSM: InfiniBand Subnet Manager
+ - Diagnostic tools
+ - Performance tests
+ o MPI:
+ - OSU MPI stack supporting the InfiniBand interface
+ - Open MPI stack supporting the InfiniBand interface
+ - MPI benchmark tests (OSU BW/LAT, Intel MPI Benchmark, Presta)
+ o Sources of all software modules (under conditions mentioned in the
+ modules' LICENSE files)
+ o Documentation
+
+==============================================================================
+3. HW and SW Requirements
+==============================================================================
+
+1) Server platform with InfiniBand HCA (see OFED Distribution
+ Release Notes for details)
+
+2) Linux OS (see OFED Distribution Release Notes for details)
+
+3) Administrator privileges on your machine(s)
+
+4) Disk Space: - For Build & Installation: 300MB
+ - For Installation only: 200MB
+
+5) For the OFED Distribution to compile on your machine, some software
+ packages of your OS distribution are required. These are listed here.
+
+OS Distribution Required Packages
+--------------- ----------------------------------
+General:
+o Common to all gcc, glib, glib-devel, glibc, glibc-devel,
+ automake, autoconf, libtool.
+o RedHat kernel-devel, sysfsutils, sysfsutils-devel, rpm-build
+o SLES 9.0 kernel-source, udev, rpm
+o SLES 10.0 kernel-source, sysfsutils, sysfsutils-devel, rpm
+
+Specific Component Requirements:
+o OSU MPI a Fortran Compiler (such as gcc-g77)
+o ibutils tcl-8.4, tcl-devel-8.4
+o oiscsi-iser-support open-iscsi, db-devel
+o tvflash pciutils-devel
+
+ The installer will warn you if you attempt to compile any of the
+ above packages and do not have the prerequisites installed.
+
+==============================================================================
+4. How to download and extract the OFED Distribution
+==============================================================================
+
+1) Download the OFED-X.X.X.tgz file to your target Linux host.
+
+ If this package is to be installed on a cluster, it is recommended to
+ download it to an NFS shared directory.
+
+2) Extract the package using:
+
+ tar xzvf OFED-X.X.X.tgz
+
+==============================================================================
+5. Installing OFED Software
+==============================================================================
+
+1) Go to the directory into which the package was extracted:
+
+ cd /..../OFED-X.X.X
+
+2) Installing the OFED package must be done as root. For a
+ menu-driven, first build and installation, run the installer
+ script:
+
+ ./install.sh
+
+ Interactive menus will direct you through the install process.
+
+ Note: After the installer completes, information about the OFED
+ installation such as the prefix, kernel version, and
+ installation parameters can be found by running
+ /etc/infiniband/info.
+
+ During the interactive installation of OFED, two files are
+ generated: ofed.conf and ofed_net.conf. ofed.conf holds the
+ installed software modules and configuration settings chosen by the
+ user. ofed_net.conf holds the IPoIB settings chosen by the user.
+
+ If the package is installed on a cluster-shared directory, these
+ files can then be used to perform an automatic, unattended
+ installation of OFED on other machines in the cluster. The
+ unattended installation will use the same choices as were selected
+ in the interactive installation.
+
+ For an automatic installation on any host, run the following:
+
+ ./OFED-X.X.X/install.sh -c <path>/ofed.conf -net <path>/ofed_net.conf
+
+ Note: It is possible to rename and/or edit the ofed.conf and ofed_net.conf
+ files. Thus it is possible to change user choices (observing the
+ original format). See examples of ofed.conf and ofed_net.conf under
+ OFED-X.X.X/docs.
+
+Install Process Results:
+------------------------
+
+o The OFED package is installed under <prefix> directory.
+o Kernel modules are copied to:
+ /lib/modules/`uname -r`/kernel/drivers/infiniband/
+o The package kernel include files are placed under <prefix>/src/openib.
+ These includes should be used when building kernel modules which use
+ the Infiniband stack. (Note that these includes, if needed, have
+ been "backported" to your kernel).
+o The package raw (unbackported) source files are placed under
+ <prefix>/src/openib-1.1.
+o The script "openibd" is installed under /etc/init.d/. This script can
+ be used to load and unload the software stack.
+o The directory /etc/infiniband is created with the files "info" and
+ "openib.conf". The "info" script can be used to retrieve OFED
+ installation information. The "openib.conf" file contains the list of
+ modules that will be loaded when the "openibd" script is used.
+o The file "90-ib.rules" is installed to /etc/udev/rules.d/
+o If libibverbs-utils is installed, then ofed.sh and ofed.csh are
+ installed under /etc/profile.d/. These automatically update the PATH
+ environment variable with <prefix>/bin. In addition, ofed.conf is
+ installed under /etc/ld.so.conf.d/ to update dynamic linker's
+ run-time search path to find the InfiniBand shared libraries.
+o The file /etc/modprobe.conf is updated to include the following:
+ - "alias ib<n> ib_ipoib" for each ib<n> interface.
+ - "alias net-pf-27 ib_sdp" for sdp.
+o If opensm is installed, the daemon opensmd is installed under /etc/init.d/
+ and opensm.conf is installed under /etc.
+o If IPoIB configuration files were included, ifcfg-ib<n> files will be
+ installed at:
+ - RedHat: /etc/sysconfig/network-scripts/
+ - SuSE: /etc/sysconfig/network/
+
+
+==============================================================================
+6. Building OFED RPMs
+==============================================================================
+
+1) Go to the directory into which the package was extracted:
+
+ cd /..../OFED-X.X.X
+
+ Building RPMs can be done as a non-root user.
+
+2) For interactive build run the build.sh script:
+
+ ./build.sh
+
+ Interactive menus will direct you through the build process.
+
+ During the manual building of OFED RPMs, ofed.conf is generated.
+ ofed.conf holds the selected software modules and configuration
+ settings chosen by the user.
+
+3) For an automated build, run the following:
+
+ ./OFED-X.X.X/build.sh -c <path>/ofed.conf
+
+ Note: It is possible to rename and/or edit the ofed.conf file. Thus
+ it is possible to change user choices (observing the original format).
+ See an example of ofed.conf under OFED-X.X.X/docs.
+
+Build Process Results
+---------------------
+
+The OFED build.sh script builds OFED binary RPMs under
+OFED-X.X.X/RPMS; the sources are placed in OFED-X.X.X/SRPMS/.
+Running this script does not change any currently installed
+components, and the script does not change the current kernel build.
+
+Once the build process has completed, the user may run ./install.sh to
+install the new RPMs. This time, however, any previously installed
+OFED components will be uninstalled and the newly built package will
+be installed.
+
+Note: Depending on your hardware, the build procedure may take 30-45
+ minutes. Installation, however, is a relatively short process
+ (~5 minutes). A common strategy for OFED installation on large
+ homogeneous clusters is to extract the tarball on a network
+ file system (such as NFS), do the build on NFS, and then run
+ installer on each node with the RPMs that were previously built.
+
+*** Important Note for Open MPI users ONLY: The Open MPI software
+ requires that the InfiniBand drivers be installed before it is
+ built. Hence, Open MPI will only be built if you select the
+ "install" option. Open MPI will *not* be built if you only select
+ the "build" option.
+
+==============================================================================
+7. IP-over-IB (IPoIB) Configuration
+==============================================================================
+
+Configuring IPoIB is an optional step during the installation. During
+interactive installation, the user may choose to insert the ifcfg-ib<n>
+files. If this option is chosen, the ifcfg-ib<n> files will be
+installed at:
+
+- RedHat: /etc/sysconfig/network-scripts/
+- SuSE: /etc/sysconfig/network/
+
+Setting IPoIB Configuration:
+----------------------------
+
+The default IPoIB interface configuration is based on DHCP. Note that
+a special patch for DHCP servers is required for supporting IPoIB
+clients. A patch for dhcp v3.0.4 is is available under
+OFED-X.X.X/docs/dhcp.
+
+If you are not using DHCP to obtain IP addresses for clients using
+IPoIB, you must manually specify the full IP configuration during the
+interactive installation: IP address, network address, netmask, and
+broadcast address.
+
+For unattended installations, a configuration file can be provided
+with this information. The configuration file must specify the
+following information:
+- Fixed values for each IPoIB interface.
+- Base IPoIB configuration on Ethernet configuration (may be useful for
+ cluster configuration).
+
+Here are some examples of ofed_net.conf:
+
+# Static settings; all values provided by this file
+IPADDR_ib0=172.16.0.4
+NETMASK_ib0=255.255.0.0
+NETWORK_ib0=172.16.0.0
+BROADCAST_ib0=172.16.255.255
+ONBOOT_ib0=1
+
+# Based on eth0; each '*' will be replaced with corresponding octet
+# from eth0.
+LAN_INTERFACE_ib0=eth0
+IPADDR_ib0=172.16.'*'.'*'
+NETMASK_ib0=255.255.0.0
+NETWORK_ib0=172.16.0.0
+BROADCAST_ib0=172.16.255.255
+ONBOOT_ib0=1
+
+# Based on the first eth<n> interface that is found (for n=0,1,...);
+# each '*' will be replaced with corresponding octet from eth<n>.
+LAN_INTERFACE_ib0=
+IPADDR_ib0=172.16.'*'.'*'
+NETMASK_ib0=255.255.0.0
+NETWORK_ib0=172.16.0.0
+BROADCAST_ib0=172.16.255.255
+ONBOOT_ib0=1
+
+==============================================================================
+8. Uninstalling OFED
+==============================================================================
+
+There are two ways to uninstall OFED:
+1) Via the installation menu.
+2) Using the script uninstall.sh. The script resides under OFED-X.X.X/
+ and under the installation <prefix> directory.
+
+
+==============================================================================
+9. Configuration
+==============================================================================
+
+Most of the OFED components can be configured or reconfigured after
+the installation by modifying the relevant configuration files. The
+list of the modules that will be loaded automatically upon boot can be
+found in the /etc/infiniband/openib.conf file. Other configuration
+files include:
+- SDP configuration file: <prefix>/etc/libsdp.conf
+- OpenSM configuration file: /etc/opensm.conf
+
+See packages Release Notes for more details.
+
+Note: After the installer completes, information about the OFED
+ installation such as the prefix, kernel version, and
+ installation parameters can be found by running
+ /etc/infiniband/info.
+
+==============================================================================
+10. Related Documentation
+==============================================================================
+
+OFED documentation is located in the ofed-docs RPM. After
+installation the documents are located under the directory:
+<prefix>/docs (the default prefix is /usr/local/ofed).
+
+Document list:
+
+ o README.txt
+ o OFED_Installation_Guide.txt
+ o MPI_README.txt
+ o Examples of configuration files
+ o OFED_tips.txt
+ o HOWTO.build_ofed
+ o All release notes
+
+For more information, please visit the OpenFabrics web site:
+
+ http://www.openfabrics.org/
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ Version 1.1
+ Release Notes
+
+ October 2006
+
+
+===============================================================================
+Table of Contents
+===============================================================================
+1. Overview, which includes:
+ - OFED Distribution Rev 1.1 Contents
+ - Supported Platforms and Operating Systems
+ - Supported HCA Adapter Cards and Firmware Versions
+ - Tested Switch Platforms
+ - Third party Test Packages
+ - OFED sources
+2. Main Changes from OFED 1.0
+3. Fixed Bugs
+4. Known Issues
+
+
+===============================================================================
+1. Overview
+===============================================================================
+These are the release notes of Open Fabrics Enterprise Distribution (OFED)
+release 1.1. The OFED software package is composed of several software modules,
+and is intended for use on a computer cluster constructed as an InfiniBand
+network.
+
+Note: If you plan to upgrade the OFED package on your cluster, please upgrade
+all of its nodes to this new version.
+
+
+1.1 OFED 1.1 Contents
+---------------------
+The OFED package contains the following components:
+ o OpenFabrics core and ULPs:
+ - HCA drivers (mthca, ipath, ehca)
+ - core
+ - Upper Layer Protocols: IPoIB, SDP, SRP Initiator, iSER Host and uDAPL
+ o OpenFabrics utilities:
+ - OpenSM (OSM): InfiniBand Subnet Manager
+ - Diagnostic tools
+ - Performance tests
+ o MPI:
+ - OSU MPI stack supporting the InfiniBand interface
+ - Open MPI stack supporting the InfiniBand interface
+ - MPI benchmark tests (OSU benchmarks, Intel MPI benchmarks, Presta)
+ o Sources of all software modules (under conditions mentioned in the modules'
+ LICENSE files)
+ o Documentation
+
+Notes:
+1. SDP is in beta quality.
+2. ehca driver is in technology preview state.
+3. All other OFED components are of production quality.
+4. See release notes for each package in the docs directory.
+5. Any Topspin copyright belongs to Cisco Systems, Inc.
+
+1.2 Supported Platforms and Operating Systems
+---------------------------------------------
+ o CPU architectures:
+ - x86_64
+ - x86
+ - ia64
+ - ppc64
+
+ o Linux Operating Systems:
+ - RedHat EL4 up3: 2.6.9-34.ELsmp
+ - RedHat EL4 up4: 2.6.9-42.ELsmp
+ - SLES9 SP3: 2.6.5-7.244-smp
+ - SLES10: 2.6.16.21-0.8-smp
+ - kernel.org: 2.6.17.x and 2.6.18.x
+
+1.3 HCAs Supported
+------------------
+This release supports HCAs by Mellanox Technologies, Qlogic and IBM.
+
+ o Mellanox Technologies HCAs:
+ - InfiniHost (fw-23108 Rev 3.5.000)
+ - InfiniHost III Ex (MemFree: fw-25218 Rev 5.1.400
+ with memory: fw-25208 Rev 4.7.600)
+ - InfiniHost III Lx (fw-25204 Rev 1.1.000)
+ The SDR and DDR modes of the InfiniHost III family are supported.
+
+ For official firmware versions please see:
+ http://www.mellanox.com/support/firmware_table.php
+
+ o Qlogic HCAs:
+ - QHT6040 (PathScale InfiniPath HT-460)
+ - QHT6140 (PathScale InfiniPath HT-465)
+ - QLE6140 (PathScale InfiniPath PE-880)
+
+ o IBM HCAs:
+ - GX Dual-port 4x IB HCA
+ - GX Dual-port 12x IB HCA
+
+
+1.4 Switches Supported
+----------------------
+This release was tested with switches and gateways provided by the following
+companies:
+ - Cisco
+ - Voltaire
+ - SilverStorm
+ - Flextronics
+
+1.5 Third Party Packages
+------------------------
+The following third party packages have been tested with OFED 1.1:
+1. Intel MPI, Version 2.0.1 - refresh, and Version 3.0
+
+1.6 OFED Sources:
+-----------------
+Source repositories:
+Kernel: git://www.mellanox.co.il/~git/infiniband ofed_1_1
+User: https://openib.org/svn/gen2/branches/1.1/src/userspace
+
+The kernel sources are based on Linux 2.6.18-rc6 mainline kernel. Its patches
+are included in the OFED sources directory.
+For details see HOWTO.build_ofed.
+
+
+===============================================================================
+2. Main Changes from OFED 1.0
+===============================================================================
+Note: For details regarding the various changes, please see the release notes
+for each package in the docs directory.
+
+ 2.1 General changes:
+ o Kernel code based on 2.6.18
+ o HCA fatal - kernel flow support
+ o High Availability in IPoIB and SRP
+ o RDS was removed for the OFED package
+ o IBM low level driver (ehca) was added
+
+ 2.1 IPoIB:
+ o High Availability support using a user-level daemon (beta quality)
+
+ 2.2 SDP:
+ o Beta quality (higher stability)
+ o Improved latency
+ o Implemented the Naggle algorithm
+ o Supports sending/receiving out of band data
+ o Interoperability with previous SDP implementation
+
+ 2.3 SRP:
+ o GA quality
+ o DM (Device Mapper) - for high availability (beta quality).
+ o New srp_daemon was added
+
+ 2.4 iSER:
+ o Testing more platforms (e.g., ppc64 and ia64)
+
+ 2.5 uDAPL
+ o Scalability features needed for Intel MPI
+
+ 2.6 MPI:
+ a. OSU MVAPICH:
+ o Version was changed to 0.9.7-mlx2.2.0
+ o Message coalescing
+ b. Open MPI:
+ o Version was updated to v1.1.1
+ o Bug fixes and general enhancements over v1.1
+ o See http://www.open-mpi.org/svn/new.php for details
+ c. MPI tests:
+ o Updated the tests to latest versions from LLNL, Intel, OSU
+
+ 2.7 OSM:
+ o Partition Manager (Pkey)
+ o Pre-computed routing load from file
+ o Primitive QoS - as technology preview
+
+ 2.8 Management:
+ o Added Madeye utility
+ o Added saquery tool
+ o Enhanced ibnetdiscover tool with grouping function
+ o New ibutils package:
+ o Port error counter check
+ o Port performance counters dump
+ o Link width and Link speed check by flag
+
+ 2.9 Install:
+ o Create both 32-bit and 64-bit user-level libraries on x86_64 and
+ ppc64 platforms
+ o OSM RPM was separated into several RPMs to enable installing
+ diagnostic tools without the opensm executable.
+ o The package kernel include files are placed under <prefix>/src/openib.
+ These includes should be used when building kernel modules which use
+ the Infiniband stack. (Note that these includes, if needed, have
+ been "backported" to your kernel).
+ o The package raw (unbackported) source files are placed under
+ <prefix>/src/openib-1.1.
+
+===============================================================================
+3. Fixed Bugs
+===============================================================================
+1. OFED installation now supports installing lib32 on 64-bit systems.
+2. Registration of huge page memory buffers is now supported.
+3. Diagnostic tools do not require an opensm executable to be installed anymore.
+4. Hotplug removal does not hang the system when the device is used by
+ the uverbs interface.
+4. MVAPICH does work on ppc64.
+
+Bugs fixed in each package are reported in the packages release notes.
+
+===============================================================================
+4. Known Issues
+===============================================================================
+The following is a list of major limitations and known issues of the various
+components of the OFED 1.1 release.
+
+1. Memory registration by user is limited according to the administrator
+ setting. See "Pinning (Locking) User Memory Pages" in OFED_tips.txt for
+ system configuration.
+2. Fork support from kernel 2.6.12 and above is available provided
+ that applications do not use threads. The fork() is supported as long
+ as parent process does not run before child exits or calls exec().
+ The former can be achieved by calling wait(childpid) the later can be
+ achieved by application specific means. Posix system() call is
+ supported.
+3. On RedHat EL4 up2 and Fedora Core 4 the driver may not load properly if
+ SELINUX is enforced.
+ Workaround: Change the value of the parameter SELINUX in
+ /etc/sysconfig/selinux from "enforcing" to "permissive" or "disabled".
+4. libibcm is not thread safe: if several threads use libibcm, the function
+ ib_cm_get_device will give the same device to all of the threads, which
+ can cause thread X to get events that were sent to thread Y.
+5. ehca driver is supported only on kernel 2.6.18.
+6. ipath driver is supported only on 64 bit platforms.
+
+Note: See the release notes of each component for additional issues.
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ Tips for Working with OFED 1.1
+
+ October 2006
+
+===============================================================================
+Table of Contents
+===============================================================================
+1. OFED Utilities
+2. Debug HOWTOs
+3. Pinning (Locking) User Memory Pages
+
+
+===============================================================================
+1. OFED Utilities
+===============================================================================
+
+The OFED package includes utilities under <prefix>/bin, where <prefix> stands
+for the OFED installation path. To retrieve the this path, run the script
+"/etc/infiniband/info" as explained in 2.2 below.
+
+Notes:
+------
+1. This document includes descriptions for a subset of the existing utilities.
+ To learn about other utilities, use their --help flag.
+
+2. The sources for all utilities are not part of the RPM installation. However,
+ all sources exist in the openib-1.1.tgz tarball.
+
+
+1.1 Device Information
+----------------------
+Device information can be obtained using several utilities:
+
+a. ibv_devinfo
+
+ ibv_devinfo prints the ca attributes.
+
+ usage:
+ ibv_devinfo
+
+ Options:
+ -d, --ib-dev=<dev> use IB device <dev> (default: first device found)
+ -i, --ib-port=<port> use port <port> of IB device (default: all ports)
+ -l, --list print only the IB devices names
+ -v, --verbose print all the attributes of the IB device(s)
+
+b. ibstat
+
+ usage:
+ ibstat [OPTIONS] <ca_name> [portnum]
+
+ Options:
+ -d debug
+ -l list all IB devices
+ -s print short device summary
+ -p print port GUIDs
+ -V print ibstat version information and exit
+ -h print usage
+
+ Examples:
+ ibstat -l # list all IB devices
+ ibstat mthca0 2 # stat port 2 of mthca0
+
+c. Using sysfs file system
+ The driver supports the sysfs file system under: /sys/class/infiniband
+
+ Examples:
+
+ > ls /sys/class/infiniband/mthca0/
+ board_id device fw_ver hca_type hw_rev node_desc node_guid node_type
+ ports sys_image_guid
+
+ > cat /sys/class/infiniband/mthca0/board_id
+ MT_0200000001
+
+ > ls /sys/class/infiniband/mthca0/ports/1/
+ cap_mask counters gids lid lid_mask_count phys_state pkeys rate sm_lid
+ sm_sl state
+
+ > cat /sys/class/infiniband/mthca0/ports/1/state
+ 4: ACTIVE
+
+1.2 Performance Tests
+---------------------
+ The following performance tests are provided with the OFED release:
+
+ 1. Latency tests:
+ - ib_read_lat: RDMA read
+ - ib_write_lat: RDMA write
+ - ib_send_lat: UD, UC and RC (default) send
+
+ 2. Bandwidth tests:
+ - ib_read_bw: RDMA read
+ - ib_write_bw: RDMA write
+ - ib_send_bw: UD, UC and RC (default) send
+
+ Usage:
+ Server: <test name> <options>
+ Client: <test name> <options> <server IP address>
+ <server IP address> is an Ethernet or IPoIB address.
+ --help lists the available <options>. The same options must be
+ passed to both server and client.
+
+ Note: See PERF_TEST_README.txt for more information on the performance
+ tests.
+
+ Example: ib_send_bw
+ Usage:
+ ib_send_bw start a server and wait for connection
+ ib_send_bw <host> connect to server at <host>
+
+ options:
+ -p, --port=<port> listen on/connect to port <port>
+ (default: 18515)
+ -d, --ib-dev=<dev> use IB device <dev>
+ (default: first device found)
+ -i, --ib-port=<port> use port <port> of IB device
+ (default: 1)
+ -c, --connection=<RC/UC/UD> connection type RC/UC/UD (default: RC)
+ -m, --mtu=<mtu> mtu size (default: 1024)
+ -s, --size=<size> size of message to exchange
+ (default: 65536)
+ -a, --all run sizes from 2 up to 2^23
+ -t, --tx-depth=<dep> size of tx queue (default: 300)
+ -n, --iters=<iters> number of exchanges
+ (at least 2, default: 1000)
+ -b, --bidirectional measure bidirectional bandwidth
+ (default: unidirectional)
+ -V, --version display version number
+
+1.3 Ping-pong Example Tests
+---------------------------
+ The ping-pong example tests provide basic connectivity tests. Each test
+ has a help message (-h).
+ - ibv_ud_pingpong
+ - ibv_rc_pingpong
+ - ibv_srq_pingpong
+ - ibv_uc_pingpong
+
+ Example: ibv_ud_pingpong --h
+ Usage:
+ ibv_ud_pingpong start a server and wait for connection
+ ibv_ud_pingpong <host> connect to server at <host>
+
+ options:
+ -p, --port=<port> listen on/connect to port <port>
+ (default: 18515)
+ -d, --ib-dev=<dev> use IB device <dev>
+ (default: first device found)
+ -i, --ib-port=<port> use port <port> of IB device (default: 1)
+ -s, --size=<size> size of message to exchange (default: 2048)
+ -r, --rx-depth=<dep> number of receives to post at a time
+ (default: 500)
+ -n, --iters=<iters> number of exchanges (default: 1000)
+ -e, --events sleep on CQ events (default: poll)
+
+
+===============================================================================
+2. Debug HOWTOs
+===============================================================================
+
+2.1 OFED Components and version information
+-------------------------------------------
+The text file <prefix>/BUILD_ID provides data on all OFED components (whether
+installed or not).
+
+For example:
+
+ > cat /usr/local/ofed/BUILD_ID
+ OFED-1.1-rc4
+
+ openib-1.1 (REV=9304)
+ # User space
+ https://openib.org/svn/gen2/branches/1.1/src/userspace
+ Git:
+ ref: refs/heads/ofed_1_1
+ commit d39c60f1406d29eb8e336529610574800a81d81e
+
+ # MPI
+ mpi_osu-0.9.7-mlx2.2.0.tgz
+ openmpi-1.1.1-1.src.rpm
+ mpitests-2.0-0.src.rpm
+
+2.2 Installed OFED Components
+-------------------------------
+The script /etc/infiniband/info provides data on the specific OFED installation
+on this machine.
+
+For example:
+
+ > /etc/infiniband/info
+ prefix=/usr/local/ofed
+ Kernel=2.6.9-22.ELsmp
+
+ MODULES: CONFIG_INFINIBAND=m CONFIG_INFINIBAND_USER_MAD=m
+ CONFIG_INFINIBAND_USER_ACCESS=m CONFIG_INFINIBAND_ADDR_TRANS=y
+ CONFIG_INFINIBAND_MTHCA=m CONFIG_IPATH_CORE=m CONFIG_INFINIBAND_IPATH=m
+ CONFIG_INFINIBAND_IPOIB=m
+
+ User level: --kernel-version 2.6.9-22.ELsmp --kernel-sources
+ /lib/modules/2.6.9-22.ELsmp/build --with-libibcm --with-libibverbs
+ --with-libipathverbs --with-libmthca --with-mstflint --with-perftest
+
+2.3 Building/Installing IB Modules with debug information
+---------------------------------------------------------
+To compile/build/install the IB modules so that they will contain debug
+information, set OPENIB_KERNEL_EXTRA_CFLAGS="-g" in your environment
+before running OFED install.sh/build.sh .
+
+
+===============================================================================
+3. Pinning (Locking) User Memory Pages
+===============================================================================
+
+Memory locking is managed by the kernel on a per user basis. Regular users (as
+opposed to root) have a limited number of pages which they may pin, where
+the limit is pre-set by the administrator. Registering memory for IB verbs
+requires pinning memory, thus an application cannot register more memory than
+it is allowed to pin.
+
+The user can change the system per-process memory lock limit by adding
+the following two lines to file /etc/security/limits.conf:
+
+ * soft memlock <number>
+ * hard memlock <number>
+
+ where <number> denotes the number of KBytes that may be locked by a
+ user process.
+
+The above change to /etc/security/limits.conf will allow any user process in the
+system to lock up to <number> KBytes of memory.
+
+On some systems, it may be possible to use "unlimited" for the size to disable
+these limits entirely.
+
+Note: The file /etc/security/limits.conf contains further documentation.
+
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ Performance Tests README for OFED 1.1
+
+ October 2006
+
+
+
+===============================================================================
+Table of Contents
+===============================================================================
+1. Overview
+2. Notes on Testing Methodology
+3. Test Descriptions
+4. Running Tests
+
+===============================================================================
+1. Overview
+===============================================================================
+This is a collection of tests written over uverbs intended for use as a
+performance micro-benchmark. As an example, the tests can be used for
+HW or SW tuning and/or functional testing.
+
+Please post results/observations to the openib-general mailing list.
+See "Contact Us" at http://openib.org/mailman/listinfo/openib-general and
+http://www.openib.org.
+
+
+===============================================================================
+2. Notes on Testing Methodology
+===============================================================================
+- The benchmark used the CPU cycle counter to get time stamps without context
+ switch. Some CPU architectures (e.g., Intel's 80486 or older PPC) do NOT
+ have such capability.
+
+- The benchmark measures round-trip time but reports half of that as one-way
+ latency. This means that it may not be sufficiently accurate for asymmetrical
+ configurations.
+
+- Min/Median/Max result is reported.
+ The median (vs average) is less sensitive to extreme scores.
+ Typically, the "Max" value is the first value measured.
+
+- Larger samples help marginally only. The default (1000) is pretty good.
+ Note that an array of cycles_t (typically unsigned long) is allocated
+ once to collect samples and again to store the difference between them.
+ Really big sample sizes (e.g., 1 million) might expose other problems
+ with the program.
+
+- The "-H" option will dump the histogram for additional statistical analysis.
+ See xgraph, ygraph, r-base (http://www.r-project.org/), pspp, or other
+ statistical math programs.
+
+Architectures tested: i686, x86_64, ia64
+
+
+
+===============================================================================
+4. Test Descriptions
+===============================================================================
+
+rdma_lat.c latency test with RDMA write transactions
+rdma_bw.c streaming BW test with RDMA write transactions
+
+
+The following tests are mainly useful for HW/SW benchmarking.
+They are not intended as actual usage examples.
+
+send_lat.c latency test with send transactions
+send_bw.c BW test with send transactions
+write_lat.c latency test with RDMA write transactions
+write_bw.c BW test with RDMA write transactions
+read_lat.c latency test with RDMA read transactions
+read_bw.c BW test with RDMA read transactions
+
+The executable name of each test starts with the general prefix "ib_",
+e.g., ib_write_lat.
+
+Running Tests
+-------------
+
+Prerequisites:
+ kernel 2.6
+ ib_uverbs (kernel module) matches libibverbs
+ ("match" means binary compatible, but ideally of the same SVN rev)
+
+Server: ./<test name> <options>
+Client: ./<test name> <options> <server IP address>
+
+ o <server address> is IPv4 or IPv6 address. You can use the IPoIB
+ address if IPoIB is configured.
+ o --help lists the available <options>
+
+ *** IMPORTANT NOTE: The SAME OPTIONS must be passed to both server and client.
+
+
+Common Options to all tests:
+ -p, --port=<port> listen on/connect to port <port> (default: 18515)
+ -m, --mtu=<mtu> mtu size (default: 1024)
+ -d, --ib-dev=<dev> use IB device <dev> (default: first device found)
+ -i, --ib-port=<port> use port <port> of IB device (default: 1)
+ -s, --size=<size> size of message to exchange (default: 1)
+ -a, --all run sizes from 2 till 2^23
+ -t, --tx-depth=<dep> size of tx queue (default: 50)
+ -n, --iters=<iters> number of exchanges (at least 100, default: 1000)
+ -C, --report-cycles report times in cpu cycle units
+ (default: microseconds)
+ -H, --report-histogram print out all results
+ (default: print summary only)
+ -U, --report-unsorted (implies -H) print out unsorted results
+ (default: sorted)
+ -V, --version display version number
+
+ *** IMPORTANT NOTE: You need to be running a Subnet Manager on the switch or
+ on one of the nodes in your fabric.
+
+Example:
+Run "ib_rdma_lat -C" on the server side.
+Then run "ib_rdma_lat -C <server IP address>" on the client.
+
+ib_rdma_lat will exit on both server and client after printing results.
+
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ Version 1.1
+ README
+
+ October 2006
+
+
+This is the OpenFabrics Enterprise Distribution (OFED) version 1.1 software
+package supporting InfiniBand fabrics. It is composed of several software
+modules intended for use on a computer cluster constructed as an InfiniBand
+network.
+
+*** Note: If you plan to upgrade OFED on your cluster, please upgrade all
+ its nodes to this new version.
+
+This document includes the following sections:
+
+1. HW and SW Requirements
+2. OFED Package Contents
+3. A Note on the Installation Process
+4. Building OFED Software RPMs
+5. Installing OFED
+6. Starting and Verifying the IB Fabric
+7. MPI (Message Passing Interface)
+8. Related Documentation
+
+
+OpenFabrics Home Page: http://www.openfabrics.org
+The OFED rev 1.1 software download available in www.openib.org/downloads.html
+
+Please email bugs and error reports to your InfiniBand vendor, or use bugzilla
+http://openib.org/bugzilla/
+
+
+
+1. HW and SW Requirements:
+==========================
+1) Server platform with InfiniBand HCA (see OFED Distribution
+ Release Notes for details)
+
+2) Linux OS (see OFED Distribution Release Notes for details)
+
+3) Administrator privileges on your machine(s)
+
+4) Disk Space: - For Build & Installation: 300MB
+ - For Installation only: 200MB
+
+5) For the OFED Distribution to compile on your machine, some software packages
+ of your OS distribution are required. These are listed here.
+
+OS Distribution Required Packages
+--------------- ----------------------------------
+General:
+o Common to all gcc, glib, glib-devel, glibc, glibc-devel,
+ automake, autoconf, libtool.
+o RedHat kernel-devel, sysfsutils, sysfsutils-devel, rpm-build
+o SLES 9.0 kernel-source, udev, rpm
+o SLES 10.0 kernel-source, sysfsutils, sysfsutils-devel, rpm
+
+Specific Component Requirements:
+o OSU MPI requires: Fortran Compiler(default: gcc-g77)
+o ibutils: tcl-8.4, tcl-devel-8.4
+o oiscsi-iser-support: open-iscsi, db-devel
+o tvflash: pciutils-devel
+
+
+2. OFED Package Contents
+========================
+
+The OFED Distribution package generates RPMs for installing the following:
+
+ o OpenFabrics core and ULPs:
+ - HCA drivers (mthca, ipath, ehca)
+ - core
+ - Upper Layer Protocols: IPoIB, SDP, SRP Initiator, iSER Initiator,
+ and uDAPL
+ o OpenFabrics utilities:
+ - OpenSM: InfiniBand Subnet Manager
+ - Diagnostic tools
+ - Performance tests
+ o MPI:
+ - OSU MPI stack supporting the InfiniBand interface
+ - Open MPI stack supporting the InfiniBand interface
+ - MPI benchmark tests (OSU BW/LAT, Intel MPI Benchmark, Presta)
+ o Sources of all software modules (under conditions mentioned in the
+ modules' LICENSE files)
+ o Documentation
+
+
+3. A Note on the Installation Process
+=====================================
+
+The OFED build process can take up to 40 minutes. If you are planning to
+install the OFED package on a multi-node cluster, it is recommended to build
+OFED RPMs once into a shared directory, and use the created RPMs in order to
+install the package on the rest of the cluster machines.
+
+Use the script build.sh to build the OFED RPMs. This script can be used as a
+non-root user.
+
+To install the package, use the install.sh script. When installing from scratch,
+install.sh will first build the RPMs, then install them onto the local machine.
+If the RPMs already exist, the install.sh script will simply install them onto
+the local machine without re-building them.
+
+*** Important Note for Open MPI users ONLY:
+ You must install OFED (run install.sh). Building the OFED RPMs is
+ not sufficient.
+
+4. Building OFED Software RPMs
+==============================
+
+Building OFED SW RPM packages can be a separate process or part of the
+installer. In the latter case you may skip this section and move to the next
+one: "Installing OFED Software".
+
+Some users may wish to build OFED RPM files separate from the main
+installation flow. To do this, please run the ./build.sh script. (See note in
+Section 3 above.)
+
+The build process will temporarily use the following default directory:
+/var/tmp/OFED. The build.sh script will prompt the user to enter a different
+temporary directory if desired.
+
+build.sh will also prompt the user for the installation directory. By default it
+is /usr/local/ofed.
+The RPMs will be placed under ./RPMS directory.
+
+For further details, see "Building OFED RPMs" and "Advanced Usage of OFED" in
+OFED_Installation_Guide.txt under OFED-1.1/docs.
+
+
+5. Installing OFED Software
+============================
+
+The default installation directory is: /usr/local/ofed
+
+Install Quick Guide:
+1) Download and extract: tar xzvf OFED-1.1.tgz file.
+2) Change into directory: cd OFED-1.1
+3) Run as root: ./install.sh
+4) Follow the directions to install required components. For details, please see
+ OFED_Installation_Guide.txt under OFED-1.1/docs.
+
+
+Note: The install script removes previously installed IB packages and
+ re-installs from scratch. You will be prompted to acknowledge the deletion
+ of the old packages. However, configuration files (.conf) will be
+ preserved and saved with a ".rpmsave" extension.
+
+
+6. Starting and Verifying the IB Fabric
+=======================================
+
+1) If you rebooted your machine after the installation process completed,
+ IB interfaces should be up. If you did not reboot your machine, please
+ enter the following command: /etc/init.d/openibd start
+
+2) Check that the IB driver is running on all nodes: ibv_devinfo should print
+ "hca_id: <linux device name>" on the first line.
+
+3) Make sure that a Subnet Manager is running by invoking the sminfo utility.
+ If an SM is not running, sminfo prints:
+ sminfo: iberror: query failed
+ If an SM is running, sminfo prints the LID and other SM node information.
+ Example:
+ sminfo: sm lid 0x1 sm guid 0x2c9010b7c2ae1, activity count 20 priority 1
+
+ To check if OpenSM is running on the management node, enter: /etc/init.d/opensmd status
+ To start OpenSM, enter: /etc/init.d/opensmd start
+
+ Note: OpenSM parameters can be set via the file /etc/opensm.conf.
+ Note: OpenSM can be configured to run upon boot by setting 'ONBOOT=yes'
+ in /etc/opensm.conf.
+
+4) Verify the status of ports by using ibv_devinfo: all connected ports should
+ report a "PORT_ACTIVE" state.
+
+5) Check the network connectivity status: run ibchecknet to see if the subnet
+ is "clean" and ready for ULP/application use. The following tools display
+ more information in addition to IB info: ibnetdiscover, ibhosts, and
+ ibswitches.
+
+6) Alternatively, instead of running steps 3 to 5 you can use the ibdiagnet
+ utility to perform a set of tests on your network. Upon finding an error,
+ ibdiagnet will print a message starting with a "-E-". For a more complete
+ report of the network features you should run ibdiagnet -r. If you have a
+ topology file describing your network you can feed this file to ibdiagnet
+ (using the option: -t <file>) and all reports will use the names they
+ appear in the file (instead of LIDs, GUIDs and directed routes).
+
+7) To run an application over SDP set the following variables:
+ env LD_PRELOAD='stack_prefix'/lib/libsdp.so
+ LIBSDP_CONFIG_FILE='stack_prefix'/etc/libsdp.conf <application name>
+ (or LD_PRELOAD='stack_prefix'/lib64/libsdp.so on 64 bit machines)
+ The default 'stack_prefix' is /usr/local/ofed.
+
+
+7. MPI (Message Passing Interface)
+==================================
+
+In Step 2 of the main menu of install.sh, options 2, 3 and 4 can
+install one or more MPI stacks. Multiple MPI stacks can be installed
+simultaneously -- they will not conflict with each other.
+
+There are two MPI stacks included in this release of OFED:
+
+- Ohio State University's MVAPICH 0.9.7 (specifically updated and
+ modified by Mellanox Technologies and Cisco or this release of OFED)
+- Open MPI 1.1.1
+
+OFED also includes 4 basic tests that can be run against each MPI
+stack: bandwidth (bw), latency (lt), Intel MPI Benchmark and Presta. The tests
+are located under: <prefix>/mpi/<compiler>/<mpi stack>/tests/.
+
+Please see MPI_README.txt for more details on each MPI package and how to run
+the tests.
+
+
+8. Related Documentation
+========================
+1) Release Notes for OFED Distribution components are to be found under
+ OFED-1.1/docs and, after the package installation, under <prefix>/docs.
+2) For a detailed installation guide, see OFED_Installation_Guide.txt.
+3) MPI_README.txt under <prefix>/docs.
+4) OFED_tips.txt under <prefix>/docs
+5) PERF_TEST_README.txt under <prefix>/docs
+5) For more information, please visit the OFED web-page http://www.openfabrics.org
+
+
+For more information contact your InfiniBand vendor.
+
--- /dev/null
+Index: dhcp-3.0.4/includes/site.h
+===================================================================
+--- dhcp-3.0.4.orig/includes/site.h 2002-03-12 20:33:39.000000000 +0200
++++ dhcp-3.0.4/includes/site.h 2006-05-23 11:34:38.000000000 +0300
+@@ -135,7 +135,7 @@
+ the aforementioned problems do not matter to you, or if no other
+ API is supported for your system, you may want to go with it. */
+
+-/* #define USE_SOCKETS */
++#define USE_SOCKETS
+
+ /* Define this to use the Sun Streams NIT API.
+
+Index: dhcp-3.0.4/common/discover.c
+===================================================================
+--- dhcp-3.0.4.orig/common/discover.c 2006-02-23 00:43:27.000000000 +0200
++++ dhcp-3.0.4/common/discover.c 2006-05-23 11:45:16.000000000 +0300
+@@ -532,6 +532,12 @@ void discover_interfaces (state)
+ break;
+ #endif
+
++ case ARPHRD_INFINIBAND:
++ tmp -> hw_address.hlen = 1;
++ tmp -> hw_address.hbuf [0] = ARPHRD_INFINIBAND;
++ memcpy (&tmp -> hw_address.hbuf [1], sa.sa_data, 20);
++ break;
++
+ default:
+ log_error ("%s: unknown hardware address type %d",
+ ifr.ifr_name, sa.sa_family);
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ Diagnostic Tools in OFED 1.1 Release Notes
+
+ October 2006
+
+
+Repo: https://openib.org/svn/gen2/branches/1.1/src/userspace/management/diags
+Version: 9535
+
+
+General
+-------
+Model of operation: All utilities use direct MAD access to perform their
+operations. Operations that require QP0 mads only may use direct routed
+mads, and therefore may work even in unconfigured subnets. Almost all
+utilities can operate without accessing the SM, unless GUID to lid translation
+is required. The only exception to this is saquery which requires the SM.
+
+
+Dependencies
+------------
+Most utilities depend on libibmad and libibumad.
+All utilities depend on the ib_umad kernel module.
+
+
+Multiple port/Multiple CA support
+---------------------------------
+When no IB device or port is specified (see the "local umad parameters" below),
+the libibumad library selects the port to use by the following criteria:
+1. the first port that is ACTIVE.
+2. if not found, the first port that is UP (physical link up).
+
+If a port and/or CA name is specified, the libibumad library attempts to
+satisfy the user request, and will fail if it cannot do so.
+
+For example:
+ ibaddr # use the 'best port'
+ ibaddr -C mthca1 # pick the best port from mthca1 only.
+ ibaddr -P 2 # use the second (active/up) port from the
+ first available IB device.
+ ibaddr -C mthca0 -P 2 # use the specified port only.
+
+
+Common options & flags
+----------------------
+Most diagnostics take the following flags. The exact list of supported
+flags per utility can be found in the usage message and can be displayed
+using util_name -h syntax.
+
+# Debugging flags
+ -d raise the IB debugging level. May be used
+ several times (-ddd or -d -d -d).
+ -e show umad send receive errors (timeouts and others)
+ -h display the usage message
+ -v increase the application verbosity level.
+ May be used several times (-vv or -v -v -v)
+ -V display the internal version info.
+
+# Addressing flags
+ -D use directed path address arguments. The path
+ is a comma separated list of out ports.
+ Examples:
+ "0" # self port
+ "0,1,2,1,4" # out via port 1, then 2, ...
+ -G use GUID address arguments. In most cases, it is the Port GUID.
+ Examples:
+ "0x08f1040023"
+ -s <smlid> use 'smlid' as the target lid for SA queries.
+
+# Local umad parameters:
+ -C <ca_name> use the specified ca_name.
+ -P <ca_port> use the specified ca_port.
+ -t <timeout_ms> override the default timeout for the solicited mads.
+
+
+CLI notation
+------------
+All utilities use the POSIX style notation, meaning that all options (flags)
+must precede all arguments (parameters).
+
+
+Bugs Fixed
+----------
+man pages are now supplied.
+
+
+Utilities descriptions
+----------------------
+See man pages
+
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ ehca in OFED 1.1 Release Notes
+
+ October 2006
+
+
+Overview
+--------
+ehca is the low level driver implementation for all IBM GX-based HCAs.
+
+ehca Available Parameters
+--------------------------
+In order to set ehca parameters, add the following line(s) to /etc/modprobe.conf:
+
+ options ib_ehca <parameter>=<value>
+
+whereby <parameter> is one of the following items:
+- debug_level debug level (0: no debug traces (default), 1: with debug traces)
+- nr_ports number of connected ports (default: 2)
+- port_act_time time to wait for port activation (default: 30 sec)
+
+Known Issues
+------------
+
+1. The device driver normally uses both ports. For using just one port connect
+the ports as shown in Figure 1 and load the device driver by running
+`modprobe ib_ehca nr_ports=1`.
+
+ --------- IB Card in p570
+ |
+ \ /
+ +---+
+ | # |
+ | # |
+ | # |
+ | # | <--- Port 2: NOT CONNECTED
+ | # |
+ |---|
+ | # |
+ | # |
+ | # |
+ | # | <--- Port 1: CONNECTED TO THE INFINIBAND SWITCH
+ | # |
+ +---+
+
+*Figure 1:* Connections if only one port is used.
+
+NOTE: In OpenPower 720 and p550 port 1 is at the top, port 2 is at the bottom.
+
+2. Furthermore the port(s) needs to be connected to an active switch port while
+loading the ehca device driver.
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ IBUTILS in OFED 1.1 Release Notes
+
+ October 2006
+
+
+===============================================================================
+Table of Contents
+===============================================================================
+1. Overview
+2. Requirements
+3. Reports
+4. Known Issues
+
+
+===============================================================================
+1. Overview
+===============================================================================
+
+The IBUTILS package provides means for debugging the connectivity and
+status of InfiniBand (IB) devices in a fabric.
+
+The package tools are intended to provide the following services:
+* Discover the InfiniBand fabric connectivity
+* Determine whether or not a Subnet Manager (SM) is running
+* Identify links which drop packets and/or incur errors by sending MAD
+ packets multiple times, across all the links, reporting port monitor counters
+* Identify fabric level mismatches or inconsistencies such as:
+ - Duplicate port GUIDs - Two or more different ports with the same GUID
+ - Duplicate node GUIDs - Two or more different nodes with the same node GUID
+ - Duplicate LIDs - Two or more devices that have the same assigned LID
+ - Zero valued LIDs - A device with LID=0 indicates that the SM did not
+ assign a LID to this device.
+ - Zero valued system GUIDs - A device with system GUID=0 indicates that
+ the vendor did not assign it a GUID.
+ - An InfiniBand link is in the INIT state, which prevents data transfer
+ - Unexpected link width (when using the -lw flag)
+ - Unexpected link speed (when using the -ls flag)
+
+The IBUTILS package includes the following stand-alone tools:
+
+ o ibdiagnet
+ Discovers the network, providing a listing of the following:
+ - All the nodes, ports and links in the fabric
+ - Link Forwarding Tables (LFT) dump file
+ - Multicast Forwarding Tables (MFT) dump file
+ - Fabric Subnet Managers (SMs) information file and a list of all
+ the masked GUIDs found
+
+ o ibdiagpath
+ - Traces a path between two nodes specified by LIDs or a directed path
+ between the source and destination nodes.
+ - Provides information regarding the nodes and ports traversed.
+ - Utilizes device-specific health queries for the different devices
+ along the path between the source and destination nodes.
+
+Note: There are man pages for both tools.
+
+
+===============================================================================
+2. Requirements
+===============================================================================
+ 1. ibis must be installed.
+
+ 2. The path environment variable must include the path to ibis. To define the
+ path to ibis use one of the following commands (depending on your shell):
+ export PATH=<path to ibis>:$PATH
+ or
+ setenv PATH <path to ibis>:$PATH
+
+ (the default path to ibis is: /usr/local/ofed/bin/)
+
+===============================================================================
+3. Reports
+===============================================================================
+The default directory for all generated report files is /tmp .
+
+Both utilities collect summary information regarding all the fabric SM's
+during the run, and then output that information at end of the run in file
+/tmp/ibdiagnet.sm.
+
+Each report message includes:
+ - Device Type
+ - Device portGUID
+ - The direct path to the device
+ - If a topology file is provided to be matched with the discovered fabric,
+ the node name is also provided in the report message. Otherwise, host
+ names are included only in HCA-related report messages.
+
+===============================================================================
+4. Known Issues
+===============================================================================
+ ibdiagpath issues:
+ - If no subnet manager is initialized in the subnet, FDB tables may be
+ incorrectly set. Consequently, PortCounter MADs cannot be sent.
+
+ - A link along a LID-routed path in INIT state causes ibdiagpath performance
+ queries to fail. The performance queries fail since they cannot proceed via
+ non-ACTIVE links.
+
+ - ibdiagpath cannot validate the provided topology file against the existing
+ fabric topology. If the topology file includes a device/link that does not
+ exist, or the device/link information is incorrect, then ibdiagpath may
+ -- in name-based routing -- extract a non-existing path based on the
+ incorrect topology file.
+
+ - If the hostname provided for the -s flag is not the actual local hostname,
+ then all the extracted names from the topology file will be incorrect.
+ However, all the other information provided will be correct.
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ ipath in OFED 1.1 Release Notes
+
+ October 2006
+
+
+Overview
+--------
+ipath is the low level driver implementation for all QLogic HCAs.
+
+
+1. MSI (Message Signalled Interrupt) support required with QLogic HCAs
+----------------------------------------------------------------------
+The QLogic adapter require MSI (Message Signalled Interrupts) to be
+enabled in the kernel. In addition, the kernels provided with some Linux
+distributions must be rebuilt after installation of a kernel patch to
+fix the pci_msi_quirk bug introduced in the 2.6.12 kernel.
+
+1.1. If the InfiniPath driver is being compiled on a machine without
+CONFIG_PCI_MSI=y configured, a warning similar to this will appear in
+dmesg at boot:
+
+[root@sqa-00 ~]# dmesg | grep ipath
+ipath_core 0000:01:00.0: infinipath0: pci_enable_msi failed: -1, interrupts may not work
+ipath_core 0000:01:00.0: infinipath0: irq is 0, BIOS error? Interrupts won't work
+
+OpenFabrics on a QLogic adapter will not work correctly unless the kernel
+is configured with CONFIG_PCI_MSI=y.
+
+1.2. Systems with QLogic adapters and which contain the AMD8131 PCI
+bridge may require installation of the pci_msi_quirk kernel patch. If
+the following messages are displayed on the console during boot, or are
+in /var/log/messages, you will need to install the patch.
+
+PCI: MSI quirk detected. pci_msi_quirk set.
+path_core 0000:03:00.0: pci_enable_msi failed: -22, interrupts
+may not work
+
+NOTE: This problem has been fixed in the 2.6.17 kernel.org kernel.
+
+To install pci_msi_quirk patch and configure MSI for use with QLogic adapter
+----------------------------------------------------------------------------
+To remedy both of these problems simultaneously, build the
+kernel RPMs yourself with stock kernel SRPMs available from your
+distribution source, and install the kernel patches provided at
+http://www.pathscale.com/infinipath_support/downloads-1.3.html.
+
+See http://www.pathscale.com/docs/infinipath/1.3/msi_patch_notes.txt
+for details.
+
+Once these instructions are completed, rebuild the OFED kernel
+modules with the OFED installer.
+
+2. Note:
+--------
+When running Fedora Core 4 with the QLogic adapters, it is recommended
+to use the 2.6.16 kernel.
+
+3. Known Issues:
+----------------
+The ipath driver only supports the x86_64 architecture in this release.
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ IPoIB in OFED 1.1 Release Notes
+
+ October 2006
+
+
+===============================================================================
+Table of Contents
+===============================================================================
+1. Overview
+2. New Features
+3. Known Issues
+4. DHCP Support of IPoIB
+5. High Availability (HA) Service
+
+===============================================================================
+1. Overview
+===============================================================================
+IPoIB is a network driver implementation that enables transmission of IP and
+ARP protocol packets over an InfiniBand UD channel. The implementation conforms
+to the relevant IETF working group's RFCs (http://www.ietf.org).
+
+
+===============================================================================
+2. New Features
+===============================================================================
+IPoIB supports increasing the verbosity of debug messages through a module
+parameter. The parameter value can be controlled at load time or runtime.
+At load time this can be done by inserting the following line in
+/etc/modprobe.conf:
+ options ib_ipoib debug_level=<integer number>
+
+At runtime the value can be controlled by writing the following to sysfs:
+ echo <integer number> > /sys/module/ib_ipoib/parameters/debug_level
+
+The value can also be inspected by running:
+ cat /sys/module/ib_ipoib/parameters/debug_level
+
+Note that on some older kernels (like the one supplied with Redhat AS 4.0),
+the path for inspecting the debug level is different and is as follows:
+ /sys/module/ib_ipoib/debug_level
+
+===============================================================================
+3. Known Issues
+===============================================================================
+1. If a host has multiple interfaces, each belonging to a different IP subnet,
+ yet they use the same InfiniBand switch, the host may build an incorrect ARP
+ table. This may lead to problems which seem like violations of the IP rule
+ requiring different broadcast domains -- a rule not observed in this
+ implementation of IPoIB.
+
+2. On Fedora Core 4, SuSE 10 and SLES 10:
+ a. There are IPoIB alias lines in modprobe.conf which prevent stopping/
+ unloading the stack (i.e., '/etc/init.d/openibd stop' will fail).
+ These alias lines cause the drivers to be loaded again by udev scripts.
+
+ Workaround: Change modprobe.conf to set
+ OPENIB_PARAMS="--without-modprobe" before running install.sh,
+ or remove the alias lines from modprobe.conf.
+
+ b. The ib1 interface uses the configuration script of ib0.
+
+ Workaround: Invoke ifup/ifdown using both the interface name and the
+ configuration script name (example: ifup ib1 ib1).
+
+3. On RedHat EL 4 up2 the driver may not load properly if SELINUX is enforced.
+
+ Workaround: Change the value of the parameter SELINUX in
+ /etc/sysconfig/selinux from "enforcing" to "permissive" or "disabled".
+
+4. Since the IPoIB configuration files (ifcfg-ib<n>) are installed at the
+ standard networking scripts location (RedHat:/etc/sysconfig/network-scripts/
+ and SuSE: /etc/sysconfig/network/), the option IPOIB_LOAD=no in openib.conf
+ does not prevent the loading of IPoIB on boot.
+
+5. On RedHat EL 4 up4, ipoib multicast group membership does not work
+ due to missing code in the kernel which was available in u3 and removed
+ in u4.
+
+
+===============================================================================
+4. DHCP Support of IPoIB
+===============================================================================
+IPoIB is configured by default to use information obtained dynamically from a
+DHCP server, at driver startup time, to configure its interfaces.
+
+Note: To use DHCP the user must apply a special patch (see "DHCP Notes" below).
+
+DHCP Supported Operating Systems
+--------------------------------
+1. SLES 10
+2. SuSE 10
+3. Any kernel from 2.6.14 (tested with kernel 2.6.16.18)
+
+DHCP Unsupported Operating Systems
+----------------------------------
+No RedHat EL distributions are supported.
+
+
+DHCP Notes
+----------
+1. It may be required to run over different UDP ports than the well known ports
+ (67 and 68). Free port numbers greater than 0x8000 must be chosen. To
+ specify a server or client port number, use the option -p <port number>.
+ The client's port number must be the chosen server's port number plus one.
+
+2. For IPoIB to use DHCP, it is required to patch ISC's DHCP. The patch file can
+ be found under OFED-1.1/docs/dhcp after extracting the distribution file
+ (after installation it can also be found under <prefix>/docs/dhcp). The patch
+ should be applied for the server and for each client. Tests were run on
+ version 3.0.4 of the DHCP package.
+
+
+===============================================================================
+5. High Availability (HA) Service
+===============================================================================
+High Availability (HA) service for IPoIB interfaces is provided via the
+ipoibtools package. Ipoibtools currently includes a perl script, ipoib_ha.pl,
+and two executables, arpingib and mcasthandle.
+
+The HA service operates as follows: A user-level daemon runs in background to
+detect failure of the primary IPoIB interface. If such a failure is detected
+(e.g., port down), the daemon configures the secondary IPoIB interface with the
+configuration parameters of the primary IPoIB interface (so that the secondary
+interface assumes the IP identity of the primary interface).
+
+Enabling the HA Service
+-----------------------
+To enable HA service automatically (upon bootup of the driver),
+perform the following steps:
+
+1. Edit file '/etc/infiniband/openib.conf' as follows:
+
+ IPOIBHA_ENABLE=yes
+ PRIMARY_IPOIB_DEV=ib0
+ SECONDARY_IPOIB_DEV=ib1
+
+2. Run '/etc/init.d/openibd restart' to restart the driver.
+
+The HA service may also be activated manually, via the following command:
+
+ ipoib_ha.pl -p <primary IPoIB interface> -s <secondary IPoIB interface> \
+ --with-arping --with-multicast [-v]
+
+ -p primary IPoIB interface (default: ib0)
+ -s secondary IPoIB interface (default: ib1)
+ --with-arping use modified arping utility to send an unsolicited
+ ARP REPLY
+ --with-multicast support applications that are using multicast
+ -v verbose output
+
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ iSER initiator in OFED 1.1 Release Notes
+
+ October 2006
+
+
+* Background
+
+ iSER allows iSCSI to be layered over RDMA transports (including InfiniBand
+ and iWARP (RNIC)).
+
+ The OpenIB iSER initiator implementation is interoperable with open-iscsi
+ (http://www.open-iscsi.org/). It provides an alternative transport to
+ iscsi_tcp in the open iscsi framework. The iSER transport exposes a
+ transport API to scsi_transport_iscsi, and a SCSI LLD API to the Linux
+ SCSI mid-layer (scsi_mod).
+
+* supported platforms
+
+ SLES10 (RC1 and later)
+
+ the release has been tested against Voltaire iSCSI/iSER target running
+ in Voltaire's IB/Fibre-Channel router
+
+* known issues
+
+ SCSI command aborts may cause some instabilities
+
+* iSER links
+
+ WIKI pages
+
+ information on building/configuring/running the open iscsi initiator w. iSER
+ https://openib.org/tiki/tiki-index.php?page=iSER
+
+ IETF pages
+
+ iSCSI and iSER specifications come out of the IETF IP storage (IPS) work group
+ http://www.ietf.org/html.charters/ips-charter.html.
+
+ "ABOUT" page
+
+ general and detailed information on iSCSI and iSER
+ http://www.voltaire.com/iser.htm
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ mthca in OFED 1.1 Release Notes
+
+ October 2006
+
+
+===============================================================================
+Table of Contents
+===============================================================================
+1. Overview
+2. New Features
+3. Fixed Bugs
+4. Known Issues
+
+===============================================================================
+1. Overview
+===============================================================================
+mthca is the low level driver implementation for all Mellanox Technologies HCAs.
+
+mthca Available Parameters
+--------------------------
+In order to set mthca parameters, add the following line to /etc/modpobe.conf:
+
+ options ib_mthca parameter=<value>
+
+mthca parameters:
+
+ - tune_pci increase PCI burst from the default set by BIOS if nonzero
+ - msi attempt to use MSI if nonzero
+ - msi_x attempt to use MSI-X if nonzero
+ - fw_cmd_doorbell post FW commands through doorbell page if nonzero (and
+ supported by FW)
+ - catas_reset_disable disable device reset on a catastrophic event if nonzero
+
+
+===============================================================================
+2. New Features
+===============================================================================
+1. Catastrophic event reset: catastrophic event handling has been expanded
+ to include resetting the device. After generating the IB_EVENT_DEVICE_FATAL
+ async event, mthca now resets the device (assuming that the
+ catas_reset_disable module parameter described above is zero).
+
+ Note that the reset entails removing then adding the device. For the device
+ to complete the reset, all user-level applications using device resources
+ directly via the user verbs layer must release those resources. Thus, such
+ applications should register to receive async events, should detect the
+ IBV_EVENT_DEVICE_FATAL event, and should release all resources for that
+ device upon receiving such an event.
+
+
+===============================================================================
+3. Fixed Bugs
+===============================================================================
+1. mthca no longer misses restoring the following PCI-X/PCI Express
+ registers after reset:
+ o PCI-X device: PCI-X command register
+ o PCI-X bridge: upstream and downstream split transaction registers
+ o PCI Express: PCI Express device control and link control registers
+2. Fence bit is now supported properly.
+3. Fixed modify_qp, modify_srq and resize_cq methods to be fully reentrant.
+
+===============================================================================
+4. Known Issues
+===============================================================================
+1. UAR size other than 8MB prevents mthca driver loading. The default UAR
+ size is 8MB. If it is changed, the following error message will be logged to
+ /var/log/messages upon attempt to load the mthca driver:
+ ib_mthca 0000:04:00.0: Missing UAR, aborting.
+
+2. If a user level application that uses multicast receives a control signal
+ in the process of detaching from a multicast group, its QP may remain a
+ member of the multicast group (in HCA).
+ Workaround: Destroy the multicast group after detaching the QP from it.
+
+3. On MemFree devices, RC QPs can be created with a maximum of (max_sge - 3)
+ entries only.
+
+4. Performance degradation due to wrong BIOS configuration:
+ The PCI Express spec. requires BIOS to set the MaxReadReq register
+ for each card for maximum performance and stability.
+
+ If you are seeing bandwidth performance degradation, you can try forcing
+ the card to behave out of PCI Express spec. by setting the tune_pci=1 module
+ parameter. This tune_pci=1 option was the default setting in OFED
+ 1.0, which might have masked performance degradation on some systems.
+
+ If tune_pci=1 improves bandwidth, please report the issue to your
+ BIOS vendor. Please note that Mellanox Technologies does not recommend using
+ tune_pci=1 in production systems: working with tune_pci=1 option set is
+ untested and is known to trigger stability issues on some platforms.
+
--- /dev/null
+#
+# Copyright (c) 2006 Mellanox Technologies. All rights reserved.
+#
+# This Software is licensed under one of the following licenses:
+#
+# 1) under the terms of the "Common Public License 1.0" a copy of which is
+# available from the Open Source Initiative, see
+# http://www.opensource.org/licenses/cpl.php.
+#
+# 2) under the terms of the "The BSD License" a copy of which is
+# available from the Open Source Initiative, see
+# http://www.opensource.org/licenses/bsd-license.php.
+#
+# 3) under the terms of the "GNU General Public License (GPL) Version 2" a
+# copy of which is available from the Open Source Initiative, see
+# http://www.opensource.org/licenses/gpl-license.php.
+#
+# Licensee has the right to choose one of the above licenses.
+#
+# Redistributions of source code must retain the above copyright
+# notice and one of the license notices.
+#
+# Redistributions in binary form must reproduce both the above copyright
+# notice, one of the license notices in the documentation
+# and/or other materials provided with the distribution.
+#
+#
+# $Id: ofed-docs.spec 7948 2006-06-13 12:42:34Z vlad $
+#
+
+Summary: OFED docs
+Name: ofed-docs
+Version: @VERSION@
+Release: 0
+License: GPL/BSD
+Url: https://openib.org/svn/gen2/branches/1.1/ofed/docs
+Group: Documentation/Man
+Source: %{name}-%{version}.tar.gz
+BuildRoot: %{?build_root:%{build_root}}%{!?build_root:/var/tmp/%{name}-%{version}-root}
+Vendor: OpenFabrics
+%description
+OpenFabrics documentation
+
+%prep
+%setup -q -n %{name}-%{version}
+
+%install
+mkdir -p $RPM_BUILD_ROOT%{_prefix}
+cp -a * $RPM_BUILD_ROOT%{_prefix}
+
+%clean
+rm -rf $RPM_BUILD_ROOT
+
+%files
+%defattr(-,root,root)
+%{_prefix}/docs
+%{_prefix}/README.txt
+%{_prefix}/LICENSE
+%{_prefix}/BUILD_ID
+
+%changelog
+* Thu Jul 27 2006 Vladimir Sokolovsky <vlad@mellanox.co.il>
+- Changed version to 1.1
+* Tue Jun 6 2006 Vladimir Sokolovsky <vlad@mellanox.co.il>
+- Initial packaging
--- /dev/null
+STACK_PREFIX=/usr/local/ofed
+BUILD_ROOT=/var/tmp/OFED
+kernel_ib=y
+kernel_ib_devel=y
+libibverbs=y
+libibverbs_devel=y
+libibverbs_utils=y
+libibcm=y
+libibcm_devel=y
+libmthca=y
+libehca=y
+libmthca_devel=y
+perftest=y
+mstflint=y
+libipathverbs=y
+libipathverbs_devel=y
+oiscsi_iser=y
+ofed_docs=y
+ofed_scripts=y
+libsdp=y
+srptools=y
+ipoibtools=y
+tvflash=y
+libibcommon=y
+libibcommon_devel=y
+libibmad=y
+libibmad_devel=y
+libibumad=y
+libibumad_devel=y
+libopensm=y
+libopensm_devel=y
+opensm=y
+libosmcomp=y
+libosmcomp_devel=y
+libosmvendor=y
+libosmvendor_devel=y
+openib_diags=y
+librdmacm=y
+librdmacm_devel=y
+librdmacm_utils=y
+dapl=y
+dapl_devel=y
+mpi_osu=y
+openmpi=y
+mpitests=y
+ibutils=y
+ib_verbs=y
+ib_mthca=y
+ib_ehca=y
+ib_ipoib=y
+ib_ipath=y
+ib_sdp=y
+ib_srp=y
+ib_iser=y
+MPI_COMPILER_mpi_osu=" gcc intel pathscale"
+MPI_COMPILER_openmpi=" gcc intel pathscale"
+# OPENIB_PARAMS="--with-memtrack --without-modprobe --with-madeye-mod --without-ipoibconf"
--- /dev/null
+LAN_INTERFACE_ib0=eth0
+IPADDR_ib0=192.168.0.'*'
+NETMASK_ib0=255.255.255.0
+NETWORK_ib0=192.168.0.0
+BROADCAST_ib0=192.168.0.255
+ONBOOT_ib0=1
+
+LAN_INTERFACE_ib1=eth0
+IPADDR_ib1=172.16.'*'.'*'
+NETMASK_ib1=255.255.0.0
+NETWORK_ib1=172.16.0.0
+BROADCAST_ib1=172.16.255.255
+ONBOOT_ib1=1
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ Open MPI in OFED 1.1 Copyrights, License, and Release Notes
+
+ October 2006
+
+
+Open MPI Copyrights
+-------------------
+Copyright (c) 2004-2006 The Trustees of Indiana University and Indiana
+ University Research and Technology
+ Corporation. All rights reserved.
+Copyright (c) 2004-2006 The University of Tennessee and The University
+ of Tennessee Research Foundation. All rights
+ reserved.
+Copyright (c) 2004-2006 High Performance Computing Center Stuttgart,
+ University of Stuttgart. All rights reserved.
+Copyright (c) 2004-2006 The Regents of the University of California.
+ All rights reserved.
+Copyright (c) 2006 Cisco Systems, Inc. All rights reserved.
+Copyright (c) 2006 Voltaire, Inc. All rights reserved.
+
+Additional copyrights may follow
+
+Open MPI License
+----------------
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are
+met:
+
+- Redistributions of source code must retain the above copyright
+ notice, this list of conditions and the following disclaimer.
+
+- Redistributions in binary form must reproduce the above copyright
+ notice, this list of conditions and the following disclaimer listed
+ in this license in the documentation and/or other materials
+ provided with the distribution.
+
+- Neither the name of the copyright holders nor the names of its
+ contributors may be used to endorse or promote products derived from
+ this software without specific prior written permission.
+
+The copyright holders provide no reassurances that the source code
+provided does not infringe any patent, copyright, or any other
+intellectual property rights of third parties. The copyright holders
+disclaim any liability to any recipient for claims brought against
+recipient by any third party for infringement of that parties
+intellectual property rights.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+===========================================================================
+
+The best way to report bugs, send comments, or ask questions is to
+sign up on the user's and/or developer's mailing list (for user-level
+and developer-level questions; when in doubt, send to the user's
+list):
+
+ users@open-mpi.org
+ devel@open-mpi.org
+
+Because of spam, only subscribers are allowed to post to these lists
+(ensure that you subscribe with and post from exactly the same e-mail
+address -- joe@example.com is considered different than
+joe@mycomputer.example.com!). Visit these pages to subscribe to the
+lists:
+
+ http://www.open-mpi.org/mailman/listinfo.cgi/users
+ http://www.open-mpi.org/mailman/listinfo.cgi/devel
+
+Thanks for your time.
+
+===========================================================================
+
+OFED-Specific Release Notes
+---------------------------
+
+SLES 10 with non-gcc compiler support:
+
+The Open MPI v1.1.1-1 SRPM included in OFED v1.1 will not build with
+non-gcc compilers on SLES 10 because SLES's rpmbuild command inserts
+"-D_FORTIFY_SOURCE=2" into the build flags, which -- for lack of a
+longer, boring explanation -- doesn't yet work with non-gcc compilers.
+
+Open MPI can be built from source to workaround this problem. The
+source code for Open MPI can be extracted from the SRPM shipped with
+OFED or downloaded from the main Open MPI web site:
+http://www.open-mpi.org/.
+
+To compile with Open MPI from source with OFED support, fully install
+the rest of OFED and then configure Open MPI with the
+"--with-openib=/usr/local/ofed" command line option. See the rest of
+the documentation below for other configure command line options and
+installation instructions.
+
+Intel compiler support:
+
+Some versions of the Intel 9.1 C++ compiler suite series produce
+incorrect code when used with the Open MPI C++ bindings. Symptoms of
+this problem include crashing applications (e.g., segmentation
+violations) and Open MPI producing errors about incorrect parameters.
+
+===========================================================================
+
+General Release Notes
+---------------------
+
+The following abbreviated list of release notes applies to this code
+base as of this writing (17 Jun 2006):
+
+- Open MPI includes support for a wide variety of supplemental
+ hardware and software package. When configuring Open MPI, you may
+ need to supply additional flags to the "configure" script in order
+ to tell Open MPI where the header files, libraries, and any other
+ required files are located. As such, running "configure" by itself
+ may include support for all the devices (etc.) that you expect,
+ especially if their support headers / libraries are installed in
+ non-standard locations. Network interconnects are an easy example
+ to discuss -- Myrinet and Infiniband, for example, both have
+ supplemental headers and libraries that must be found before Open
+ MPI can build support for them. You must specify where these files
+ are with the appropriate options to configure. See the listing of
+ configure command-line switches, below, for more details.
+
+- The Open MPI installation must be in your PATH on all nodes (and
+ potentially LD_LIBRARY_PATH, if libmpi is a shared library).
+
+- LAM/MPI-like mpirun notation of "C" and "N" is not yet supported.
+
+- Striping MPI messages across multiple networks is supported (and
+ happens automatically when multiple networks are available), but
+ needs performance tuning.
+
+- The run-time systems that are currently supported are:
+ - rsh / ssh
+ - Recent versions of BProc (e.g., Clustermatic)
+ - PBS Pro, Open PBS, Torque (i.e., anything who supports the TM
+ interface)
+ - SLURM
+ - XGrid
+ - Cray XT-3 / Red Storm
+
+- The majority of Open MPI's documentation is here in this file and on
+ the web site FAQ (http://www.open-mpi.org/). This will eventually
+ be supplemented with cohesive installation and user documentation
+ files.
+
+- Systems that have been tested are:
+ - Linux, 32 bit, with gcc
+ - Linux, 64 bit (x86), with gcc
+ - OS X (10.3), 32 bit, with gcc
+ - OS X (10.4), 32 bit, with gcc
+
+- Other systems have been lightly (but not fully tested):
+ - Other compilers on Linux, 32 and 64 bit
+ - Other 64 bit platforms (Linux and AIX on PPC64, SPARC)
+
+- Some MCA parameters can be set in a way that renders Open MPI
+ inoperable (see notes about MCA parameters later in this file). In
+ particular, some parameters have required options that must be
+ included.
+ - If specified, the "btl" parameter must include the "self"
+ component, or Open MPI will not be able to deliver messages to the
+ same rank as the sender. For example: "mpirun --mca btl tcp,self
+ ..."
+ - If specified, the "btl_tcp_if_exclude" paramater must include the
+ loopback device ("lo" on many Linux platforms), or Open MPI will
+ not be able to route MPI messages using the TCP BTL. For example:
+ "mpirun --mca btl_tcp_if_exclude lo,eth1 ..."
+
+- Building shared libraries on AIX with the xlc compilers is only
+ supported if you supply the following command line option to
+ configure: LDFLAGS=-Wl,-brtl.
+
+- Open MPI does not support the Sparc v8 CPU target, which is the
+ default on Sun Solaris. The v8plus (32 bit) or v9 (64 bit)
+ targets must be used to build Open MPI on Solaris. This can be
+ done by including a flag in CFLAGS, CXXFLAGS, FFLAGS, and FCFLAGS,
+ -xarch=v8plus for the Sun compilers, -mv8plus for GCC.
+
+- At least some versions of the Intel 8.1 compiler seg fault while
+ compiling certain Open MPI source code files. As such, it is not
+ supported.
+
+- The Intel 9.0 v20051201 compiler on IA64 platforms seems to have a
+ problem with optimizing the ptmalloc2 memory manager component (the
+ generated code will segv). As such, the ptmalloc2 component will
+ automatically disable itself if it detects that it is on this
+ platform/compiler combination. The only effect that this should
+ have is that the MCA parameter mpi_leave_pinned will be inoperative.
+
+- Early versions of the Portland Group 6.0 compiler have problems
+ creating the C++ MPI bindings as a shared library (e.g., v6.0-1).
+ Tests with later versions show that this has been fixed (e.g.,
+ v6.0-5).
+
+- The Portland Group compilers require the "-Msignextend" compiler
+ flag to extend the sign bit when converting from a shorter to longer
+ integer. This is is different than other compilers (such as GNU).
+ When compiling Open MPI with the Portland compiler suite, the
+ following flags should be passed to Open MPI's configure script:
+
+ shell$ ./configure CFLAGS=-Msignextend CXXFLAGS=-signextent \
+ --with-wrapper-cflags=-Msignextend \
+ --with-wrapper-cxxflags=-Msignextend ...
+
+ This will both compile Open MPI with the proper compile flags and
+ also automatically add "-Msignextend" when the C and C++ MPI wrapper
+ compilers are used to compile user MPI applications.
+
+- Open MPI will build bindings suitable for all common forms of
+ Fortran 77 compiler symbol mangling on platforms that support it
+ (e.g., Linux). On platforms that do not support weak symbols (e.g.,
+ OS X), Open MPI will build Fortran 77 bindings just for the compiler
+ that Open MPI was configured with.
+
+ Hence, on platforms that support it, if you configure Open MPI with
+ a Fortran 77 compiler that uses one symbol mangling scheme, you can
+ successfully compile and link MPI Fortran 77 applications with a
+ Fortran 77 compiler that uses a different symbol mangling scheme.
+
+ NOTE: For platforms that support the multi-Fortran-compiler bindings
+ (i.e., weak symbols are supported), due to limitations in the MPI
+ standard and in Fortran compilers, it is not possible to hide these
+ differences in all cases. Specifically, the following two cases may
+ not be portable between different Fortran compilers:
+
+ 1. The C constants MPI_F_STATUS_IGNORE and MPI_F_STATUSES_IGNORE
+ will only compare properly to Fortran applications that were
+ created with Fortran compilers that that use the same
+ name-mangling scheme as the Fortran compiler that Open MPI was
+ configured with.
+
+ 2. Fortran compilers may have different values for the logical
+ .TRUE. constant. As such, any MPI function that uses the Fortran
+ LOGICAL type may only get .TRUE. values back that correspond to
+ the the .TRUE. value of the Fortran compiler that Open MPI was
+ configured with. Note that some Fortran compilers allow forcing
+ .TRUE. to be 1 and .FALSE. to be 0. For example, the Portland
+ Group compilers provide the "-Munixlogical" option, and Intel
+ compilers (version >= 8.) provide the "-fpscomp logicals" option.
+
+ You can use the ompi_info command to see the Fortran compiler that
+ Open MPI was configured with.
+
+- The MPI and run-time layers do not free all used memory properly
+ during MPI_FINALIZE.
+
+- Running on nodes with different endian and/or different datatype
+ sizes within a single parallel job is supported starting with Open
+ MPI v1.1. However, Open MPI does not resize data when datatypes
+ differ in size (for example, sending a 4 byte MPI_LONG and receiving
+ an 8 byte MPI_LONG will fail).
+
+- MPI_THREAD_MULTIPLE support is included, but is only lightly tested.
+
+- Asynchronous message passing progress using threads can be turned on
+ with the --enable-progress-threads option to configure.
+ Asynchronous message passing progress is only supported for TCP,
+ shared memory, and Myrinet/GM. Myrinet/GM has only been lightly
+ tested.
+
+- Due to limitations in the Libtool 1.5 series, Fortran 90 MPI
+ bindings support can only be built as a static library. It is
+ expected that Libtool 2.0 (and therefore future releases of Open
+ MPI) will be able to support shared libraries for the Fortran 90
+ bindings.
+
+- The XGrid support is experimental - see the Open MPI FAQ and this
+ post on the Open MPI user's mailing list for more information:
+
+ http://www.open-mpi.org/community/lists/users/2006/01/0539.php
+
+- The MX library limits the maximum message fragment size for both
+ on-node and off-node messages. As of MX v1.0.3, the inter-node
+ maximum fragment size is 32k, and the intra-node maximum fragment
+ size is 16k -- fragments sent larger than these sizes will fail.
+ Open MPI automatically fragments large messages; it currently limits
+ its first fragment size on MX networks to the lower of these two
+ values -- 16k. As such, increasing the value of the MCA parameter
+ named "btl_mx_first_frag_size" larger than 16k may cause failures in
+ some cases (i.e., when using MX to send large messages to processes
+ on the same node); it will cause failures in all cases if it is set
+ above 32k. Note that this only affects the *first* fragment of
+ messages; latter fragments do not have this size restriction. The
+ MCA parameter btl_mx_max_send_size can be used to vary the maximum
+ size of subsequent fragments.
+
+- The current version of the Open MPI point-to-point engine does not
+ yet support hardware-level MPI message matching. As such, MPI
+ message matching must be performed in software, artificially
+ increasing latency for short messages on certain networks (such as
+ MX and hardware-supported Portals). Future versions of Open MPI
+ will support hardware matching on networks that provide it, and will
+ eliminate the extra overhead of software MPI message matching where
+ possible.
+
+- The Fortran 90 MPI bindings can now be built in one of three sizes
+ using --with-mpi-f90-size=SIZE (see description below). These sizes
+ reflect the number of MPI functions included in the "mpi" Fortran 90
+ module and therefore which functions will be subject to strict type
+ checking. All functions not included in the Fortran 90 module can
+ still be invoked from F90 applications, but will fall back to
+ Fortran-77 style checking (i.e., little/none).
+
+ - trivial: Only includes F90-specific functions from MPI-2. This
+ means overloaded versions of MPI_SIZEOF for all the MPI-supported
+ F90 intrinsic types.
+
+ - small (default): All the functions in "trivial" plus all MPI
+ functions that take no choice buffers (meaning buffers that are
+ specified by the user and are of type (void*) in the C bindings --
+ generally buffers specified for message passing). Hence,
+ functions like MPI_COMM_RANK are included, but functions like
+ MPI_SEND are not.
+
+ - medium: All the functions in "small" plus all MPI functions that
+ take one choice buffer (e.g., MPI_SEND, MPI_RECV, ...). All
+ one-choice-buffer functions have overloaded variants for each of
+ the MPI-supported Fortran intrinsic types up to the number of
+ dimensions specified by --with-f90-max-array-dim (default value is
+ 4).
+
+ Increasing the size of the F90 module (in order from trivial, small,
+ and medium) will generally increase the length of time required to
+ compile user MPI applications. Specifically, "trivial"- and
+ "small"-sized F90 modules generally allow user MPI applications to
+ be compiled fairly quickly but lose type safety for all MPI
+ functions with choice buffers. "medium"-sized F90 modules generally
+ take longer to compile user applications but provide greater type
+ safety for MPI functions.
+
+ Note that MPI functions with two choice buffers (e.g., MPI_GATHER)
+ are not currently included in Open MPI's F90 interface. Calls to
+ these functions will automatically fall through to Open MPI's F77
+ interface. A "large" size that includes the two choice buffer MPI
+ functions is possible in future versions of Open MPI.
+
+===========================================================================
+
+Building Open MPI
+-----------------
+
+Open MPI uses a traditional configure script paired with "make" to
+build. Typical installs can be of the pattern:
+
+---------------------------------------------------------------------------
+shell$ ./configure [...options...]
+shell$ make all install
+---------------------------------------------------------------------------
+
+There are many available configure options (see "./configure --help"
+for a full list); a summary of the more commonly used ones follows:
+
+--prefix=<directory>
+ Install Open MPI into the base directory named <directory>. Hence,
+ Open MPI will place its executables in <directory>/bin, its header
+ files in <directory>/include, its libraries in <directory>/lib, etc.
+
+--with-gm=<directory>
+ Specify the directory where the GM libraries and header files are
+ located. This enables GM support in Open MPI.
+
+--with-gm-libdir=<directory>
+ Look in directory for the GM libraries. By default, Open MPI will
+ look in <gm directory>/lib and <gm directory>/lib64, which covers
+ most cases. This option is only needed for special configurations.
+
+--with-mx=<directory>
+ Specify the directory where the MX libraries and header files are
+ located. This enables MX support in Open MPI.
+
+--with-mx-libdir=<directory>
+ Look in directory for the MX libraries. By default, Open MPI will
+ look in <mx directory>/lib and <mx directory>/lib64, which covers
+ most cases. This option is only needed for special configurations.
+
+--with-mvapi=<directory>
+ Specify the directory where the mVAPI libraries and header files are
+ located. This enables mVAPI support in Open MPI.
+
+--with-mvapi-libdir=<directory>
+ Look in directory for the MVAPI libraries. By default, Open MPI will
+ look in <mvapi directory>/lib and <mvapi directory>/lib64, which covers
+ most cases. This option is only needed for special configurations.
+
+--with-openib=<directory>
+ Specify the directory where the Open Fabrics (previously known as
+ OpenIB) libraries and header files are located. This enables Open
+ Fabrics support in Open MPI.
+
+--with-openib-libdir=<directory>
+ Look in directory for the OPENIB libraries. By default, Open MPI will
+ look in <openib directory>/lib and <openib directory>/lib64, which covers
+ most cases. This option is only needed for special configurations.
+
+--with-tm=<directory>
+ Specify the directory where the TM libraries and header files are
+ located. This enables PBS / Torque support in Open MPI.
+
+--with-mpi-param_check(=value)
+ "value" can be one of: always, never, runtime. If --with-mpi-param
+ is not specified, "runtime" is the default. If --with-mpi-param
+ is specified with no value, "always" is used. Using
+ --without-mpi-param-check is equivalent to "never".
+
+ - always: the parameters of MPI functions are always checked for
+ errors
+ - never: the parameters of MPI functions are never checked for
+ errors
+ - runtime: whether the parameters of MPI functions are checked
+ depends on the value of the MCA parameter mpi_param_check
+ (default: yes).
+
+--with-threads=value
+ Since thread support (both support for MPI_THREAD_MULTIPLE and
+ asynchronous progress) is only partially tested, it is disabled by
+ default. To enable threading, use "--with-threads=posix". This is
+ most useful when combined with --enable-mpi-threads and/or
+ --enable-progress-threads.
+
+--enable-mpi-threads
+ Allows the MPI thread level MPI_THREAD_MULTIPLE. See
+ --with-threads; this is currently disabled by default.
+
+--enable-progress-threads
+ Allows asynchronous progress in some transports. See
+ --with-threads; this is currently disabled by default.
+
+--disable-mpi-cxx
+ Disable building the C++ MPI bindings. Note that this does *not*
+ disable the C++ checks during configure; some of Open MPI's tools
+ are written in C++ and therefore require a C++ compiler to be built.
+
+--disable-mpi-f77
+ Disable building the Fortran 77 MPI bindings.
+
+--disable-mpi-f90
+ Disable building the Fortran 90 MPI bindings. Also related to the
+ --with-f90-max-array-dim and --with-mpi-f90-size options.
+
+--with-mpi-f90-size=<SIZE>
+ Three sizes of the MPI F90 module can be built: trivial (only a
+ handful of MPI-2 F90-specific functions are included in the F90
+ module), small (trivial + all MPI functions that take no choice
+ buffers), and medium (small + all MPI functions that take 1 choice
+ buffer). This parameter is only used if the F90 bindings are
+ enabled.
+
+--with-f90-max-array-dim=<DIM>
+ The F90 MPI bindings are strictly typed, even including the number of
+ dimensions for arrays for MPI choice buffer parameters. Open MPI
+ generates these bindings at compile time with a maximum number of
+ dimensions as specified by this parameter. The default value is 4.
+
+--disable-shared
+ By default, libmpi is built as a shared library, and all components
+ are built as dynamic shared objects (DSOs). This switch disables
+ this default; it is really only useful when used with
+ --enable-static. Specifically, this option does *not* imply
+ --disable-shared; enabling static libraries and disabling shared
+ libraries are two independent options.
+
+--enable-static
+ Build libmpi as a static library, and statically link in all
+ components. Note that this option does *not* imply
+ --disable-shared; enabling static libraries and disabling shared
+ libraries are two independent options.
+
+There are several other options available -- see "./configure --help".
+
+Changing the compilers that Open MPI uses to build itself uses the
+standard Autoconf mechanism of setting special environment variables
+either before invoking configure or on the configure command line.
+The following environment variables are recognized by configure:
+
+CC - C compiler to use
+CFLAGS - Compile flags to pass to the C compiler
+CPPFLAGS - Preprocessor flags to pass to the C compiler
+
+CXX - C++ compiler to use
+CXXFLAGS - Compile flags to pass to the C++ compiler
+CXXCPPFLAGS - Preprocessor flags to pass to the C++ compiler
+
+F77 - Fortran 77 compiler to use
+FFLAGS - Compile flags to pass to the Fortran 77 compiler
+
+FC - Fortran 90 compiler to use
+FCFLAGS - Compile flags to pass to the Fortran 90 compiler
+
+LDFLAGS - Linker flags to pass to all compilers
+LIBS - Libraries to pass to all compilers (it is rarely
+ necessary for users to need to specify additional LIBS)
+
+For example:
+
+shell$ ./configure CC=mycc CXX=myc++ F77=myf77 F90=myf90 ...
+
+It is required that the compilers specified be compile and link
+compatible, meaning that object files created by one compiler must be
+able to be linked with object files from the other compilers and
+produce correctly functioning executables.
+
+Open MPI supports all the "make" targets that are provided by GNU
+Automake, such as:
+
+all - build the entire Open MPI package
+install - install Open MPI
+uninstall - remove all traces of Open MPI from the $prefix
+clean - clean out the build tree
+
+Once Open MPI has been built and installed, it is safe to run "make
+clean" and/or remove the entire build tree.
+
+VPATH builds are fully supported.
+
+Generally speaking, the only thing that users need to do to use Open
+MPI is ensure that <prefix>/bin is in their PATH and <prefix>/lib is
+in their LD_LIBRARY_PATH. Users may need to ensure to set the PATH
+and LD_LIBRARY_PATH in their shell setup files (e.g., .bashrc, .cshrc)
+so that rsh/ssh-based logins will be able to find the Open MPI
+executables.
+
+===========================================================================
+
+Checking Your Open MPI Installation
+-----------------------------------
+
+The "ompi_info" command can be used to check the status of your Open
+MPI installation (located in <prefix>/bin/ompi_info). Running it with
+no arguments provides a summary of information about your Open MPI
+installation.
+
+Note that the ompi_info command is extremely helpful in determining
+which components are installed as well as listing all the run-time
+settable parameters that are available in each component (as well as
+their default values).
+
+The following options may be helpful:
+
+--all Show a *lot* of information about your Open MPI
+ installation.
+--parsable Display all the information in an easily
+ grep/cut/awk/sed-able format.
+--param <framework> <component>
+ A <framework> of "all" and a <component> of "all" will
+ show all parameters to all components. Otherwise, the
+ parameters of all the components in a specific framework,
+ or just the parameters of a specific component can be
+ displayed by using an appropriate <framework> and/or
+ <component> name.
+
+Changing the values of these parameters is explained in the "The
+Modular Component Architecture (MCA)" section, below.
+
+===========================================================================
+
+Compiling Open MPI Applications
+-------------------------------
+
+Open MPI provides "wrapper" compilers that should be used for
+compiling MPI applications:
+
+C: mpicc
+C++: mpiCC (or mpic++ if your filesystem is case-insensitive)
+Fortran 77: mpif77
+Fortran 90: mpif90
+
+For example:
+
+shell$ mpicc hello_world_mpi.c -o hello_world_mpi -g
+shell$
+
+All the wrapper compilers do is add a variety of compiler and linker
+flags to the command line and then invoke a back-end compiler. To be
+specific: the wrapper compilers do not parse source code at all; they
+are solely command-line manipulators, and have nothing to do with the
+actual compilation or linking of programs. The end result is an MPI
+executable that is properly linked to all the relevant libraries.
+
+===========================================================================
+
+Running Open MPI Applications
+-----------------------------
+
+Open MPI supports both mpirun and mpiexec (they are exactly
+equivalent). For example:
+
+shell$ mpirun -np 2 hello_world_mpi
+
+or
+
+shell$ mpiexec -np 1 hello_world_mpi : -np 1 hello_world_mpi
+
+are equivalent. Some of mpiexec's switches (such as -host and -arch)
+are not yet functional, although they will not error if you try to use
+them.
+
+The rsh starter accepts a -hostfile parameter (the option
+"-machinefile" is equivalent); you can specify a -hostfile parameter
+indicating an standard mpirun-style hostfile (one hostname per line):
+
+shell$ mpirun -hostfile my_hostfile -np 2 hello_world_mpi
+
+If you intend to run more than one process on a node, the hostfile can
+use the "slots" attribute. If "slots" is not specified, a count of 1
+is assumed. For example, using the following hostfile:
+
+---------------------------------------------------------------------------
+node1.example.com
+node2.example.com
+node3.example.com slots=2
+node4.example.com slots=4
+---------------------------------------------------------------------------
+
+shell$ mpirun -hostfile my_hostfile -np 8 hello_world_mpi
+
+will launch MPI_COMM_WORLD rank 0 on node1, rank 1 on node2, ranks 2
+and 3 on node3, and ranks 4 through 7 on node4.
+
+Other starters, such as the batch scheduling environments, do not
+require hostfiles (and will ignore the hostfile if it is supplied).
+
+Note that the values of component parameters can be changed on the
+mpirun / mpiexec command line. This is explained in the section
+below, "The Modular Component Architecture (MCA)".
+
+===========================================================================
+
+The Modular Component Architecture (MCA)
+
+The MCA is the backbone of Open MPI -- most services and functionality
+are implemented through MCA components. Here is a list of all the
+component frameworks in Open MPI:
+
+---------------------------------------------------------------------------
+MPI component frameworks:
+-------------------------
+
+allocator - Memory allocator
+bml - BTL management layer
+btl - MPI point-to-point byte transfer layer
+coll - MPI collective algorithms
+io - MPI-2 I/O
+mpool - Memory pooling
+pml - MPI point-to-point management layer
+ptl - (Outdated / deprecated) MPI point-to-point transport layer
+rcache - Memory registration cache
+topo - MPI topology routines
+
+Back-end run-time environment component frameworks:
+---------------------------------------------------
+
+errmgr - RTE error manager
+gpr - General purpose registry
+iof - I/O forwarding
+ns - Name server
+oob - Out of band messaging
+pls - Process launch system
+ras - Resource allocation system
+rds - Resource discovery system
+rmaps - Resource mapping system
+rmgr - Resource manager
+rml - RTE message layer
+schema - Name schemas
+sds - Startup / discovery service
+soh - State of health monitor
+
+Miscellaneous frameworks:
+-------------------------
+
+maffinity - Memory affinity
+memory - Memory subsystem hooks
+paffinity - Processor affinity
+timer - High-resolution timers
+
+---------------------------------------------------------------------------
+
+Each framework typically has one or more components that are used at
+run-time. For example, the btl framework is used by MPI to send bytes
+across underlying networks. The tcp btl, for example, sends messages
+across TCP-based networks; the gm btl sends messages across GM
+Myrinet-based networks.
+
+Each component typically has some tunable parameters that can be
+changed at run-time. Use the ompi_info command to check a component
+to see what its tunable parameters are. For example:
+
+shell$ ompi_info --param btl tcp
+
+shows all the parameters (and default values) for the tcp btl
+component.
+
+These values can be overridden at run-time in several ways. At
+run-time, the following locations are examined (in order) for new
+values of parameters:
+
+1. <prefix>/etc/openmpi-mca-params.conf
+
+ This file is intended to set any system-wide default MCA parameter
+ values -- it will apply, by default, to all users who use this Open
+ MPI installation. The default file that is installed contains many
+ comments explaining its format.
+
+2. $HOME/.openmpi/mca-params.conf
+
+ If this file exists, it should be in the same format as
+ <prefix>/etc/openmpi-mca-params.conf. It is intended to provide
+ per-user default parameter values.
+
+3. environment variables of the form OMPI_MCA_<name> set equal to a
+ <value>
+
+ Where <name> is the name of the parameter. For example, set the
+ variable named OMPI_MCA_btl_tcp_frag_size to the value 65536
+ (Bourne-style shells):
+
+ shell$ OMPI_MCA_btl_tcp_frag_size=65536
+ shell$ export OMPI_MCA_btl_tcp_frag_size
+
+4. the mpirun command line: --mca <name> <value>
+
+ Where <name> is the name of the parameter. For example:
+
+ shell$ mpirun --mca btl_tcp_frag_size 65536 -np 2 hello_world_mpi
+
+These locations are checked in order. For example, a parameter value
+passed on the mpirun command line will override an environment
+variable; an environment variable will override the system-wide
+defaults.
+
+===========================================================================
+
+Common Questions
+----------------
+
+Many common questions about building and using Open MPI are answered
+on the FAQ:
+
+ http://www.open-mpi.org/faq/
+
+===========================================================================
+
+Got more questions?
+-------------------
+
+Found a bug? Got a question? Want to make a suggestion? Want to
+contribute to Open MPI? Please let us know!
+
+User-level questions and comments should generally be sent to the
+user's mailing list (users@open-mpi.org). Because of spam, only
+subscribers are allowed to post to this list (ensure that you
+subscribe with and post from *exactly* the same e-mail address --
+joe@example.com is considered different than
+joe@mycomputer.example.com!). Visit this page to subscribe to the
+user's list:
+
+ http://www.open-mpi.org/mailman/listinfo.cgi/users
+
+Developer-level bug reports, questions, and comments should generally
+be sent to the developer's mailing list (devel@open-mpi.org). Please
+do not post the same question to both lists. As with the user's list,
+only subscribers are allowed to post to the developer's list. Visit
+the following web page to subscribe:
+
+ http://www.open-mpi.org/mailman/listinfo.cgi/devel
+
+When submitting bug reports to either list, be sure to include the
+following information in your mail (please compress!):
+
+- the stdout and stderr from Open MPI's configure
+- the top-level config.log file
+- the stdout and stderr from building Open MPI
+- the output from "ompi_info --all" (if possible)
+
+For Bourne-type shells, here's one way to capture this information:
+
+shell$ ./configure ... 2>&1 | tee config.out
+[...lots of configure output...]
+shell$ make 2>&1 | tee make.out
+[...lots of make output...]
+shell$ mkdir ompi-output
+shell$ cp config.out config.log make.out ompi-output
+shell$ ompi_info --all |& tee ompi-output/ompi-info.out
+shell$ tar cvf ompi-output.tar ompi-output
+[...output from tar...]
+shell$ gzip ompi-output.tar
+
+For C shell-type shells, the procedure is only slightly different:
+
+shell% ./configure ... |& tee config.out
+[...lots of configure output...]
+shell% make |& tee make.out
+[...lots of make output...]
+shell% mkdir ompi-output
+shell% cp config.out config.log make.out ompi-output
+shell% ompi_info --all |& tee ompi-output/ompi-info.out
+shell% tar cvf ompi-output.tar ompi-output
+[...output from tar...]
+shell% gzip ompi-output.tar
+
+In either case, attach the resulting ompi-output.tar.gz file to your
+mail. This provides the Open MPI developers with a lot of information
+about your installation and can greatly assist us in helping with your
+problem.
+
+Be sure to also include any other useful files (in the
+ompi-output.tar.gz tarball), such as output showing specific errors.
--- /dev/null
+ OpenSM Release Notes 2.0.5
+ ============================
+
+Version: OpenFabrics Enterprise Distribution (OFED) 1.1
+Repo: https://openib.org/svn/gen2/branches/1.1/src/userspace/management/osm
+Version: 9535 (openib-2.0.5)
+Date: October 2006
+
+1 Overview
+----------
+This document describes the contents of the OpenSM OFED 1.1 release.
+OpenSM is an InfiniBand compliant Subnet Manager and Administration,
+and runs on top of OpenIB. The OpenSM version for this release
+is openib-2.0.5
+
+This document includes the following sections:
+1 This Overview section (describing new features and software
+ dependencies)
+2 Known Issues And Limitations
+3 Unsupported IB compliance statements
+4 Major Bug Fixes
+5 Main Verification Flows
+6 Qualified software stacks and devices
+
+1.1 Major New Features
+
+* Partition manager:
+ The partition manager provides a means to setup multiple partitions
+ by providing a partition policy file. For details please read the
+ doc/partition-config.txt or the opensm man page.
+
+* Basic QoS Manager:
+ Provides a uniform configuration of the entire fabric with values defined
+ in the OpenSM options file. The options support different settings for
+ CAs, Switches, and Routers. Note that this is disabled by default and
+ using -Q enables QoS fabric setup.
+
+* Loading pre-routes from a file:
+ A new routing module enables loading pre-routes from a file.
+ To use this option you should use the command line options:
+ "-R file --U <your routing file>" or
+ "--routing_engine file --ucast_file <your routing file>"
+ For more information refer to the file doc/modular-routing.txt
+ or the opensm man page.
+
+* SA MultiPathRecord support:
+ The SA can now handle requests for multiple PathRecords in one query.
+ This includes methods SA GetMulti/GetMultiResp and dual sided RMPP.
+
+* PPC64 is now QAed and supported
+
+* Support LMC > 0 for Switch Enhanced Port 0:
+ Allows enhanced switch port 0 (ESP0) to have a non zero
+ LMC. Use the configured subnet wide LMC for this. Modifications were
+ necessary to the LID assignment and routing to support this.
+ Also, added an option to the configuration to use LMC configured for
+ subnet for enhanced switch port 0 or set it to 0 even if a non zero
+ LMC is configured for the subnet. The default is currently the
+ latter option. The new configuration option is: lmc_esp0
+
+1.2 Minor New Features:
+
+* IPoIB broadcast group configuration:
+ It is now possible to control the IPoIB broadcast group parameters
+ (MTU, rate, SL) through the partitions configuration file.
+
+* Limiting OpenSM log file size:
+ By providing the command line option: "-L <size in MB>" or
+ "--log_limit <size in MB>" the user can limit the generated log
+ file size. When specified, the log file will be truncated upon reaching
+ this limit.
+
+* Favor 1K MTU for Tavor (MT23108) HCA
+ In cases where a PathRecord or MultiPathRecord is queried and the
+ requestor does not specify the MTU or does specify it in a way
+ that allows for MTU to be 1K and one of the path ends in a Tavor,
+ limit the MTU to 1K max.
+
+* Man pages:
+ Added opensm.8 and osmtest.8
+
+* Leaf VL stall count control:
+ A new parameter (leaf_vl_stall_count) for controlling the number of
+ sequential packets dropped on a switch port driving a HCA/TCA/Router
+ that cause the port to enter the VLStalled state was added to the
+ options file.
+
+* SM Polling/Handover defaults changed
+ The default SMInfo polling retries was decreased from 18 to 4
+ which reduces the default handover time from 3 min to 40 seconds.
+
+1.3 Library API Changes
+
+* cl_mem* APIs deprecated in complib:
+ These functions are now considered as deprecated and should be
+ replaced by direct calls to malloc, free, memset, etc.
+
+* osm_log_init_v2 API added in libopensm:
+ Supports providing the new option for log file truncation.
+
+1.4 Software Dependencies
+
+OpenSM depends on the installation of either OFED 1.1, OFED 1.0,
+OpenIB gen2 (e.g. IBG2 distribution), OpenIB gen1 (e.g. IBGD
+distribution), or Mellanox VAPI stacks. The qualified driver versions
+are provided in Table 2, "Qualified IB Stacks".
+
+1.5 Supported Devices Firmware
+
+The main task of OpenSM is to initialize InfiniBand devices. The
+qualified devices and their corresponding firmware versions
+are listed in Table 3.
+
+2 Known Issues And Limitations
+------------------------------
+
+* No Service / Key associations:
+ There is no way to manage Service access by Keys.
+
+* No SM to SM SMDB synchronization:
+ Puts the burden of re-registering services, multicast groups, and
+ inform-info on the client application (or IB access layer core).
+
+* No "port down" event handling:
+ Changing the switch port through which OpenSM connects to the IB
+ fabric may cause incorrect operation. Please restart OpenSM whenever
+ such a connectivity change is made.
+
+* Changing connections during SM operation:
+ Under some conditions the SM can get confused by a change in
+ cabling (moving a cable from one switch port to the other) and
+ momentarily see this as having the same GUID appear connected
+ to two different IB ports. Under some conditions, when the SM fails to
+ get the corresponding change event it might mistakenly report this case
+ as a "duplicated GUID" case and abort. It is advisable to double-check
+ the syslog after each such change in connectivity and restart
+ OpenSM if it has exited.
+
+3 Unsupported IB Compliance Statements
+--------------------------------------
+The following section lists all the IB compliance statements which
+OpenSM does not support. Please refer to the IB specification for detailed
+information regarding each compliance statement.
+
+* C14-22 (Authentication):
+ M_Key M_KeyProtectBits and M_KeyLeasePeriod shall be set in one
+ SubnSet method. As a work-around, an OpenSM option is provided for
+ defining the protect bits.
+
+* C14-67 (Authentication):
+ On SubnGet(SMInfo) and SubnSet(SMInfo) - if M_Key is not zero then
+ the SM shall generate a SubnGetResp if the M_Key matches, or
+ silently drop the packet if M_Key does not match.
+
+* C15-0.1.23.4 (Authentication):
+ InformInfoRecords shall always be provided with the QPN set to 0,
+ except for the case of a trusted request, in which case the actual
+ subscriber QPN shall be returned.
+
+* o13-17.1.2 (Event-FWD):
+ If no permission to forward, the subscription should be removed and
+ no further forwarding should occur.
+
+* C14-24.1.1.5 and C14-62.1.1.22 (Initialization):
+ GUIDInfo - SM should enable assigning Port GUIDInfo.
+
+* C14-44 (Initialization):
+ If the SM discovers that it is missing an M_Key to update CA/RT/SW,
+ it should notify the higher level.
+
+* C14-62.1.1.12 (Initialization):
+ PortInfo:M_Key - Set the M_Key to a node based random value.
+
+* C14-62.1.1.13 (Initialization):
+ PortInfo:P_KeyProtectBits - set according to an optional policy.
+
+* C14-62.1.1.24 (Initialization):
+ SwitchInfo:DefaultPort - should be configured for random FDB.
+
+* C14-62.1.1.32 (Initialization):
+ RandomForwardingTable should be configured.
+
+* o15-0.1.12 (Multicast):
+ If the JoinState is SendOnlyNonMember = 1 (only), then the endport
+ should join as sender only.
+
+* o15-0.1.8 (Multicast):
+ If a request for creating an MCG with fields that cannot be met,
+ return ERR_REQ_INVALID (currently ignores SL and FlowLabelTClass).
+
+* C15-0.1.8.6 (SA-Query):
+ Respond to SubnAdmGetTraceTable - this is an optional attribute.
+
+* C15-0.1.13 Services:
+ Reject ServiceRecord create, modify or delete if the given
+ ServiceP_Key does not match the one included in the ServiceGID port
+ and the port that sent the request.
+
+* C15-0.1.14 (Services):
+ Provide means to associate service name and ServiceKeys.
+
+4 Major Bug Fixes
+-----------------
+
+The following is a list of bugs that were fixed. Note that other less critical
+or visible bugs were also fixed.
+
+* "Broken" fabric (duplicated port GUIDs) handling improved
+ Replace assert with a real check to handle invalid physical port
+ in osm_node_info_rcv.c which could occur on a broken fabric
+
+* SA client synchronous request failed but status returned was IB_SUCCESS
+ even if there was no response.
+ There was a missing setting of the status in the synchronous case.
+
+* Memory leak fixes:
+ 1. In libvendor/osm_vendor_ibumad.c:osm_vendor_get_all_port_attr
+ 2. In libvendor/osm_vendor_ibumad_sa.c:__osmv_sa_mad_rcv_cb
+ 3. On receiving SMInfo SA request from a node that does not share a
+ partition, the response mad was allocated but never free'd
+ as it was never sent.
+
+* Set(InformInfo) OpenSM Deadlock:
+ When receiving a request with unknown LID
+
+* PathRecord to inconsistent multicast destination:
+ Fix the return error when multicast destination is not consistently
+ indicated.
+
+* Remove double calculation of reversible path
+ In osm_sa_path_record.c:__osm_pr_rcv_get_lid_pair_path a PathRecord
+ query used to double check if the path is reversible
+
+* Some PathRecord log messages use "net order":
+ Fix GUID net to host conversion in some osm_log messages
+
+* DR/LID routed SMPs direction bit handling:
+ osm_resp.c:osm_resp_make_resp_smp, set direction bit only if direct
+ routed class. This bug caused two issues:
+ 1. Get/Set responses always had direction bit set.
+ 2. Trap represses never had direction bit set.
+ The direction bit needs setting in direct routed responses and it
+ doesn't exist in LID routed responses.
+ osm_sm_mad_ctrl.c: did not detect the "direction bit" correctly.
+
+* OpenSM crash due to transaction lookup (interop with Cisco stack)
+ When a wire TID that maps to internal TID of zero (after applying
+ mask) was received the lookup of the transaction was successful.
+ The stale transaction pointed to "free'd" memory.
+
+* Better handling for Path/MultiPath requests for raw traffic
+
+* Wrong ProducerType provided in Notice Reports:
+ When formating an SM generated report, the ProducerType was using
+ CL_NTOH32 which can not be used to format a 24bit network order number.
+
+* OpenSM break on PPC64
+ complib: Fixed memory corruption in cl_pool.c:cl_qcpool_init. This
+ affected big endian 64-bit architectures only.
+
+* Illegal Set(InformInfo) was wrongly successful in updating the SMDB
+ osm_sa_informinfo.c: In osm_infr_rcv_process_set_method, if sending
+ error, don't call osm_infr_rcv_process_set_method
+
+* RMPP queries of InformInfoRecord fail
+ ib_types.h: Pad ib_inform_info_record_t to be modulo 8 in size so
+ that attribute offset is calculated properly
+
+* Returning "invalid request" rather than "unsupported method/attribute"
+ In these cases, a noncompliant response was being provided.
+
+* Noncompliant response for SubnAdmGet(PortInfoRecord) with no match
+ osm_pir_rcv_process, now returns "SA no records error" for SubnAdmGet
+ with 0 records found
+
+* Noncompliant non base LID returned by some queries:
+ The following attributes used to return the request LID rather than
+ its base LID in responses: PKeyTableRecord, GUIDInfoRecord,
+ SLtoVLMappingTableRecord, VLArbitrationTableRecord, LinkRecord
+
+* Noncompliant SubnAdmGet and SubnAdmGetTable:
+ Mixing of error codes in case of no records or multiple records
+ fixed for the attributes:
+ LinearForwardingTableRecord, GUIDInfoRecord,
+ VLArbitrationTableRecord, LinkRecord, PathRecord
+
+* segfault in InformInfo flows
+ Under stress concurrent Set/Delete/Get flows. Fixed by adding
+ missing lock.
+
+* SA queries containing LID out if range did not return ERR_REQ_INVALID
+
+5 Main Verification Flows
+-------------------------
+
+OpenSM verification is run using the following activities:
+* osmtest - a stand-alone program
+* ibmgtsim (IB management simulator) based - a set of flows that
+ simulate clusters, inject errors and verify OpenSM capability to
+ respond and bring up the network correctly.
+* small cluster regression testing - where the SM is used on back to
+ back or single switch configurations. The regression includes
+ multiple OpenSM dedicated tests.
+* cluster testing - when we run OpenSM to setup a large cluster, perform
+ hand-off, reboots and reconnects, verify routing correctness and SA
+ responsiveness at the ULP level (IPoIB and SDP).
+
+5.1 osmtest
+
+osmtest is an automated verification tool used for OpenSM
+testing. Its verification flows are described by list below.
+
+* Inventory File: Obtain and verify all port info, node info, link and path
+ records parameters.
+
+* Service Record:
+ - Register new service
+ - Register another service (with a lease period)
+ - Register another service (with service p_key set to zero)
+ - Get all services by name
+ - Delete the first service
+ - Delete the third service
+ - Added bad flows of get/delete non valid service
+ - Add / Get same service with different data
+ - Add / Get / Delete by different component mask values (services
+ by Name & Key / Name & Data / Name & Id / Id only )
+
+* Multicast Member Record:
+ - Query of existing Groups (IPoIB)
+ - BAD Join with insufficient comp mask (o15.0.1.3)
+ - Create given MGID=0 (o15.0.1.4)
+ - Create given MGID=0xFF12A01C,FE800000,00000000,12345678 (o15.0.1.4)
+ - Create BAD MGID=0xFA. (o15.0.1.6)
+ - Create BAD MGID=0xFF12A01B w/ link-local not set (o15.0.1.6)
+ - New MGID with invalid join state (o15.0.1.9)
+ - Retry of existing MGID - See JoinState update (o15.0.1.11)
+ - BAD RATE when connecting to existing MGID (o15.0.1.13)
+ - Partial JoinState delete request - removing FullMember (o15.0.1.14)
+ - Full Delete of a group (o15.0.1.14)
+ - Verify Delete by trying to Join deleted group (o15.0.1.14)
+ - BAD Delete of IPoIB membership (no prev join) (o15.0.1.15)
+
+* GUIDInfo Record:
+ - All GUIDInfoRecords in subnet are obtained
+
+* MultiPathRecord:
+ - Perform some compliant and noncompliant MultiPathRecord requests
+ - Validation is via status in responses and IB analyzer
+
+* PKeyTableRecord:
+ - Perform some compliant and noncompliant PKeyTableRecord queries
+ - Validation is via status in responses and IB analyzer
+
+* LinearForwardingTableRecord:
+ - Perform some compliant and noncompliant LinearForwardingTableRecord queries
+ - Validation is via status in responses and IB analyzer
+
+* Event Forwarding: Register for trap forwarding using reports
+ - Send a trap and wait for report
+ - Unregister non-existing
+
+* Trap 64/65 Flow: Register to Trap 64-65, create traps (by
+ disconnecting/connecting ports) and wait for report, then unregister.
+
+* Stress Test: send PortInfoRecord queries, both single and RMPP and
+ check for the rate of responses as well as their validity.
+
+
+5.2 IB Management Simulator OpenSM Test Flows:
+
+The simulator provides ability to simulate the SM handling of virtual
+topologies that are not limited to actual lab equipment availability.
+OpenSM was simulated to bring up clusters of up to 10,000 nodes. Daily
+regressions use smaller (16 and 128 nodes clusters).
+
+The following test flows are run on the IB management simulator:
+
+* Stability:
+ Up to 12 links from the fabric are randomly selected to drop packets
+ at drop rates up to 90%. The SM is required to succeed in bringing the
+ fabric up. The resulting routing is verified to be correct as well.
+
+* LID Manager:
+ Using LMC = 2 the fabric is initialized with LIDs. Faults such as
+ zero LID, Duplicated LID, non-aligned (to LMC) LIDs are
+ randomly assigned to various nodes and other errors are randomly
+ output to the guid2lid cache file. The SM sweep is run 5 times and
+ after each iteration a complete verification is made to ensure that all
+ LIDs that could possibly be maintained are kept, as well as that all nodes
+ were assigned a legal LID range.
+
+* Multicast Routing:
+ Nodes randomly join the 0xc000 group and eventually the
+ resulting routing is verified for completeness and adherence to
+ Up/Down routing rules.
+
+* osmtest:
+ The complete osmtest flow as described in the previous table is run on
+ the simulated fabrics.
+
+* Stress Test:
+ This flow merges fabric, LID and stability issues with continuous
+ PathRecord, ServiceRecord and Multicast Join/Leave activity to
+ stress the SM/SA during continuous sweeps. InformInfo Set/Delete/Get
+ were added to the test such both existing and non existing nodes
+ perform them in random order.
+
+5.3 OpenSM Regression
+
+Using a back-to-back or single switch connection, the following set of
+tests is run nightly on the stacks described in table 2. The included
+tests are:
+
+* Stress Testing: Flood the SA with queries from multiple channel
+ adapters to check the robustness of the entire stack up to the SA.
+
+* Dynamic Changes: Dynamic Topology changes, through randomly
+ dropping SMP packets, used to test OpenSM adaptation to an unstable
+ network & verify DB correctness.
+
+* Trap Injection: This flow injects traps to the SM and verifies that it
+ handles them gracefully.
+
+* SA Query Test: This test exhaustively checks the SA responses to all
+ possible single component mask. To do that the test examines the
+ entire set of records the SA can provide, classifies them by their
+ field values and then selects every field (using component mask and a
+ value) and verifies that the response matches the expected set of records.
+ A random selection using multiple component mask bits is also performed.
+
+5.4 Cluster testing:
+
+Cluster testing is usually run before a distribution release. It
+involves real hardware setups of 16 to 32 nodes (or more if a beta site
+is available). Each test is validated by running all-to-all ping through the IB
+interface. The test procedure includes:
+
+* Cluster bringup
+
+* Hand-off between 2 or 3 SM's while performing:
+ - Node reboots
+ - Switch power cycles (disconnecting the SM's)
+
+* Unresponsive port detection and recovery
+
+* osmtest from multiple nodes
+
+* Trap injection and recovery
+
+
+6 Qualification
+----------------
+
+Table 2 - Qualified IB Stacks
+=============================
+
+Stack | Version
+-----------------------------------------|--------------------------
+OFED | 1.1
+OFED | 1.0
+OpenIB Gen2 (IBG2 distribution) | 1.0
+OpenIB Gen1 (IBGD distribution) | 1.8.0
+VAPI (Mellanox InfiniBand HCA Driver) | 3.2 and later
+
+Table 3 - Qualified Devices and Corresponding Firmware
+======================================================
+
+Mellanox
+Device | FW versions
+--------|-----------------------------------------------------------
+MT43132 | InfiniScale - fw-43132 5.2.0 (and later)
+MT47396 | InfiniScale III - fw-47396 0.5.0 (and later)
+MT23108 | InfiniHost - fw-23108 3.3.2
+MT25204 | InfiniHost III Lx - fw-25204 1.0.1
+MT25208 | InfiniHost III Ex (InfiniHost Mode) - fw-25208 4.6.2 (and later)
+MT25208 | InfiniHost III Ex (MemFree Mode) - fw-25218 5.0.1 (and later)
+
+QLogic/PathScale
+Device | Note
+--------|-----------------------------------------------------------
+iPath | QHT6040 (PathScale InfiniPath HT-460)
+iPath | QHT6140 (PathScale InfiniPath HT-465)
+iPath | QLE6140 (PathScale InfiniPath PE-880)
+
+Note: OpenSM does not run on an IBM Galaxy (eHCA) as it does not expose
+QP0 and QP1. However, it does support it as a device on the subnet.
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ OSU MPI MVAPICH-0.9.7, Rev 0.9.7-mlx2.2.0 in OFED 1.1 Release Notes
+
+ October 2006
+
+
+Overview
+--------
+These are the release notes for OSU MPI MVAPICH-0.9.7, Rev 0.9.7-mlx2.2.0
+This is OFED's edition of the OSU MPI MVAPICH-0.9.7 release. OSU MPI is an MPI
+channel implementation over InfiniBand from Ohio State University (OSU)
+(http://nowlab.cse.ohio-state.edu/projects/mpi-iba/).
+
+Software Dependencies
+---------------------
+OSU MPI depends on the installation of the OFED Distribution stack with OpenSM
+running. The MPI module also requires an established network interface (either
+InfiniBand IPoIB or Ethernet).
+
+New Features
+------------
+The mlx2.2.0 module is based on the MVAPICH-0.9.7 (MPI-1 over OpenIB/Gen2) module at
+openib.org gen2. This version for OFED has the following additional features:
+- Message coalescing
+- SRQ flow optimization
+
+Bug Fixes
+---------
+- Affinity support is now enabled by default
+- Multiple fixes in rsh/ssh launcher
+- LD_LIBRARY_PATH fix (MLNX #37387)
+- TotalView scalability fix
+- FastPath is enabled on start for small clusters
+- Fix for correct f90 support (openib bugzilla 191)
+- Fix for comment support in mpirun_rsh (openib bugzilla 143)
+
+Known Issues
+------------
+- A process running MPI cannot fork after MPI_Init. Using fork might cause a
+ segmentation fault.
+- Using mpirun with ssh has a signal collection problem. Killing the run
+ (using CTRL-C) might leave some of the processes running on some of the
+ nodes. This can also happen if one of the processes exits with an error.
+ Note: This problem does not exist with rsh.
+- The MPD job launcher feature of OSU MPI module has not been tested by Mellanox
+ Technologies. See http://nowlab.cse.ohio-state.edu/projects/mpi-iba/ for more
+ details.
+- For users of Mellanox Technologies firmware fw-23108 or fw-25208 only:
+ OSU MPI might fail in its default configuration if your HCA is burnt with an
+ fw-23108 version that is earlier than 3.4.000, or with an fw-25208 version
+ 4.7.400 or earlier.
+ Workaround:
+ Option 1 - Update the firmware
+ Option 2 - In mvapich.conf, set VIADEV_SRQ_ENABLE=0
+- MVAPICH does not run on RHEL4 U3 ppc64
+- MVAPICH may fail to run on some SLES 10 machines due to problems in resolving
+ the host name.
+ Workaround: Edit /etc/hosts and comment-out/remove the line that maps
+ IP address 127.0.0.2 to the system's fully qualified hostname.
+
+Main Verification Flows
+-----------------------
+In order to verify the correctness of OSU MPI, the following tests and
+parameters were run.
+
+Test Description
+===================================================================
+Intel's test suite 1400 Intel tests
+BW/LT OSU's test for bandwidth latency
+IMB Intel's MPI Benchmark test
+mpitest b_eff test
+Presta Presta multicast test
+Linpack Linpack benchmark
+NAS2.3 NAS NPB2.3 tests
+SuperLU SuperLU benchmark (NERSC edition)
+NAMD NAMD application
+CAM CAM application
--- /dev/null
+ Open Fabrics Enterprise Distribution (OFED)
+ SDP in OFED 1.1 Release Notes
+
+ October 2006
+
+
+
+===============================================================================
+Table of Contents
+===============================================================================
+1. Overview
+2. Bug Fixes
+3. Known Issues
+4. Verification Applications/Flows/Tests
+
+===============================================================================
+1. Overview
+===============================================================================
+SDP in OFED is at beta level for OFED 1.1.
+
+
+===============================================================================
+2. Bug Fixes
+===============================================================================
+* SDP now disables timewait on close if the socket has been disconnected
+
+* SDP now reports EPIPE if a packet gets queued after disconnect
+
+* Improved urgent data latency
+
+* Fixed data corruption upon changing the TCP_NODELAY socket option
+
+* Fixed a crash that occurs when a child is disconnected while its parent is
+ being destroyed.
+
+* SDP now recovers from RTU packet loss.
+
+
+===============================================================================
+3. Known Issues
+===============================================================================
+- Each SDP socket currently consumes up to 2 MBytes of memory. If this value
+ is high for your installation, it is possible to trade off performance
+ for lower memory utilization per socket by reducing the value of the
+ "rcvbuf_scale" module parameter (default: 16).
+
+ Note: the minimum legal value for this parameter is 1.
+ At this parameter value, each socket will consume approximately 128 KBytes.
+
+- Small message size performance is low when messages are sent by client
+ at a rate lower than the rate at which they are consumed by server,
+ and when TCP_CORK is not set. This is observed, for example, with iperf
+ benchmark. As a workaround, set the TCP_CORK socket option
+ to ensure data is sent in at least 32K byte chunks.
+
+- Performance is low on 32-bit kernels, as SDP utilizes high memory
+ to ease memory pressure. Moving to a 64-bit kernel solves this
+ problem even if the application remains a 32-bit one.
+
+- By default, SDP utilizes a 2 Kbyte MTU size. This may cause PCI-X cards
+ using Mellanox Technologies "Infinihost" HCAs to experience low bandwidth.
+ Workaround: reset the MTU size to 1K in this situation, using either of
+ the two methods below:
+
+ 1. Activate the "tavor quirk" workaround in opensm:
+ a. Create an opensm options cache file (/var/cache/osm/opensm.opts):
+ > opensm --cache-options -o
+ b. Add the following line to /var/cache/osm/opensm.opts:
+ enable_quirks TRUE
+ c. Rerun opensm using your usual command line options to activate
+ the opensm quirk option.
+
+ 2. Activate the "tavor quirk" workaround in cma:
+ set the tavor_quirk module parameter of the rdma_cm module to value 1
+ (default: 0).
+
+
+===============================================================================
+4. Verification Applications/Flows/Tests
+===============================================================================
+- ssh/sshd
+- wget/netscape/firefox/apache
+- netpipe
+- netperf
+- LTP socket tests
+- iperf-2.0.2
+- ttcp
+- Threaded and forking echo client server examples
+- Various Java client server applications (SUN:jre, BEA:jrockit/WebLogic, GNU:gij/gcj)
+- Many UNIX utilities to verify that pre-load did not harm the applications
+
+
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ Open Fabrics Enterprise Distribution (OFED)
+ libsdp v. 9382 in OFED 1.1 Release Notes
+
+ October 2006
+
+
+===============================================================================
+Table of Contents
+===============================================================================
+1. Overview
+2. New Features
+3. Bug Fixes
+4. Known Issues
+5. Verification Applications/Flows/Tests
+
+===============================================================================
+1. Overview
+===============================================================================
+This document describes the contents of the libsdp OFED 1.1 release.
+libsdp is a LD_PRELOAD-able library that can be used to migrate existing
+applications to use InfiniBand Sockets Direct Protocol (SDP) instead of
+TCP sockets, transparently and without recompilations. To setup libsdp
+please follow the instructions below. The libsdp version for this release
+is 1.1.
+
+
+===============================================================================
+2. New Features
+===============================================================================
+* New verbosity level "7" used for reporting connect/accept calls
+ and the address family used. This results in a reasonably short
+ log file that shows which connections used SDP and which ones used TCP.
+
+
+===============================================================================
+3 Bug Fixes
+===============================================================================
+The following list of bugs were fixed. Note that other less critical
+or visible bugs were also fixed.
+
+* Some applications provide IPv6 address in partial struct (missing the newer
+ scope_id field). The fix avoids libsdp memory corruption in this case.
+
+* Fixed address conversion bug in loopback address and also on all IPv4->IPv6
+ (missed the extra 0xffff to the IPv6).
+
+* listen - had a missing flow for handling implicit bind. This caused the
+ "use both" case to provide two different/unrelated ports for the SDP and
+ TCP ports. Eventually, this caused the SDP port to be unusable (since the
+ client was only obtaining the TCP port number). Fixed by adding a flow similar
+ to the one used by bind with ANY_PORT.
+
+* Several bugs in using getsockname()/getpeername() which prevented the correct
+ address length from being returned when SDP-provided IPv4 addresses had to be
+ converted back to IPv6.
+
+* Fixed memory corruption caused by using struct sockaddr to store IPv6 address.
+ sockaddr_storage is used instead.
+
+* Accept now handles a null output address pointer.
+
+* errno was corrupted by call to is_invalid_addr, reporting false errno to
+ applications.
+
+* Fixed socket leak in the flow for bind(ANY_PORT)
+
+* libsdp now avoids log errors on connect() using async mode returning -1 when
+ errno == EINPROGRESS
+
+===============================================================================
+4. Known Issues
+===============================================================================
+* libsdp cannot provide its socket switch functionality for executables
+ statically linked with libc.
+
+* When using server to listen on both SDP and TCP, the number of sockets is
+ doubled.
+
+* A rare race still exists when performing bind/listen on ANY_PORT. The race
+ is between applications and has been greatly minimized. A test to reproduce it
+ has not been found yet. The race is between libsdp running the sequence
+ close(fd1) and bind(fd2, port), and another application/thread explicitly
+ trying to bind(fd3, port) to the same port.
+
+ To resolve this race a change in SDP/CMA behavior is required (provide
+ different port number in successive calls to bind (ANY_PORT) and SDP support
+ for "unbind").
+
+
+===============================================================================
+5. Verification Applications/Flows/Tests
+===============================================================================
+See the corresponding section in the SDP release notes above.
+
--- /dev/null
+
+ Open Fabrics Enterprise Distribution (OFED)
+ SRP in OFED 1.1 Release Notes
+
+ October 2006
+
+
+==============================================================================
+Table of contents
+==============================================================================
+
+ 1. Overview
+ 2. Software Dependencies
+ 3. Major Features
+ 4. Loading SRP Initiator
+ 5. Manually Establishing an SRP Connection
+ 6. SRP Tools - ibsrpdm and srp_daemon
+ 7. Automatic Discovery and Connecting to Targets
+ 8. Multiple Connections from Initiator IB Port to the Target
+ 9. High Availability
+ 10. Shutting Down SRP
+ 11. Known Issues
+ 12. Vendor Specific Notes
+
+
+==============================================================================
+1. Overview
+==============================================================================
+
+The SRP standard describes the message format and protocol definitions required
+for transferring commands and data between a SCSI initiator port and a SCSI
+target port using RDMA communication service.
+
+
+==============================================================================
+2. Software Dependencies
+==============================================================================
+
+The SRP Initiator depends on the installation of the OFED Distribution stack
+with OpenSM running.
+
+==============================================================================
+3. Major Features
+==============================================================================
+
+This SRP Initiator is based on source taken from openib.org gen2 implementing
+the SCSI RDMA Protocol-2 (SRP-2), Doc. no. T10/1524-D. See:
+www.t10.org/ftp/t10/drafts/srp2/srp2r00a.pdf
+
+The SRP Initiator supports:
+- Basic SCSI Primary Commands -3 (SPC-3)
+ (www.t10.org/ftp/t10/drafts/spc3/spc3r21b.pdf)
+- Basic SCSI Block Commands -2 (SBC-2)
+ (www.t10.org/ftp/t10/drafts/sbc2/sbc2r16.pdf)
+- Basic functionalities, task management and limited error handling
+
+==============================================================================
+4. Loading SRP Initiator
+==============================================================================
+
+To load the SRP module, either execute the "modprobe ib_srp" command after the
+OFED driver is up, or change the value of SRP_LOAD in
+/etc/infiniband/openib.conf to "yes" (causing the srp module to be loaded
+at driver boot).
+
+NOTE: When loading the ib_srp module, it is possible to set the module
+ parameter srp_sg_tablesize. This is the maximum number of
+ gather/scatter entries per I/O (default: 12).
+
+
+==============================================================================
+5. Manually Establishing an SRP Connection
+==============================================================================
+
+The following steps describe how to manually load an SRP connection between
+the Initiator and an SRP Target. Section 7 explains how to do this
+automatically.
+
+- Make sure that the ib_srp module is loaded, the SRP Initiator is reachable
+ by the SRP Target, and that an SM is running.
+
+- To establish a connection with an SRP Target and create SRP (SCSI) device(s)
+ for that target under /dev, use the following command:
+
+ echo id_ext=[GUID value],ioc_guid=[GUID value],dgid=[port GID value],\
+ pkey=ffff,service_id=[service[0] value] > \
+ /sys/class/infiniband_srp/srp-mthca[hca number]-[port number]/add_target
+
+ Notes:
+ a. Execution of the above "echo" command may take some time
+ b. The SM must be running while the command executes
+ c. It is possible to include additional parameters in the echo command:
+ > max_cmd_per_lun - Default: 63
+ > max_sect (short for max_sectors) - sets the request size of a command
+ > io_class - Default: 0x100 as in rev 16A of the specification
+ Note: In rev 10 the default was 0xff00
+ > initiator_ext - Please refer to Section 8 (Multiple Connections...)
+ d. See SRP Tools below for instructions on how the parameters in the
+ echo command above may be obtained.
+
+- To list the new SCSI devices that have been added by the echo command, you
+ may use either of the following two methods:
+ a. Execute "fdisk -l". This commands lists all devices; the new devices are
+ included in this listing.
+ b. Execute "dmesg" or look at /var/log/messages to find messages with the names
+ of the new devices.
+
+
+==============================================================================
+6. SRP Tools - ibsrpdm and srp_daemon
+==============================================================================
+
+To assist in performing the steps in Section 5, the OFED 1.1 distribution
+provides two utilities which:
+- Detect targets on the fabric reachable by the Initiator (for step 1)
+- Output target attributes in a format suitable for use in the above
+ "echo" command (step 2)
+
+These utilities are: ibsrpdm and srp_daemon.
+
+The utilities can be found under /usr/local/ofed/sbin/ (or <prefix>/sbin/),
+and are part of the srptools RPM that may be installed using the
+OFED custom installation. Detailed information regarding the various
+options for these utilities are provided by their man pages.
+
+Below, several usage scenarios for these utilities are presented.
+
+ibsrpdm usage
+-------------
+1. Detecting reachable targets
+
+ a. To detect all targets reachable by the SRP initiator via the default
+ umad device (/dev/umad0), execute the following command:
+ > ibsrpdm
+
+ This command will output information on each SRP target detected, in
+ human-readable form.
+
+ Sample output:
+ IO Unit Info:
+ port LID: 0103
+ port GID: fe800000000000000002c90200402bd5
+ change ID: 0002
+ max controllers: 0x10
+
+ controller[ 1]
+ GUID: 0002c90200402bd4
+ vendor ID: 0002c9
+ device ID: 005a44
+ IO class : 0100
+ ID: LSI Storage Systems SRP Driver 200400a0b81146a1
+ service entries: 1
+ service[ 0]: 200400a0b81146a1 / SRP.T10:200400A0B81146A1
+
+ b. To detect all the SRP Targets reachable by the SRP Initiator via
+ another umad device, use the following command:
+
+ > ibsrpdm -d <umad device>
+
+2. Assistance in creating an SRP connection
+
+ a. To generate output suitable for utilization in the "echo" command of
+ section 5, add the "-c" option to ibsrpdm:
+
+ >ibsrpdm -c
+
+ Sample output:
+ id_ext=200400A0B81146A1,ioc_guid=0002c90200402bd4,
+ dgid=fe800000000000000002c90200402bd5,pkey=ffff,service_id=200400a0b81146a1
+
+ b. To establish a connection with an SRP Target (Section 5) using the output
+ from the "libsrpdm -c" example above, execute the following command:
+
+ echo id_ext=200400A0B81146A1,ioc_guid=0002c90200402bd4,
+ dgid=fe800000000000000002c90200402bd5,pkey=ffff,service_id=200400a0b81146a1
+ > /sys/class/infiniband_srp/srp-mthca0-1/add_target
+
+ The SRP connection should now be up; the newly created SCSI devices should appear
+ in the listing obtained from the "fdisk -l" command.
+
+srp_daemon
+----------
+The srp_daemon utility is based on ibsrpdm and extends its functionality.
+In addition to the ibsrpdm functionality described above, srp_daemon can also:
+- Establish an SRP connection by itself (without the need to issue the "echo"
+ command described in Section 5)
+- Continue running in background, detecting new targets and establishing SRP
+ connections with them (daemon mode)
+- Discover reachable SRP targets given an infiniband HCA name and port, rather
+ than just by /dev/umad<N> where <N> is a digit
+- Enable High Availability operation (together with Device-Mapper Multipath)
+
+a. srp_daemon commands equivalent to ibsrpdm:
+
+ "srp_daemon -a -o" is equivalent to "srp_daemon -a -o"
+ "srp_daemon -c -a -o" is equivalent to "ibsrpdm -c"
+
+b. srp_daemon extensions to ibsrpdm
+
+ - To discover SRP Targets reachable from HCA device <infiniband HCA name>,
+ port <port num>, (and generate output suitable for 'echo') you may execute
+
+ srp_daemon -c -a -o -i <infiniband HCA name> -p <port number>
+
+ - To both discover the SRP Targets and establish connections with them, just
+ add the -e option to the above command.
+
+ - Executing srp_daemon without -a option will display only the reachable
+ Targets to which the initiator is not connected (via the port upon which
+ srp_daemon was activated).
+
+ - Continuous background (daemon) operation, providing automatic ongoing
+ detection and connection capability -- see the next section.
+
+==============================================================================
+7. Automatic Discovery and Connecting to Targets
+==============================================================================
+
+- Make sure that the ib_srp module is loaded, the SRP Initiator can reach an
+ SRP Target, and that an SM is running.
+
+- To connect to all the existing Targets in the fabric, execute
+ srp_daemon -e -o. This utility will scan the fabric once, connect to
+ every Target it detects, and then exit.
+
+- To connect to all the existing Targets in the fabric and to connect
+ to new targets that will join the fabric, execute srp_daemon -e. This utility
+ continues to execute until it is either killed by the user or encounters
+ connection errors (such as no SM in the fabric).
+
+- To execute SRP daemon as a daemon you may execute run_srp_daemon
+ (found under /usr/local/ofed/sbin/ or <prefix>/sbin/), providing it with
+ the same options used for running srp_daemon.
+
+ Note: Make sure only one instance of run_srp_daemon runs per port.
+
+- To execute SRP daemon as a daemon on all the ports, execute srp_daemon.sh
+ (found under /usr/local/ofed/sbin/ or <prefix>/sbin/).
+ srp_daemon.sh sends its log to /var/log/srp_daemon.log.
+
+- It is possible to configure this script to execute automatically when the
+ InfiniBand driver starts by changing the value of SRPHA_ENABLE in
+ /etc/infiniband/openib.conf to "yes". However, this option also enables
+ SRP High Availability that has some more features. (Please read the High
+ Availability section).
+
+==============================================================================
+8. Multiple Connections from Initiator IB Port to the Target
+==============================================================================
+
+Some system configurations may need multiple SRP connections from
+the SRP Initiator to the same SRP Target: to the same Target IB port,
+or to different IB ports on the same Target HCA.
+
+In case of a single Target IB port, i.e., SRP connections use the same path,
+the configuration is enabled using a different initiator_ext value for each
+SRP connection. The initiator_ext value is a 16-hexadecimal-digit value
+specified in the connection command.
+
+Also in case of two physical connections (i.e., network paths) from a single
+initiator IB port to two different IB ports on the same Target HCA, there is
+need for a different initiator_ext value on each path. The conventions is to
+use the Target port GUID as the initiator_ext value for the relevant path.
+
+If you use srp_daemon with -n flag, it automatically assigns initiator_ext
+values according to this convention. For example:
+
+ id_ext=200500A0B81146A1,ioc_guid=0002c90200402bec,dgid=fe800000000000000002c90200402bed,\
+ pkey=ffff,service_id=200500a0b81146a1,initiator_ext=ed2b400002c90200
+
+ Notes:
+ a. It is recommended to use the -n flag for all srp_daemon invocations.
+ b. ibsrpdm does not have a corresponding option.
+ c. srp_daemon.sh always uses the -n option (whether invoked manually by
+ the user, or automatically at startup by setting SRPHA_ENABLE to yes).
+
+==============================================================================
+9. High Availability (HA)
+==============================================================================
+
+ Note: This is a Beta release of the High Availability feature for the
+ SCSI RDMA Protocol (SRP) Initiator.
+ It is intended for development use, not as a complete product.
+
+High Availability Overview
+--------------------------
+
+High Availability works using the Device-Mapper (DM) multipath and the
+SRP daemon.
+
+Each initiator is connected to the same target from several ports/HCAs.
+The DM multipath is responsible for joining together different paths to the
+same target and for fail-over between paths when one of them goes offline.
+Rules were added to udev that will execute multipath on newly joined SCSI
+devices.
+
+Each initiator should execute several instances of the SRP daemon, one for each
+port. At startup, each SRP daemon detects the SRP targets in the fabric and
+sends requests to the ib_srp module to connect to each of them. These
+SRP daemons also detect targets that subsequently join the fabric, and send the
+ib_srp module requests to connect to them as well.
+
+High Availability Operation
+---------------------------
+
+When a path (from port1) to a target fails, the ib_srp module starts an error
+recovery process. If this process gets to the reset_host stage and there is no
+path to the target from this port, ib_srp will remove this scsi_host. After
+the scsi_host is removed, multipath switches to another path to this target
+(from another port/HCA).
+
+When the failed path recovers, it will be detected by the SRP daemon. The SRP
+daemon will then request ib_srp to connect to this target. Once the connection
+is up, there will be a new scsi_host for this target. The udev rule will then
+execute multipath on the devices of this host, and we will return to the
+original state (before the path failed).
+
+High Availability Prerequisites
+-------------------------------
+
+Installation: (Execute once)
+- Verify that multipath is installed. If not, it is possible to download it
+ from http://christophe.varoqui.free.fr/multipath-tools/multipath-tools-0.4.7.tar.bz2
+ and then compile and install it.
+
+- Update udev: (Execute once - for manual activation of High Availability only)
+
+- Add a file to /etc/udev/rules.d/ (you can call it 91-srp.rules)
+ This file should have one line:
+ ACTION=="add", KERNEL=="sd*[!0-9]", RUN+="/sbin/multipath %M:%m"
+
+Note that when SRPHA_ENABLE is set to "yes" (see below in Automatic activation
+of High Availability subsection), this file is created upon each boot of
+the driver and deleted when the driver is unloaded.
+
+Manual Activation of High Availability
+--------------------------------------
+
+Initialization: (Execute after each boot of the driver)
+ 1) Execute modprobe dm-multipath
+ 2) Execute modprobe ib-srp
+ 3) Make sure you have created file /etc/udev/rules.d/91-srp.rules
+ as described above
+ 4) Execute for each port and each HCA:
+ srp_daemon -c -e -R 300 -i <InfiniBand HCA name> -p <port number>
+ (You can use another value for -R. See in Known Issues section the
+ workaround for the rare race condition.)
+
+ This step can be performed by executing srp_daemon.sh, which sends
+ its log to /var/log/srp_daemon.log.
+
+ Now it is possible to access the SRP LUNs on /dev/mapper/.
+
+ NOTE: It is possible that regular (not SRP) LUNs may also be present;
+ the SRP LUNs may be identified by their name.
+
+
+Automatic Activation of High Availability
+-----------------------------------------
+- Set the value of SRPHA_ENABLE in /etc/infiniband/openib.conf to "yes".
+
+- From the next loading of the driver it will be possible to access the SRP
+ LUNs on /dev/mapper/
+ NOTE: It is possible that regular (not SRP) LUNs may also be present;
+ the SRP LUNs may be identified by their name.
+
+- It is possible to see the output of the SRP daemon in /var/log/srp_daemon.log
+
+
+==============================================================================
+10. Shutting Down SRP
+==============================================================================
+
+SRP can be shutdown by using "rmmod ib_srp", or by stopping the OFED driver
+("/etc/init.d/openibd stop"), or as a by-product of a complete system shutdown.
+
+Prior to shutting down SRP, remove all references to it. The actions you need
+to take depend on the way SRP was loaded. There are three cases.
+
+a. Without High Availability
+------------------------------------
+When working without High Availability, you should unmount the SRP
+partitions that were mounted prior to shutting down SRP.
+
+
+b. After Manual Activation of High Availability
+-----------------------------------------------
+If you manually activated SRP High Availability, perform the following steps:
+1) Unmount all SRP partitions that were mounted
+2) Kill the SRP daemon instances
+3) Make sure there are no multipath instances running. If there are multiple
+ instances, wait for them to end or kill them.
+4) Execute multipath -F
+
+
+c. After Automatic Activation of High Availability
+--------------------------------------------------
+If SRP High Availability was automatically activated, SRP shutdown must be
+part of the driver shutdown ("/etc/init.d/openibd stop") which performs
+steps 2-4 of case b above. However, you still have to unmount all SRP
+partitions that were mounted before driver shutdown.
+
+
+HAL Issue
+---------
+The HAL (Hardware Abstraction Layer) system includes a daemon that examines
+all devices in the system. In this process, it frequently holds a reference
+to the ib_srp module. If you attempt to shutdown SRP while this daemon is
+holding a reference to ib_srp, the shutdown will fail. Therefore, you
+should make sure this will not occur. One solution may be to stop "haldaemon"
+(/etc/init.d/haldaemon stop) prior to SRP shutdown.
+
+
+==============================================================================
+11. Known Issues
+==============================================================================
+
+- SRP is not supported on a 32-bit operating system running on a 64-bit
+ platform.
+
+- The SCSI device is sent offline when a link goes down for several seconds,
+ when the subnet manager goes down for a long time, or when a disk is removed
+ from a target during run-time.
+
+- There is a very rare race condition which can cause the SRP daemon to miss a
+ target that joins the fabric. The race can occur if a target that left the
+ fabric rejoins it after the ib_srp module has decided to remove this target,
+ but before the scsi_host has been removed. As a result, when the SRP daemon
+ checks if this target is already connected, it will receive a positive
+ response and will therefore not reconnect to this target.
+
+ Workaround: Execute the srp_daemon command with the -R <sec> option. This
+ option causes the SRP daemon to perform a full rescan of the fabric every
+ <sec> seconds.
+
+- It is recommended to use an SM that supports the enhanced capability mask
+ matching feature (errata MGTWG8372). With SMs which support this feature, the
+ SRP daemon generates significantly less communication traffic.
+
+- When booting OFED with SRP High Availability enabled, executing multipath for
+ all LUNs on all connections may take some time (several minutes). However, it
+ is possible to start working while this process is in progress.
+
+- If SRP High Availability is enabled, disconnections while OFED is booting, or
+ simultaneous disconnections and connections during normal operation, may lead
+ to what seems as a deadlock between multipath instances.
+
+- High Availability uses multipath which needs at least udev version 050. The
+ RHEL4 distribution uses udev 039, therefore, High Availability does not work
+ on the standard Red Hat distribution.
+
+- Stopping the driver while SRP High Availability is enabled kills all
+ multipath processes. Consider appropriate actions in case multipath is used
+ for other purposes.
+
+- AS High Availability is based on Device Mapper multipath, it embodies
+ multipath limitations and also its configuration and tuning options.
+ See http://christophe.varoqui.free.fr/wiki/wakka.php?wiki=Home
+ for information on multipath.
+ To modify and tune multipath configuration, edit the file /etc/multipath.conf
+ according to instructions and tips listed in
+ /usr/share/doc/packages/multipath-tools/multipath.conf.*
+
+- In case your topology has two physical connections (i.e., network paths) from
+ a single initiator IB port to two different IB ports on the same Target HCA,
+ and you wish to have an SRP connection on the one path coexist with an SRP
+ connection on the second path, you must set a different initiator_ext value
+ on each path. See Section 8, "Multiple Connections from Initiator IB Port
+ to the Target" for details.
+
+==============================================================================
+12. Vendor Specific Notes
+==============================================================================
+
+Hosts connected to Silverstorm SRP Targets must perform one of the following
+steps after upgrading to OFED 1.1 to continue accessing their storage
+successfully:
+
+1. When issuing the "echo" command to add a new SRP Target, the host
+ must append the string ",initiator_ext=0000000000000001" to the original
+ echo string.
+ Example:
+ 'ibsrpdm -c' output is as follows:
+
+ id_ext=0000000000000001,ioc_guid=00066a0138000165,dgid=fe8000000000000
+ 000066a0260000165,pkey=ffff,service_id=0000494353535250,io_class=ff00
+
+ id_ext=0000000000000001,ioc_guid=00066a0238000165,dgid=fe8000000000000
+ 000066a0260000165,pkey=ffff,service_id=0000494353535250,io_class=ff00
+
+ To connect to the first target, the echo command must be:
+
+ echo -n \
+ id_ext=0000000000000001,ioc_guid=00066a0138000165,\
+ dgid=fe8000000000000000066a0260000165,pkey=ffff,\
+ service_id=0000494353535250,io_class=ff00,\
+ initiator_ext=0000000000000001 > \
+ /sys/class/inifiniband_srp/srp-mthca0-1/add_target
+
+
+2. Change the SRP map on the Silverstorm SRP Target to set the expected
+ initiator extension to 0. For details on how to change the SRP map on a
+ Silverstorm SRP Target, please refer to product documentation.
+
+
--- /dev/null
+
+ Release Notes for
+ Gamma 3.2 and OFED 1.1 DAPL Release
+ October 2006
+
+
+ DAPL GAMMA 3.2/OFED 1.1 RELEASE NOTES
+
+ This release of the DAPL reference implementation
+ is timed to coincide with OFED release 1.1 of the
+ Open Fabrics (www.openfabrics.org) software stack.
+
+ NEW SINCE Gamma 3.1 and OFED 1.0
+
+ * BUG FIXES
+
+ + Update obsolete CLK_TCK to CLOCKS_PER_SEC
+ + Fill out some unitialized fields in the ia_attr structure returned by
+ dat_ia_query().
+ + Update dtest to support multiple segments on rdma write and change
+ makefile to use OpenIB-cma by default.
+ + Add support for dat_evd_set_unwaitable on a DTO evd in openib_cma
+ provider
+ + Added errno reporting (message and return codes) during open to help
+ diagnose create thread issues.
+ + Fix some suspicious inline assembly EIEIO_ON_SMP and ISYNC_ON_SMP
+ + Fix IA64 build problems
+ + Lower the reject debug message level so we don't see warnings when
+ consumers reject.
+ + Added support for active side TIMED_OUT event from a provider.
+ + Fix bug in dapls_ib_get_dat_event() call after adding new unreachable
+ event.
+ + Update for new rdma_create_id() function signature.
+ + Set max rdma read per EP attributes
+ + Report the proper error and timeout events.
+ + Socket CM fix to guard against using a loopback address as the local
+ device address.
+ + Use the uCM set_option feature to adjust connect request timeout
+ retry values.
+ + Fix to disallow any event after a disconnect event.
+
+ * OFED 1.1 uDAPL source build instructions:
+
+ cd /usr/local/ofed/src/openib-1.1/src/userspace/dapl
+
+ # NON_DEBUG build configuration
+
+ ./configure --disable-libcheck --prefix /usr/local/ofed
+ --libdir /usr/local/ofed/lib64 LDFLAGS=-L/usr/local/ofed/lib64
+ CPPFLAGS="-I../libibverbs/include -I../librdmacm/include"
+
+ # build and install
+
+ make
+ make install
+
+ # DEBUG build configuration
+
+ ./configure --disable-libcheck --enable-debug --prefix /usr/local/ofed
+ --libdir /usr/local/ofed/lib64 LDFLAGS=-L/usr/local/ofed/lib64
+ CPPFLAGS="-I../libibverbs/include -I../librdmacm/include"
+
+ # build and install
+
+ make
+ make install
+
+ # DEBUG messages: set environment variable DAPL_DBG_TYPE, default
+ mapping is 0x0003
+
+ DAPL_DBG_TYPE_ERR = 0x0001,
+ DAPL_DBG_TYPE_WARN = 0x0002,
+ DAPL_DBG_TYPE_EVD = 0x0004,
+ DAPL_DBG_TYPE_CM = 0x0008,
+ DAPL_DBG_TYPE_EP = 0x0010,
+ DAPL_DBG_TYPE_UTIL = 0x0020,
+ DAPL_DBG_TYPE_CALLBACK = 0x0040,
+ DAPL_DBG_TYPE_DTO_COMP_ERR= 0x0080,
+ DAPL_DBG_TYPE_API = 0x0100,
+ DAPL_DBG_TYPE_RTN = 0x0200,
+ DAPL_DBG_TYPE_EXCEPTION = 0x0400,
+ DAPL_DBG_TYPE_SRQ = 0x0800,
+ DAPL_DBG_TYPE_CNTR = 0x1000
+
+
+ Note: The udapl provider library libdaplscm.so is untested and
+ unsupported, thus customers should not use it.
+ It will be removed in the next OFED release.
+
+ DAPL GAMMA 3.1 RELEASE NOTES
+
+ This release of the DAPL reference implementation
+ is timed to coincide with the first release of the
+ Open Fabrics (www.openfabrics.org) software stack.
+ This release adds support for this new stack, which
+ is now the native Linux RDMA stack.
+
+ This release also adds a new licensing option. In
+ addition to the Common Public License and BSD License,
+ the code can now be licensed under the terms of the GNU
+ General Public License (GPL) version 2.
+
+ NEW SINCE Gamma 3.0
+
+ - GPL v2 added as a licensing option
+ - OpenFabrics (aka OpenIB) gen2 verbs support
+ - dapltest support for Solaris 10
+
+ * BUG FIXES
+
+ + Fixed a disconnect event processing race
+ + Fix to destroy all QPs on IA close
+ + Removed compiler warnings
+ + Removed unused variables
+ + And many more...
+
+ DAPL GAMMA 3.0 RELEASE NOTES
+
+ This is the first release based on version 1.2 of the spec. There
+ are some components, such a shared receive queues (SRQs), which
+ are not implemented yet.
+
+ Once again there were numerous bug fixes submitted by the
+ DAPL community.
+
+ NEW SINCE Beta 2.06
+
+ - DAT 1.2 headers
+ - DAT_IA_HANDLEs implemented as small integers
+ - Changed default device name to be "ia0a"
+ - Initial support for Linux 2.6.X kernels
+ - Updates to the OpenIB gen 1 provider
+
+ * BUG FIXES
+
+ + Updated Makefile for differentiation between OS releases.
+ + Updated atomic routines to use appropriate API
+ + Removed unnecessary assert from atomic_dec.
+ + Fixed bugs when freeing a PSP.
+ + Fixed error codes returned by the DAT static registry.
+ + Kernel updates for dat_strerror.
+ + Cleaned up the transport layer/adapter interface to use DAPL
+ types rather than transport types.
+ + Fixed ring buffer reallocation.
+ + Removed old test/udapl/dapltest directory.
+ + Fixed DAT_IA_HANDLE translation (from pointer to int and
+ vice versa) on 64-bit platforms.
+
+ DAP BETA 2.06 RELEASE NOTES
+
+ We are not planning any further releases of the Beta series,
+ which are based on the 1.1 version of the spec. There may be
+ further releases for bug fixes, but we anticipate the DAPL
+ community to move to the new 1.2 version of the spec and the
+ changes mandated in the reference implementation.
+
+ The biggest item in this release is the first inclusion of the
+ OpenIB Gen 1 provider, an item generating a lot of interest in
+ the IB community. This implementation has graciously been
+ provided by the Mellanox team. The kdapl implementation is in
+ progress, and we imagine work will soon begin on Gen 2.
+
+ There are also a handful of bug fixes available, as well as a long
+ awaited update to the endpoint design document.
+
+ NEW SINCE Beta 2.05
+
+ - OpenIB gen 1 provider support has been added
+ - Added dapls_evd_post_generic_event(), routine to post generic
+ event types as requested by some providers. Also cleaned up
+ error reporting.
+ - Updated the endpoint design document in the doc/ directory.
+
+ * BUG FIXES
+
+ + Cleaned up memory leak on close by freeing the HCA structure;
+ + Removed bogus #defs for rdtsc calls on IA64.
+ + Changed daptest thread types to use internal types for
+ portability & correctness
+ + Various 64 bit enhancements & updates
+ + Fixes to conformance test that were defining CONN_QUAL twice
+ and using it in different ways
+ + Cleaned up private data handling in ep_connect & provider
+ support: we now avoid extra copy in connect code; reduced
+ stack requirements by using private_data structure in the EP;
+ removed provider variable.
+ + Fixed problem in the dat conformance test where cno_wait would
+ attempt to dereference a timer value and SEGV.
+ + Removed old vestiges of depricated POLLING_COMPLETIONS
+ conditionals.
+
+ DAPL BETA 2.05 RELEASE NOTES
+
+ This was to be a very minor release, the primary change was
+ going to be the new wording of the DAT license as contained in
+ the header for all source files. But the interest and
+ development occurring in DAPL provided some extra bug fixes, and
+ some new functionality that has been requested for a while.
+
+ First, you may notice that every single source file was
+ changed. If you read the release notes from DAPL BETA 2.04, you
+ were warned this would happen. There was a legal issue with the
+ wording in the header, the end result was that every source file
+ was required to change the word 'either of' to 'both'. We've
+ been putting this change off as long as possible, but we wanted
+ to do it in a clean drop before we start working on DAT 1.2
+ changes in the reference implementation, just to keep things
+ reasonably sane.
+
+ kdapltest has enabled three of the subtests supported by
+ dapltest. The Performance test in particular has been very
+ useful to dapltest in getting minima and maxima. The Limit test
+ pushes the limits by allocating the maximum number of specific
+ resources. And the FFT tests are also available.
+
+ Most vendors have supported shared memory regions for a while,
+ several of which have asked the reference implementation team to
+ provide a common implementation. Shared memory registration has
+ been tested on ibapi, and compiled into vapi. Both InfiniBand
+ providers have the restriction that a memory region must be
+ created before it can be shared; not all RDMA APIs are this way,
+ several allow you to declare a memory region shared when it is
+ registered. Hence, details of the implementation are hidden in
+ the provider layer, rather than forcing other APIs to do
+ something strange.
+
+ This release also contains some changes that will allow dapl to
+ work on Opteron processors, as well as some preliminary support
+ for Power PC architecture. These features are not well tested
+ and may be incomplete at this time.
+
+ Finally, we have been asked several times over the course of the
+ project for a canonical interface between the common and
+ provider layers. This release includes a dummy provider to meet
+ that need. Anyone should be able to download the release and do
+ a:
+ make VERBS=DUMMY
+
+ And have a cleanly compiled dapl library. This will be useful
+ both to those porting new transport providers, as well as those
+ going to new machines.
+
+ The DUMMY provider has been compiled on both Linux and Windows
+ machines.
+
+
+ NEW SINCE Beta 2.4
+ - kdapltest enhancements:
+ * Limit subtests now work
+ * Performance subtests now work.
+ * FFT tests now work.
+
+ - The VAPI headers have been refreshed by Mellanox
+
+ - Initial Opteron and PPC support.
+
+ - Atomic data types now have consistent treatment, allowing us to
+ use native data types other than integers. The Linux kdapl
+ uses atomic_t, allowing dapl to use the kernel macros and
+ eliminate the assembly code in dapl_osd.h
+
+ - The license language was updated per the direction of the
+ DAT Collaborative. This two word change affected the header
+ of every file in the tree.
+
+ - SHARED memory regions are now supported.
+
+ - Initial support for the TOPSPIN provider.
+
+ - Added a dummy provider, essentially the NULL provider. It's
+ purpose is to aid in porting and to clarify exactly what is
+ expected in a provider implementation.
+
+ - Removed memory allocation from the DTO path for VAPI
+
+ - cq_resize will now allow the CQ to be resized smaller. Not all
+ providers support this, but it's a provider problem, not a
+ limitation of the common code.
+
+ * BUG FIXES
+
+ + Removed spurious lock in dapl_evd_connection_callb.c that
+ would have caused a deadlock.
+ + The Async EVD was getting torn down too early, potentially
+ causing lost errors. Has been moved later in the teardown
+ process.
+ + kDAPL replaced mem_map_reserve() with newer SetPageReserved()
+ for better Linux integration.
+ + kdapltest no longer allocate large print buffers on the stack,
+ is more careful to ensure buffers don't overflow.
+ + Put dapl_os_dbg_print() under DAPL_DBG conditional, it is
+ supposed to go away in a production build.
+ + dapltest protocol version has been bumped to reflect the
+ change in the Service ID.
+ + Corrected several instances of routines that did not adhere
+ to the DAT 1.1 error code scheme.
+ + Cleaned up vapi ib_reject_connection to pass DAT types rather
+ than provider specific types. Also cleaned up naming interface
+ declarations and their use in vapi_cm.c; fixed incorrect
+ #ifdef for naming.
+ + Initialize missing uDAPL provider attr, pz_support.
+ + Changes for better layering: first, moved
+ dapl_lmr_convert_privileges to the provider layer as memory
+ permissions are clearly transport specific and are not always
+ defined in an integer bitfield; removed common routines for
+ lmr and rmr. Second, move init and release setup/teardown
+ routines into adapter_util.h, which defined the provider
+ interface.
+ + Cleaned up the HCA name cruft that allowed different types
+ of names such as strings or ints to be dealt with in common
+ code; but all names are presented by the dat_registry as
+ strings, so pushed conversions down to the provider
+ level. Greatly simplifies names.
+ + Changed deprecated true/false to DAT_TRUE/DAT_FALSE.
+ + Removed old IB_HCA_NAME type in favor of char *.
+ + Fixed race condition in kdapltest's use of dat_evd_dequeue.
+ + Changed cast for SERVER_PORT_NUMBER to DAT_CONN_QUAL as it
+ should be.
+ + Small code reorg to put the CNO into the EVD when it is
+ allocated, which simplifies things.
+ + Removed gratuitous ib_hca_port_t and ib_send_op_type_t types,
+ replaced with standard int.
+ + Pass a pointer to cqe debug routine, not a structure. Some
+ clean up of data types.
+ + kdapl threads now invoke reparent_to_init() on exit to allow
+ threads to get cleaned up.
+
+
+
+ DAPL BETA 2.04 RELEASE NOTES
+
+ The big changes for this release involve a more strict adherence
+ to the original dapl architecture. Originally, only InfiniBand
+ providers were available, so allowing various data types and
+ event codes to show through into common code wasn't a big deal.
+
+ But today, there are an increasing number of providers available
+ on a number of transports. Requiring an IP iWarp provider to
+ match up to InfiniBand events is silly, for example.
+
+ Restructuring the code allows more flexibility in providing an
+ implementation.
+
+ There are also a large number of bug fixes available in this
+ release, particularly in kdapl related code.
+
+ Be warned that the next release will change every file in the
+ tree as we move to the newly approved DAT license. This is a
+ small change, but all files are affected.
+
+ Future releases will also support to the soon to be ratified DAT
+ 1.2 specification.
+
+ This release has benefited from many bug reports and fixes from
+ a number of individuals and companies. On behalf of the DAPL
+ community, thank you!
+
+
+ NEW SINCE Beta 2.3
+
+ - Made several changes to be more rigorous on the layering
+ design of dapl. The intent is to make it easier for non
+ InfiniBand transports to use dapl. These changes include:
+
+ * Revamped the ib_hca_open/close code to use an hca_ptr
+ rather than an ib_handle, giving the transport layer more
+ flexibility in assigning transport handles and resources.
+
+ * Removed the CQD calls, they are specific to the IBM API;
+ folded this functionality into the provider open/close calls.
+
+ * Moved VAPI, IBAPI transport specific items into a transport
+ structure placed inside of the HCA structure. Also updated
+ routines using these fields to use the new location. Cleaned
+ up provider knobs that have been exposed for too long.
+
+ * Changed a number of provider routines to use DAPL structure
+ pointers rather than exposing provider handles & values. Moved
+ provider specific items out of common code, including provider
+ data types (e.g. ib_uint32_t).
+
+ * Pushed provider completion codes and type back into the
+ provider layer. We no longer use EVD or CM completion types at
+ the common layer, instead we obtain the appropriate DAT type
+ from the provider and process only DAT types.
+
+ * Change private_data handling such that we can now accommodate
+ variable length private data.
+
+ - Remove DAT 1.0 cruft from the DAT header files.
+
+ - Better spec compliance in headers and various routines.
+
+ - Major updates to the VAPI implementation from
+ Mellanox. Includes initial kdapl implementation
+
+ - Move kdapl platform specific support for hash routines into
+ OSD file.
+
+ - Cleanups to make the code more readable, including comments
+ and certain variable and structure names.
+
+ - Fixed CM_BUSTED code so that it works again: very useful for
+ new dapl ports where infrastructure is lacking. Also made
+ some fixes for IBHOSTS_NAMING conditional code.
+
+ - Added DAPL_MERGE_CM_DTO as a compile time switch to support
+ EVD stream merging of CM and DTO events. Default is off.
+
+ - 'Quit' test ported to kdapltest
+
+ - uDAPL now builds on Linux 2.6 platform (SuSE 9.1).
+
+ - kDAPL now builds for a larger range of Linux kernels, but
+ still lacks 2.6 support.
+
+ - Added shared memory ID to LMR structure. Shared memory is
+ still not fully supported in the reference implementation, but
+ the common code will appear soon.
+
+ * Bug fixes
+ - Various Makefiles fixed to use the correct dat registry
+ library in its new location (as of Beta 2.03)
+ - Simple reorg of dat headers files to be consistent with
+ the spec.
+ - fixed bug in vapi_dto.h recv macro where we could have an
+ uninitialized pointer.
+ - Simple fix in dat_dr.c to initialize a variable early in the
+ routine before errors occur.
+ - Removed private data pointers from a CONNECTED event, as
+ there should be no private data here.
+ - dat_strerror no longer returns an uninitialized pointer if
+ the error code is not recognized.
+ - dat_dup_connect() will reject 0 timeout values, per the
+ spec.
+ - Removed unused internal_hca_names parameter from
+ ib_enum_hcas() interface.
+ - Use a temporary DAT_EVENT for kdapl up-calls rather than
+ making assumptions about the current event queue.
+ - Relocated some platform dependent code to an OSD file.
+ - Eliminated several #ifdefs in .c files.
+ - Inserted a missing unlock() on an error path.
+ - Added bounds checking on size of private data to make sure
+ we don't overrun the buffer
+ - Fixed a kdapltest problem that caused a machine to panic if
+ the user hit ^C
+ - kdapltest now uses spin locks more appropriate for their
+ context, e.g. spin_lock_bh or spin_lock_irq. Under a
+ conditional.
+ - Fixed kdapltest loops that drain EVDs so they don't go into
+ endless loops.
+ - Fixed bug in dapl_llist_add_entry link list code.
+ - Better error reporting from provider code.
+ - Handle case of user trying to reap DTO completions on an
+ EP that has been freed.
+ - No longer hold lock when ep_free() calls into provider layer
+ - Fixed cr_accept() to not have an extra copy of
+ private_data.
+ - Verify private_data pointers before using them, avoid
+ panic.
+ - Fixed memory leak in kdapltest where print buffers were not
+ getting reclaimed.
+
+
+
+ DAPL BETA 2.03 RELEASE NOTES
+
+ There are some prominent features in this release:
+ 1) dapltest/kdapltest. The dapltest test program has been
+ rearchitected such that a kernel version is now available
+ to test with kdapl. The most obvious change is a new
+ directory structure that more closely matches other core
+ dapl software. But there are a large number of changes
+ throughout the source files to accommodate both the
+ differences in udapl/kdapl interfaces, but also more mundane
+ things such as printing.
+
+ The new dapltest is in the tree at ./test/dapltest, while the
+ old remains at ./test/udapl/dapltest. For this release, we
+ have maintained both versions. In a future release, perhaps
+ the next release, the old dapltest directory will be
+ removed. Ongoing development will only occur in the new tree.
+
+ 2) DAT 1.1 compliance. The DAT Collaborative has been busy
+ finalizing the 1.1 revision of the spec. The header files
+ have been reviewed and posted on the DAT Collaborative web
+ site, they are now in full compliance.
+
+ The reference implementation has been at a 1.1 level for a
+ while. The current implementation has some features that will
+ be part of the 1.2 DAT specification, but only in places
+ where full compatibility can be maintained.
+
+ 3) The DAT Registry has undergone some positive changes for
+ robustness and support of more platforms. It now has the
+ ability to support several identical provider names
+ simultaneously, which enables the same dat.conf file to
+ support multiple platforms. The registry will open each
+ library and return when successful. For example, a dat.conf
+ file may contain multiple provider names for ex0a, each
+ pointing to a different library that may represent different
+ platforms or vendors. This simplifies distribution into
+ different environments by enabling the use of common
+ dat.conf files.
+
+ In addition, there are a large number of bug fixes throughout
+ the code. Bug reports and fixes have come from a number of
+ companies.
+
+ Also note that the Release notes are cleaned up, no longer
+ containing the complete text of previous releases.
+
+ * EVDs no longer support DTO and CONNECTION event types on the
+ same EVD. NOTE: The problem is maintaining the event ordering
+ between two channels such that no DTO completes before a
+ connection is received; and no DTO completes after a
+ disconnect is received. For 90% of the cases this can be made
+ to work, but the remaining 10% will cause serious performance
+ degradation to get right.
+
+ NEW SINCE Beta 2.2
+
+ * DAT 1.1 spec compliance. This includes some new types, error
+ codes, and moving structures around in the header files,
+ among other things. Note the Class bits of dat_error.h have
+ returned to a #define (from an enum) to cover the broadest
+ range of platforms.
+
+ * Several additions for robustness, including handle and
+ pointer checking, better argument checking, state
+ verification, etc. Better recovery from error conditions,
+ and some assert()s have been replaced with 'if' statements to
+ handle the error.
+
+ * EVDs now maintain the actual queue length, rather than the
+ requested amount. Both the DAT spec and IB (and other
+ transports) allow the underlying implementation to provide
+ more CQ entries than requested.
+
+ Requests for the same number of entries contained by an EVD
+ return immediate success.
+
+ * kDAPL enhancements:
+ - module parameters & OS support calls updated to work with
+ more recent Linux kernels.
+ - kDAPL build options changes to match the Linux kernel, vastly
+ reducing the size and making it more robust.
+ - kDAPL unload now works properly
+ - kDAPL takes a reference on the provider driver when it
+ obtains a verbs vector, to prevent an accidental unload
+ - Cleaned out all of the uDAPL cruft from the linux/osd files.
+
+ * New dapltest (see above).
+
+ * Added a new I/O trace facility, enabling a developer to debug
+ all I/O that are in progress or recently completed. Default
+ is OFF in the build.
+
+ * 0 timeout connections now refused, per the spec.
+
+ * Moved the remaining uDAPL specific files from the common/
+ directory to udapl/. Also removed udapl files from the kdapl
+ build.
+
+ * Bug fixes
+ - Better error reporting from provider layer
+ - Fixed race condition on reference counts for posting DTO
+ ops.
+ - Use DAT_COMPLETION_SUPPRESS_FLAG to suppress successful
+ completion of dapl_rmr_bind (instead of
+ DAT_COMPLEITON_UNSIGNALLED, which is for non-notification
+ completion).
+ - Verify psp_flags value per the spec
+ - Bug in psp_create_any() checking psp_flags fixed
+ - Fixed type of flags in ib_disconnect from
+ DAT_COMPLETION_FLAGS to DAT_CLOSE_FLAGS
+ - Removed hard coded check for ASYNC_EVD. Placed all EVD
+ prevention in evd_stream_merging_supported array, and
+ prevent ASYNC_EVD from being created by an app.
+ - ep_free() fixed to comply with the spec
+ - Replaced various printfs with dbg_log statements
+ - Fixed kDAPL interaction with the Linux kernel
+ - Corrected phy_register protottype
+ - Corrected kDAPL wait/wakeup synchronization
+ - Fixed kDAPL evd_kcreate() such that it no longer depends
+ on uDAPL only code.
+ - dapl_provider.h had wrong guard #def: changed DAT_PROVIDER_H
+ to DAPL_PROVIDER_H
+ - removed extra (and bogus) call to dapls_ib_completion_notify()
+ in evd_kcreate.c
+ - Inserted missing error code assignment in
+ dapls_rbuf_realloc()
+ - When a CONNECTED event arrives, make sure we are ready for
+ it, else something bad may have happened to the EP and we
+ just return; this replaces an explicit check for a single
+ error condition, replacing it with the general check for the
+ state capable of dealing with the request.
+ - Better context pointer verification. Removed locks around
+ call to ib_disconnect on an error path, which would result
+ in a deadlock. Added code for BROKEN events.
+ - Brought the vapi code more up to date: added conditional
+ compile switches, removed obsolete __ActivePort, deal
+ with 0 length DTO
+ - Several dapltest fixes to bring the code up to the 1.1
+ specification.
+ - Fixed mismatched dalp_os_dbg_print() #else dapl_Dbg_Print();
+ the latter was replaced with the former.
+ - ep_state_subtype() now includes UNCONNECTED.
+ - Added some missing ibapi error codes.
+
+
+
+ NEW SINCE Beta 2.1
+
+ * Changes for Erratta and 1.1 Spec
+ - Removed DAT_NAME_NOT_FOUND, per DAT erratta
+ - EVD's with DTO and CONNECTION flags set no longer valid.
+ - Removed DAT_IS_SUCCESS macro
+ - Moved provider attribute structures from vendor files to udat.h
+ and kdat.h
+ - kdapl UPCALL_OBJECT now passed by reference
+
+ * Completed dat_strerr return strings
+
+ * Now support interrupted system calls
+
+ * dapltest now used dat_strerror for error reporting.
+
+ * Large number of files were formatted to meet project standard,
+ very cosmetic changes but improves readability and
+ maintainability. Also cleaned up a number of comments during
+ this effort.
+
+ * dat_registry and RPM file changes (contributed by Steffen Persvold):
+ - Renamed the RPM name of the registry to be dat-registry
+ (renamed the .spec file too, some cvs add/remove needed)
+ - Added the ability to create RPMs as normal user (using
+ temporal paths), works on SuSE, Fedora, and RedHat.
+ - 'make rpm' now works even if you didn't build first.
+ - Changed to using the GNU __attribute__((constructor)) and
+ __attribute__((destructor)) on the dat_init functions, dat_init
+ and dat_fini. The old -init and -fini options to LD makes
+ applications crash on some platforms (Fedora for example).
+ - Added support for 64 bit platforms.
+ - Added code to allow multiple provider names in the registry,
+ primarily to support ia32 and ia64 libraries simultaneously.
+ Provider names are now kept in a list, the first successful
+ library open will be the provider.
+
+ * Added initial infrastructure for DAPL_DCNTR, a feature that
+ will aid in debug and tuning of a dapl implementation. Partial
+ implementation only at this point.
+
+ * Bug fixes
+ - Prevent debug messages from crashing dapl in EVD completions by
+ verifying the error code to ensure data is valid.
+ - Verify CNO before using it to clean up in evd_free()
+ - CNO timeouts now return correct error codes, per the spec.
+ - cr_accept now complies with the spec concerning connection
+ requests that go away before the accept is invoked.
+ - Verify valid EVD before posting connection evens on active side
+ of a connection. EP locking also corrected.
+ - Clean up of dapltest Makefile, no longer need to declare
+ DAT_THREADSAFE
+ - Fixed check of EP states to see if we need to disconnect an
+ IA is closed.
+ - ep_free() code reworked such that we can properly close a
+ connection pending EP.
+ - Changed disconnect processing to comply with the spec: user will
+ see a BROKEN event, not DISCONNECTED.
+ - If we get a DTO error, issue a disconnect to let the CM and
+ the user know the EP state changed to disconnect; checked IBA
+ spec to make sure we disconnect on correct error codes.
+ - ep_disconnect now properly deals with abrupt disconnects on the
+ active side of a connection.
+ - PSP now created in the correct state for psp_create_any(), making
+ it usable.
+ - dapl_evd_resize() now returns correct status, instead of always
+ DAT_NOT_IMPLEMENTED.
+ - dapl_evd_modify_cno() does better error checking before invoking
+ the provider layer, avoiding bugs.
+ - Simple change to allow dapl_evd_modify_cno() to set the CNO to
+ NULL, per the spec.
+ - Added required locking around call to dapl_sp_remove_cr.
+
+ - Fixed problems related to dapl_ep_free: the new
+ disconnect(abrupt) allows us to do a more immediate teardown of
+ connections, removing the need for the MAGIC_EP_EXIT magic
+ number/state, which has been removed. Mmuch cleanup of paths,
+ and made more robust.
+ - Made changes to meet the spec, uDAPL 1.1 6.3.2.3: CNO is
+ triggered if there are waiters when the last EVD is removed
+ or when the IA is freed.
+ - Added code to deal with the provider synchronously telling us
+ a connection is unreachable, and generate the appropriate
+ event.
+ - Changed timer routine type from unsigned long to uintptr_t
+ to better fit with machine architectures.
+ - ep.param data now initialized in ep_create, not ep_alloc.
+ - Or Gerlitz provided updates to Mellanox files for evd_resize,
+ fw attributes, many others. Also implemented changes for correct
+ sizes on REP side of a connection request.
+
+
+
+ NEW SINCE Beta 2.0
+
+ * dat_echo now DAT 1.1 compliant. Various small enhancements.
+
+ * Revamped atomic_inc/dec to be void, the return value was never
+ used. This allows kdapl to use Linux kernel equivalents, and
+ is a small performance advantage.
+
+ * kDAPL: dapl_evd_modify_upcall implemented and tested.
+
+ * kDAPL: physical memory registration implemented and tested.
+
+ * uDAPL now builds cleanly for non-debug versions.
+
+ * Default RDMA credits increased to 8.
+
+ * Default ACK_TIMEOUT now a reasonable value (2 sec vs old 2
+ months).
+
+ * Cleaned up dat_error.h, now 1.1 compliant in comments.
+
+ * evd_resize initial implementation. Untested.
+
+ * Bug fixes
+ - __KDAPL__ is defined in kdat_config.h, so apps don't need
+ to define it.
+ - Changed include file ordering in kdat.h to put kdat_config.h
+ first.
+ - resolved connection/tear-down race on the client side.
+ - kDAPL timeouts now scaled properly; fixed 3 orders of
+ magnitude difference.
+ - kDAPL EVD callbacks now get invoked for all completions; old
+ code would drop them in heavy utilization.
+ - Fixed error path in kDAPL evd creation, so we no longer
+ leak CNOs.
+ - create_psp_any returns correct error code if it can't create
+ a connection qualifier.
+ - lock fix in ibapi disconnect code.
+ - kDAPL INFINITE waits now work properly (non connection
+ waits)
+ - kDAPL driver unload now works properly
+ - dapl_lmr_[k]create now returns 1.1 error codes
+ - ibapi routines now return DAT 1.1 error codes
+
+
+
+ NEW SINCE Beta 1.10
+
+ * kDAPL is now part of the DAPL distribution. See the release
+ notes above.
+
+ The kDAPL 1.1 spec is now contained in the doc/ subdirectory.
+
+ * Several files have been moved around as part of the kDAPL
+ checkin. Some files that were previously in udapl/ are now
+ in common/, some in common are now in udapl/. The goal was
+ to make sure files are properly located and make sense for
+ the build.
+
+ * Source code formatting changes for consistency.
+
+ * Bug fixes
+ - dapl_evd_create() was comparing the wrong bit combinations,
+ allowing bogus EVDs to be created.
+ - Removed code that swallowed zero length I/O requests, which
+ are allowed by the spec and are useful to applications.
+ - Locking in dapli_get_sp_ep was asymmetric; fixed it so the
+ routine will take and release the lock. Cosmetic change.
+ - dapl_get_consuemr_context() will now verify the pointer
+ argument 'context' is not NULL.
+
+
+
+
+ OBTAIN THE CODE
+
+ To obtain the tree for your local machine you can check it
+ out of the source repository using CVS tools. CVS is common
+ on Unix systems and available as freeware on Windows machines.
+ The command to anonymously obtain the source code from
+ Source Forge (with no password) is:
+
+ cvs -d:pserver:anonymous@cvs.dapl.sourceforge.net:/cvsroot/dapl login
+ cvs -z3 -d:pserver:anonymous@cvs.dapl.sourceforge.net:/cvsroot/dapl co .
+
+ When prompted for a password, simply press the Enter key.
+
+ Source Forge also contains explicit directions on how to become
+ a developer, as well as how to use different CVS commands. You may
+ also browse the source code using the URL:
+
+ http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/dapl/
+
+
+ SYSTEM REQUIREMENTS
+
+ This project has been implemented on Red Hat Linux 7.3, SuSE
+ SLES 8, Windows 2000, RHEL 3.0, and a couple of other Linux
+ distrubutions. The structure of the code is designed to allow
+ other operating systems to easily be adapted.
+
+ The DAPL team has used Mellanox Tavor based InfiniBand HCAs for
+ development, and continues with this platform. Our HCAs use the
+ IB verbs API submitted by IBM. Mellanox has contributed an
+ adapter layer using their VAPI verbs API. Either platform is
+ available to any group considering DAPL work. The structure of
+ the uDAPL source allows other provider API sets to be easily
+ integrated.
+
+ The development team uses any one of three topologies: two HCAs
+ in a single machine; a single HCA in each of two machines; and
+ most commonly, a switch. Machines connected to a switch may have
+ more than one HCA.
+
+ The DAPL Plugfest revealed that switches and HCAs available from
+ most vendors will interoperate with little trouble, given the
+ most recent releases of software. The dapl reference team makes
+ no recommendation on HCA or switch vendors.
+
+ Explicit machine configurations are available upon request.
+
+
+ IN THE TREE
+
+ The DAPL tree contains source code for the uDAPL and kDAPL
+ implementations, and also includes tests and documentation.
+
+ Included documentation has the base level API of the
+ providers: the IBM Access API and the Mellanox Verbs API. Also
+ included are a growing number of DAPL design documents which
+ lead the reader through specific DAPL subsystems. More
+ design documents are in progress and will appear in the tree in
+ the near future.
+
+ A small number of test applications and a unit test framework
+ are also included. dapltest is the primary testing application
+ used by the DAPL team, it is capable of simulating a variety of
+ loads and exercises a large number of interfaces. Full
+ documentation is included for each of the tests.
+
+ Recently, the dapl conformance test has been added to the source
+ repository. The test provides coverage of the most common
+ interfaces, doing both positive and negative testing. Vendors
+ providing DAPL implementation are strongly encouraged to run
+ this set of tests.
+
+
+ MAKEFILE NOTES
+
+ There are a number #ifdef's in the code that were necessary
+ during early development. They are disappearing as we
+ have time to take advantage of features and work available from
+ newer releases of provider software. You may notice an #ifdef
+ <something>_BUSTED, which indicates a particular feature was not
+ working at the time the code was written and the DAPL team
+ developed a work-around.
+
+ These #ifdefs are not documented as the intent is to remove
+ them as soon as possible.
+
+ Of particular relevance are the following #defines:
+
+ - CM_BUSTED
+
+ The DAPL team has been an early adopter of InfiniBand and has
+ had to improvise missing functionality while the vendors lag
+ our development. InfiniBand uses a Connection Manager (CM) to
+ establish a connection between nodes. This #define essentially
+ 'fakes' a connection by moving a QP into the appropriate
+ state. Most of the IB vendors have a working CM now and this
+ is no longer the default, but the code remains as some
+ development groups are working to catch up.
+
+ - NO_NAME_SERVICE
+
+ Naming is a thorny issue in InfiniBand; translating from a
+ hostname or an interface name to a GID that can be used to
+ establish a connection with a remote machine. The reference
+ implementation provides a simple name service under this
+ #define. The goal is to use IPoIB when it becomes
+ available. NO_NAME_SERVICE will probably remain in the code
+ long term in order to enable various implementations. A
+ description of how this works is found in the end_point_design
+ document in the doc/ directory.
+
+
+ CONTRIBUTIONS
+
+ As is common to Source Forge projects, there are a small number
+ of developers directly associated with the source tree and having
+ privileges to change the tree. Requested updates, changes, bug
+ fixes, enhancements, or contributions should be sent to Steve
+ Sears at sjs@netapp.com for review. We welcome your
+ contributions and expect the quality of the project will
+ improve thanks to your help.
+
+ The core DAPL team is:
+
+ Steve Sears
+ Philip Christopher
+
+ ... with contributions from a number of excellent engineers in
+ various companies contributing to the open source effort.
+
+
+ ONGOING WORK
+
+ Not all of the DAPL spec is implemented at this time.
+ Functionality such as shared memory will probably not be
+ implemented by the reference implementation (there is a write up
+ on this in the doc/ area), and there are yet various cases where
+ work remains to be done. And of course, not all of the
+ implemented functionality has been tested yet. The DAPL team
+ continues to develop and test the tree with the intent of
+ completing the specification and delivering a robust and useful
+ implementation.
+
+
+The DAPL Team
+