shefty [Mon, 9 Nov 2009 20:09:35 +0000 (20:09 +0000)]
etc/docs: developer installation scripts
The following patch series implements a series of scripts that can be used
by developers to build and install the winof drivers across an HPC cluster.
The scripts are intended to allow quick building and replacement of specific
drivers and libraries. The process can be automated more by layering additional
scripts over those provided.
This patch documents the anticipated build and installation process. Follow
on patches in the series implement the various scripts.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2549 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
shefty [Mon, 9 Nov 2009 20:07:12 +0000 (20:07 +0000)]
winmad/inf: create inf file under bin/kernel
Winmad currently creates its inf file under core/winmad/kernel/obj*.
Move the inf file to bin/kernel/obj*. This is the location where all
other inf files in the tree are created.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2548 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
stansmith [Tue, 3 Nov 2009 22:25:45 +0000 (22:25 +0000)]
[OPENSM_3] relocaated OSM_MAX_LOG_NAME_SIZE to where it is used (winosm_common.c). Wired syslog() calls into OutputDebugStringA() so one can view syslog() writes using a DebugView Monitor. Simplified osm_strdup() to handle multiple environment vars in a path/filename.
stansmith [Tue, 3 Nov 2009 22:21:48 +0000 (22:21 +0000)]
[OPENSM_3] Added OpenSM local Service control handling to reset/zero OSM log file (code 128) along with code 192 to start a heavy sweep. Emulates OFED Opensm receiving SIGUSR1 & SIGHUP.
sc control OpenSM 128 reset OSM log file
sc control OpenSM 129 start a Heavy sweep.
Added above text to usage() message under --help.
shefty [Tue, 3 Nov 2009 16:45:38 +0000 (16:45 +0000)]
In order to support opensm running over winmad (via the libibumad),
we need to set the IsSM PortInfo capability bit when it is present.
We do this in the winmad driver based on the user registering for
unsolicted directed route SMPs. The bit is unset when that user goes
away.
In order to set the capability bit, we need to add ib_modify_ca()
to the IB_AL interface. The interface GUID is updated as a result.
For opensm, a call to umad_register (directly or indirectly through
another library), should result in setting the IsSM capability bit
correctly. No additional work is required, such as calling
umad_get_issm_path and opening a separate file, as is done on linx.
This will require a platform specific handling in the opensm code.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2536 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
stansmith [Sat, 31 Oct 2009 00:27:53 +0000 (00:27 +0000)]
[DAPL] Sync with latest 2.0.24 OFED release
Summary of changes since last release:
v2 - winof: Utilize WinOF version of inet_ntop() for Windows OSes which do not support inet_ntop().
v2 - winof: ucm windows build issue with new CQ completion channel
v2 - winof: add ucm provider to windows build
v2 - winof: add missing build files for ibal, scm
v2 - scm: connection peer resets under heavy load, incorrect event on error
v2 - ucm: increase default reply and rtu timeout values.
v2 - ucm: change some debug message levels and add check for valid UD REPLY during retries.
v2 - ucm: increase timers during subsequent retries
v2 - ucm, scm: address handles need destroyed when freeing Endpoints with UD QP's.
v2 - openib_common: ignore pd free errors, clear pd_handle and return.
v2 - ucm: using UD type QP's, ucm reports wrong reject event when user rejects AH resolution request.
v2 - ucm, scm, cma: Fix CNO support on DTO type EVD's
v2 - ucm: fix lock init bug in ucm_cm_find
v2 - ucm: fix build problem with latest windows ucm changes
v2 - ucm: HCA should not be closed until all resources have been released.
v2 - ucm: build warning when compiling on 32-bit systems.
v2 - ucm: trying to deregister the same memory region twice
v2 - dat: reduce debug message level when parsing for location of dat.conf
v2 - ucm: update ucm provider for windows environment
v2 - ucm: add timer/retry CM logic to the ucm provider
stansmith [Thu, 29 Oct 2009 20:59:21 +0000 (20:59 +0000)]
[WINOF] enhance driver file copy error reporting by incorporating a for() loop to copy ipoib & qlgcvnic drivers. Include Winverbs ND provider in binary tree creation.
leonidk [Mon, 26 Oct 2009 10:30:10 +0000 (10:30 +0000)]
[MLX4] Allocate and map sufficient ICM memory for EQ context. [mlnx: 4946]
The current implementation allocates a single host page for EQ context
memory, which was OK when we only allocated a few EQs. However, since
we now allocate an EQ for each CPU core, this patch removes the
hard-coded limit (which we exceed with 4 KB pages and 128 byte EQ
context entries with 32 CPUs) and uses the same ICM table code as all
other context tables, which ends up simplifying the code quite a bit
while fixing the problem.
This problem was actually hit in practice on a dual-socket Nehalem box
with 16 real hardware threads and sufficiently odd ACPI tables that it
shows on boot
SMP: Allowing 32 CPUs, 16 hotplug CPUs
so num_possible_cpus() ends up 32, and mlx4 ends up creating 33 MSI-X
interrupts and 33 EQs. This mlx4 bug means that mlx4 can't even
initialize at all on this quite mainstream system.
leonidk [Mon, 26 Oct 2009 10:14:38 +0000 (10:14 +0000)]
[MLX4] limit the process of reading VPD with timeout, but continue to work on error. [mlnx: 4879]
This patch solves the freeze of the driver in case when FW doesn't provide VPD.
(in fact - it's a workaround of a FW bug).
VPD is not used today in IB drivers.
leonidk [Mon, 26 Oct 2009 10:05:59 +0000 (10:05 +0000)]
[CORE,HW] replace using of Paged pool by NonPaged one. [mlnx: 4836]
We see from time to time BSODs at shutdown times while a hard traffic load.
It can be attributed to the fact that some of the structured used in PnP and Power Management are allocated in Paged Pool.
As far as these structures are of little size it is safer to use for them always NonPagedPool.
It makes the driver more robust.
leonidk [Mon, 26 Oct 2009 09:35:50 +0000 (09:35 +0000)]
[IBBUS,HW] add standby/hibernation support to IBBUS. [mlnx: 4750]
Mellanox HW doesn't support neither standby nor hibernation.
To simulate such support, low-level driver resets HCA on power down and starts it up on power up.
IBBUS, continuing to work with HCA, produces BSODs.
This patch deregisters HCA from IBAL on power down and re-registers it on power up.
stansmith [Sat, 24 Oct 2009 00:32:28 +0000 (00:32 +0000)]
[INC] add defines and inline functions from OFED management ib_types.h in order to build OpenSM 3.3.2 using only trunk\inc\*. Tested by building & installing a WinOF release using current openSM and newer openSM; no observed differences.
stansmith [Fri, 23 Oct 2009 02:26:00 +0000 (02:26 +0000)]
[WIX] Added explicit Windows Volume (%SystemDrive%\)for DAT config & SDK directories. Required as older installers do not default TARGET dir to Windows Volume.
[WinOF] Not all installers (svr2003/XP) default TARGETDIR == WindowsVolume, explicitly set WindowsVolume so DAT\ is installed where expected '%SystemDrive%\DAT'.
[WinOF] document logic in loading the windows driver store 1st with mlx4_hca driver, then mlx4_bus driver. Story is a race between PNP and MSFT installer, as the mlx4_bus driver if installed 1st will setup for mlx4_hca and PNP will request mlx4_hca driver before the mlx4_hca driver has been installed into the driver store - net result is a failed mlx4_hca load/startup. Install into Driver Store: 1st mlx4_hca, then mlx4_bus.
dapl2
Move close socket calls to the connection thread, to prevent accessing a socket
after it has been closed.
Remove the setsockopt call to mark listen addresses as reusable. A reusable
address allows other libraries, such as MPI, to bind to the same address.
Create all sockets as IPPROTO_TCP, rather than undefined.
Listen on a specific address, rather than any address, to prevent two listens
from occurring on the same port, but using different IP addresses. This
prevents connections from going to the wrong process.
winverbs
Map select winsock errors to winverb error codes. MPI relies on specific error
codes mapped through DAPL, specifically to determine if an address is in use.
Also adds devices to the end of the device list, rather than the beginning,
which helps maintain a more natural order of the devices when matched against
system device lists.
librdmacm
Have librdmacm release all libibverbs resources when the last librdmacm
structure goes away. This allows for graceful cleanup when the libraries are
properly unloaded, which is useful for debugging application that crash or do
not cleanup all resources properly.
winverbs: fix crash accessing freed memory from async thread
If an application exits while asynchronous accept processing is queued,
it's possible for the async processing to access the IbCmId after it has
been freed. A similar problem to this was fixed that dealt with accessing
the verbs QP handle.
A simpler, more generic solution to this problem is to handle application
exit in the same manner as device removal, and lock the winverb provider
lookup lists with exclusive access. Asynchronous operations that are in
process will run to completion, and future operations will be blocked until
the provider cleanup thread has completed. Once they run, they will fail
to acquire a reference on the desired object, which should result in a
graceful failure.
This avoids more complicated locking to use handles belonging to the lower
level code. If a reference on an object can be acquired, the handle will
be available for use until the reference is released. To handle IB CM
callbacks, additional state checking is required to avoid processing
CM events when we're trying to destroy the endpoint.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2462 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
winverbs: fix crash accessing freed memory from async thread
If an application exits while asynchronous accept processing is queued,
it's possible for the async processing to access the IbCmId after it has
been freed. A similar problem to this was fixed that dealt with accessing
the verbs QP handle.
A simpler, more generic solution to this problem is to handle application
exit in the same manner as device removal, and lock the winverb provider
lookup lists with exclusive access. Asynchronous operations that are in
process will run to completion, and future operations will be blocked until
the provider cleanup thread has completed. Once they run, they will fail
to acquire a reference on the desired object, which should result in a
graceful failure.
This avoids more complicated locking to use handles belonging to the lower
level code. If a reference on an object can be acquired, the handle will
be available for use until the reference is released. To handle IB CM
callbacks, additional state checking is required to avoid processing
CM events when we're trying to destroy the endpoint.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2453 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86