shefty [Wed, 6 May 2009 06:11:14 +0000 (06:11 +0000)]
ND: fixup build.txt
Update the build.txt document for building ulp/nd.
Replace use of ND_INC and ND_INC_S variables with a single, user defined
ND_SDK_PATH environment variable. The change makes it consistent
with the existing PLATFORM_SDK_PATH variable.
The makefile checks for ND_SDK_PATH, rather than ND_INC when determining if ND
should be built. ND_INC indicates that the SDK has been installed, but is not
useful for building in the WDK.
The hard-coded paths in the ND sources file are removed, since those paths are
specific to individual installations. PLATFORM_SDK_PATH_S is replaced with
the existing PLATFORM_SDK_PATH variable.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2155 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
shefty [Wed, 6 May 2009 06:06:21 +0000 (06:06 +0000)]
We need to return all MADs to IBAL before calling close_al. To protect
against queuing MADs during deregistration, set the MAD service handle
to NULL when deregistering and check that it is still valid before queuing
any received MADs.
This fixes a hanging when using ctrl-C to kill a process running ibping.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2152 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
leonidk [Mon, 4 May 2009 12:42:20 +0000 (12:42 +0000)]
[IBAL] fix memory leak on power down/power up flow. [mlnx: 4289]
port_mgr_port_add() allocates a port_pnp_ctx_t context, which is saved by IBAL to be used later in port_mgr_port_remove().
But in hibernation flow port_mgr_port_remove() doesn't release this context which causes IBBUS memory leak.
It was trapped by Verifier during WHQL Common Scenario Stress test.
leonidk [Sun, 3 May 2009 12:47:37 +0000 (12:47 +0000)]
[IBAL] crash on IBBUS disabling while mad traffic. [mlnx: 4275]
__ioc_query_sa takes references on IOC PnP service before sending the node and path_record requests.
But these references get released at the end of __node_rec_cb and __path_rec_cb, while __process_sweep routine, which performs the IOU sweeping, is just scheduled to run in an async thread.
If the test happens to unload the driver after __node_rec_cb and __path_rec_cb and before __process_sweep started to run, IOC PnP service gets released and __process_sweep crashes.
The patch takes a reference on IOC PnP service before scheduling a thread for __process_sweep and releases the reference at the end of __process_sweep.
(Pay attention, that __process_sweep schedules a thread for itself twice while moving through its FSM:
SWEEP_IOU_INFO --> SWEEP_IOC_PROFILE --> SWEEP_SVC_ENTRIES --> SWEEP_COMPLETE)
winverbs: release CM interface only once per reference
The CM interface is not bound to a device, and is only acquired once by the winverbs driver. Release the CM interface only once after all devices have been removed, not once per hardware device.
This should fix issues enabling/disabling HCA drivers with multiple HCAs present in a single system.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2141 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
[WinOF] dapl socket cm name change dapl2-scmd.dll --> dapl2-ofa-scmd.dll
Handle the absence of ND components: be verbose, don't fail.
Skip .cdf file copy.
Use the latest comp_channel changes to fix event reporting and avoid
hangs when destroying resources. We need to track when closing
devices to make sure that events are canceled, and avoid issuing
new wait calls.
Minor correction to the cmatose test app to avoid busy polling of the CQ,
which can prevent other threads from running. This leads to connection
failures when running more clients than there are CPUs in the system.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2130 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
Use the latest comp_channel changes to fix event reporting and avoid
hangs when destroying resources. We need to track when closing
devices to make sure that events are canceled, and avoid issuing
new wait calls.
Rename windows specific calls to include 'w' after the ibv prefix to
avoid any potential future conflicts and clearly indicate to a caller
that they're using a windows only call.
Use the common ntohll definition.
Device names are changed from ibv_device_<guid> to ibv_device_X, where
X is an index (0, 1, 2, etc.). This gives devices across the cluster
the same name, which is closer to IBAL and OFED device naming.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2129 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
winverbs: fixes to support OFED compatibility libraries and ND
Winverbs fixes based on testing the DAPL openib_scm provider, which uses the
libibverbs compatibility library.
Simplify endpoint connect locking and code structure so it's clear when the
user's request is completed.
Add const to TranslateAddress to avoid a compiler warning when building the
ND provider.
Renumber CQ notification types to align with underlying code.
Take the RemoteAddress in a send work request in host order, to align with
the UVP. (This will be revisited, but is required for RDMA over winverbs to
work for now.)
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2128 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
The previous version of the completion channel was racy and would
occasionally lose events, resulting in users blocking indefinitely
if no new events occurred. The most sure fix for this is to add
a thread to the completion manager that reaps events from an IO
completion port and dispatches them to the correct completion
channel. This results in a 1-2% performance hit in libibverbs
bandwidth tests that wait on CQ, but actually works.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2127 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
ibal/cm: reference private data in received MAD not stored in CEP
To avoid any potential synchronization issues with changes to the private
data in the CEP, reference the private data in the received MAD when
formatting CM events.
Fix the size of the reject private data.
This only affects users of the newer IB CM interface, which is only winverbs
at this point.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2125 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
etc/dlist: add simple userspace doubly-linked list abstraction
Add a very simple implementation for managing a doubly-linked list.
This implementation uses only a 'list entry' structure for both the
list and items on the list, versus separate structures like complib.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2121 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
[NetworkDirect] Enable Network Direct IB provider builds.
if HPC SDK installed (ND_INC defined) then build for x86 & x64 - no IA64 ND support until ND over Winverbs.
if !HPC SDK installed, skip ND provider build for all architectures.
[ND] reverting back to fake ND sources file so build out-of-svn works again.
Once MS supplied sources file actually builds something useful, will reinstate their sources file.
[HW] Pass IRP_MN_QUERY_INTERFACE IRP down the stack.
This patch changes the HCA driver to pass any IRP_MN_QUERY_INTERFACE IRP down the stack if the interface is supported, after setting the IRP status to STATUS_SUCCESS. The bottom-most driver will complete the IRP without changing the status.
Signed-off-by: Fab Tillier <ftillier@microsoft.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2093 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
[WinOF]
buildrelease.bat - for compf & allf - verify input path exists; protect against bad typists/self.
README.txt - update how2 doc with current paths.
Release_notes.htm - added MS HPC driver install link to HPC install discussion.
signDrivers.bat - be explicit as to which cert-store the cert comes from (self-doc).
[DAPL2] cleanup/simpleify
dt-cli.bat standardize time output, skip 'finished test xxx' messages where start is still on the screen.
dt-svr.bat - exit with correct error code.
This patch fixes issues with connection establishment for NetworkDirect. The root cause of the issue is 'too many cooks' - CIDs exposed to user-mode should not be destroyed in the kernel code without explicit request from the user. Otherwise, the CID can get recycled in the kernel for the same process and improperly freed when the stale CID is released by the application (multiple connection objects in the app have the same CID.)
Unfortunately, the fix is not simple. The QP references the CEP, so QP destruction frees the CEP, even if there's a reference to that CEP left in the application. Removing the CEP reference form the QP solves this problem, but deadlocks the app if it destroys the QP before the CEP, since the QP is used to queue connection-related IRPs, and the CEP uses the QP as its context and so holds a reference on it.
This patch does the following:
- Remove CEP reference for ND related QP.
- Remove ND connection related IRP queue from QP.
- Remove ND IRP handling from CEP manager.
- Add a function to CEP manager to reference the context associated with a CEP if the context is non-NULL.
- Move ND connection related IRP management into al_ndi_cm.c, in nd_csq_t structure.
As part of testing, I needed to add NotifyDisconnect functionality, so this is also included in the patch.
Note that the patch depends on Sean's previous patch to change kal_cep_destroy to allow silently dropping a REQ. I did not remove Sean's previous changes from this patch, so they are duplicated here. This allows the patch to be applied and build.
-Fab
Signed-off-by: Fab Tillier <ftillier@microsoft.com>
git-svn-id: svn://openib.tc.cornell.edu/gen1@2079 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86
[WinOF] Windows 7 support
buildrelease.bat - use 7068 WDK
build-oFA-dist.bat - Win7 support; collect OS variants prior to zip, facilitates win7 or not support.
signDriver.bat - delayed env var expansion bug: replace %CD%\arch with !CD! as pushd does update %CD%.
stansmith [Mon, 30 Mar 2009 22:49:25 +0000 (22:49 +0000)]
[WinOF] Using the WIX preprocessor OS*\arch*\wof.wxs files (all 10 of them) have been distilled down to be manageable. WIX common install sections are now included from WinOF\WIX\common\.
leonidk [Sun, 29 Mar 2009 16:06:32 +0000 (16:06 +0000)]
[MLX4_BUS] Bad order of operations in mlx4_ib teardown process. [mlnx: 4208]
The bug is that the driver performs "CLOSE_PORT" command prior to closing all resources (such as QPs).
In some cases it causes loss of completions.
According to PRM:
18.2 ConnectX Driver Teardown and Re-initialization
The HCA can be shut down (and re-initialized/restarted later on) by software. This operation is performed while the system shuts down gracefully or when PCI bus re-enumeration and memory re-allocation is required. In this case, software should perform the following steps:
•Stop HCA operations (tear-down all QPs and flush WQEs if required).
•Take down the network links by executing the CLOSE_PORT command.
leonidk [Sun, 29 Mar 2009 15:40:56 +0000 (15:40 +0000)]
[CORE,HW] remove PDO from the upper HCA interface. [mlnx: 4197]
This patch removes p_hca_dev field of the upper CA interface (ci_interface_t), which contains PDO of HCA device.
IBBUS, now sitting over HCA, gets this PDO in add_device function and stores it (in this patch) in new p_hca_dev field in IBAL CA object.
All the usages of ci_interface_t.p_hca_dev field is replaced by usage of p_hca_dev in IBAL CA object.
p_hca_obj field, added in 2019 patch in RDMA_INTERFACE_VERBS, removed and placed instead of p_hca_dev in ci_interface_t.
Removing of PDO filed from the interface required changing of ib_register_ca prototype (for technical reasons).
It is - an interface function, so the interface version number was increased (IB_CI_INTERFACE_VERSION=5).