]> git.openfabrics.org - ~ardavis/dapl.git/log
~ardavis/dapl.git
14 years agoibal: changes for EP to CM linking and synchronization.
Arlin Davis [Mon, 8 Mar 2010 20:53:45 +0000 (12:53 -0800)]
ibal: changes for EP to CM linking and synchronization.

Windows IBAL changes to allocate and manage CM objects
and to link them to the EP. This will insure the CM
IBAL objects and cm_id's are not destroy before EP.
Remove windows only ibal_cm_handle in EP structure.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoscm: add support for canceling conn request that times out.
Arlin Davis [Wed, 24 Feb 2010 20:00:07 +0000 (12:00 -0800)]
scm: add support for canceling conn request that times out.

print warning message during timeout.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoscm, cma, ucm: consolidate dat event/provider event translation
Arlin Davis [Wed, 24 Feb 2010 19:28:04 +0000 (11:28 -0800)]
scm, cma, ucm: consolidate dat event/provider event translation

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocommon: missed linking changes from atomic to acquire/release
Arlin Davis [Wed, 24 Feb 2010 19:26:25 +0000 (11:26 -0800)]
common: missed linking changes from atomic to acquire/release

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocommon: add CM-EP linking to support mutiple CM's and proper protection during destru...
Arlin Davis [Wed, 24 Feb 2010 18:03:57 +0000 (10:03 -0800)]
common: add CM-EP linking to support mutiple CM's and proper protection during destruction

Add linking for CM to EP, including reference counting, to insure syncronization
during creation and destruction. A cm_list_head has been added to the EP object to
support multiple CM objects (UD) per EP. If the CM object is linked to an EP it
cannot be destroyed.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoRelease 2.0.27-1 dapl-2.0.27-1
Arlin Davis [Wed, 24 Feb 2010 00:26:41 +0000 (16:26 -0800)]
Release 2.0.27-1

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agowindows: add scm makefile
Arlin Davis [Mon, 22 Feb 2010 17:42:17 +0000 (09:42 -0800)]
windows: add scm makefile

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoWindows does not require rdma_cma_abi.h, move the include from common code and to...
Arlin Davis [Mon, 22 Feb 2010 17:41:13 +0000 (09:41 -0800)]
Windows does not require rdma_cma_abi.h, move the include from common code and to OSD file.

Signed-off-by: stan smith <stan.smith@intel.com>
14 years agoWindows patch to fix IB_INVALID_HANDLE name collision
Arlin Davis [Fri, 19 Feb 2010 22:52:01 +0000 (14:52 -0800)]
Windows patch to fix IB_INVALID_HANDLE name collision

signed-off-by: stan smith <stan.smith@intel.com>

14 years agoscm: dat_ep_connect fails on 32bit servers
Arlin Davis [Mon, 8 Feb 2010 21:49:35 +0000 (13:49 -0800)]
scm: dat_ep_connect fails on 32bit servers

memcpy for remote IA address uses incorrect sizeof for a pointer type.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoundefined symbol: dapls_print_cm_list
Arlin Davis [Fri, 5 Feb 2010 19:51:16 +0000 (11:51 -0800)]
undefined symbol: dapls_print_cm_list

call prototype should be dependent on DAPL_COUNTERS.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoCleanup CM object lock before freeing CM object memory
Arlin Davis [Fri, 5 Feb 2010 19:39:21 +0000 (11:39 -0800)]
Cleanup CM object lock before freeing CM object memory

Running windows application verifiier for uDAPL validation
for all 3 providers. Cleanup memory lock leaks found
by verifier.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agodestroy verbs completion channels created via ia_open or ep_create.
Arlin Davis [Thu, 4 Feb 2010 00:21:30 +0000 (16:21 -0800)]
destroy verbs completion channels created via ia_open or ep_create.

Completion channels are created with ia_open for CNO events and
with ep_create in cases where DAT allows EP(qp) to be created with
no EVD(cq) and IB doesn't. These completion channels need to be
destroyed at close along with a CQ for the EP without CQ case.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoUpdate Copyright file and include the 3 license files in distribution
Arlin Davis [Wed, 3 Feb 2010 19:06:45 +0000 (11:06 -0800)]
Update Copyright file and include the 3 license files in distribution

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoWhen copying private_data out of rdma_cm events, use the
Arlin Davis [Tue, 2 Feb 2010 22:43:03 +0000 (14:43 -0800)]
When copying private_data out of rdma_cm events, use the
reported private_data_len for the size, and not IB maximums.
This fixes a bug running over the librdmacm on windows, where
DAPL accessed invalid memory.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agodapl/cma: fix referencing freed address
Sean Hefty [Thu, 28 Jan 2010 18:19:20 +0000 (10:19 -0800)]
dapl/cma: fix referencing freed address

DAPL uses a pointer to reference the local and remote addresses
of an endpoint.  It expects that those addresses are located
in memory that is always accessible.  Typically, for the local
address, the pointer references the address stored with the DAPL
HCA device.  However, for the cma provider, it changes this pointer
to reference the address stored with the rdma_cm_id.

This causes a problem when that endpoint is connected on the
passive side of a connection.  When connect requests are given
to DAPL, a new rdma_cm_id is associated with the request.  The
DAPL code replaces the current rdma_cm_id associated with a
user's endpoint with the new rdma_cm_id.  The old rdma_cm_id is
then deleted.  But the endpoint's local address pointer still
references the address stored with the old rdma_cm_id.  The
result is that any reference to the address will access freed
memory.

Fix this by keeping the local address pointer always pointing
to the address associated with the DAPL HCA device.  This is about
the best that can be done given the DAPL interface design.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agodapl: move close device after async thread is done
Sean Hefty [Tue, 26 Jan 2010 23:13:03 +0000 (15:13 -0800)]
dapl: move close device after async thread is done

using it

Before calling ibv_close_device, wait for the asynchronous
processing thread to finish using the device.  This prevents
a use after free error.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agoRelease 2.0.26-1 dapl-2.0.26-1
Arlin Davis [Mon, 11 Jan 2010 17:03:10 +0000 (09:03 -0800)]
Release 2.0.26-1

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoopenib_common: add check for both gid and global routing in RTR
Arlin Davis [Tue, 22 Dec 2009 22:00:33 +0000 (14:00 -0800)]
openib_common: add check for both gid and global routing in RTR

check for valid gid pointer along with global route setting
during transition to RTR. Add more GID information to
debug print statement in qp modify call.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoopenib_common: remote memory read privilege set multi times
Arlin Davis [Fri, 4 Dec 2009 20:31:22 +0000 (12:31 -0800)]
openib_common: remote memory read privilege set multi times

duplicate setting of read privilege in dapls_convert_privileges

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm, scm: DAPL_GLOBAL_ROUTING enabled causes segv
Arlin Davis [Fri, 4 Dec 2009 20:25:30 +0000 (12:25 -0800)]
ucm, scm: DAPL_GLOBAL_ROUTING enabled causes segv

socket cm and ud cm providers support QP modify with is_global
set and GRH. New v2 providers didn't pass GID information
in modify_qp RTR call and incorrectly byte swapped the already
network order GID. Add debug print of GID during global modify.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoRelease 2.0.25-1 dapl-2.0.25-1
Arlin Davis [Wed, 25 Nov 2009 06:16:58 +0000 (22:16 -0800)]
Release 2.0.25-1

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agowinof scm: initialize opt for NODELAY setsockopt
Arlin Davis [Wed, 25 Nov 2009 06:15:46 +0000 (22:15 -0800)]
winof scm: initialize opt for NODELAY setsockopt

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoRelease 2.0.25
Arlin Davis [Tue, 24 Nov 2009 19:29:46 +0000 (11:29 -0800)]
Release 2.0.25

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agowinof cma: windows definition for EADDRNOTAVAIL missing
Arlin Davis [Tue, 24 Nov 2009 16:58:44 +0000 (08:58 -0800)]
winof cma: windows definition for EADDRNOTAVAIL missing

Signed-off-by: stan smith <stan.smith@intel.com>
14 years agoscm: client side setsockopt NODELAY fails if data arrives before setting
Arlin Davis [Tue, 24 Nov 2009 16:54:26 +0000 (08:54 -0800)]
scm: client side setsockopt NODELAY fails if data arrives before setting

Move setsockopt before connect to avoid race with data.
Seems to fail on windows. Not seen on linux.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocma: setup_listener Cannot assign requested address
Arlin Davis [Wed, 18 Nov 2009 17:52:40 +0000 (09:52 -0800)]
cma: setup_listener Cannot assign requested address

Colliding with RDS port of 18634. rdma_cm can return
either EADDRINUSE or EADDRNOTAVAIL if the bind fails.
Add check for either and return proper DAT_CONN_QUAL_IN_USE.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocommon: seg fault in dapl_evd_wait with multi-thread application using CNO's.
Arlin Davis [Wed, 18 Nov 2009 17:43:38 +0000 (09:43 -0800)]
common: seg fault in dapl_evd_wait with multi-thread application using CNO's.

If we are dealing with event streams besides a CQ event stream,
be conservative and set producer side locking.  Otherwise, no.
Check for CNO is missing, CNO is not considered CQ event stream.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm: inbound DREQ/DREP handshake should transition QP.
Arlin Davis [Wed, 18 Nov 2009 17:37:48 +0000 (09:37 -0800)]
ucm: inbound DREQ/DREP handshake should transition QP.

During release, when receiving a disconnect request from remote peer
instead of a disconnect call from the client, the QP didn't get properly
set in ERR state and didn't flush the queue during disconnect processing.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agowinof: Remove duplicate include of comp_channel.cpp from cm.c as it is included in...
Arlin Davis [Mon, 2 Nov 2009 16:24:53 +0000 (08:24 -0800)]
winof: Remove duplicate include of comp_channel.cpp from cm.c as it is included in opensm_ucb/device.c.

Signed-off-by: stan smith <stan.smith@intel.com>
14 years agoRelease 2.0.24 dapl-2.0.24-1
Arlin Davis [Fri, 30 Oct 2009 21:19:21 +0000 (13:19 -0800)]
Release 2.0.24

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agowinof: Utilize WinOF version of inet_ntop() for Windows OSes which do not support...
Arlin Davis [Fri, 30 Oct 2009 20:57:22 +0000 (12:57 -0800)]
winof: Utilize WinOF version of inet_ntop() for Windows OSes which do not support inet_ntop().

Signed-off-by: stan smith <stan.smith@intel.com>
14 years agoucm: windows build issue with new CQ completion channel
Arlin Davis [Fri, 30 Oct 2009 15:17:26 +0000 (07:17 -0800)]
ucm: windows build issue with new CQ completion channel

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agowinof: add ucm provider to windows build
Arlin Davis [Fri, 30 Oct 2009 14:35:33 +0000 (06:35 -0800)]
winof: add ucm provider to windows build

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agowinof: add missing build files for ibal, scm
Arlin Davis [Fri, 30 Oct 2009 14:32:56 +0000 (06:32 -0800)]
winof: add missing build files for ibal, scm

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoscm: connection peer resets under heavy load, incorrect event on error
Arlin Davis [Wed, 28 Oct 2009 17:52:50 +0000 (09:52 -0800)]
scm: connection peer resets under heavy load, incorrect event on error

Under heavy load, we get a peer reset from the remote stack. In this
case retry the socket connection for this QP setup.

Add debugging with PID's and socket ports to help isolate
these types of socket scaling issues.

Report correct UD event during error, check remote_ah creation.

Fix dapl_poll return codes for single event type only.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm: increase default reply and rtu timeout values.
Arlin Davis [Wed, 28 Oct 2009 17:47:37 +0000 (09:47 -0800)]
ucm: increase default reply and rtu timeout values.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm: change some debug message levels and add check for valid UD REPLY during retries.
Arlin Davis [Wed, 28 Oct 2009 15:48:20 +0000 (07:48 -0800)]
ucm: change some debug message levels and add check for valid UD REPLY during retries.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm: increase timers during subsequent retries
Arlin Davis [Tue, 27 Oct 2009 18:37:45 +0000 (10:37 -0800)]
ucm: increase timers during subsequent retries

check/process create_ah errors during connect phase
cleanup some debug messaging.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm, scm: address handles need destroyed when freeing Endpoints with UD QP's.
Arlin Davis [Mon, 19 Oct 2009 17:38:36 +0000 (10:38 -0700)]
ucm, scm: address handles need destroyed when freeing Endpoints with UD QP's.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoopenib_common: ignore pd free errors, clear pd_handle and return.
Arlin Davis [Fri, 16 Oct 2009 21:42:00 +0000 (14:42 -0700)]
openib_common: ignore pd free errors, clear pd_handle and return.

some older adapters have some issues
with pd free so just clear handle and return

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm: using UD type QP's, ucm reports wrong reject event when user rejects AH resoluti...
Arlin Davis [Fri, 16 Oct 2009 15:52:21 +0000 (08:52 -0700)]
ucm: using UD type QP's, ucm reports wrong reject event when user rejects AH resolution request.

During rejects, both usr and ucm internal, the qp_type does not get initialized
so the check for UD type QP messages fail on active side and the wrong
event gets generated. Initialize saddr.ib information before sending reject
back to active side.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm, scm, cma: Fix CNO support on DTO type EVD's
Arlin Davis [Fri, 16 Oct 2009 14:57:25 +0000 (07:57 -0700)]
ucm, scm, cma: Fix CNO support on DTO type EVD's

EVD wait_object should be used for CNO processing
and not the direct CQ event channels. Add proper
checking for DTO type EVD's with CNO at wait
and wakeup.

UCM missing support for collective EVD's under a
CNO. Add support to create common channel for
collective EVD's during device open. Add support
in cm_thread to check this channel. Also,
during disconnect, move QP to error to properly
flush queue instead of moving to reset and init.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm: fix lock init bug in ucm_cm_find
Arlin Davis [Thu, 15 Oct 2009 16:19:45 +0000 (09:19 -0700)]
ucm: fix lock init bug in ucm_cm_find

the lock should be setup as pointer to lock
not lock structure. Cleanup lock and list
in cm_find function and cm_print function.

Add debug aid by passing process id in
msg resv area. cleanup cr references
and change to cm for consistency.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm: fix build problem with latest windows ucm changes
Arlin Davis [Wed, 14 Oct 2009 17:03:47 +0000 (10:03 -0700)]
ucm: fix build problem with latest windows ucm changes

define dapls_thread_signal as inline

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoSigned-off-by: Sean Hefty <sean.hefty@intel.com>
Sean Hefty [Wed, 14 Oct 2009 16:34:22 +0000 (09:34 -0700)]
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agoThe HCA should not be closed until all resources have been released.
Sean Hefty [Wed, 14 Oct 2009 16:34:18 +0000 (09:34 -0700)]
The HCA should not be closed until all resources have been released.
This results in a hang on windows, since closing the device frees
the event processing thread.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agoFix build warning when compiling on 32-bit systems.
Sean Hefty [Wed, 14 Oct 2009 16:34:13 +0000 (09:34 -0700)]
Fix build warning when compiling on 32-bit systems.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agoTrying to deregister the same memory region twice leads to an
Sean Hefty [Wed, 14 Oct 2009 16:34:07 +0000 (09:34 -0700)]
Trying to deregister the same memory region twice leads to an
application crash on windows.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agodat: reduce debug message level when parsing for location of dat.conf
Arlin Davis [Wed, 14 Oct 2009 14:59:23 +0000 (07:59 -0700)]
dat: reduce debug message level when parsing for location of dat.conf

Don't output failover to default /etc/dat.conf from
sysconfdir at ERROR level. Reduce to DAT_OS_DBG_TYPE_SR.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm: update ucm provider for windows environment
Arlin Davis [Thu, 8 Oct 2009 23:23:22 +0000 (16:23 -0700)]
ucm: update ucm provider for windows environment

add dapls_thread_signal abstraction and a new
cm_thread function specific for windows.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm: add timer/retry CM logic to the ucm provider
Arlin Davis [Thu, 8 Oct 2009 23:02:52 +0000 (16:02 -0700)]
ucm: add timer/retry CM logic to the ucm provider

add reply, rtu and retry count options via
environment variables. Times in msecs.
DAPL_UCM_RETRY 10
DAPL_UCM_REP_TIME 400
DAPL_UCM_RTU_TIME 200

Add RTU_PENDING and DISC_RECV states

Add check timer code to the cm_thread
and the option to the select abstaction
to take timeout values in msecs.
DREQ, REQ, and REPLY will all be timed
and retried.

Split out reply code and disconnect_final
code to better facilitate retry timers.
Add checking for duplicate messages.

Added new UD extension events for errors.
DAT_IB_UD_CONNECTION_REJECT_EVENT
DAT_IB_UD_CONNECTION_ERROR_EVENT

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoRelease 2.0.23 dapl-2.0.23-1
Arlin Davis [Fri, 2 Oct 2009 21:49:52 +0000 (14:49 -0700)]
Release 2.0.23

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocma: cannot reuse the cm_id and qp for new connection, must reallocate a new one.
Arlin Davis [Fri, 2 Oct 2009 21:48:15 +0000 (14:48 -0700)]
cma: cannot reuse the cm_id and qp for new connection, must reallocate a new one.

When merging common code base the dapls_ib_reinit_ep mistakely
modified QP to reset then init for all providers. Will
not work for rdma_cm (cma provider) since the cm_id cannot
be reused.  Add build check for _OPENIB_CMA_ to pull in correct
free and reallocate method for reinit_ep.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoscm, cma: update DAPL cm protocol revision with latest address/port changes
Arlin Davis [Fri, 2 Oct 2009 20:50:12 +0000 (13:50 -0700)]
scm, cma: update DAPL cm protocol revision with latest address/port changes

CM protocol changed, roll revision to 6.
The socket cm could be competing with address space if
application is using sockets above to exchange information
like dapltest, and MPI consumers. Adjust port on listen
and connect to reduce the chance of port collision with
application above.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm: modify IB address format to align better with sockaddr_in6
Arlin Davis [Fri, 2 Oct 2009 19:47:37 +0000 (12:47 -0700)]
ucm: modify IB address format to align better with sockaddr_in6

Restructure the dcm_addr union to map the IB side
closer to sockaddr6 and initialize family to
AF_INET6 to insure callee allocates enough memory
for ucm dat_ia_address type. Put qpn in flowinfo
and gid in sin6_addr. Change the test suites
to print address information based on AF_INET
or AF_INET6 instead of using specific IB address
union from the provider.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoAdd definition for getpid similar to that used by the other dtest apps.
Sean Hefty [Wed, 30 Sep 2009 21:29:03 +0000 (14:29 -0700)]
Add definition for getpid similar to that used by the other dtest apps.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agoWinOF provides a common implementation of gettimeofday that should
Sean Hefty [Wed, 30 Sep 2009 21:28:57 +0000 (14:28 -0700)]
WinOF provides a common implementation of gettimeofday that should
be used instead.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agoThe completion manager was updated to provide an abstraction that
Sean Hefty [Wed, 30 Sep 2009 21:27:50 +0000 (14:27 -0700)]
The completion manager was updated to provide an abstraction that
better mimicked how fd's were used.  Update dapl to use this
abstraction, rather than the older completion manager api.

This helps minimize changes between linux and windows.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agodtestcm: remove IB verb definitions
Arlin Davis [Wed, 30 Sep 2009 21:26:47 +0000 (14:26 -0700)]
dtestcm: remove IB verb definitions

Remove gid and qp_type references from test app.
Print address infomation in sockaddr and
ucm provider format with qpn and lid.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agodtest, dtestx: remove IB verb definitions
Arlin Davis [Wed, 30 Sep 2009 17:44:14 +0000 (10:44 -0700)]
dtest, dtestx: remove IB verb definitions

remove gid and qp_type checking from test suite.
Print address infomation in sockaddr and
ucm provider format with qpn and lid.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoscm: tighten up socket options to insure similiar behavior on Windows and Linux.
Arlin Davis [Mon, 28 Sep 2009 17:59:36 +0000 (10:59 -0700)]
scm: tighten up socket options to insure similiar behavior on Windows and Linux.

Add IPPROTO_TCP to create socket. Specify device IP address
when binding instead of INADDR_ANY and remove setsocketopt
REUSEADDR on the listen socket to avoid any issues with
portability. Don't want duplicate port bindings.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocma: improve serialization of destroy and event processing
Arlin Davis [Mon, 28 Sep 2009 17:46:26 +0000 (10:46 -0700)]
cma: improve serialization of destroy and event processing

WinOF testing with slightly different scheduler and verbs
showed some issues with cleanup. Add better protection around
destroy and event processing thread.

Remove destroy flag and add refs counting to conn objects
to block destroy until all references are cleared. Add
locking aroung ref counting and passive and active
event processing.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoscm: improve serialization of destroy and state changes
Arlin Davis [Mon, 28 Sep 2009 17:42:52 +0000 (10:42 -0700)]
scm: improve serialization of destroy and state changes

WinOF testing with slightly different scheduler and verbs
showed some issues with cleanup. Add better protection around
destroy and move state change before socket send to insure
correct state in multi-thread environment targeting the same
device on send and recv.

Change DCM_RTU_PENDING to DCM_REP_PENDING and
and add static definition to local routines for better
readability.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocommon: no cleanup/release code for timer thread
Arlin Davis [Thu, 17 Sep 2009 15:56:06 +0000 (08:56 -0700)]
common: no cleanup/release code for timer thread

dapl_set_timer() creates a thread to process timers for dat_ep_connect
but provides no mechanism to destroy/exit during dapl library unload.
Timers are initialized in library init code and should be released
in the fini code. Add a dapl_timer_release call to the dapl_fini
function to check state of timer thread and destroy before exiting.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoscm, cma: dapli_thread doesn't always get teminated on library close.
Arlin Davis [Thu, 17 Sep 2009 15:53:29 +0000 (08:53 -0700)]
scm, cma: dapli_thread doesn't always get teminated on library close.

DAPL doesn't actually wait for the async processing thread to exit before
allowing the library to close.  It will wait up to 10 seconds, which under
heavy load isn't enough time.  Since the thread is created by an application
level thread, it will continue to run as long as the application runs.  But
if the application closes the library, then all library data and code is
invalid, which can result in the thread running something that's not
library code and accessing freed memory.

With this change, I was able to run mpi ping-pong, 16 ranks on a single
system (scm provider) without crashes 1300 times.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agoucm: tighten up locking with CM processing, state changes
Arlin Davis [Wed, 9 Sep 2009 20:10:35 +0000 (13:10 -0700)]
ucm: tighten up locking with CM processing, state changes

tighten up locking on CM processing and state changes
and reduce the send completion threshold to 50 from 100
to replenish the request message faster.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm: For UD type QP's, return CR p_data with CONN_EST event on passive side.
Arlin Davis [Wed, 9 Sep 2009 16:44:03 +0000 (09:44 -0700)]
ucm: For UD type QP's, return CR p_data with CONN_EST event on passive side.

Intel MPI uses the p_data provided with CONN_EST as a reference to the
UD pair and remote rank. The ucm provider was overwriting the CR p_data
with the ACCEPT p_data. Change to save CR p_data but also provide
storage for user provided ACCEPT p_data in case the REPLY is lost
and needs retransmitted.

p_data size was provided to event processing in network order
instead of host order.

For new QP's create new address handles and do not use
existing AH's created for the CM. Different PD's are
associated with each.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm: cleanup extra cr/lf
Arlin Davis [Tue, 8 Sep 2009 16:14:46 +0000 (09:14 -0700)]
ucm: cleanup extra cr/lf

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoucm: fix issues with UD QP's.
Arlin Davis [Tue, 8 Sep 2009 16:11:37 +0000 (09:11 -0700)]
ucm: fix issues with UD QP's.

private data size not in host order when processing
connection events.

ud extentions event should include original ia_addr
and qpn used during connection and not the IB qpn.

ucm QP service resource cleanup in wrong order.

cleanup extra cr/lf device.c

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agowinof: Convert windows version of dapl and dat libaries to use private heaps.
Arlin Davis [Thu, 3 Sep 2009 17:45:56 +0000 (10:45 -0700)]
winof: Convert windows version of dapl and dat libaries to use private heaps.

This allows for better support of memory registration caching by upper
level libaries (MPI) that use SecureMemoryCacheCallback.

It also makes it easier to debug heap corruption issues.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agodtest, dtestx: modifications for UD QP testing with ucm provider.
Arlin Davis [Wed, 2 Sep 2009 21:01:51 +0000 (14:01 -0700)]
dtest, dtestx: modifications for UD QP testing with ucm provider.

remote_addr is wrong for IP remote address.

The dtestx requires the server connect back to the client
for the UD test. With the ucm provider you need to provide
the QPN and the LID which you cannot get until the dtest
client starts. So, for now, don't support UD testing
on UCM providers.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoscm, ucm: UD QP support was broken when porting to common openib code base.
Arlin Davis [Wed, 2 Sep 2009 20:54:59 +0000 (13:54 -0700)]
scm, ucm: UD QP support was broken when porting to common openib code base.

create remote_ah was moved out of modify_qp_state function but not
included in the RTU and ACCEPT code for UD QP's. qp type check
should be on daddr not saddr in ucm cm code.

QP number must be converted to host order before supplying remote_ah,
and qp number to consumer.

Modify QP state to RTR for UD QP mask setting incorrect.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocma: cleanup warning with unused local variable, ret, in disconnect
Arlin Davis [Tue, 1 Sep 2009 20:02:24 +0000 (13:02 -0700)]
cma: cleanup warning with unused local variable, ret, in disconnect

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocma: remove debug message after rdma_disconnect failure
Arlin Davis [Tue, 1 Sep 2009 19:36:31 +0000 (12:36 -0700)]
cma: remove debug message after rdma_disconnect failure

DAPL automatically calls rdma_disconnect() when a disconnect request is
received.  If the user also calls disconnect, that calls rdma_disconnect() as
well, but the connection has already been disconnected by DAPL and is no longer
valid.  The result is that the user's call to rdma_disconnect() will fail.  Do
not display an error message if this occurs.

Locking could be added to prevent calling rdma_disconnect() multiple times, but
since the librdmacm provides synchronization to trap this, we might as well take
advantage of it.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agoscm: socket errno check needs O/S dependent wrapper
Arlin Davis [Tue, 1 Sep 2009 19:27:43 +0000 (12:27 -0700)]
scm: socket errno check needs O/S dependent wrapper

Intel MPI checks the uDAPL error code when calling dat_psp_create() to see if
the port number that it provides is in use or not.  Convert winsock error codes
to unix errno values.

This fixes the following error reported by Intel MPI:
'DAPL provider is not found and fallback device is not enabled'

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agodapltest: update script files for WinOF
Arlin Davis [Tue, 1 Sep 2009 19:13:16 +0000 (12:13 -0700)]
dapltest: update script files for WinOF

Cleanup 64-bit paths now that WinOF is always installed into '\Program Files\WinOF'.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocma: conditional check for new rdma_cm definition.
Arlin Davis [Tue, 1 Sep 2009 19:10:21 +0000 (12:10 -0700)]
cma: conditional check for new rdma_cm definition.

RDMA_CM_EVENT_TIMEWAIT_EXIT is new to OFED 1.4
add conditional check so dapl can build and run
against older OFED 1.3 stacks

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoRelease 2.0.22 dapl-2.0.22
Arlin Davis [Thu, 20 Aug 2009 16:13:43 +0000 (09:13 -0700)]
Release 2.0.22

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agodapltest: add mdep processor yield and use with dapltest
Arlin Davis [Thu, 20 Aug 2009 16:12:47 +0000 (09:12 -0700)]
dapltest: add mdep processor yield and use with dapltest

Be thread scheduler friendly and release the current thread thus allowing other threads to run.

Signed off by Stan Smith stan.smith@intel.com

14 years agoucm: Add new provider using a DAPL based IB-UD cm mechanism for MPI implementations.
Arlin Davis [Tue, 18 Aug 2009 17:15:15 +0000 (10:15 -0700)]
ucm: Add new provider using a DAPL based IB-UD cm mechanism for MPI implementations.

New provider uses it's own CM protocol on top of IB-UD queue pairs.
During device open, this provider creates a UD queue pair and
returns local address information via dat_ia_query. This 24 byte
opaque address must be exchange out-of-band before connecting to a
server via dat_ep_connect. This provider is targeted for MPI
implementations that already exchange address information
during mpi_init phase.

Future release may provide some ARP mechanism via multicast.

dtest, dtestx, and dtestcm was modified to report the lid and qpn
information on the server side so you can provide appropriate
destination address information for the client test suite.

dapltest will not work with this provider.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoRelease 2.0.21 OFED-1.5-beta WinOF-2.1 dapl-2.0.21-1
Arlin Davis [Wed, 5 Aug 2009 03:54:12 +0000 (20:54 -0700)]
Release 2.0.21

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoscm: Fix disconnect. QP's need to move to ERROR state in
Arlin Davis [Wed, 5 Aug 2009 03:49:09 +0000 (20:49 -0700)]
scm: Fix disconnect. QP's need to move to ERROR state in
order to flush work requests and notify consumer. Moving to
RESET removed all requests but did not notify consumer.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agomodify dtest.c to cleanup CNO wait code and consolidate into
Arlin Davis [Wed, 5 Aug 2009 03:48:03 +0000 (20:48 -0700)]
modify dtest.c to cleanup CNO wait code and consolidate into
collect_event() call. After waking up from CNO wait the
consumer must check all EVD's. The EVD's under the CNO
could be dropped if already triggered or could come in any order.
DT_RetToString changed to DT_RetToStr and DT_EventToSTr
changed to DT_EventToStr for consistency.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoCNO events, once triggered will not be returned during the cno wait.
Arlin Davis [Wed, 5 Aug 2009 03:47:17 +0000 (20:47 -0700)]
CNO events, once triggered will not be returned during the cno wait.
Check for triggered state before going to sleep in cno_wait. Reset
triggered EVD reference after reporting.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoCNO support broken in both CMA and SCM providers.
Arlin Davis [Sun, 2 Aug 2009 21:21:09 +0000 (14:21 -0700)]
CNO support broken in both CMA and SCM providers.

CQ thread/callback mechanism was removed by mistake. Still
need indirect DTO callbacks when CNO is attached to EVD's.

Add CQ event channel to cma provider's thread and add
to select for rdma_cm and async channels.

For scm provider there is not easy way to add this channel
to the select across sockets on windows. So, for portablity
reasons 2 thread is started to process the ASYNC and
CQ channels for events.

Must disable EVD (evd_endabled=FALSE) during destroy
to prevent EVD events firing for CNOs and re-arming CQ while
CQ is being destroyed.

Change dtest to check EVD after CNO timesout.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocommon osd: include winsock2.h for IPv6 definitions.
Arlin Davis [Thu, 30 Jul 2009 15:02:30 +0000 (08:02 -0700)]
common osd: include winsock2.h for IPv6 definitions.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocommon osd: include w2tcpip.h for sockaddr_in6 definitions.
Arlin Davis [Wed, 29 Jul 2009 15:02:15 +0000 (08:02 -0700)]
common osd: include w2tcpip.h for sockaddr_in6 definitions.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoDAPL introduced the concept of directly waiting on the CQ for
Sean Hefty [Mon, 27 Jul 2009 22:07:33 +0000 (15:07 -0700)]
DAPL introduced the concept of directly waiting on the CQ for
events by adding a compile time flag and special handling in the common
code.  Rather than using the compile time flag and modifying the
common code, let the provider implement the best way to wait for
CQ events.

This simplifies the code and allows the common openib providers to
optimize for Linux and Windows platforms independently, rather than
assuming a specific implementation for signaling events.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agodapltest: Implement a malloc() threshold for the completion reaping.
Arlin Davis [Thu, 16 Jul 2009 19:41:22 +0000 (12:41 -0700)]
dapltest: Implement a malloc() threshold for the completion reaping.

change byte vector allocation to stack in functions:
  DT_handle_send_op, DT_handle_rdma_op & DT_handle_recv_op.

When allocation size is under the threshold, use a stack local
allocation instead of malloc/free.  Move redundant bzero() to
be called only in the case of using local stack allocation as
DT_Mdep_malloc() already does a bzero(). Consolidate error handling
return and free()check to a single point by using goto.

Signed-off-by: Stan Smith <stan.smith@intel.com>
15 years agoscm: handle connected state when freeing CM objects
Arlin Davis [Thu, 16 Jul 2009 19:32:09 +0000 (12:32 -0700)]
scm: handle connected state when freeing CM objects

The QP could be freed before being disconnected
so the provider needs process disconnect before freeing
the CM object. The disconnect clean will finish
the destroy process during the disc callback.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm, dtest: changes for winof gettimeofday and FD_SETSIZE settings.
Arlin Davis [Wed, 8 Jul 2009 19:49:43 +0000 (12:49 -0700)]
scm, dtest: changes for winof gettimeofday and FD_SETSIZE settings.

scm changes to set FD_SETSIZE with expected value and
prevent windows override.

dtest: remove gettimeofday implementation for windows
specific implemenation etc\user\gtod.c

general EOL cleanup

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: set TCP_NODELAY sockopt on the server side for sends.
Arlin Davis [Mon, 6 Jul 2009 16:24:07 +0000 (09:24 -0700)]
scm: set TCP_NODELAY sockopt on the server side for sends.

scm provider sends small messages from both server and client
sides. Set NODELAY on both sides to avoid send delays either
way.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agowindows: remove obsolete files in dapl/udapl source tree
Arlin Davis [Thu, 2 Jul 2009 21:16:52 +0000 (14:16 -0700)]
windows: remove obsolete files in dapl/udapl source tree

SOURCES,makefile,udapl.r,udapl_exports.src,udapl_sources.c

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agodtestcm: add UD type QP option to test
Arlin Davis [Thu, 2 Jul 2009 21:11:20 +0000 (14:11 -0700)]
dtestcm: add UD type QP option to test

Add -u for UD type QP's during connection setup.
Will setup UD QPs and provide remote AH
in connect establishment event. Measures
setup/exchange rates.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: destroy QP called before disconnect
Arlin Davis [Thu, 2 Jul 2009 21:07:36 +0000 (14:07 -0700)]
scm: destroy QP called before disconnect

Handle the case where QP is destroyed before
disconnect processing. Windows supports
reinit_qp during a disconnect call by
destroying the QP and recreating the
QO instead of state change from reset
to init. Call disconnect in destroy
CM code to handle this unexpected state.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agocma: add support for rdma_cm TIME_WAIT event.
Arlin Davis [Thu, 2 Jul 2009 21:03:12 +0000 (14:03 -0700)]
cma: add support for rdma_cm TIME_WAIT event.

Nothing to process, simply ack the event.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: remove old udapl_scm code replaced by openib_scm.
Arlin Davis [Wed, 1 Jul 2009 14:58:32 +0000 (07:58 -0700)]
scm: remove old udapl_scm code replaced by openib_scm.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agowinof: fix build issues after consolidating cma, scm code base.
Arlin Davis [Wed, 1 Jul 2009 14:53:18 +0000 (07:53 -0700)]
winof: fix build issues after consolidating cma, scm code base.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agocma: lock held when exiting as a result of a rdma_create_event_channel failure.
Arlin Davis [Wed, 1 Jul 2009 14:51:59 +0000 (07:51 -0700)]
cma: lock held when exiting as a result of a rdma_create_event_channel failure.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>