Arlin Davis [Thu, 17 Jun 2010 19:40:21 +0000 (12:40 -0700)]
scm, ucm: add pkey, pkey_index, sl override for QP's
On a per open basis, add environment variables
DAPL_IB_SL and DAPL_IB_PKEY and use on
connection setup (QP modify) to override default
values of 0 for SL and PKEY index. If pkey is
provided then find the pkey index with
ibv_query_pkey for dev_attr.max_pkeys.
Will be used for RC and UD type QP's.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 2 Jun 2010 17:05:03 +0000 (10:05 -0700)]
ucm: incorrectly freeing port on passive side after reject
cm_release was incorrectly freeing a client port
assuming it was the server listening port. Move
the listening port cleanup to remove_conn_listner
and only cleanup client ports in cm_release.
Arlin Davis [Wed, 19 May 2010 22:17:58 +0000 (15:17 -0700)]
dapltest: server info devicename is not large enough for dapl_name storage
Server info device name is a 80 char array but the dapl device name
that is copied is 256 bytes. Increase started_server.devicename definition.
Chalk one up for windows SDK OACR (auto code review).
Arlin Davis [Mon, 17 May 2010 23:15:21 +0000 (16:15 -0700)]
scm, cma: fini code can be called multiple times and hang via fork
The providers should protect against forked child exits and
not cleanup until the parent init actually exits. Otherwise,
the child will hang trying to cleanup dapl thread. Modify to
check process id for proper init to fini cleanup and limit
cleanup to parent only.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 13 May 2010 17:31:17 +0000 (10:31 -0700)]
scm: SOCKOPT ERR Connection timed out on large clusters
Large scale all to all connections on +1000 cores
the listen backlog is reached and SYN's are dropped
which causes the connect to timeout. Retry connect
on timeout errors.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 10 May 2010 19:46:17 +0000 (12:46 -0700)]
ucm: UD mode, active side cm object released to soon, the RTU could be lost.
Will see following message with DAPL_DBG_TYPE set for Errors & Warnings (0x3):
ucm_recv: NO MATCH op REP 0x120 65487 i0x60005e c0x60005e < 0xd2 19824 0x60006a
The cm object was released on the active side after the connection
was established, RTU sent. This is a problem if the RTU is lost
and the remote side retries the REPLY. The RTU is never resent.
Keep the cm object until the EP is destroyed.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 10 May 2010 19:35:51 +0000 (12:35 -0700)]
cma, ucm: cleanup issues with dat_ep_free on a connected EP without disconnecting.
During EP free, disconnecting with ABRUPT close flag, the disconnect should wait
for the DISC event to fire to allow the CM to be properly destroyed upon return.
The cma must also release the lock when calling the blocking rdma_destroy_id given
the callback thread could attempt to acquire the lock for reference counting.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 27 Apr 2010 18:20:08 +0000 (11:20 -0700)]
scm: remove modify QP to ERR state during disconnect on UD type QP
The disconnect on a UD type QP should not modify QP to error
since this is a shared QP. The disconnect should be treated
as a NOP on the UD type QP and only be transitioned during
the QP destroy (dat_ep_free).
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 8 Apr 2010 16:38:57 +0000 (09:38 -0700)]
common: EP links to EVD, PZ incorrectly released before provider CM objects freed.
unlink/clear references after ALL CM objects linked to EP are freed.
Otherwise, event processing via CM objects could reference the handles
still linked to EP. After CM objects are freed (blocking) these handles
linked to EP are guaranteed not to refereence from underlying provider.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 7 Apr 2010 18:12:21 +0000 (11:12 -0700)]
common: remove unnecessary lmr lkey hashing and duplicate lkey checking
lmr lkey hashing is too restrictive given the returned lkey could be
the same value for different regions on some rdma devices. Actually,
this checking is really unecesssary and requires considerable overhead
for hashing so just remove hashing of lmr lkey's. Let verbs device
level do the checking and validation.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 16 Mar 2010 22:47:58 +0000 (14:47 -0800)]
ucm: fix issues with new EP to CM linking changes
Add EP locking around QP modify
Remove release during disconnect event processing
Add check in cm_free to check state and schedule thread if necessary.
Add some additional debugging
Add processing in disconnect_clean for conn_req timeout
Remove extra CR's
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 16 Mar 2010 17:44:44 +0000 (09:44 -0800)]
scm: new cm_ep linking broke UD mode over socket cm
Add EP locking around modify_qp for EP state.
Add new dapli_ep_check for debugging EP
Cleanup extra CR's
Change socket errno to dapl_socket_errno() abstraction
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 16 Mar 2010 17:12:11 +0000 (09:12 -0800)]
common: dat_ep_connect should not set timer UD endpoints
connect for UD type is simply AH resolution and doesn't
need timed. The common code is not designed to handle
multiple timed events on connect requests so just ignore
timing UD AH requests.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 8 Mar 2010 20:53:45 +0000 (12:53 -0800)]
ibal: changes for EP to CM linking and synchronization.
Windows IBAL changes to allocate and manage CM objects
and to link them to the EP. This will insure the CM
IBAL objects and cm_id's are not destroy before EP.
Remove windows only ibal_cm_handle in EP structure.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 24 Feb 2010 18:03:57 +0000 (10:03 -0800)]
common: add CM-EP linking to support mutiple CM's and proper protection during destruction
Add linking for CM to EP, including reference counting, to insure syncronization
during creation and destruction. A cm_list_head has been added to the EP object to
support multiple CM objects (UD) per EP. If the CM object is linked to an EP it
cannot be destroyed.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 4 Feb 2010 00:21:30 +0000 (16:21 -0800)]
destroy verbs completion channels created via ia_open or ep_create.
Completion channels are created with ia_open for CNO events and
with ep_create in cases where DAT allows EP(qp) to be created with
no EVD(cq) and IB doesn't. These completion channels need to be
destroyed at close along with a CQ for the EP without CQ case.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 2 Feb 2010 22:43:03 +0000 (14:43 -0800)]
When copying private_data out of rdma_cm events, use the
reported private_data_len for the size, and not IB maximums.
This fixes a bug running over the librdmacm on windows, where
DAPL accessed invalid memory.
Signed-off-by: Sean Hefty <sean.hefty@intel.com> Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Sean Hefty [Thu, 28 Jan 2010 18:19:20 +0000 (10:19 -0800)]
dapl/cma: fix referencing freed address
DAPL uses a pointer to reference the local and remote addresses
of an endpoint. It expects that those addresses are located
in memory that is always accessible. Typically, for the local
address, the pointer references the address stored with the DAPL
HCA device. However, for the cma provider, it changes this pointer
to reference the address stored with the rdma_cm_id.
This causes a problem when that endpoint is connected on the
passive side of a connection. When connect requests are given
to DAPL, a new rdma_cm_id is associated with the request. The
DAPL code replaces the current rdma_cm_id associated with a
user's endpoint with the new rdma_cm_id. The old rdma_cm_id is
then deleted. But the endpoint's local address pointer still
references the address stored with the old rdma_cm_id. The
result is that any reference to the address will access freed
memory.
Fix this by keeping the local address pointer always pointing
to the address associated with the DAPL HCA device. This is about
the best that can be done given the DAPL interface design.
Arlin Davis [Tue, 22 Dec 2009 22:00:33 +0000 (14:00 -0800)]
openib_common: add check for both gid and global routing in RTR
check for valid gid pointer along with global route setting
during transition to RTR. Add more GID information to
debug print statement in qp modify call.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 4 Dec 2009 20:25:30 +0000 (12:25 -0800)]
ucm, scm: DAPL_GLOBAL_ROUTING enabled causes segv
socket cm and ud cm providers support QP modify with is_global
set and GRH. New v2 providers didn't pass GID information
in modify_qp RTR call and incorrectly byte swapped the already
network order GID. Add debug print of GID during global modify.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Colliding with RDS port of 18634. rdma_cm can return
either EADDRINUSE or EADDRNOTAVAIL if the bind fails.
Add check for either and return proper DAT_CONN_QUAL_IN_USE.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 18 Nov 2009 17:43:38 +0000 (09:43 -0800)]
common: seg fault in dapl_evd_wait with multi-thread application using CNO's.
If we are dealing with event streams besides a CQ event stream,
be conservative and set producer side locking. Otherwise, no.
Check for CNO is missing, CNO is not considered CQ event stream.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 18 Nov 2009 17:37:48 +0000 (09:37 -0800)]
ucm: inbound DREQ/DREP handshake should transition QP.
During release, when receiving a disconnect request from remote peer
instead of a disconnect call from the client, the QP didn't get properly
set in ERR state and didn't flush the queue during disconnect processing.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 16 Oct 2009 15:52:21 +0000 (08:52 -0700)]
ucm: using UD type QP's, ucm reports wrong reject event when user rejects AH resolution request.
During rejects, both usr and ucm internal, the qp_type does not get initialized
so the check for UD type QP messages fail on active side and the wrong
event gets generated. Initialize saddr.ib information before sending reject
back to active side.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 16 Oct 2009 14:57:25 +0000 (07:57 -0700)]
ucm, scm, cma: Fix CNO support on DTO type EVD's
EVD wait_object should be used for CNO processing
and not the direct CQ event channels. Add proper
checking for DTO type EVD's with CNO at wait
and wakeup.
UCM missing support for collective EVD's under a
CNO. Add support to create common channel for
collective EVD's during device open. Add support
in cm_thread to check this channel. Also,
during disconnect, move QP to error to properly
flush queue instead of moving to reset and init.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Sean Hefty [Wed, 14 Oct 2009 16:34:18 +0000 (09:34 -0700)]
The HCA should not be closed until all resources have been released.
This results in a hang on windows, since closing the device frees
the event processing thread.
Arlin Davis [Thu, 8 Oct 2009 23:02:52 +0000 (16:02 -0700)]
ucm: add timer/retry CM logic to the ucm provider
add reply, rtu and retry count options via
environment variables. Times in msecs.
DAPL_UCM_RETRY 10
DAPL_UCM_REP_TIME 400
DAPL_UCM_RTU_TIME 200
Add RTU_PENDING and DISC_RECV states
Add check timer code to the cm_thread
and the option to the select abstaction
to take timeout values in msecs.
DREQ, REQ, and REPLY will all be timed
and retried.
Split out reply code and disconnect_final
code to better facilitate retry timers.
Add checking for duplicate messages.
Added new UD extension events for errors.
DAT_IB_UD_CONNECTION_REJECT_EVENT
DAT_IB_UD_CONNECTION_ERROR_EVENT
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 2 Oct 2009 21:48:15 +0000 (14:48 -0700)]
cma: cannot reuse the cm_id and qp for new connection, must reallocate a new one.
When merging common code base the dapls_ib_reinit_ep mistakely
modified QP to reset then init for all providers. Will
not work for rdma_cm (cma provider) since the cm_id cannot
be reused. Add build check for _OPENIB_CMA_ to pull in correct
free and reallocate method for reinit_ep.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 2 Oct 2009 20:50:12 +0000 (13:50 -0700)]
scm, cma: update DAPL cm protocol revision with latest address/port changes
CM protocol changed, roll revision to 6.
The socket cm could be competing with address space if
application is using sockets above to exchange information
like dapltest, and MPI consumers. Adjust port on listen
and connect to reduce the chance of port collision with
application above.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 2 Oct 2009 19:47:37 +0000 (12:47 -0700)]
ucm: modify IB address format to align better with sockaddr_in6
Restructure the dcm_addr union to map the IB side
closer to sockaddr6 and initialize family to
AF_INET6 to insure callee allocates enough memory
for ucm dat_ia_address type. Put qpn in flowinfo
and gid in sin6_addr. Change the test suites
to print address information based on AF_INET
or AF_INET6 instead of using specific IB address
union from the provider.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Sean Hefty [Wed, 30 Sep 2009 21:27:50 +0000 (14:27 -0700)]
The completion manager was updated to provide an abstraction that
better mimicked how fd's were used. Update dapl to use this
abstraction, rather than the older completion manager api.
This helps minimize changes between linux and windows.
Arlin Davis [Mon, 28 Sep 2009 17:59:36 +0000 (10:59 -0700)]
scm: tighten up socket options to insure similiar behavior on Windows and Linux.
Add IPPROTO_TCP to create socket. Specify device IP address
when binding instead of INADDR_ANY and remove setsocketopt
REUSEADDR on the listen socket to avoid any issues with
portability. Don't want duplicate port bindings.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 28 Sep 2009 17:46:26 +0000 (10:46 -0700)]
cma: improve serialization of destroy and event processing
WinOF testing with slightly different scheduler and verbs
showed some issues with cleanup. Add better protection around
destroy and event processing thread.
Remove destroy flag and add refs counting to conn objects
to block destroy until all references are cleared. Add
locking aroung ref counting and passive and active
event processing.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 28 Sep 2009 17:42:52 +0000 (10:42 -0700)]
scm: improve serialization of destroy and state changes
WinOF testing with slightly different scheduler and verbs
showed some issues with cleanup. Add better protection around
destroy and move state change before socket send to insure
correct state in multi-thread environment targeting the same
device on send and recv.
Change DCM_RTU_PENDING to DCM_REP_PENDING and
and add static definition to local routines for better
readability.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 17 Sep 2009 15:56:06 +0000 (08:56 -0700)]
common: no cleanup/release code for timer thread
dapl_set_timer() creates a thread to process timers for dat_ep_connect
but provides no mechanism to destroy/exit during dapl library unload.
Timers are initialized in library init code and should be released
in the fini code. Add a dapl_timer_release call to the dapl_fini
function to check state of timer thread and destroy before exiting.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 17 Sep 2009 15:53:29 +0000 (08:53 -0700)]
scm, cma: dapli_thread doesn't always get teminated on library close.
DAPL doesn't actually wait for the async processing thread to exit before
allowing the library to close. It will wait up to 10 seconds, which under
heavy load isn't enough time. Since the thread is created by an application
level thread, it will continue to run as long as the application runs. But
if the application closes the library, then all library data and code is
invalid, which can result in the thread running something that's not
library code and accessing freed memory.
With this change, I was able to run mpi ping-pong, 16 ranks on a single
system (scm provider) without crashes 1300 times.