Arlin Davis [Fri, 26 Jun 2009 21:45:34 +0000 (14:45 -0700)]
scm: fixes and optimizations for connection scaling
Prioritize accepts on listen ports via FD_READ
process the accepts ahead of other work to avoid
socket half_connection (SYN_RECV) stalls.
Fix dapl_poll to return DAPL_FD_ERROR on
all event error types.
Add new state for socket released, but CR
not yet destroyed. This enables scm to release
the socket resources immediately after exchanging
all QP information. Also, add state to str call.
Only add the CR reference to the EP if it is
RC type. UD has multiple CR's per EP so when
a UD EP disconnect_clean was called, from a
timeout, it destroyed the wrong CR.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Sat, 20 Jun 2009 03:52:51 +0000 (20:52 -0700)]
common,scm: add debug capabilities to print in-process CM lists
Add a new debug bit DAPL_DBG_TYPE_CM_LIST.
If set, the pending CM requests will be
dumped when dat_print_counters is called.
Only provided when built with -DDAPL_COUNTERS
Add new dapl_cm_state_str() call for state
to string conversion for debug prints.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 10 Jun 2009 16:09:56 +0000 (09:09 -0700)]
scm: cleanup orphaned UD CR's when destroying the EP
UD CR objects are kept active because of direct private data references
from CONN events. The cr->socket is closed and marked inactive but the
object remains allocated and queued on the CR resource list. There can
be multiple CR's associated with a given EP and there is no way to
determine when consumer is finished with event until the dat_ep_free.
Schedule destruction for all CR's associated with this EP during
free call. cr_thread will complete cleanup with state of SCM_DESTROY.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 10 Jun 2009 17:06:59 +0000 (10:06 -0700)]
scm: update CM code to shutdown before closing socket
data could be lost without calling shutdown on the socket
before closing. Update to shutdown and then close. Add
definition for SHUT_RW to SD_BOTH for windows.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
---
Sean Hefty [Thu, 4 Jun 2009 15:19:12 +0000 (08:19 -0700)]
dapl/windows cma provider: add support for network devices based on index
The linux cma provider provides support for named network devices, such
as 'ib0' or 'eth0'. This allows the same dapl configuration file to
be used easily across a cluster.
To allow similar support on Windows, allow users to specify the device
name 'rdma_devN' in the dapl.conf file. The given index, N, is map to a
corresponding IP address that is associated with an RDMA device.
Arlin Davis [Mon, 18 May 2009 16:06:19 +0000 (09:06 -0700)]
windows: add build files for openib_scm, remove /Wp64 build option.
Add build files for windows socket cm and change build
option on windows providers. The new Win7 WDK issues a
depreciated compiler option warning for /Wp64
(Enable 64-bit porting warnings)
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 18 May 2009 15:50:35 +0000 (08:50 -0700)]
scm: multi-hca CM processing broken. Need cr thread wakeup mechanism per HCA.
Currently there is only one pipe across all
device opens. This results in some posted CR work
getting delayed or not processed at all. Provide
pipe for each device open and cr thread created
and manage on a per device level.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 15 May 2009 16:48:38 +0000 (09:48 -0700)]
linux_osd: use pthread_self instead of getpid for debug messages
getpid provides process ids which are not unique. Use unique thread
id's in debug messages to help isolate issues across many device
opens with multiple CM threads.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 29 Apr 2009 15:39:37 +0000 (08:39 -0700)]
dtest: add flush EVD call after data transfer errors
Flush and print entries on async, request, and receive
queues after any data transfer error. Will help
identify failing operation during operations
without completion events requested.
Fix -B0 so burst size of 0 works.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 16 Apr 2009 21:35:18 +0000 (14:35 -0700)]
dapltest: reset server listen ports to avoid collisions during long runs
If server is running continuously the port number increments
from base without reseting between tests. This will
eventually cause collisions in port space.
Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
Sean Hefty [Thu, 16 Apr 2009 17:21:51 +0000 (10:21 -0700)]
To avoid duplicating port numbers between different tests, the next port
number to use must increment based on the number of endpoints per thread *
the number of threads.
Sean Hefty [Thu, 16 Apr 2009 17:21:45 +0000 (10:21 -0700)]
dapltest assumes that events across multiple endpoints occur in a specific
order. Since this is a false assumption, avoid this by directing events to
per endpoint EVDs, rather than using shared EVDs.
Sean Hefty [Thu, 16 Apr 2009 17:21:41 +0000 (10:21 -0700)]
Synchronization is missing between removing items from an EVD and queuing
them. Since the removal thread is the user's, but the queuing thread is
not, the synchronization must be provided by DAPL. Hold the evd lock
around any calls to dapls_rbuf_*.
Sean Hefty [Thu, 16 Apr 2009 17:21:26 +0000 (10:21 -0700)]
Communication to the CR thread is done using an internal socket. When a
new connection request is ready for processing, an object is placed on
the CR list, and data is written to the internal socket. The write causes
the CR thread to wake-up and process anything on its cr list.
If multiple objects are placed on the CR list around the same time, then
the CR thread will read in a single character, but process the entire list.
This results in additional data being left on the internal socket. When
the CR does a select(), it will find more data to read, read the data, but
not have any real work to do. The result is that the thread spins in a
loop checking for changes when none have occurred until all data on the
internal socket has been read.
Avoid this overhead by reading all data off the internal socket before
processing the CR list.
Sean Hefty [Thu, 16 Apr 2009 17:21:13 +0000 (10:21 -0700)]
The dapl connect call takes as input an address (sockaddr) and a port number
as separate input parameters. It modifies the sockaddr address to set the
port number before trying to connect. This leads to a situation in
dapltest with multiple threads that reference the same buffer for their
address, but specify different port numbers, where the different threads
end up trying to connect to the same remote port.
To solve this, do not modify the caller's address buffer and instead use
a local buffer. This fixes an issue seen running multithreaded tests with
dapltest.
Sean Hefty [Fri, 10 Apr 2009 15:17:32 +0000 (08:17 -0700)]
The connection request thread adds sockets to a select list unless
the cr->socket is invalid and the cr request state is set to destroy. If the
cr->socket is invalid, but the cr->state is not destroy, then the cr->socket
is added to an FD set for select/poll. This results in select/poll
returning an error when select is called. As a result, the cr thread never
actually blocks during this state.
Fix this by only destroying a cr based on its state being set to destroy
and skip adding cr->sockets to the FD set when they are invalid.
Sean Hefty [Fri, 10 Apr 2009 15:08:03 +0000 (08:08 -0700)]
The IBAL library allocates a small number of threads for callbacks to the
user. If the user blocks all of the callback threads, no additional
callbacks can be invoked. The DAPL IBAL provider cancels listen requests
from within an IBAL callback, then waits for a second callback to confirm
that the listen has been canceled. If there is a single IBAL callback
thread, or multiple listens are canceled simultaneously, then the provider
can deadlock waiting for a cancel callback that never occurs.
This problem is seen when running dapltest with multiple threads.
Sean Hefty [Fri, 10 Apr 2009 15:07:57 +0000 (08:07 -0700)]
We need to check the return value from select for errors before checking
the FD sets. An item may be in an FD set but select could have returned
an error.
Sean Hefty [Fri, 10 Apr 2009 15:07:49 +0000 (08:07 -0700)]
Enable building with CQ_WAIT_OBJECTS support to directly wait on CQ
completion channels in the Windows version of the openib_scm provider.
Also minor fixup to use DAPL_DBG_TYPE_UTIL for debug log messages
instead of DAPL_DBG_TYPE_CM.
Sean Hefty [Fri, 10 Apr 2009 15:07:44 +0000 (08:07 -0700)]
The IBAL-SCM provider will run into an inifinite loop if the check for
cr->socket > SCM_MAX_CONN - 1 fails. The code continues back to the start
of the while loop without moving to the next connection request entry
in the list.
Sean Hefty [Fri, 10 Apr 2009 15:07:40 +0000 (08:07 -0700)]
next_cr is set just before and inside the check
if ((cr->socket == DAPL_INVALID_SOCKET && cr->state == SCM_DESTROY)
Remove setting it inside the if statement.
Sean Hefty [Fri, 10 Apr 2009 15:07:32 +0000 (08:07 -0700)]
The WinOF HCA driver cannot handle transitioning from RTS -> RESET ->
INIT -> ERROR. Simply delete the QP and re-create it to reinitialize
the endpoint until the bug is fixed.
Sean Hefty [Fri, 10 Apr 2009 15:06:53 +0000 (08:06 -0700)]
Move from using pipes to sockets for internal communication. This
avoids issues with windows only supporting select() on sockets.
Remove windows specific definition of dapl_dbg_log.
Update to latest windows libibverbs implementation using completion
channel abstraction to improve windows scalability and simplify
porting where FD's are accessed directly in Linux.
Arlin Davis [Fri, 13 Mar 2009 20:39:12 +0000 (12:39 -0800)]
uDAPL: scm provider, remove query gid/lid from connection setup phase
move lid/gid queries from the connection setup phase
and put them in the open call to avoid overhead
of more fd's during connections. No need
to query during connection setup since uDAPL
binds to specific hca/ports via dat_ia_open.
Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
Arlin Davis [Wed, 4 Mar 2009 18:04:13 +0000 (10:04 -0800)]
dapl scm: remove unecessary thread when using direct objects
A thread is created for processing events on devices without
direct event objecti support. Since all openfabrics devices support
direct events there is no need to start a thread. Move this under
Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
Arlin Davis [Tue, 3 Mar 2009 17:25:26 +0000 (09:25 -0800)]
uDAPL common: add 64 bit counters for IA, EP, and EVD's.
-DDAPL_COUNTERS to build-in counters for cma and scm providers.
New extension calls in dat_ib_extensions.h for counters
dat_print_counters, dat_query_counters
Counters for operations, async errors, and data
Update dtestx (-p) with print and query counter examples
Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
Arlin Davis [Mon, 27 Oct 2008 16:48:53 +0000 (08:48 -0800)]
dapltest: transaction test moves to cleanup stage before rdma_read processing is complete
With multiple treads, the transaction server tread can move to cleanup
stage and unregister memory before the remote client process has
completed the rdma read. In lieu of a rewrite to add sync messages
at the end of transaction test phase, just add a delay before cleanup.
Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
Arlin Davis [Wed, 24 Sep 2008 15:33:32 +0000 (08:33 -0700)]
build: $(DESTDIR) prepend needed on install hooks for dat.conf
All install directives that automake creates automatically
have $(DESTDIR) prepended to them so that a make
DESTDIR=<some_path> install will work. The hand written
install hooks for dat.conf was missing DESTDIR.
Arlin Davis [Wed, 24 Sep 2008 15:26:28 +0000 (08:26 -0700)]
dapl scm: UD shares EP's which requires serialization
add locking around the modify_qp state changes to avoid
unnecessary modify_qp calls during multiple resolve
remote AH connection events on a single EP.
Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
Arlin Davis [Sat, 20 Sep 2008 22:58:59 +0000 (15:58 -0700)]
dapl: fixes for IB UD extensions in common code and socket cm provider.
- Manage EP states base on attribute service type.
- Allow multiple connections (remote_ah resolution)
and accepts on UD type endpoints.
- Supply private data on CR conn establishment
- Add UD extension conn event type - DAT_IB_UD_PASSIVE_REMOTE_AH
Signed-off by: Arlin Davis ardavis@ichips.intel.com