]> git.openfabrics.org - ~ardavis/dapl.git/log
~ardavis/dapl.git
14 years agoucm: fix issues with UD QP's.
Arlin Davis [Tue, 8 Sep 2009 16:11:37 +0000 (09:11 -0700)]
ucm: fix issues with UD QP's.

private data size not in host order when processing
connection events.

ud extentions event should include original ia_addr
and qpn used during connection and not the IB qpn.

ucm QP service resource cleanup in wrong order.

cleanup extra cr/lf device.c

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agowinof: Convert windows version of dapl and dat libaries to use private heaps.
Arlin Davis [Thu, 3 Sep 2009 17:45:56 +0000 (10:45 -0700)]
winof: Convert windows version of dapl and dat libaries to use private heaps.

This allows for better support of memory registration caching by upper
level libaries (MPI) that use SecureMemoryCacheCallback.

It also makes it easier to debug heap corruption issues.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agodtest, dtestx: modifications for UD QP testing with ucm provider.
Arlin Davis [Wed, 2 Sep 2009 21:01:51 +0000 (14:01 -0700)]
dtest, dtestx: modifications for UD QP testing with ucm provider.

remote_addr is wrong for IP remote address.

The dtestx requires the server connect back to the client
for the UD test. With the ucm provider you need to provide
the QPN and the LID which you cannot get until the dtest
client starts. So, for now, don't support UD testing
on UCM providers.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoscm, ucm: UD QP support was broken when porting to common openib code base.
Arlin Davis [Wed, 2 Sep 2009 20:54:59 +0000 (13:54 -0700)]
scm, ucm: UD QP support was broken when porting to common openib code base.

create remote_ah was moved out of modify_qp_state function but not
included in the RTU and ACCEPT code for UD QP's. qp type check
should be on daddr not saddr in ucm cm code.

QP number must be converted to host order before supplying remote_ah,
and qp number to consumer.

Modify QP state to RTR for UD QP mask setting incorrect.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocma: cleanup warning with unused local variable, ret, in disconnect
Arlin Davis [Tue, 1 Sep 2009 20:02:24 +0000 (13:02 -0700)]
cma: cleanup warning with unused local variable, ret, in disconnect

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocma: remove debug message after rdma_disconnect failure
Arlin Davis [Tue, 1 Sep 2009 19:36:31 +0000 (12:36 -0700)]
cma: remove debug message after rdma_disconnect failure

DAPL automatically calls rdma_disconnect() when a disconnect request is
received.  If the user also calls disconnect, that calls rdma_disconnect() as
well, but the connection has already been disconnected by DAPL and is no longer
valid.  The result is that the user's call to rdma_disconnect() will fail.  Do
not display an error message if this occurs.

Locking could be added to prevent calling rdma_disconnect() multiple times, but
since the librdmacm provides synchronization to trap this, we might as well take
advantage of it.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agoscm: socket errno check needs O/S dependent wrapper
Arlin Davis [Tue, 1 Sep 2009 19:27:43 +0000 (12:27 -0700)]
scm: socket errno check needs O/S dependent wrapper

Intel MPI checks the uDAPL error code when calling dat_psp_create() to see if
the port number that it provides is in use or not.  Convert winsock error codes
to unix errno values.

This fixes the following error reported by Intel MPI:
'DAPL provider is not found and fallback device is not enabled'

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
14 years agodapltest: update script files for WinOF
Arlin Davis [Tue, 1 Sep 2009 19:13:16 +0000 (12:13 -0700)]
dapltest: update script files for WinOF

Cleanup 64-bit paths now that WinOF is always installed into '\Program Files\WinOF'.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocma: conditional check for new rdma_cm definition.
Arlin Davis [Tue, 1 Sep 2009 19:10:21 +0000 (12:10 -0700)]
cma: conditional check for new rdma_cm definition.

RDMA_CM_EVENT_TIMEWAIT_EXIT is new to OFED 1.4
add conditional check so dapl can build and run
against older OFED 1.3 stacks

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoRelease 2.0.22 dapl-2.0.22
Arlin Davis [Thu, 20 Aug 2009 16:13:43 +0000 (09:13 -0700)]
Release 2.0.22

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agodapltest: add mdep processor yield and use with dapltest
Arlin Davis [Thu, 20 Aug 2009 16:12:47 +0000 (09:12 -0700)]
dapltest: add mdep processor yield and use with dapltest

Be thread scheduler friendly and release the current thread thus allowing other threads to run.

Signed off by Stan Smith stan.smith@intel.com

14 years agoucm: Add new provider using a DAPL based IB-UD cm mechanism for MPI implementations.
Arlin Davis [Tue, 18 Aug 2009 17:15:15 +0000 (10:15 -0700)]
ucm: Add new provider using a DAPL based IB-UD cm mechanism for MPI implementations.

New provider uses it's own CM protocol on top of IB-UD queue pairs.
During device open, this provider creates a UD queue pair and
returns local address information via dat_ia_query. This 24 byte
opaque address must be exchange out-of-band before connecting to a
server via dat_ep_connect. This provider is targeted for MPI
implementations that already exchange address information
during mpi_init phase.

Future release may provide some ARP mechanism via multicast.

dtest, dtestx, and dtestcm was modified to report the lid and qpn
information on the server side so you can provide appropriate
destination address information for the client test suite.

dapltest will not work with this provider.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoRelease 2.0.21 OFED-1.5-beta WinOF-2.1 dapl-2.0.21-1
Arlin Davis [Wed, 5 Aug 2009 03:54:12 +0000 (20:54 -0700)]
Release 2.0.21

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoscm: Fix disconnect. QP's need to move to ERROR state in
Arlin Davis [Wed, 5 Aug 2009 03:49:09 +0000 (20:49 -0700)]
scm: Fix disconnect. QP's need to move to ERROR state in
order to flush work requests and notify consumer. Moving to
RESET removed all requests but did not notify consumer.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agomodify dtest.c to cleanup CNO wait code and consolidate into
Arlin Davis [Wed, 5 Aug 2009 03:48:03 +0000 (20:48 -0700)]
modify dtest.c to cleanup CNO wait code and consolidate into
collect_event() call. After waking up from CNO wait the
consumer must check all EVD's. The EVD's under the CNO
could be dropped if already triggered or could come in any order.
DT_RetToString changed to DT_RetToStr and DT_EventToSTr
changed to DT_EventToStr for consistency.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoCNO events, once triggered will not be returned during the cno wait.
Arlin Davis [Wed, 5 Aug 2009 03:47:17 +0000 (20:47 -0700)]
CNO events, once triggered will not be returned during the cno wait.
Check for triggered state before going to sleep in cno_wait. Reset
triggered EVD reference after reporting.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoCNO support broken in both CMA and SCM providers.
Arlin Davis [Sun, 2 Aug 2009 21:21:09 +0000 (14:21 -0700)]
CNO support broken in both CMA and SCM providers.

CQ thread/callback mechanism was removed by mistake. Still
need indirect DTO callbacks when CNO is attached to EVD's.

Add CQ event channel to cma provider's thread and add
to select for rdma_cm and async channels.

For scm provider there is not easy way to add this channel
to the select across sockets on windows. So, for portablity
reasons 2 thread is started to process the ASYNC and
CQ channels for events.

Must disable EVD (evd_endabled=FALSE) during destroy
to prevent EVD events firing for CNOs and re-arming CQ while
CQ is being destroyed.

Change dtest to check EVD after CNO timesout.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocommon osd: include winsock2.h for IPv6 definitions.
Arlin Davis [Thu, 30 Jul 2009 15:02:30 +0000 (08:02 -0700)]
common osd: include winsock2.h for IPv6 definitions.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agocommon osd: include w2tcpip.h for sockaddr_in6 definitions.
Arlin Davis [Wed, 29 Jul 2009 15:02:15 +0000 (08:02 -0700)]
common osd: include w2tcpip.h for sockaddr_in6 definitions.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
14 years agoDAPL introduced the concept of directly waiting on the CQ for
Sean Hefty [Mon, 27 Jul 2009 22:07:33 +0000 (15:07 -0700)]
DAPL introduced the concept of directly waiting on the CQ for
events by adding a compile time flag and special handling in the common
code.  Rather than using the compile time flag and modifying the
common code, let the provider implement the best way to wait for
CQ events.

This simplifies the code and allows the common openib providers to
optimize for Linux and Windows platforms independently, rather than
assuming a specific implementation for signaling events.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agodapltest: Implement a malloc() threshold for the completion reaping.
Arlin Davis [Thu, 16 Jul 2009 19:41:22 +0000 (12:41 -0700)]
dapltest: Implement a malloc() threshold for the completion reaping.

change byte vector allocation to stack in functions:
  DT_handle_send_op, DT_handle_rdma_op & DT_handle_recv_op.

When allocation size is under the threshold, use a stack local
allocation instead of malloc/free.  Move redundant bzero() to
be called only in the case of using local stack allocation as
DT_Mdep_malloc() already does a bzero(). Consolidate error handling
return and free()check to a single point by using goto.

Signed-off-by: Stan Smith <stan.smith@intel.com>
15 years agoscm: handle connected state when freeing CM objects
Arlin Davis [Thu, 16 Jul 2009 19:32:09 +0000 (12:32 -0700)]
scm: handle connected state when freeing CM objects

The QP could be freed before being disconnected
so the provider needs process disconnect before freeing
the CM object. The disconnect clean will finish
the destroy process during the disc callback.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm, dtest: changes for winof gettimeofday and FD_SETSIZE settings.
Arlin Davis [Wed, 8 Jul 2009 19:49:43 +0000 (12:49 -0700)]
scm, dtest: changes for winof gettimeofday and FD_SETSIZE settings.

scm changes to set FD_SETSIZE with expected value and
prevent windows override.

dtest: remove gettimeofday implementation for windows
specific implemenation etc\user\gtod.c

general EOL cleanup

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: set TCP_NODELAY sockopt on the server side for sends.
Arlin Davis [Mon, 6 Jul 2009 16:24:07 +0000 (09:24 -0700)]
scm: set TCP_NODELAY sockopt on the server side for sends.

scm provider sends small messages from both server and client
sides. Set NODELAY on both sides to avoid send delays either
way.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agowindows: remove obsolete files in dapl/udapl source tree
Arlin Davis [Thu, 2 Jul 2009 21:16:52 +0000 (14:16 -0700)]
windows: remove obsolete files in dapl/udapl source tree

SOURCES,makefile,udapl.r,udapl_exports.src,udapl_sources.c

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agodtestcm: add UD type QP option to test
Arlin Davis [Thu, 2 Jul 2009 21:11:20 +0000 (14:11 -0700)]
dtestcm: add UD type QP option to test

Add -u for UD type QP's during connection setup.
Will setup UD QPs and provide remote AH
in connect establishment event. Measures
setup/exchange rates.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: destroy QP called before disconnect
Arlin Davis [Thu, 2 Jul 2009 21:07:36 +0000 (14:07 -0700)]
scm: destroy QP called before disconnect

Handle the case where QP is destroyed before
disconnect processing. Windows supports
reinit_qp during a disconnect call by
destroying the QP and recreating the
QO instead of state change from reset
to init. Call disconnect in destroy
CM code to handle this unexpected state.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agocma: add support for rdma_cm TIME_WAIT event.
Arlin Davis [Thu, 2 Jul 2009 21:03:12 +0000 (14:03 -0700)]
cma: add support for rdma_cm TIME_WAIT event.

Nothing to process, simply ack the event.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: remove old udapl_scm code replaced by openib_scm.
Arlin Davis [Wed, 1 Jul 2009 14:58:32 +0000 (07:58 -0700)]
scm: remove old udapl_scm code replaced by openib_scm.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agowinof: fix build issues after consolidating cma, scm code base.
Arlin Davis [Wed, 1 Jul 2009 14:53:18 +0000 (07:53 -0700)]
winof: fix build issues after consolidating cma, scm code base.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agocma: lock held when exiting as a result of a rdma_create_event_channel failure.
Arlin Davis [Wed, 1 Jul 2009 14:51:59 +0000 (07:51 -0700)]
cma: lock held when exiting as a result of a rdma_create_event_channel failure.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agowindows: all dlist functions have been moved to the header file.
Sean Hefty [Mon, 29 Jun 2009 19:34:54 +0000 (12:34 -0700)]
windows: all dlist functions have been moved to the header file.
remove references to dlist.c

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agodtestcm windows: add build infrastructure for new dtestcm test suite
Arlin Davis [Mon, 29 Jun 2009 19:13:48 +0000 (12:13 -0700)]
dtestcm windows: add build infrastructure for new dtestcm test suite

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoopenib_common: reorganize provider code base to share common mem, cq, qp, dto functions
Arlin Davis [Mon, 29 Jun 2009 15:57:46 +0000 (08:57 -0700)]
openib_common: reorganize provider code base to share common mem, cq, qp, dto functions

add new openib_common directory with cq, qp, util, dto, mem function calls
and definitions. This basically leaves the unique CM and Device definitions
and functions to the individual providers directory of openib_scm and openib_cma.

modifications to dapl_cr_accept required. ep->cm_handle is allocated
and managed entirely in provider so dapl common code should not update
ep_handle->cm_handle from the cr->cm_handle automatically. The provider
should determine which cm_handle is required for the accept.

openib_cma defines _OPENIB_CMA_ and openib_scm defines _OPENIB_SCM_ for provider
specific build needs in common code.

15 years agoscm: fixes and optimizations for connection scaling
Arlin Davis [Fri, 26 Jun 2009 21:45:34 +0000 (14:45 -0700)]
scm: fixes and optimizations for connection scaling

Prioritize accepts on listen ports via FD_READ
process the accepts ahead of other work to avoid
socket half_connection (SYN_RECV) stalls.

Fix dapl_poll to return DAPL_FD_ERROR on
all event error types.

Add new state for socket released, but CR
not yet destroyed. This enables scm to release
the socket resources immediately after exchanging
all QP information. Also, add state to str call.

Only add the CR reference to the EP if it is
RC type. UD has multiple CR's per EP so when
a UD EP disconnect_clean was called, from a
timeout, it destroyed the wrong CR.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: double the default fd_set_size
Arlin Davis [Fri, 26 Jun 2009 21:31:19 +0000 (14:31 -0700)]
scm: double the default fd_set_size

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: EP reference in CR should be cleared during ep_destroy
Arlin Davis [Fri, 26 Jun 2009 21:28:30 +0000 (14:28 -0700)]
scm: EP reference in CR should be cleared during ep_destroy

The EP reference in the CR should be set to null
during the EP free call to insure no further
reference back to a mem freed EP.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agodtestx: fix conn establishment event checking
Arlin Davis [Fri, 26 Jun 2009 21:23:35 +0000 (14:23 -0700)]
dtestx: fix conn establishment event checking

not catching error cases on client side
when checking for event number and UD type
&& should have been ||

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agodtestcm: new test to measure dapl connection rates.
Arlin Davis [Fri, 26 Jun 2009 21:18:37 +0000 (14:18 -0700)]
dtestcm: new test to measure dapl connection rates.

new test suite added to measure connection
rates of providers. Used to compare cma, scm,
and other providers under development.

dtestcm USAGE

s: server
c: connections (default = 1000)
b: burst rate of conn_reqs (default = 100)
m: multi-listens (set to burst setting )
v: verbose
w: wait on event (default, polling)
d: delay before accept
h: hostname/address of server, specified on client
P: provider name (default = OpenIB-v2-ib0)

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoRelease 2.0.20 dapl-2.0.20-1
Arlin Davis [Sat, 20 Jun 2009 03:59:16 +0000 (20:59 -0700)]
Release 2.0.20

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agocommon,scm: add debug capabilities to print in-process CM lists
Arlin Davis [Sat, 20 Jun 2009 03:52:51 +0000 (20:52 -0700)]
common,scm: add debug capabilities to print in-process CM lists

Add a new debug bit DAPL_DBG_TYPE_CM_LIST.
If set, the pending CM requests will be
dumped when dat_print_counters is called.
Only provided when built with -DDAPL_COUNTERS

Add new dapl_cm_state_str() call for state
to string conversion for debug prints.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: disconnect EP before cleaning up orphaned CR's during dat_ep_free
Arlin Davis [Tue, 16 Jun 2009 16:22:31 +0000 (09:22 -0700)]
scm: disconnect EP before cleaning up orphaned CR's during dat_ep_free

There is the possibility of dat_ep_free being called
with RC CR's still in connected state. Call disconnect
on the CR before marking for destroy.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agodapltest: windows scripts updated
Arlin Davis [Wed, 10 Jun 2009 19:05:17 +0000 (12:05 -0700)]
dapltest: windows scripts updated

Support added for provider specification and general simplification of internal workings.

Signed-off-by: Stan Smith <stan.smith@intel.com>
15 years agoscm: private data is not handled properly via CR rejects.
Arlin Davis [Wed, 10 Jun 2009 16:18:09 +0000 (09:18 -0700)]
scm: private data is not handled properly via CR rejects.

For both RC and UD connect requests, the private
data is not being received on socket and passed
back via the active side REJECT event.

UD requires new extended reject event type of
DAT_IB_UD_CONNECTION_REJECT_EVENT to distiquish
between RC and UD type rejects.

cr_thread exit/cleanup processing fixed to insure
all items are off the list before exiting.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: cleanup orphaned UD CR's when destroying the EP
Arlin Davis [Wed, 10 Jun 2009 16:09:56 +0000 (09:09 -0700)]
scm: cleanup orphaned UD CR's when destroying the EP

UD CR objects are kept active because of direct private data references
from CONN events. The cr->socket is closed and marked inactive but the
object remains allocated and queued on the CR resource list. There can
be multiple CR's associated with a given EP and there is no way to
determine when consumer is finished with event until the dat_ep_free.
Schedule destruction for all CR's associated with this EP during
free call. cr_thread will complete cleanup with state of SCM_DESTROY.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: provider specific query for default UD MTU is wrong.
Arlin Davis [Wed, 10 Jun 2009 16:05:32 +0000 (09:05 -0700)]
scm: provider specific query for default UD MTU is wrong.

Change the provider specific query DAT_IB_TRANSPORT_MTU
to report 2048 for new default MTU size.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: update CM code to shutdown before closing socket
Arlin Davis [Wed, 10 Jun 2009 17:06:59 +0000 (10:06 -0700)]
scm: update CM code to shutdown before closing socket

data could be lost without calling shutdown on the socket
before closing. Update to shutdown and then close. Add
definition for SHUT_RW to SD_BOTH for windows.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
---

15 years agodapltest: windows script dt-cli.bat updated
Arlin Davis [Thu, 4 Jun 2009 20:48:18 +0000 (13:48 -0700)]
dapltest: windows script dt-cli.bat updated

scn should be scm

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agodapl/windows cma provider: add support for network devices based on index
Sean Hefty [Thu, 4 Jun 2009 15:19:12 +0000 (08:19 -0700)]
dapl/windows cma provider: add support for network devices based on index

The linux cma provider provides support for named network devices, such
as 'ib0' or 'eth0'.  This allows the same dapl configuration file to
be used easily across a cluster.

To allow similar support on Windows, allow users to specify the device
name 'rdma_devN' in the dapl.conf file.  The given index, N, is map to a
corresponding IP address that is associated with an RDMA device.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoopenib: remove 1st gen provider, replaced with openib_cma and openib_scm
Arlin Davis [Thu, 4 Jun 2009 15:00:29 +0000 (08:00 -0700)]
openib: remove 1st gen provider, replaced with openib_cma and openib_scm

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agodapltest: update windows script files
Arlin Davis [Fri, 29 May 2009 15:21:10 +0000 (08:21 -0700)]
dapltest: update windows script files

Enhancement to take DAPL provider name as cmd-line arguement.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agodapltest: update windows batch files in sripts directory
Arlin Davis [Thu, 28 May 2009 22:30:05 +0000 (15:30 -0700)]
dapltest: update windows batch files in sripts directory

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agowindows_osd/linux_osd: new dapl_os_gettid macro to return thread id
Arlin Davis [Mon, 18 May 2009 21:00:02 +0000 (14:00 -0700)]
windows_osd/linux_osd: new dapl_os_gettid macro to return thread id

Change dapl_os_getpid inline to macro on windows and add dapl_os_gettid
macros on linux and windows to return thread id.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agowindows: missing build files for common and udapl sub-directories
Arlin Davis [Mon, 18 May 2009 20:53:59 +0000 (13:53 -0700)]
windows: missing build files for common and udapl sub-directories

Add dapl/dapl_common_src.c and dapl/dapl_udapl_src.c

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agowindows: add build files for openib_scm, remove /Wp64 build option.
Arlin Davis [Mon, 18 May 2009 16:06:19 +0000 (09:06 -0700)]
windows: add build files for openib_scm, remove /Wp64 build option.

Add build files for windows socket cm and change build
option on windows providers. The new Win7 WDK issues a
depreciated compiler option warning for /Wp64
(Enable 64-bit porting warnings)

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: multi-hca CM processing broken. Need cr thread wakeup mechanism per HCA.
Arlin Davis [Mon, 18 May 2009 15:50:35 +0000 (08:50 -0700)]
scm: multi-hca CM processing broken. Need cr thread wakeup mechanism per HCA.

Currently there is only one pipe across all
device opens. This results in some posted CR work
getting delayed or not processed at all. Provide
pipe for each device open and cr thread created
and manage on a per device level.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agodtest: add connection timers on client side
Arlin Davis [Fri, 15 May 2009 18:06:19 +0000 (11:06 -0700)]
dtest: add connection timers on client side

Add timers for active connections and print
results. Allow polling or wait on conn event.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agolinux_osd: use pthread_self instead of getpid for debug messages
Arlin Davis [Fri, 15 May 2009 16:48:38 +0000 (09:48 -0700)]
linux_osd: use pthread_self instead of getpid for debug messages

getpid provides process ids which are not unique. Use unique thread
id's in debug messages to help isolate issues across many device
opens with multiple CM threads.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agowindows ibal-scm: dapl/dirs file needs updated to remove ibal-scm
Arlin Davis [Fri, 1 May 2009 17:18:05 +0000 (10:18 -0700)]
windows ibal-scm: dapl/dirs file needs updated to remove ibal-scm

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoRelease 2.0.19 dapl-2.0.19-1 ofed_1_4_1-v2
Arlin Davis [Thu, 30 Apr 2009 06:13:36 +0000 (23:13 -0700)]
Release 2.0.19

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm, cma: dat max_lmr_block_size is 32 bit, verbs max_mr_size is 64 bit
Arlin Davis [Wed, 29 Apr 2009 21:33:28 +0000 (14:33 -0700)]
scm, cma: dat max_lmr_block_size is 32 bit, verbs max_mr_size is 64 bit

mismatch of device attribute size restricts max_lmr_block_size to 32 bit
value. Add check, if larger then limit to 4G-1 until DAT v2 spec changes.

Consumers should use max_lmr_virtual_address for actual max
registration block size until attribute interface changes.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoscm: increase default MTU size from 1024 to 2048
Arlin Davis [Wed, 29 Apr 2009 17:51:03 +0000 (10:51 -0700)]
scm: increase default MTU size from 1024 to 2048

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agoopenib_scm, cma: use direct SGE mappings from dat_lmr_triplet to ibv_sge
Arlin Davis [Wed, 29 Apr 2009 17:49:09 +0000 (10:49 -0700)]
openib_scm, cma: use direct SGE mappings from dat_lmr_triplet to ibv_sge

no need to rebuild scatter gather list given that DAT v2.0
is now aligned with verbs ibv_sge. Fix ib_send_op_type_t typedef.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agodtest: add flush EVD call after data transfer errors
Arlin Davis [Wed, 29 Apr 2009 15:39:37 +0000 (08:39 -0700)]
dtest: add flush EVD call after data transfer errors

Flush and print entries on async, request, and receive
queues after any data transfer error. Will help
identify failing operation during operations
without completion events requested.
Fix -B0 so burst size of 0 works.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
15 years agodtest/dapltest: Cleanup code with Lindent
Arlin Davis [Wed, 22 Apr 2009 20:16:19 +0000 (13:16 -0700)]
dtest/dapltest: Cleanup code with Lindent

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agoibal-scm: remove, obsolete
Arlin Davis [Tue, 21 Apr 2009 22:51:24 +0000 (15:51 -0700)]
ibal-scm: remove, obsolete

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agoscm, cma provider: Cleanup code with Lindent
Arlin Davis [Tue, 21 Apr 2009 22:44:15 +0000 (15:44 -0700)]
scm, cma provider: Cleanup code with Lindent

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agoudapl: Cleanup code with Lindent
Arlin Davis [Tue, 21 Apr 2009 22:39:01 +0000 (15:39 -0700)]
udapl: Cleanup code with Lindent

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agodapl common: Cleanup code with Lindent
Arlin Davis [Tue, 21 Apr 2009 22:31:20 +0000 (15:31 -0700)]
dapl common: Cleanup code with Lindent

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agodat: Cleanup code with Lindent
Arlin Davis [Tue, 21 Apr 2009 19:52:29 +0000 (12:52 -0700)]
dat: Cleanup code with Lindent

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agoRelease 2.0.18 dapl-2.0.18-1
Arlin Davis [Mon, 20 Apr 2009 19:28:08 +0000 (12:28 -0700)]
Release 2.0.18

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agodapltest: reset server listen ports to avoid collisions during long runs
Arlin Davis [Thu, 16 Apr 2009 21:35:18 +0000 (14:35 -0700)]
dapltest: reset server listen ports to avoid collisions during long runs

If server is running continuously the port number increments
from base without reseting between tests. This will
eventually cause collisions in port space.

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agoTo avoid duplicating port numbers between different tests, the next port
Sean Hefty [Thu, 16 Apr 2009 17:21:51 +0000 (10:21 -0700)]
To avoid duplicating port numbers between different tests, the next port
number to use must increment based on the number of endpoints per thread *
the number of threads.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agodapltest assumes that events across multiple endpoints occur in a specific
Sean Hefty [Thu, 16 Apr 2009 17:21:45 +0000 (10:21 -0700)]
dapltest assumes that events across multiple endpoints occur in a specific
order.  Since this is a false assumption, avoid this by directing events to
per endpoint EVDs, rather than using shared EVDs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoSynchronization is missing between removing items from an EVD and queuing
Sean Hefty [Thu, 16 Apr 2009 17:21:41 +0000 (10:21 -0700)]
Synchronization is missing between removing items from an EVD and queuing
them.  Since the removal thread is the user's, but the queuing thread is
not, the synchronization must be provided by DAPL.  Hold the evd lock
around any calls to dapls_rbuf_*.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoCommunication to the CR thread is done using an internal socket. When a
Sean Hefty [Thu, 16 Apr 2009 17:21:26 +0000 (10:21 -0700)]
Communication to the CR thread is done using an internal socket.  When a
new connection request is ready for processing, an object is placed on
the CR list, and data is written to the internal socket.  The write causes
the CR thread to wake-up and process anything on its cr list.

If multiple objects are placed on the CR list around the same time, then
the CR thread will read in a single character, but process the entire list.
This results in additional data being left on the internal socket.  When
the CR does a select(), it will find more data to read, read the data, but
not have any real work to do.  The result is that the thread spins in a
loop checking for changes when none have occurred until all data on the
internal socket has been read.

Avoid this overhead by reading all data off the internal socket before
processing the CR list.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoThe dapl connect call takes as input an address (sockaddr) and a port number
Sean Hefty [Thu, 16 Apr 2009 17:21:13 +0000 (10:21 -0700)]
The dapl connect call takes as input an address (sockaddr) and a port number
as separate input parameters.  It modifies the sockaddr address to set the
port number before trying to connect.  This leads to a situation in
dapltest with multiple threads that reference the same buffer for their
address, but specify different port numbers, where the different threads
end up trying to connect to the same remote port.

To solve this, do not modify the caller's address buffer and instead use
a local buffer.  This fixes an issue seen running multithreaded tests with
dapltest.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoWindows socket calls should check return values against SOCKET_ERROR to
Sean Hefty [Thu, 16 Apr 2009 17:21:03 +0000 (10:21 -0700)]
Windows socket calls should check return values against SOCKET_ERROR to
determine if an error occurred.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoBuild: add new file dapl/openib_cma/linux/openib_osd.h to EXTRA_DIST
Arlin Davis [Fri, 10 Apr 2009 15:33:41 +0000 (08:33 -0700)]
Build: add new file dapl/openib_cma/linux/openib_osd.h to EXTRA_DIST

Fix rpmbuild problem with new cma osd include file.

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agodapl scm: reduce wait time for thread startup.
Arlin Davis [Fri, 10 Apr 2009 15:32:24 +0000 (08:32 -0700)]
dapl scm: reduce wait time for thread startup.

thread startup wait reduce to 2ms to reduce open times.

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agodapl-scm: getsockopt optlen needs initialized to size of optval
Arlin Davis [Fri, 10 Apr 2009 15:31:22 +0000 (08:31 -0700)]
dapl-scm: getsockopt optlen needs initialized to size of optval

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agoThe connection request thread adds sockets to a select list unless
Sean Hefty [Fri, 10 Apr 2009 15:17:32 +0000 (08:17 -0700)]
The connection request thread adds sockets to a select list unless
the cr->socket is invalid and the cr request state is set to destroy.  If the
cr->socket is invalid, but the cr->state is not destroy, then the cr->socket
is added to an FD set for select/poll.  This results in select/poll
returning an error when select is called.  As a result, the cr thread never
actually blocks during this state.

Fix this by only destroying a cr based on its state being set to destroy
and skip adding cr->sockets to the FD set when they are invalid.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoMake sure all locks are initialized properly and don't zero their memory
Sean Hefty [Fri, 10 Apr 2009 15:08:16 +0000 (08:08 -0700)]
Make sure all locks are initialized properly and don't zero their memory
once they are.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoThe lock functions are defined just a few lines beneath the prototypes
Sean Hefty [Fri, 10 Apr 2009 15:08:13 +0000 (08:08 -0700)]
The lock functions are defined just a few lines beneath the prototypes
as inline.  Remove the duplicate prototypes.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoMake sure all locks are initialized and don't zero out their memory once
Sean Hefty [Fri, 10 Apr 2009 15:08:07 +0000 (08:08 -0700)]
Make sure all locks are initialized and don't zero out their memory once
they are.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoThe IBAL library allocates a small number of threads for callbacks to the
Sean Hefty [Fri, 10 Apr 2009 15:08:03 +0000 (08:08 -0700)]
The IBAL library allocates a small number of threads for callbacks to the
user.  If the user blocks all of the callback threads, no additional
callbacks can be invoked.  The DAPL IBAL provider cancels listen requests
from within an IBAL callback, then waits for a second callback to confirm
that the listen has been canceled.  If there is a single IBAL callback
thread, or multiple listens are canceled simultaneously, then the provider
can deadlock waiting for a cancel callback that never occurs.

This problem is seen when running dapltest with multiple threads.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoWe need to check the return value from select for errors before checking
Sean Hefty [Fri, 10 Apr 2009 15:07:57 +0000 (08:07 -0700)]
We need to check the return value from select for errors before checking
the FD sets.  An item may be in an FD set but select could have returned
an error.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoSigned-off-by: Sean Hefty <sean.hefty@intel.com>
Sean Hefty [Fri, 10 Apr 2009 15:07:53 +0000 (08:07 -0700)]
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoEnable building with CQ_WAIT_OBJECTS support to directly wait on CQ
Sean Hefty [Fri, 10 Apr 2009 15:07:49 +0000 (08:07 -0700)]
Enable building with CQ_WAIT_OBJECTS support to directly wait on CQ
completion channels in the Windows version of the openib_scm provider.
Also minor fixup to use DAPL_DBG_TYPE_UTIL for debug log messages
instead of DAPL_DBG_TYPE_CM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoThe IBAL-SCM provider will run into an inifinite loop if the check for
Sean Hefty [Fri, 10 Apr 2009 15:07:44 +0000 (08:07 -0700)]
The IBAL-SCM provider will run into an inifinite loop if the check for
cr->socket > SCM_MAX_CONN - 1 fails.  The code continues back to the start
of the while loop without moving to the next connection request entry
in the list.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agonext_cr is set just before and inside the check
Sean Hefty [Fri, 10 Apr 2009 15:07:40 +0000 (08:07 -0700)]
next_cr is set just before and inside the check
if ((cr->socket == DAPL_INVALID_SOCKET && cr->state == SCM_DESTROY)
Remove setting it inside the if statement.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoSome errors on windows are more easily interpretted in hex than decimal.
Sean Hefty [Fri, 10 Apr 2009 15:07:35 +0000 (08:07 -0700)]
Some errors on windows are more easily interpretted in hex than decimal.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoThe WinOF HCA driver cannot handle transitioning from RTS -> RESET ->
Sean Hefty [Fri, 10 Apr 2009 15:07:32 +0000 (08:07 -0700)]
The WinOF HCA driver cannot handle transitioning from RTS -> RESET ->
INIT -> ERROR.  Simply delete the QP and re-create it to reinitialize
the endpoint until the bug is fixed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoSigned-off-by: Sean Hefty <sean.hefty@intel.com>
Sean Hefty [Fri, 10 Apr 2009 15:07:23 +0000 (08:07 -0700)]
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoConvert the openib_cma provider to common code between linux and windows.
Sean Hefty [Fri, 10 Apr 2009 15:07:18 +0000 (08:07 -0700)]
Convert the openib_cma provider to common code between linux and windows.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoMove from using pipes to sockets for internal communication. This
Sean Hefty [Fri, 10 Apr 2009 15:06:53 +0000 (08:06 -0700)]
Move from using pipes to sockets for internal communication.  This
avoids issues with windows only supporting select() on sockets.

Remove windows specific definition of dapl_dbg_log.

Update to latest windows libibverbs implementation using completion
channel abstraction to improve windows scalability and simplify
porting where FD's are accessed directly in Linux.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
15 years agoRelease 2.0.17
Arlin Davis [Tue, 31 Mar 2009 13:41:50 +0000 (05:41 -0800)]
Release 2.0.17

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agodapl: ia64 build problem on SuSE 11, atomic.h no longer exists.
Arlin Davis [Tue, 31 Mar 2009 13:22:11 +0000 (05:22 -0800)]
dapl: ia64 build problem on SuSE 11, atomic.h no longer exists.

Add autotools check for SuSE 11 and include intrinsics.h

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agoRelease 2.0.16 dapl-2.0.16-1
Arlin Davis [Mon, 16 Mar 2009 21:23:50 +0000 (13:23 -0800)]
Release 2.0.16

Fix changelog year in spec file.

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>
15 years agoRelease 2.0.16
Arlin Davis [Mon, 16 Mar 2009 21:15:22 +0000 (13:15 -0800)]
Release 2.0.16

Signed-off-by: Arlin Davis <ardavis@ichips.intel.com>