]> git.openfabrics.org - ~ardavis/dapl.git/log
~ardavis/dapl.git
10 years agoopenib: add new provider specific attributes
Arlin Davis [Tue, 4 Mar 2014 18:30:02 +0000 (10:30 -0800)]
openib: add new provider specific attributes

DAT_IB_PROVIDER_NAME = UCM/CMA/SCM
DAT_IB_DEVICE_NAME = ibv_get_device_name
DAT_IB_CONNECTIVITY_MODE = DIRECT/PROXY
DAT_IB_RDMA_READ = TRUE/FALSE
DAT_IB_NODE_GUID = xxxx:xxxx:xxxx:xxxx
DAT_IB_PORT_STATE = ibv_port_state_str

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodapltest: update scripts for regression testing purposes
Arlin Davis [Mon, 3 Mar 2014 23:04:12 +0000 (15:04 -0800)]
dapltest: update scripts for regression testing purposes

cl.sh and srv.sh update to provide better examples and
a methods to quickly regression test any dapltest changes.

 usage: srv.sh devicename
   where devicename is provider (default = ofa-v2-mlx4_0-1)

 usage: cl.sh hostname testname devicename
   where testname
     stop - request DAPLtest server to exit.
     conn - simple connection with limited dater transfer
     trans - single transaction test
     transm - transaction test: multiple transactions [RW SND, RDMA]
     transt - transaction test: multi-threaded, single transaction
     transme - transaction test: multi-endpoints per thread
     transmet - transaction test: multi: threads and endpoints per thread
     transmete - transaction test: multi threads == endpoints
     perf - Performance test
     threads - multi-threaded single transaction test.
     threadsm - multi: threads and endpoints, single transaction test.
     rdma-write - RDMA write
     rdma-read - RDMA read
     bw - bandwidth
     latb - latency tests, blocking for events
     latp - latency tests, polling for events
     lim - limit tests.
     regression - loop over a collection of all tests.
   where devicename is provider (default = ofa-v2-mlx4_0-1)

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodapltest: Add final send/recv "sync" for transaction tests.
swise@opengridcomputing.com [Mon, 3 Mar 2014 22:35:43 +0000 (14:35 -0800)]
dapltest: Add final send/recv "sync" for transaction tests.

The transaction tests need both sides to send a sync message after running the test.  This ensures that all remote operations are complete before dapltest deregeisters memory and disconnects the endpoints.

Without this logic, we see intermittent async errors on iwarp devices because a read response or write arrives after the rmr has been destroyed.
I believe this is more likely to happen with iWARP than IB because iWARP completions only indicate the local buffer can be reused.  It doesn't imply that the message has even arrived at the peer, let alone been placed in the peer application's memory.

Changes from V1:

- allocate new send/recv buffers for the Final Sync message.

- post the Final Sync recv buffer at the beginning of the final iteration of a test.

- tests ok on cxgb4 and mlx4 devices.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
10 years agoRelease 2.0.40
Arlin Davis [Mon, 10 Feb 2014 21:07:00 +0000 (13:07 -0800)]
Release 2.0.40

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodist: ib collective extension include files missing
Arlin Davis [Mon, 10 Feb 2014 07:34:43 +0000 (23:34 -0800)]
dist: ib collective extension include files missing

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodapltest: the quit command is missing changes for -n option.
Arlin Davis [Mon, 10 Feb 2014 07:24:29 +0000 (23:24 -0800)]
dapltest: the quit command is missing changes for -n option.

Server-port was not being set properly during param init phase on the client side.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodat.conf: remove v1, add Mellanox Connect-IB and Intel Xeon Phi MIC
Arlin Davis [Mon, 10 Feb 2014 06:55:17 +0000 (22:55 -0800)]
dat.conf: remove v1, add Mellanox Connect-IB and Intel Xeon Phi MIC

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoNULL undefined on Fedora, incorrectly using kernel stddef.h
Arlin Davis [Mon, 10 Feb 2014 21:01:47 +0000 (13:01 -0800)]
NULL undefined on Fedora, incorrectly using kernel stddef.h

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoRelease 2.0.39
Arlin Davis [Thu, 3 Oct 2013 23:05:06 +0000 (16:05 -0700)]
Release 2.0.39

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodapltest: fix endian swap issue with performance test
Arlin Davis [Thu, 3 Oct 2013 22:21:08 +0000 (15:21 -0700)]
dapltest: fix endian swap issue with performance test

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoSCM: getifaddrs modfications for better out of the box experience
Arlin Davis [Tue, 1 Oct 2013 22:40:17 +0000 (15:40 -0700)]
SCM: getifaddrs modfications for better out of the box experience

socket cm will now walk list of interfaces and ignore loopback
and ignore IB devices, unless the IB netdev is the only device.
Works better in a heterogenous environment with a mix of net device.
Tested with br0, mic0, and mic0:ib netdev mixes.
Overriding with DAPL_SCM_NETDEV still works as is.

Signed-off-by: Patrick Mccormick <patrick.m.mccormick@intel.com>
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoucm, scm: UD mode triggers list_head assert with large scale alltoall test
Arlin Davis [Tue, 1 Oct 2013 21:03:51 +0000 (14:03 -0700)]
ucm, scm: UD mode triggers list_head assert with large scale alltoall test

1024+ ranks, IMB alltoall may hit assert when running Intel MPI in UD mode.

CR clean up was implemented with EP to CR references still linked.
During cr_accept, the CR remote_ia_address is linked to EP object
by mistake with UD mode. UD mode my have multiple CRs per EP so
no direct mappings to CR memory can exist unless RC mode which
always has one EP to CR mapping.

In scm, ucm: for CM object free with CR references the search and
unlinking from SP must be under SP lock to serialize. Also,
cleanup thread wakeup logic to only trigger the thread if
reference count indicates the need for more processing.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoRelease 2.0.38
Arlin Davis [Mon, 22 Jul 2013 19:37:21 +0000 (12:37 -0700)]
Release 2.0.38

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agodapltest: add -n parameter to override default server port number (45278)
Arlin Davis [Tue, 16 Jul 2013 23:12:37 +0000 (16:12 -0700)]
dapltest: add -n parameter to override default server port number (45278)

Modify all tests and commands to take a new -n parameter option for server
listen port. The default port, when running multiple EP's and threads,
will sometimes collide and fail with EADDRINUSE on iWARP configurations
using rdma_bind_addr with sin_port=0.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoucm,scm: UD mode creates many CR objects per EP that needs cleaned up
Arlin Davis [Fri, 12 Jul 2013 18:52:33 +0000 (11:52 -0700)]
ucm,scm: UD mode creates many CR objects per EP that needs cleaned up

After connection is established and the AH is provided to consumer
on UD connect establishment there is no need to keep the CR object
on the SP. For large clusters this results in a growing memory
footprint for CR objects and long cleanup times on device close.

Change ucm and scm providers to unlink and free CR resources
during CM object free if this is a UD QP and CONN_EST state.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocma: add DAPL_CM_TOS environment variable to enable passing a TOS to the RDMA CM
Arlin Davis [Mon, 24 Jun 2013 21:19:22 +0000 (14:19 -0700)]
cma: add DAPL_CM_TOS environment variable to enable passing a TOS to the RDMA CM

Signed-off-by Matthew Finlay <matt@mellanox.com>
Acked-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoRelease 2.0.37 dapl-2.0.37-1
Arlin Davis [Fri, 7 Jun 2013 01:22:52 +0000 (18:22 -0700)]
Release 2.0.37

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocommon: add support for ia name during dat_ia_query
Arlin Davis [Wed, 29 May 2013 23:59:09 +0000 (16:59 -0700)]
common: add support for ia name during dat_ia_query

the device name was not being updated during a query. Copy
the hca name into ia_attr->adapter_name for consumers.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocommon: dapl_os_atomic_inc/dec() not working as expected on ppc64 machines.
Arlin Davis [Wed, 29 May 2013 23:53:18 +0000 (16:53 -0700)]
common: dapl_os_atomic_inc/dec() not working as expected on ppc64 machines.

Signed-off-by: Pradeep Satyanarayana <pradeep@us.ibm.com>
Acked-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agodapltest: ppc64 endian issue with exchanged mem handle and address
Arlin Davis [Wed, 29 May 2013 23:45:20 +0000 (16:45 -0700)]
dapltest: ppc64 endian issue with exchanged mem handle and address

Signed-off-by: Pradeep Satyanarayana <pradeep@us.ibm.com>
Signed-off-by: Aravinda Venkatramana <Aravinda.Venkatramana@emulex.com>
Acked-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoRelease 2.0.36
Arlin Davis [Thu, 5 Jul 2012 17:00:28 +0000 (10:00 -0700)]
Release 2.0.36

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: increase ACK timeout to 20 for a default value to match other providers.
Arlin Davis [Thu, 5 Jul 2012 16:58:21 +0000 (09:58 -0700)]
scm: increase ACK timeout to 20 for a default value to match other providers.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: allow qp modify in init state
Arlin Davis [Mon, 14 May 2012 21:51:38 +0000 (14:51 -0700)]
common: allow qp modify in init state

Allow consumer to modify attributes via dat_ep_modify
in init state.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: check for valid states during ep posting
Arlin Davis [Thu, 10 May 2012 21:57:31 +0000 (14:57 -0700)]
common: check for valid states during ep posting

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agodat.conf: keep list of providers in order for backward compatibility
Arlin Davis [Thu, 10 May 2012 20:35:55 +0000 (13:35 -0700)]
dat.conf: keep list of providers in order for backward compatibility

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoucm: record and silently drop a duplicate reject CM message
Arlin Davis [Thu, 10 May 2012 17:49:09 +0000 (10:49 -0700)]
ucm: record and silently drop a duplicate reject CM message

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agowindows: new version of getlocalipaddr not portable
Arlin Davis [Wed, 25 Apr 2012 20:37:53 +0000 (13:37 -0700)]
windows: new version of getlocalipaddr not portable

revert to the original getaddrinfo method for windows

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agodapltest: DFLT_QLEN is defined in multiple tests
Arlin Davis [Wed, 25 Apr 2012 20:36:52 +0000 (13:36 -0700)]
dapltest: DFLT_QLEN is defined in multiple tests

add #ifdef checking in transaction test.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoRelease 2.0.35 dapl-2.0.35-1
Arlin Davis [Wed, 25 Apr 2012 20:10:39 +0000 (13:10 -0700)]
Release 2.0.35

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoconfig/build: remove post/postun hacking used to modify dat.conf
Arlin Davis [Wed, 25 Apr 2012 20:07:10 +0000 (13:07 -0700)]
config/build: remove post/postun hacking used to modify dat.conf

Return to the tried and true method of managing configuration
files via %config directive and remove ugly sed editing methods.
The dat.conf includes both v1 and v2 device entries to insure
backward compatibility. Add doc/dat.conf

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoconfig: clean up help option displays with ext-type options
Arlin Davis [Mon, 23 Apr 2012 17:35:24 +0000 (10:35 -0700)]
config: clean up help option displays with ext-type options

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agowindows: Provide auto-detect between RoCE and Infiniband for Windows.
stan smith [Mon, 23 Apr 2012 17:32:00 +0000 (10:32 -0700)]
windows: Provide auto-detect between RoCE and Infiniband for Windows.

For RoCE, enable transport global ID use.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoucm: update UD cm provider to support new CM stat and error counters
Arlin Davis [Fri, 20 Apr 2012 00:40:45 +0000 (17:40 -0700)]
ucm: update UD cm provider to support new CM stat and error counters

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: update socket cm provider to support new CM stat and error counters
Arlin Davis [Fri, 20 Apr 2012 00:40:03 +0000 (17:40 -0700)]
scm: update socket cm provider to support new CM stat and error counters

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommom: add cm, link, and diag event counters in IB extended builds
Arlin Davis [Fri, 20 Apr 2012 00:15:22 +0000 (17:15 -0700)]
commom: add cm, link, and diag event counters in IB extended builds

Add additional event monitoring capabilities during runtime to help
isolate issues during scaling in lieu of logging/printing warning
messages. Counters have been added to provider CM services and counters
have been added and mapped to sysfs ib_cm, device port and device
diag counters. ibdev_path is used for device sysfs counters.

uDAPL CM events are tracked on a per IA instance via internal
provider counters. The ib_cm, link, and diag events are tracked on a
per platform basis via sysfs. For these running counters a start
and stop function is provided for sampling and mapping to DAPL
64 bit counters. All counters, along with new start and stop functions,
are provided via dat_ib_extensions.h. New IB extension version is 2.0.7

New DCNT_IA_xx counters include 40 cm, 9 link, and 9 diag types.

To enable new counters (default build is disabled):
./configure --enable-counters

New bitmappings have been added to DAPL_DBG_TYPE environment
variable to automatically start/stop counters and log
errors if counters are enabled. The following will control
CM, LINK, and DIAG respectively:

   DAPL_DBG_TYPE_CM_ERRS = 0x080000,
   DAPL_DBG_TYPE_LINK_ERRS = 0x100000,
   DAPL_DBG_TYPE_DIAG_ERRS = 0x400000,

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: use ioctl SIOCIFCONF to get complete list of configured netdev interfaces
Arlin Davis [Tue, 17 Apr 2012 22:24:22 +0000 (15:24 -0700)]
scm: use ioctl SIOCIFCONF to get complete list of configured netdev interfaces

replace usage of getaddrinfo since is doesnt actually return bound addresses
and can return the loopback address in some configurations. Some
systems may not have eth0 configured so you cannot assume eth0 as a non-loopback
default netdev.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoucm: UD send failures at scale, ucm_send ERR: get_smsg(hd=149,tl=150)
Arlin Davis [Fri, 17 Feb 2012 18:28:48 +0000 (10:28 -0800)]
ucm: UD send failures at scale, ucm_send ERR: get_smsg(hd=149,tl=150)

Full sendq should retry polling completions instead of failing.
When sendq is full and all requests are pending the get send message
code should retry polling for completions and not return error on first
empty CQ attempt. Give HCA a chance to complete some batched requests.
Also, clean up the send message error logging.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: fix retry count on connection pending timeout
Arlin Davis [Mon, 6 Feb 2012 22:04:37 +0000 (14:04 -0800)]
scm: fix retry count on connection pending timeout

Retry count not being decremented on connection TIMEOUT.
Also, cleanup log messages on CONN and REP pending and
add local port to output.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoucm: cleanup debug message, ntohl on p_size is incorrect
Arlin Davis [Mon, 6 Feb 2012 22:03:20 +0000 (14:03 -0800)]
ucm: cleanup debug message, ntohl on p_size is incorrect

private data size is a short, change to ntohs on log message

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocma, scm, ucm: allow EP (QP) creation without EVD (CQ)
Arlin Davis [Mon, 30 Jan 2012 18:19:29 +0000 (10:19 -0800)]
cma, scm, ucm: allow EP (QP) creation without EVD (CQ)

Provide ability to create a EP/QP with no EVD/CQ on either the
request or receive queue. The current implementation allows on
receive queue but not request queue. Not all ofa devices support
a null CQ so if necessary create a dummy CQ at the time of
QP creation. Also, if no CQ is specified set appropriate QP
max wr/sge attributes to zero.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: add DAPL_DBG_TYPE_CM_STATS (0x40000) to debug log options
Arlin Davis [Mon, 30 Jan 2012 18:09:42 +0000 (10:09 -0800)]
common: add DAPL_DBG_TYPE_CM_STATS (0x40000) to debug log options

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: dapls_ep_flush_cq will segfault when no CQ is attached to EP
Arlin Davis [Wed, 25 Jan 2012 19:54:29 +0000 (11:54 -0800)]
common: dapls_ep_flush_cq will segfault when no CQ is attached to EP

add check for NULL request/receive EVD (cq) before flushing.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: ep_create should allow max_request_iov attribute setting of zero
Arlin Davis [Wed, 25 Jan 2012 19:50:21 +0000 (11:50 -0800)]
common: ep_create should allow max_request_iov attribute setting of zero

When creating an EP without a request EVD (cq) the max_request_iov
and max_request_sge will be 0. Allow this combination when checking
attribute settings for ARG6.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: add check for NULL handle on ext calls, SRQ free, and helper functions
Arlin Davis [Wed, 18 Jan 2012 23:47:12 +0000 (15:47 -0800)]
common: add check for NULL handle on ext calls, SRQ free, and helper functions

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: add missing sub-types to dat_strerror()
Arlin Davis [Fri, 13 Jan 2012 20:01:26 +0000 (12:01 -0800)]
common: add missing sub-types to dat_strerror()

"unknown minor error" string returned with valid sub types.
Update function for sub-type error codes in dat_error.h.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: extended CR event processing missing rejects on errors
Arlin Davis [Thu, 12 Jan 2012 17:54:59 +0000 (09:54 -0800)]
common: extended CR event processing missing rejects on errors

When processing an inbound CR event callback a non-user reject should be
sent to client in the case of a non-listening SP, allocation error,
or EVD overrun. Changes made to dapls_evd_post_cr_event_ext callback.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoucm: incorrectly sends user reject during CR callback errors
Arlin Davis [Thu, 12 Jan 2012 17:39:46 +0000 (09:39 -0800)]
ucm: incorrectly sends user reject during CR callback errors

Add reason checking on provider rejects and set appropriate op type
in reject message. Reject can be called from cr callback during
failures. User reject will be IB_CM_REJ_REASON_CONSUMER_REJ.
Add warning message on active side.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: change dbg level on CR callback if not listening on SP
Arlin Davis [Tue, 10 Jan 2012 23:42:24 +0000 (15:42 -0800)]
common: change dbg level on CR callback if not listening on SP

Change from from CM to CM_WARN level and include in non-debug build.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: incorrectly sends user reject during CR callback errors
Arlin Davis [Mon, 9 Jan 2012 23:03:21 +0000 (15:03 -0800)]
scm: incorrectly sends user reject during CR callback errors

Add reason checking on provider rejects and set appropriate op type
in reject message. Reject can be called from cr callback during
failures. User reject will be IB_CM_REJ_REASON_CONSUMER_REJ.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agodat: add check for NULL handle on IA calls
Arlin Davis [Mon, 9 Jan 2012 18:29:26 +0000 (10:29 -0800)]
dat: add check for NULL handle on IA calls

check added to dats_get_ia_handle()

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocma,scm,ucm: extra reference on EP, with RSP, causes dat_ep_free() to hang
Arlin Davis [Thu, 8 Dec 2011 00:39:55 +0000 (16:39 -0800)]
cma,scm,ucm: extra reference on EP, with RSP, causes dat_ep_free() to hang

Need to add check for RSP or PSP provider type service points during
passive side accepts before taking CR reference on the EP. In these
cases, the EP is already linked to inbound CR.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: RSP service points incorrectly freed during CR callback
Arlin Davis [Thu, 8 Dec 2011 00:34:17 +0000 (16:34 -0800)]
common: RSP service points incorrectly freed during CR callback

The RSP service point is being removed because of improper
state/flag checking during CR callback. Add state check
for DAPL_SP_STATE_RSP_LISTENING.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: clean up dat_rsp_create log message
Arlin Davis [Fri, 2 Dec 2011 23:45:30 +0000 (15:45 -0800)]
common: clean up dat_rsp_create log message

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: cleanup debug message on EVD overflows
Arlin Davis [Fri, 2 Dec 2011 23:44:04 +0000 (15:44 -0800)]
common: cleanup debug message on EVD overflows

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: return correct event error code when remote host refuses requests
Arlin Davis [Fri, 2 Dec 2011 23:31:09 +0000 (15:31 -0800)]
scm: return correct event error code when remote host refuses requests

changed from TIMEOUT to NON_PEER_REJECTED

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agodapltest: server CR EVD is too small for multi-client configurations.
Arlin Davis [Fri, 2 Dec 2011 23:29:08 +0000 (15:29 -0800)]
dapltest: server CR EVD is too small for multi-client configurations.

Increase default size from 8 to 32.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoCommon: CR EVD overflow causes segfault.
Arlin Davis [Fri, 2 Dec 2011 23:28:31 +0000 (15:28 -0800)]
Common: CR EVD overflow causes segfault.

The CR is freed up incorrectly before unlinking with SP.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoRelease 2.0.34 dapl-2.0.34-1 ofed_1.5.4
Arlin Davis [Wed, 2 Nov 2011 23:36:22 +0000 (16:36 -0700)]
Release 2.0.34

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: change debug message level for listen/bind errors
Arlin Davis [Wed, 2 Nov 2011 17:40:02 +0000 (10:40 -0700)]
scm: change debug message level for listen/bind errors

reduce to CM_WARN instead of general WARN level.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: increase default IB ack timer from 16 to 20
Arlin Davis [Wed, 2 Nov 2011 17:29:34 +0000 (10:29 -0700)]
common: increase default IB ack timer from 16 to 20

For larger, more congested fabrics, a larger ACK timer is needed.
Consumers can still change default with environment variable
DAPL_ACK_TIMER if they need to increase or decrease.

This applies to SCM and UCM providers only. The CMA provider, which
uses rdma_cm, has no way to control ack timer with current API.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: remote ia address null pointer creates seg fault
Arlin Davis [Tue, 1 Nov 2011 20:44:24 +0000 (13:44 -0700)]
common: remote ia address null pointer creates seg fault

add NULL ptr check and return DAT_INVALID_PARAMETER, DAT_INVALID_ARG2

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: posting events on full queue returns wrong error code
Arlin Davis [Tue, 1 Nov 2011 19:40:21 +0000 (12:40 -0700)]
common: posting events on full queue returns wrong error code

Return DAT_QUEUE_FULL instead of DAT_INSUFFICIENT_RESOURCES

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: dat_ep_modify seg faults with null ep_param ptr
Arlin Davis [Tue, 1 Nov 2011 18:43:55 +0000 (11:43 -0700)]
common: dat_ep_modify seg faults with null ep_param ptr

add additional NULL ptr check for arg3 ep_param

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: dat_evd_free seg faults with resized software EVD
Arlin Davis [Fri, 28 Oct 2011 19:27:16 +0000 (12:27 -0700)]
common: dat_evd_free seg faults with resized software EVD

dapl_evd_resize is attempting to resize a CQ but there is no
CQ attached to a software EVD. Add check for cq_handle
before resizing.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: remove assert for incorrect events during cm_request
Arlin Davis [Fri, 28 Oct 2011 17:23:51 +0000 (10:23 -0700)]
common: remove assert for incorrect events during cm_request

Simply print a warning message. Connection callback doesn't
forward invalid events to consumer so no need to assert.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agodat: dat_cno_query with NULL cno_handle causes segmentation fault
Arlin Davis [Wed, 26 Oct 2011 23:02:56 +0000 (16:02 -0700)]
dat: dat_cno_query with NULL cno_handle causes segmentation fault

add check for NULL handle in dat library

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: dat_psp_create returns wrong error code on bind/listen failure
Arlin Davis [Wed, 26 Oct 2011 20:03:50 +0000 (13:03 -0700)]
scm: dat_psp_create returns wrong error code on bind/listen failure

The SCM provider changed to return DAT_INVALID_PARAMTER instead of
incorrect DAT_CONN_QUAL_UNAVAILABLE error code on any bind or
listen failure.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: socket connect request count is reset improperly on retry
Arlin Davis [Wed, 26 Oct 2011 16:12:10 +0000 (09:12 -0700)]
scm: socket connect request count is reset improperly on retry

Include the current retry count with the new connect request call
and set according after creating the new cm object.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: when hostname has loopback addr assigned, default to eth0 instead of failing
Arlin Davis [Mon, 24 Oct 2011 20:59:55 +0000 (13:59 -0700)]
scm: when hostname has loopback addr assigned, default to eth0 instead of failing

There are some cases where the eth0 device is configured with an IP address
but the getaddrinfo() will only return loopback address because the
hostname is configured in the /etc/hosts file with 127.0.0.1. In this case,
the provider will now retry address on eth0 before failing the open.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: add port number to error log during hca_open failures
Arlin Davis [Mon, 24 Oct 2011 20:57:01 +0000 (13:57 -0700)]
scm: add port number to error log during hca_open failures

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: query calls return incorrect IA handle to consumer
Arlin Davis [Mon, 24 Oct 2011 20:40:45 +0000 (13:40 -0700)]
common: query calls return incorrect IA handle to consumer

The IA handle from the consumer perspective is an IA vector and
not the provider IA address handle. Need to convert IA handle
to IA vector for consumer calls.

Modify dats_ia_get_handle call to convert both ways depending
on handle type provided so a dapl provider can convert to vector
on query calls. This fix is backward compatible with older libdat2
libraries. Function is already exported and syntax is unchanged.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: srq create asserts with !dapl_llist_is_empty(head) failed
Arlin Davis [Thu, 22 Sep 2011 20:42:15 +0000 (13:42 -0700)]
common: srq create asserts with !dapl_llist_is_empty(head) failed

return DAT_NOT_IMPLEMENTED before allocating any resources
until there is a provider that supports SRQ's.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoRelease 2.0.33 dapl-2.0.33-1 list
Arlin Davis [Mon, 29 Aug 2011 19:31:38 +0000 (12:31 -0700)]
Release 2.0.33

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm,ucm: fix compatibility issues and set minimum protocol support
Arlin Davis [Mon, 29 Aug 2011 02:10:37 +0000 (19:10 -0700)]
scm,ucm: fix compatibility issues and set minimum protocol support

allow latest version to work with previous versions to allow
compatibility back to OFED 1.5, dapl-2.0.23. If rdma_atomic_in
is not exchanged default back to original settings set by
consumer.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agobuild: link librdmacm dependency to ib_acm usage for ucm and scm providers
Arlin Davis [Fri, 19 Aug 2011 20:19:38 +0000 (13:19 -0700)]
build: link librdmacm dependency to ib_acm usage for ucm and scm providers

Add -lrdmacm to XLIBS for ucm and scm providers. Only set with
conditional use of ib_acm as defined by DAPL_USE_IBACM.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agobuild: add selective enable/disable-xxx build switch for each provider
Arlin Davis [Fri, 19 Aug 2011 19:40:24 +0000 (12:40 -0700)]
build: add selective enable/disable-xxx build switch for each provider

The following switches have been added to configure:

--disable-cma (disables the rdma_cm dapl provider build)
--disable-scm (disables the socket cm provider build)
--disable-ucm (disables the IB UD cm provider build)

all providers are enabled by default.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agobuild: add extended header files to EXTRA_DIST and fix missing backslash
Arlin Davis [Fri, 19 Aug 2011 17:24:32 +0000 (10:24 -0700)]
build: add extended header files to EXTRA_DIST and fix missing backslash

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agobuild: set IB extended coll-type to none by default
Arlin Davis [Fri, 19 Aug 2011 17:23:42 +0000 (10:23 -0700)]
build: set IB extended coll-type to none by default

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: change errno mapping of EINVAL to DAT_INVALID_PARAMETER
Arlin Davis [Mon, 8 Aug 2011 06:02:54 +0000 (23:02 -0700)]
common: change errno mapping of EINVAL to DAT_INVALID_PARAMETER

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agobuild: add IB collective and FCA provider to dapl build package as an option
Arlin Davis [Mon, 8 Aug 2011 05:59:27 +0000 (22:59 -0700)]
build: add IB collective and FCA provider to dapl build package as an option

New collective support and the FCA provider will only be built with
configure option of "--enable-coll-type=fca". Dependencies include
fca devel package, fca library, and mverbs library. This will not
be included in default builds.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: add new dapls_evd_post_event_ext call for extended events
Arlin Davis [Mon, 8 Aug 2011 05:54:36 +0000 (22:54 -0700)]
common: add new dapls_evd_post_event_ext call for extended events

Add prototype and code to post extended events on dispatcher and
include collective definitions to dat_event_str function.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoucm: add support for IB collective providers
Arlin Davis [Mon, 8 Aug 2011 05:53:45 +0000 (22:53 -0700)]
ucm: add support for IB collective providers

Add collective member address and threading information
on a per transport basis. Call create/free service with
HCA open/close.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: add support for IB collective providers
Arlin Davis [Mon, 8 Aug 2011 05:53:09 +0000 (22:53 -0700)]
scm: add support for IB collective providers

Add collective member address and threading information
on a per transport basis. Call create/free service with
HCA open/close.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocma: add support for IB collective providers
Arlin Davis [Mon, 8 Aug 2011 05:47:21 +0000 (22:47 -0700)]
cma: add support for IB collective providers

Add collective member address and threading information
on a per transport basis. Call create/free service with
HCA open/close.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: add supported collective types in named attributes for query
Arlin Davis [Mon, 8 Aug 2011 05:45:04 +0000 (22:45 -0700)]
common: add supported collective types in named attributes for query

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: add collective call mappings via standard dapli_post_ext()
Arlin Davis [Mon, 8 Aug 2011 05:41:48 +0000 (22:41 -0700)]
common: add collective call mappings via standard dapli_post_ext()

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: new debug bitmask definition for extension logging
Arlin Davis [Mon, 8 Aug 2011 05:39:32 +0000 (22:39 -0700)]
common: new debug bitmask definition for extension logging

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: new IB collective provider for Mellanox Fabric Collective Agent
Arlin Davis [Wed, 17 Aug 2011 17:29:01 +0000 (10:29 -0700)]
common: new IB collective provider for Mellanox Fabric Collective Agent

Support for bcast, barrier, reduce, allreduce, allgather, allgatherv

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agodat: add definitions for MPI offloaded collectives in IB transport extensions
Arlin Davis [Mon, 8 Aug 2011 05:06:09 +0000 (22:06 -0700)]
dat: add definitions for MPI offloaded collectives in IB transport extensions

The collective extensions are designed to support MPI and general
multicast operations over IB fabrics that support offloaded collectives.
Where feasible, they come as close to MPI semantics as possible.
Unless otherwise stated, all members participating in a data collective
operation must call the associated collective routine for the data
transfer operation to complete. Unless otherwise stated, the
root collective member of a data operation will receive its own portion
of the collective data. In most cases, the root member can prevent
sending/receiving data when such operations would be redundant. When root
data is already "in place" the root member may set the send and/or receive
buffer pointer argument to NULL.

Unlike standard DAPL movement operations that require registered
memory and LMR objects, collective data movement operations employ
pointers to user-virtual address space that do not require
pre-registration by the application. From a resource usage point
of view, the API user should consider that the provider implementation
my perform memory registrations/deregistration on behalf of the
application to accomplish a data transfer.

Most collective calls are asynchronous. Upon completion, an event
will be posted to the EVD specified when the collective was created.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
13 years agocommon: cleanup debug messages when building with ibacm feature
sean.hefty@intel.com [Wed, 22 Jun 2011 17:40:17 +0000 (10:40 -0700)]
common: cleanup debug messages when building with ibacm feature

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
13 years agoRelease 2.0.32 dapl-2.0.32-1 ofed_1.5.3
Arlin Davis [Mon, 14 Feb 2011 05:14:33 +0000 (21:14 -0800)]
Release 2.0.32

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
13 years agocma: reduce output log level in disconnect from WARN to CM_WARN
Arlin Davis [Sat, 12 Feb 2011 20:03:19 +0000 (12:03 -0800)]
cma: reduce output log level in disconnect from WARN to CM_WARN

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
13 years agoucm: delay freeing of active side UD cm object in case RTU is dropped
Arlin Davis [Sat, 12 Feb 2011 19:46:08 +0000 (11:46 -0800)]
ucm: delay freeing of active side UD cm object in case RTU is dropped

The ucm was freeing the UD CM object to quickly so a retried REPLY
was dropped and the passive side never received the AH info via RTU.
Keep active side UD cm objects on work queue until QP is destroyed
so RTU can be resent if necessary.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
13 years agoucm: cm object needs to be on work queue before req sent on wire
Arlin Davis [Sat, 12 Feb 2011 19:36:35 +0000 (11:36 -0800)]
ucm: cm object needs to be on work queue before req sent on wire

With this delay in cm object queuing there is potential for replies
being dropped coming back with a NO MATCH. Start with INIT
state and queue it up, move to state REP_PENDING when
sending out on the wire to start request timer.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
13 years agoucm, scm: remove use of usec_sleep delays and use events for disc and destroy
Arlin Davis [Fri, 4 Feb 2011 22:41:45 +0000 (14:41 -0800)]
ucm, scm: remove use of usec_sleep delays and use events for disc and destroy

use pthread mutex when processing and waiting for disconnect completions
and for CM object destruction. Add f_event, d_event to cm object.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
13 years agocommon: reduce default max inline data size because of performance anomaly
Arlin Davis [Fri, 21 Jan 2011 02:31:30 +0000 (18:31 -0800)]
common: reduce default max inline data size because of performance anomaly

Increasing max inline causes small message rates to decrease from
4M/sec when set to 64 to about 1M/sec when set to 400. This has
been observed on latest mlx4 adapters. Set default to 64 until resolved.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
13 years agocommon: dapls_evd_dto_wait() dbg message should print status and not errno
Arlin Davis [Thu, 20 Jan 2011 19:05:02 +0000 (11:05 -0800)]
common: dapls_evd_dto_wait() dbg message should print status and not errno

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
13 years agoucm, scm: exchange max_qp_rd_atom and limit outstanding requests
Arlin Davis [Mon, 17 Jan 2011 22:43:58 +0000 (14:43 -0800)]
ucm, scm: exchange max_qp_rd_atom and limit outstanding requests

exchange and add proper checking to limit outstanding
rdma reads and atomics. Use one of the reserve bytes
in CM message protocol to exchange limits and reset
EP attribute rdma_out and set QP RTS attribute properly.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
13 years agoscm: retry socket connect on ECONNREFUSED under heavy load
Arlin Davis [Wed, 5 Jan 2011 04:22:12 +0000 (20:22 -0800)]
scm: retry socket connect on ECONNREFUSED under heavy load

with large scale workloads a linux server starts rejecting
socket connect requests. Add retry logic for connection refused
errors.

increasing net.ipv4.tcp_max_syn_backlog to 2048 will also reduce the
chance of these errors when scaling up.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
13 years agocommon: qp modify RTR using wrong ep attribute parameter for dest_rd_atomic
Arlin Davis [Wed, 5 Jan 2011 00:25:29 +0000 (16:25 -0800)]
common: qp modify RTR using wrong ep attribute parameter for dest_rd_atomic

max_rdma_read_in should be used instead of max_rdma_read_out

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>