Arlin Davis [Tue, 1 Oct 2013 22:40:17 +0000 (15:40 -0700)]
SCM: getifaddrs modfications for better out of the box experience
socket cm will now walk list of interfaces and ignore loopback
and ignore IB devices, unless the IB netdev is the only device.
Works better in a heterogenous environment with a mix of net device.
Tested with br0, mic0, and mic0:ib netdev mixes.
Overriding with DAPL_SCM_NETDEV still works as is.
Signed-off-by: Patrick Mccormick <patrick.m.mccormick@intel.com> Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 1 Oct 2013 21:03:51 +0000 (14:03 -0700)]
ucm, scm: UD mode triggers list_head assert with large scale alltoall test
1024+ ranks, IMB alltoall may hit assert when running Intel MPI in UD mode.
CR clean up was implemented with EP to CR references still linked.
During cr_accept, the CR remote_ia_address is linked to EP object
by mistake with UD mode. UD mode my have multiple CRs per EP so
no direct mappings to CR memory can exist unless RC mode which
always has one EP to CR mapping.
In scm, ucm: for CM object free with CR references the search and
unlinking from SP must be under SP lock to serialize. Also,
cleanup thread wakeup logic to only trigger the thread if
reference count indicates the need for more processing.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 16 Jul 2013 23:12:37 +0000 (16:12 -0700)]
dapltest: add -n parameter to override default server port number (45278)
Modify all tests and commands to take a new -n parameter option for server
listen port. The default port, when running multiple EP's and threads,
will sometimes collide and fail with EADDRINUSE on iWARP configurations
using rdma_bind_addr with sin_port=0.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 12 Jul 2013 18:52:33 +0000 (11:52 -0700)]
ucm,scm: UD mode creates many CR objects per EP that needs cleaned up
After connection is established and the AH is provided to consumer
on UD connect establishment there is no need to keep the CR object
on the SP. For large clusters this results in a growing memory
footprint for CR objects and long cleanup times on device close.
Change ucm and scm providers to unlink and free CR resources
during CM object free if this is a UD QP and CONN_EST state.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 25 Apr 2012 20:07:10 +0000 (13:07 -0700)]
config/build: remove post/postun hacking used to modify dat.conf
Return to the tried and true method of managing configuration
files via %config directive and remove ugly sed editing methods.
The dat.conf includes both v1 and v2 device entries to insure
backward compatibility. Add doc/dat.conf
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 20 Apr 2012 00:15:22 +0000 (17:15 -0700)]
commom: add cm, link, and diag event counters in IB extended builds
Add additional event monitoring capabilities during runtime to help
isolate issues during scaling in lieu of logging/printing warning
messages. Counters have been added to provider CM services and counters
have been added and mapped to sysfs ib_cm, device port and device
diag counters. ibdev_path is used for device sysfs counters.
uDAPL CM events are tracked on a per IA instance via internal
provider counters. The ib_cm, link, and diag events are tracked on a
per platform basis via sysfs. For these running counters a start
and stop function is provided for sampling and mapping to DAPL
64 bit counters. All counters, along with new start and stop functions,
are provided via dat_ib_extensions.h. New IB extension version is 2.0.7
New DCNT_IA_xx counters include 40 cm, 9 link, and 9 diag types.
To enable new counters (default build is disabled):
./configure --enable-counters
New bitmappings have been added to DAPL_DBG_TYPE environment
variable to automatically start/stop counters and log
errors if counters are enabled. The following will control
CM, LINK, and DIAG respectively:
Arlin Davis [Tue, 17 Apr 2012 22:24:22 +0000 (15:24 -0700)]
scm: use ioctl SIOCIFCONF to get complete list of configured netdev interfaces
replace usage of getaddrinfo since is doesnt actually return bound addresses
and can return the loopback address in some configurations. Some
systems may not have eth0 configured so you cannot assume eth0 as a non-loopback
default netdev.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 17 Feb 2012 18:28:48 +0000 (10:28 -0800)]
ucm: UD send failures at scale, ucm_send ERR: get_smsg(hd=149,tl=150)
Full sendq should retry polling completions instead of failing.
When sendq is full and all requests are pending the get send message
code should retry polling for completions and not return error on first
empty CQ attempt. Give HCA a chance to complete some batched requests.
Also, clean up the send message error logging.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 30 Jan 2012 18:19:29 +0000 (10:19 -0800)]
cma, scm, ucm: allow EP (QP) creation without EVD (CQ)
Provide ability to create a EP/QP with no EVD/CQ on either the
request or receive queue. The current implementation allows on
receive queue but not request queue. Not all ofa devices support
a null CQ so if necessary create a dummy CQ at the time of
QP creation. Also, if no CQ is specified set appropriate QP
max wr/sge attributes to zero.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 25 Jan 2012 19:50:21 +0000 (11:50 -0800)]
common: ep_create should allow max_request_iov attribute setting of zero
When creating an EP without a request EVD (cq) the max_request_iov
and max_request_sge will be 0. Allow this combination when checking
attribute settings for ARG6.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 12 Jan 2012 17:54:59 +0000 (09:54 -0800)]
common: extended CR event processing missing rejects on errors
When processing an inbound CR event callback a non-user reject should be
sent to client in the case of a non-listening SP, allocation error,
or EVD overrun. Changes made to dapls_evd_post_cr_event_ext callback.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 12 Jan 2012 17:39:46 +0000 (09:39 -0800)]
ucm: incorrectly sends user reject during CR callback errors
Add reason checking on provider rejects and set appropriate op type
in reject message. Reject can be called from cr callback during
failures. User reject will be IB_CM_REJ_REASON_CONSUMER_REJ.
Add warning message on active side.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 9 Jan 2012 23:03:21 +0000 (15:03 -0800)]
scm: incorrectly sends user reject during CR callback errors
Add reason checking on provider rejects and set appropriate op type
in reject message. Reject can be called from cr callback during
failures. User reject will be IB_CM_REJ_REASON_CONSUMER_REJ.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 8 Dec 2011 00:39:55 +0000 (16:39 -0800)]
cma,scm,ucm: extra reference on EP, with RSP, causes dat_ep_free() to hang
Need to add check for RSP or PSP provider type service points during
passive side accepts before taking CR reference on the EP. In these
cases, the EP is already linked to inbound CR.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 2 Nov 2011 17:29:34 +0000 (10:29 -0700)]
common: increase default IB ack timer from 16 to 20
For larger, more congested fabrics, a larger ACK timer is needed.
Consumers can still change default with environment variable
DAPL_ACK_TIMER if they need to increase or decrease.
This applies to SCM and UCM providers only. The CMA provider, which
uses rdma_cm, has no way to control ack timer with current API.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 24 Oct 2011 20:59:55 +0000 (13:59 -0700)]
scm: when hostname has loopback addr assigned, default to eth0 instead of failing
There are some cases where the eth0 device is configured with an IP address
but the getaddrinfo() will only return loopback address because the
hostname is configured in the /etc/hosts file with 127.0.0.1. In this case,
the provider will now retry address on eth0 before failing the open.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 24 Oct 2011 20:40:45 +0000 (13:40 -0700)]
common: query calls return incorrect IA handle to consumer
The IA handle from the consumer perspective is an IA vector and
not the provider IA address handle. Need to convert IA handle
to IA vector for consumer calls.
Modify dats_ia_get_handle call to convert both ways depending
on handle type provided so a dapl provider can convert to vector
on query calls. This fix is backward compatible with older libdat2
libraries. Function is already exported and syntax is unchanged.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 29 Aug 2011 02:10:37 +0000 (19:10 -0700)]
scm,ucm: fix compatibility issues and set minimum protocol support
allow latest version to work with previous versions to allow
compatibility back to OFED 1.5, dapl-2.0.23. If rdma_atomic_in
is not exchanged default back to original settings set by
consumer.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 19 Aug 2011 19:40:24 +0000 (12:40 -0700)]
build: add selective enable/disable-xxx build switch for each provider
The following switches have been added to configure:
--disable-cma (disables the rdma_cm dapl provider build)
--disable-scm (disables the socket cm provider build)
--disable-ucm (disables the IB UD cm provider build)
all providers are enabled by default.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 8 Aug 2011 05:59:27 +0000 (22:59 -0700)]
build: add IB collective and FCA provider to dapl build package as an option
New collective support and the FCA provider will only be built with
configure option of "--enable-coll-type=fca". Dependencies include
fca devel package, fca library, and mverbs library. This will not
be included in default builds.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 8 Aug 2011 05:06:09 +0000 (22:06 -0700)]
dat: add definitions for MPI offloaded collectives in IB transport extensions
The collective extensions are designed to support MPI and general
multicast operations over IB fabrics that support offloaded collectives.
Where feasible, they come as close to MPI semantics as possible.
Unless otherwise stated, all members participating in a data collective
operation must call the associated collective routine for the data
transfer operation to complete. Unless otherwise stated, the
root collective member of a data operation will receive its own portion
of the collective data. In most cases, the root member can prevent
sending/receiving data when such operations would be redundant. When root
data is already "in place" the root member may set the send and/or receive
buffer pointer argument to NULL.
Unlike standard DAPL movement operations that require registered
memory and LMR objects, collective data movement operations employ
pointers to user-virtual address space that do not require
pre-registration by the application. From a resource usage point
of view, the API user should consider that the provider implementation
my perform memory registrations/deregistration on behalf of the
application to accomplish a data transfer.
Most collective calls are asynchronous. Upon completion, an event
will be posted to the EVD specified when the collective was created.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Sat, 12 Feb 2011 19:46:08 +0000 (11:46 -0800)]
ucm: delay freeing of active side UD cm object in case RTU is dropped
The ucm was freeing the UD CM object to quickly so a retried REPLY
was dropped and the passive side never received the AH info via RTU.
Keep active side UD cm objects on work queue until QP is destroyed
so RTU can be resent if necessary.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Sat, 12 Feb 2011 19:36:35 +0000 (11:36 -0800)]
ucm: cm object needs to be on work queue before req sent on wire
With this delay in cm object queuing there is potential for replies
being dropped coming back with a NO MATCH. Start with INIT
state and queue it up, move to state REP_PENDING when
sending out on the wire to start request timer.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 21 Jan 2011 02:31:30 +0000 (18:31 -0800)]
common: reduce default max inline data size because of performance anomaly
Increasing max inline causes small message rates to decrease from
4M/sec when set to 64 to about 1M/sec when set to 400. This has
been observed on latest mlx4 adapters. Set default to 64 until resolved.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 17 Jan 2011 22:43:58 +0000 (14:43 -0800)]
ucm, scm: exchange max_qp_rd_atom and limit outstanding requests
exchange and add proper checking to limit outstanding
rdma reads and atomics. Use one of the reserve bytes
in CM message protocol to exchange limits and reset
EP attribute rdma_out and set QP RTS attribute properly.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 7 Dec 2010 00:06:47 +0000 (16:06 -0800)]
libdat: static provider entries created for local SR database not freed
During load (dat_sr_init) the SR database is created with all dat.conf entries
but are never cleaned up during unload. Add new functions dat_sr_remove_all()
and dat_sr_remove() calls to cleanup and deallocate SR database entries and
database via dat_sr_fini().
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>