Arlin Davis [Thu, 12 Sep 2013 16:12:55 +0000 (09:12 -0700)]
mcm: reduce max qp depth and msg size in proxy mode, allow override
DAPL_MCM_WR_MAX is used set max qp depth on mcm provider, default=500
DAPL_MCM_MSG_MAX is used set max msg size on mcm provider, default=8388608
DAPL_WR_MAX is used to override max qp depth on all IB providers.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 11 Sep 2013 22:04:37 +0000 (15:04 -0700)]
mpxyd: CM_REPLY: RETRIES (7) EXHAUSTED
The clients RTU is not processed by mpxyd thread in corner cases.
The SCIF EP, handling the client cm thread (scif_ev_ep) operations,
was not added to select FD set so the op_thread didn't wake up in the
case where RTU's were sent on scif_ev_ep and no operations are
being sent on scif_op_ep.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 10 Sep 2013 16:19:18 +0000 (09:19 -0700)]
common: cleanup async event processing and logging
Add formatted string print for ib verbs async events
Remove unecessary logging and duplicate async callbacks
Modify all IB providers to use dapli_async_event_cb()
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 18 Jul 2013 17:17:11 +0000 (10:17 -0700)]
mcm: support incompatable verbs definitions inter-node within the platform
OFA verbs 3.5 and 1.5.4 are incompatable so there can be no direct
mappings to verbs within any MIC to Host communications. Remove
all direct verbs mappings in MIX and create inline construct fuctions
to convert verbs to new dat_mix_wr and dat_mix_wc types for both work requests
and work completions.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 16 Jul 2013 23:12:37 +0000 (16:12 -0700)]
dapltest: add -n parameter to override default server port number (45278)
Modify all tests and commands to take a new -n parameter option for server
listen port. The default port, when running multiple EP's and threads,
will sometimes collide and fail with EADDRINUSE on iWARP configurations
using rdma_bind_addr with sin_port=0.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 12 Jul 2013 18:52:33 +0000 (11:52 -0700)]
ucm,scm: UD mode creates many CR objects per EP that needs cleaned up
After connection is established and the AH is provided to consumer
on UD connect establishment there is no need to keep the CR object
on the SP. For large clusters this results in a growing memory
footprint for CR objects and long cleanup times on device close.
Change ucm and scm providers to unlink and free CR resources
during CM object free if this is a UD QP and CONN_EST state.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 12 Jun 2013 16:45:45 +0000 (09:45 -0700)]
cma: long delays when opening cma provider with no IPoIB configured
The rdma_cm provider (ofa-v2-ib0) can take netdev, ip address, or hostname
for local address bindings. When trying to open a non-existent netdev (ib0)
the provider will fall through and use the getaddrinfo sys call assuming
dat.conf parameter is either an IP address or hostname and not a netdev.
When trying hostname option it will attempt to resolve the name via the
name services. On a KNC this can result in long timeouts depending on the
configuration. This changes the error handling when opening the cma provider
on a non-existant netdev and will only call getaddrinfo with AI_CANONNAME
hints after checking the dat.conf parameter for a valid hostname.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 29 May 2013 23:00:32 +0000 (16:00 -0700)]
mpxyd: CM optimizations for MIC clients, improved checking on inbound CM messages
allow CM operations to be received on OP or EV channels from
MIC clients and provide each SMD channel with aligned message buffer
for scif_recv processing.
add checking for NO match at MD level after checking all SMD children
for inbound CM message match and add dump_cm_lists function for debug.
add check for inline message threshold, DAT_MIX_INLINE_MAX
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 29 May 2013 22:36:29 +0000 (15:36 -0700)]
mcm: scif_recv err on mpxyd when scaling up on MPI IMB scatter benchmark
The inline send changes incorporated fragmented scif_send options which
de-serialized the stream operation on the scif endpoint. This can result
in a CM operation from the CM thread to interleave with the post_send
inline operation that sends a hdr and inline data separately.
Modify the post_send to use only one scif_send operation for inline.
Also optimize CM and Operations by moving all CM message to the
scif_ev_ep. Cleanup operation log messages to include op strings
for easier debug.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
pmmccorm [Mon, 13 May 2013 21:03:04 +0000 (14:03 -0700)]
Enable ccl-proxy support if possible by default: yes, if nothing specified and scif.h is present no, if scif.h is not present and nothing specfied no, if --enable-mcm=no is specified yes, if --enable-mcm=yes and scif.h is present error, if --enable-mcm=yes and scif.h missing
Make the corresponding changes to the spec file so that whatever
options are specified, the RPM will contain the right files (before
we were shipping the mpxyd service and conf regardless).
Arlin Davis [Fri, 17 May 2013 19:27:36 +0000 (12:27 -0700)]
SCM: getifaddrs modfications for better out of the box experience with MIC
socket cm will now walk list of interfaces and ignore loopback
and ignore IB devices, unless the IB netdev is the only device.
Works better in a heterogenous environment with a mix of MICs.
Tested with br0, mic0, and mic0:ib netdev mixes.
Overriding with DAPL_SCM_NETDEV still works as is.
Signed-off-by: Patrick Mccormick <patrick.m.mccormick@intel.com> Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 25 Mar 2013 17:18:05 +0000 (10:18 -0700)]
mpxyd: add support for full work request or memory pool
Current implemention will fail when WR or memory is full. Change to
throttle and retry mix post_send opertions during full work queue.
New wr_pp (pst pending) added to m_qp for tracking outstanding
IB work request in flight.
Add counters for full wr and mem pool cases. Print mix-version on
startup.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Add eager completion, configurable, to signal writes or sends
after scif_readfrom is signaled and all data is local to proxy
instead of waiting for IB signal. User data on MIC is available
for reuse.
Combine sends and writes to mix_post_send command, provide
ordering guarantees between inline and dma data. Allow's
direct posting from OP thread is head of queue.
Add new counters for inline and signaled IO.
Extend m_wr to include flags for controlling eager completions
and proxy buffer and work request management.
cq event FD is now non-blocking and processed via TX thread
instead of OP thread. Allows for polling > 1 event at a time.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 15 Mar 2013 19:23:16 +0000 (12:23 -0700)]
mcm: add support for mix inline data, improve mix_poll events
mpxyd can be configured for inline data for posted
writes and sends. This will use scif_send/recv instead
of scif_readfrom based on threashold set in mpxyd.conf
change the mix_poll command to NOT issue the request
on scif and simply wait for mpxyd to write completion
back to EVD. This removes unneccesary SCIF command
traffic.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 6 Feb 2013 19:49:29 +0000 (11:49 -0800)]
install: problem with rpm update when updating from 2.0.34 or older
The postun will remove entries on older packages that incorrectly
add and remove entries instead of updating the file. When updating
from these older version we end up with an empty /etc/dat.conf.
In order to fix we have to save the dat.conf and restore during
the upgrade process with the triggerpostun
Arlin Davis [Sat, 2 Feb 2013 01:33:17 +0000 (17:33 -0800)]
mpxyd: cm scaling bug fixes and profiling
New CM thread to help with CM scale out. Testing with dtestcm
with 1000's of connections. MPI testing up to 60ppn on KNC nodes.
Add new disc timers and disconnect logging for debug.
Add cleanup for IB device during service termination.
Add profiling of device and CM operations to help debug scaling issues
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 21 Jan 2013 20:51:42 +0000 (12:51 -0800)]
mpxyd: TX thread can miss pending requests with multiple clients
Pending data variable is overwritten with multiple SCIF clients
bound to one HCA causing rdma_write to stall and not posted
on IB device. MPI running multiple ranks on a KNC can stall.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 16 Jan 2013 21:39:59 +0000 (13:39 -0800)]
mcm,mpxyd: add multi-enpoint, multi-threaded, and CPU affinity support for mpxyd and mcm clients
For performance reasons separate EP's and separate threads have been incorporated.
3 scif eps. operation, events, and transmit are created for every device open
2 threads per MIC adapter, one for operations and one for RDMA operations
CPU affinity support as been added to to assist in HCA to MIC locality
for optimum performance. This fixes some performance issues seen at scale
on HT systems.
Also added some performance profiling to help with future tunining on
various platforms.
The CPU affinity and profiling are set via new mpxyd.conf parameters.
defaults are affinity=1, affinity base cpu_id=0, profiling=0
mcm_affinity, mcm_affinity_base, mcm_profile
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>