Arlin Davis [Tue, 30 Sep 2014 21:07:52 +0000 (14:07 -0700)]
dtestx: update IB extension example test with new v2.0.9 features
Add support for new IB extensions for CM and AH resource cleanup.
Check for v2.0.9 and call dat_ib_ud_cm_free after connection
establishment and dat_ib_ud_ah_free after all data has been
transfered on UD endpoints.
Also add socket based address exchange to eliminate the need
to include lid and qpn parameters on the client side.
Change the multiple EP mode to send from EP 0 to EP[0-3] on
server side and EP[0-3] to EP[0-3] on client side.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Amir Hanania [Thu, 25 Sep 2014 23:32:06 +0000 (16:32 -0700)]
common: add srq support for openib verbs providers
Add necessary components and hooks to support ib_verbs shared
receive queues for both RC and UD QP's. External interfaces
were already provided per DAT 2.0 specification but internal
support was missing.
A new dtestsrq will be provided with package for testing and
example code.
Arlin Davis [Thu, 25 Sep 2014 23:06:33 +0000 (16:06 -0700)]
openib: add IB UD cm_free/ah_free extension support in UCM provider
Make changes to UCM provider for new CM and AH destroy extensions.
Allow consumer to schedule CM object destroy after CM connection
event has been processed. Active side will put CM object in
TIMEWAIT in case RTU is dropped, passive side can schedule
CM object destroy immediatly when called. In the case where
consumer requests CM object destroy, the provider will remove
all internal references to AH since consumer will call AH
destroy directly when finished with UD sends.
All other providers, MCM, CMA, SCM will return UNSUPPORTED
if new extensions are called.
See dtestx source for code examples of new extensions.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 25 Sep 2014 22:42:38 +0000 (15:42 -0700)]
extension: add IB UD extensions to reduce provider CM and AH memory footprint
dat_ib_ud_cm_free, dat_ib_ud_ah_free added to allow consumers
the option to free provider CM and AH objects, related to AH resolution,
immediately after consuming CONN events instead of waiting for
EP destroy. With existing UD service providers the CM and AH objects
are linked to EP and not destroyed until consumer calls dat_ep_free.
dat_ib_ud_cm_free() frees CM object after AH and private data are copied
and stored by consumer. Provider will destroy internal object
and memory associated with CM and AH resolution.
MAY be called after CM establishment and before EP destroyed
dat_ib_ud_ah_free() destroys UD Address Handle (AH).
MUST be called after all UD sends are complete and
before UD EP is destroyed.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
This patch adds the dapl_os_atopmic_inc, dapl_os_atomic_dec,
and dapl_os_atomic_assign function implementatios to the dapl
userspace package to provide the DAPL API support on the s390x
platform by adding Assembler language implemenation of those
platform specific functions.
Signed-off-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com> Acked-by: Arlin Davis <arlin.r.davis@intel.com>
Amir Hanania [Tue, 26 Aug 2014 22:41:10 +0000 (15:41 -0700)]
dtest server exchange connection info with client
The server and client create connection for the server to send the setup info to the client.
When using dtest, the client only needs to use -h <hostname/IP address> option and it will get the rest of the info from the server.
Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Arlin Davis [Mon, 25 Aug 2014 15:59:50 +0000 (08:59 -0700)]
mcm: implement proxy mix_prov_attr function, add fields CPU model and family
Provide MIC consumers with a provider specific query for proxy CPU model and family
to identify platform type from MIC side. Supported in MCM provider only.
The following provider specific name attributes were added to MCM:
DAT_IB_PROXY_CPU_FAMILY
DAT_IB_PROXY_CPU_MODEL
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 11 Aug 2014 16:49:08 +0000 (09:49 -0700)]
mpxyd: change MIC cpu_mask to per numa node instead of adapter
The proxy processing threads for multiple cards in same socket will overlap
same cpu cores with existing cpumask per adapter. Change thread affinity
and cpumask to a per socket method.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 1 Aug 2014 17:54:14 +0000 (10:54 -0700)]
mpxyd: MXS based alltoall benchmark hangs or returns post_send timeout
Clean-up shared proxy buffer slot management during IO completions.
Current code adjusts proxy buffer tail, using m_idx, incorrectly
if freeing multiple in order buffer slots. Also, when processing
immediate in-order slot, m_po_buf_tl() failed to continue parsing
list to free other in-order !busy slots.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 21 Jul 2014 18:18:07 +0000 (11:18 -0700)]
Add new MIC RDMA proxy service daemon (MPXYD)
New service created to support MIC based proxy RDMA. Includes
services to manage connectivity of multi-path heterogeneous
endpoints and use data paths based on platform constraints.
It will create and manage multiple QP's per endpoint if needed. This
allows optimal performance per direction based on various platform
constraints. For example, if the MIC is on same socket as HCA, only
proxy out is needed and not proxy in. In this case, data can go direct
from MPXYD->MIC. However, if the MIC is on a different CPU socket
from HCA, the provider will use both proxy out and proxy in services
to avoid additional constraints of the server platform.
The MCM provider and MPXYD will support connections between
MIC and non MIC endpoints.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 21 Jul 2014 17:58:37 +0000 (10:58 -0700)]
add new dapl MIC provider (MCM) to support MIC RDMA proxy services
Provider supports all modes of connectivity and will setup data paths
based on endpoint locality and platform constraints. Provides
transparent DAT API support for RDMA writes, RDMA write with
immediate data, Sends, and Recvs. No RDMA read or atomic support.
To use MCM provider an application can use the new ofa-v2-mcm
device definations in dat.conf. Intel MPSS is required for
for MCM provider build and usage.
The following shows connectivity modes and data paths:
HST -> HST to HCA
MSS -> MIC to HCA same socket
MXS -> MIC to HCA cross socket
Arlin Davis [Mon, 21 Jul 2014 15:03:46 +0000 (08:03 -0700)]
MCM: new MIC provider and proxy service definitions
Definitions for MIC Proxy RDMA services
MCM <-> MPXYD over SCI (Symmetric Communications InterFace) - ops, cm, events
MCM <-> MCM over IB - CM, WR/WC proxy-in and proxy-out wire protocol
This service enables MIC based DAPL provider (MCM) to use
proxy data service (host CPU) for SND/RCV and RDMA write operations.
RDMA reads and atomics are not supported. This service communicates within
within a server platform over PCI-E bus using SCIF and a MCM specific
MIX (MIC exchange) messaging protocol. The MCM provider uses a new MCM
CM protocol on the wire along with a Proxy WR/WC protocol.
This service is designed to improved bandwidth on larger IO
when direct MIC based IO is contrained.
This new MCM provider maintains the DAT level API semantics, including
strict ordering requirements of data flow. RDMA write with immediate
data is the only IB extension supported.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 18 Jul 2014 15:51:00 +0000 (08:51 -0700)]
common: add CQ,QP,MR abstractions for new MIC provider and data proxy service
The new MIC (many integrated core) based provider (MCM) has the capability to
shadow QPs,CQs,MRs on the host side of the platform for optimial performance
based on locality of endpoints and platform contraints. Each endpoint (DAPL_EP),
transparent to consumer, may have multiple connections via MCM provider.
openib_common ib_cq/ib_qp code base has been expanded, MCM only, to support
separate send and receive channels per endpoint.
openib_common dapl_mr code base has been expanded, MCM only, to support
MIC base DMA interfaces for MIC to HOST communications.
openib_common post_send,post_recv inline code base, MCM only, has been
modified to proxy data services via the new MCM provider.
dapl_ib_async_str added for better logging across openib providers.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 11 Jul 2014 15:39:01 +0000 (08:39 -0700)]
new lightweight open_query/close_query IB extension for fast attribute query
Consumers that need provider attributes must do a full device open
in order to get any provider/device information. With so many static device
entries in /etc/dat.conf consumers are building classification
mechanisms to identify provider type, locality, name, device
mode, and decide which device is appropriate. The existing DAT interface
doesn't provide a lightweight mechanism for queries.
The following fast query functions have been added to dat_ib_extensions.h:
In addition, DAT extension interface, dat_extension_op, has been
expanded to include new internal calls to handle quick provider load
and function linkage via udat_extension_open, and udat_extension_close
functions. Extended operations needing DAT open/close services need
to be defined from a DAT_OPEN_EXTENSION_BASE or DAT_CLOSE_EXTENSION_BASE
respectively.
NOTE: The ia_handle returned with open query must be closed with subsequent
close_query and not used with any other dat_ia_ operations. Attribute
storage from query_open is not valid after close_query call.
The IB extensions have been rolled to version 2.0.8 with this new API.
The changes are backward compatible.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 8 Jul 2014 23:14:51 +0000 (16:14 -0700)]
cma: long delays when opening cma provider with no IPoIB configured
The rdma_cm provider (ofa-v2-ib0) can take netdev, ip address, or hostname
for local address bindings. When trying to open a non-existent netdev (ib0)
the provider will fall through and use the getaddrinfo sys call assuming
dat.conf parameter is either an IP address or hostname and not a netdev.
This patch changes getipaddr() error handling when opening the cma provider
on a non-existant netdev. It will only call getaddrinfo with AI_CANONNAME
hints after checking for a valid hostname.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 24 Jun 2014 22:49:20 +0000 (15:49 -0700)]
IB extension: segfault in create collective group with non-vector type IA handle"
The dats_get_ia_handle call was change in 2.0.34 to convert IA handle from
both vector to handle and handle to vector to fix query calls that
incorrectly returned IA handles in non-vector form. If a caller uses a
non vector IA handle it will get converted incorrectly to a vector and cause
a segfault. Add additional check to verify a IA handle type before calling
get ia handle to avoid incorrect translation.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 14 Mar 2014 17:47:06 +0000 (10:47 -0700)]
dapltest: change server port, from 45278 to 62000, out of registered IANA range
The existing port 45278 is in the registered port range.
RFC 6335:
System Ports, well known, 0-1023 (assigned by IANA)
User Ports, registered, 1024-49151 (assigned by IANA)
Dynamic Ports, private or Ephemeral, 49152-65535 (never assigned)
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
dapltest: Add final send/recv "sync" for transaction tests.
The transaction tests need both sides to send a sync message after running the test. This ensures that all remote operations are complete before dapltest deregeisters memory and disconnects the endpoints.
Without this logic, we see intermittent async errors on iwarp devices because a read response or write arrives after the rmr has been destroyed.
I believe this is more likely to happen with iWARP than IB because iWARP completions only indicate the local buffer can be reused. It doesn't imply that the message has even arrived at the peer, let alone been placed in the peer application's memory.
Changes from V1:
- allocate new send/recv buffers for the Final Sync message.
- post the Final Sync recv buffer at the beginning of the final iteration of a test.
- tests ok on cxgb4 and mlx4 devices.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Arlin Davis [Tue, 1 Oct 2013 22:40:17 +0000 (15:40 -0700)]
SCM: getifaddrs modfications for better out of the box experience
socket cm will now walk list of interfaces and ignore loopback
and ignore IB devices, unless the IB netdev is the only device.
Works better in a heterogenous environment with a mix of net device.
Tested with br0, mic0, and mic0:ib netdev mixes.
Overriding with DAPL_SCM_NETDEV still works as is.
Signed-off-by: Patrick Mccormick <patrick.m.mccormick@intel.com> Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 1 Oct 2013 21:03:51 +0000 (14:03 -0700)]
ucm, scm: UD mode triggers list_head assert with large scale alltoall test
1024+ ranks, IMB alltoall may hit assert when running Intel MPI in UD mode.
CR clean up was implemented with EP to CR references still linked.
During cr_accept, the CR remote_ia_address is linked to EP object
by mistake with UD mode. UD mode my have multiple CRs per EP so
no direct mappings to CR memory can exist unless RC mode which
always has one EP to CR mapping.
In scm, ucm: for CM object free with CR references the search and
unlinking from SP must be under SP lock to serialize. Also,
cleanup thread wakeup logic to only trigger the thread if
reference count indicates the need for more processing.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 16 Jul 2013 23:12:37 +0000 (16:12 -0700)]
dapltest: add -n parameter to override default server port number (45278)
Modify all tests and commands to take a new -n parameter option for server
listen port. The default port, when running multiple EP's and threads,
will sometimes collide and fail with EADDRINUSE on iWARP configurations
using rdma_bind_addr with sin_port=0.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 12 Jul 2013 18:52:33 +0000 (11:52 -0700)]
ucm,scm: UD mode creates many CR objects per EP that needs cleaned up
After connection is established and the AH is provided to consumer
on UD connect establishment there is no need to keep the CR object
on the SP. For large clusters this results in a growing memory
footprint for CR objects and long cleanup times on device close.
Change ucm and scm providers to unlink and free CR resources
during CM object free if this is a UD QP and CONN_EST state.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 25 Apr 2012 20:07:10 +0000 (13:07 -0700)]
config/build: remove post/postun hacking used to modify dat.conf
Return to the tried and true method of managing configuration
files via %config directive and remove ugly sed editing methods.
The dat.conf includes both v1 and v2 device entries to insure
backward compatibility. Add doc/dat.conf
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>