Amir Hanania [Wed, 5 Aug 2015 21:55:30 +0000 (14:55 -0700)]
mpxyd: add MFO support on proxy side
Add checking for MFO and MXS and provide proxy-in and proxy-out
services for each mode. MXS_EP check is now MXF_EP (MFO or MXS).
Add new MIX device open, query, port query, pz operations.
Add new pz list and object management via scif_dev structure.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com> Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Amir Hanania [Wed, 5 Aug 2015 20:41:32 +0000 (13:41 -0700)]
mcm: add MFO support to openib_common code base
Provide full proxy support of CQ, QP, PZ, MR and device.
Use use new MXF_EP macro to switch proxy service based
on MXS (cross socket) or MFO (full offload) modes.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com> Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Arlin Davis [Fri, 24 Jul 2015 23:01:29 +0000 (16:01 -0700)]
mcm: add intra-node support via ibscif device and mcm provider
- New device entry ofa-v2-scif0-m
- Support for different CM and EP locality (MIC vs proxy LID)
- MSS mode for all scif device opens via proxy
- logging changes for multi-lid options
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 14 Jul 2015 21:58:32 +0000 (14:58 -0700)]
mcm,mpxyd: fix dreq processing to defer QP flush when proxy WRs still pending
The proxy will now defer DREQ flushing of proxy QPs if PI and PO
data engines have outstanding requests. Add mcm_qp_busy routine
for checking PI and PO data engines. When MIC calls disconnect
always send DREQ up to proxy in order to handle deferred flush
of proxy side posted rcv messages.
Change QP free to modify both local and proxy QPs and check for
outstanding rcv message before qp_destroy to avoid infinite wait
in dapls_ep_flush_cqs.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Amir Hanania [Wed, 17 Jun 2015 17:12:24 +0000 (10:12 -0700)]
mcm: bug fixes for non-inline devices
mcm proxy mi_send_pi setup registered WR structure properly for no
inline data support but incorrectly overwrote sg.addr with WR
WR structure on stack.
qp create didn't check for no inline and setup create accordingly
Signed-off-by: Amir Hanania <amir.hanania@intel.com> Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 5 Jun 2015 19:14:37 +0000 (12:14 -0700)]
mpxyd,mcm: RDMA write with immed data not signaled on request side
With eager completions set, the wc_flags is not set properly on event.
With eager completions no set, the proxy CQ reference is incorrect
and event is forwarded to MCM receive EVD instead of transmit EVD.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 20 May 2015 18:43:03 +0000 (11:43 -0700)]
mcm: add HST side provider support for device without inline data capability
Add registered WR buffers for HST->MXS (proxy in) mode
when inline data is not supported by device. Use registered
memory for source WR buffer instead of stack when sending
RDMA write request to peer proxy-in service.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 18 May 2015 21:51:08 +0000 (14:51 -0700)]
ucm: CM changes for UD extended port space and indexer
Tested on 1200n 28ppn cluster, AlltoAll Intel MPI, UD mode.
Both static and dynamic modes, over 500m connections.
Change port manager to indexer and service ID manager
to bitarray indexer. Reduces footprint for service IDs
and allow direct lookup on CM messages.
New insert, remove, lookup functions for processing ID
based CM objects. Inbound requests, with the exception
of new CM requests, will no longer parse list but
use hash table lookups.
AH caching is now used to prevent unnecessarily
creating multiple AH's for same QP destination.
Add 24-bit port space support to CM processing code and
to wire protocol via DCM message reserve space.
Add version check to limit to 16-bit for backward compatibility.
Bump CM protocol version to 8 for xport and rtns fields.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 18 May 2015 21:21:07 +0000 (14:21 -0700)]
ucm: optimizations for large scale UD communication management
AH caching per QP, AH space set to 48K for LID unicast
Bump port space up to 24 bits
Reduce CM object and reduce private data to 68 bytes
Add xport space and rtns to DCM reserve fields.
New indexer macros for port space hash table management
Add hash table storage to ibtrans device objects
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 12 Feb 2015 20:21:37 +0000 (15:21 -0500)]
mpxyd: add support for devices without inline data support
Add function to check for inline support during device open.
If inline data is not supported, the CM service and Proxy
data mover will not use inline data option on small IO.
The PO->PI service will now allocate and register necessary
memory to send mcm_wr_rx and mcm_wc_rx operations from
registered memory locations if inline data not supported.
If inline is supported, no extra memory will be allocated
and src buffer will be built on stack as before.
Cleanup some build warnings.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 22 Jan 2015 23:49:25 +0000 (15:49 -0800)]
openib: add inline data support check during device open
Not all rdma devices support inline data, however without
a verbs device attribute the only way to determine
support is with a QP create with max_inline_send set.
Add a common function to verify inline data support
before setting default to 64 bytes.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 15 Dec 2014 20:15:54 +0000 (12:15 -0800)]
dapl: mpxyd service changes to support multi-thread single-core option
The proxy service has been changed to reduce the number of cores required
on the host side. Provides new option, via mpxyd.conf, to use single-core
and allow system adminitrator to bind to specific core id for all Intel
Xeon Phi adapters in the platform.
mcm_affinity = 2 will set to single core (per Intel Xeon Phi).
mcm_affinity_base_mic will set to specific core for all adapters.
Best performance can be acheived with mcm_affinity = 2 and
mcm_affinity_base_mic == 0. This option will cause single core
to remain busy, polling operations from clients, as long
as long as device is open and being used by clients for data
transfers.
Arlin Davis [Mon, 15 Dec 2014 20:05:33 +0000 (12:05 -0800)]
dapl: add rdma_write_imm and write only option to dtest
New write_only (-w) option with rdma_write_imm can
be used with providers that support IB extensions.
Allows more options for write bandwith profiling
with immediate data and signaling rate options
to increase write data rates, especially on MIC
clients that use proxy services.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 21 Nov 2014 22:26:40 +0000 (14:26 -0800)]
mpxyd: DTO completion ERR: status 12, op RDMA_WRITE running MPI alltoall test
Running MIC scale-up configuration with mcm provider on a MXS node
instead of shm causes DTO error due to heavy use of proxy-in buffer pools.
Hit corner case where proxy buffer management hd ptr crossed tl
ptr due to 64 byte alignment on start when hd < 64 bytes behind tl.
Add additional checking on PO and PI buffer management to handle
the case of HD passing TL on start locations. Also changed PO
processing to hold lock until hd ptr is registered with buf_wc slot
management to preserve order of memory usage across threads.
Reduced the size of WC queue for PO and PI buffer management.
Profiling, via MCM_PROFILE, was added to monitor and trigger buffer
management errors.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 30 Sep 2014 21:07:52 +0000 (14:07 -0700)]
dtestx: update IB extension example test with new v2.0.9 features
Add support for new IB extensions for CM and AH resource cleanup.
Check for v2.0.9 and call dat_ib_ud_cm_free after connection
establishment and dat_ib_ud_ah_free after all data has been
transfered on UD endpoints.
Also add socket based address exchange to eliminate the need
to include lid and qpn parameters on the client side.
Change the multiple EP mode to send from EP 0 to EP[0-3] on
server side and EP[0-3] to EP[0-3] on client side.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Amir Hanania [Thu, 25 Sep 2014 23:32:06 +0000 (16:32 -0700)]
common: add srq support for openib verbs providers
Add necessary components and hooks to support ib_verbs shared
receive queues for both RC and UD QP's. External interfaces
were already provided per DAT 2.0 specification but internal
support was missing.
A new dtestsrq will be provided with package for testing and
example code.
Arlin Davis [Thu, 25 Sep 2014 23:06:33 +0000 (16:06 -0700)]
openib: add IB UD cm_free/ah_free extension support in UCM provider
Make changes to UCM provider for new CM and AH destroy extensions.
Allow consumer to schedule CM object destroy after CM connection
event has been processed. Active side will put CM object in
TIMEWAIT in case RTU is dropped, passive side can schedule
CM object destroy immediatly when called. In the case where
consumer requests CM object destroy, the provider will remove
all internal references to AH since consumer will call AH
destroy directly when finished with UD sends.
All other providers, MCM, CMA, SCM will return UNSUPPORTED
if new extensions are called.
See dtestx source for code examples of new extensions.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 25 Sep 2014 22:42:38 +0000 (15:42 -0700)]
extension: add IB UD extensions to reduce provider CM and AH memory footprint
dat_ib_ud_cm_free, dat_ib_ud_ah_free added to allow consumers
the option to free provider CM and AH objects, related to AH resolution,
immediately after consuming CONN events instead of waiting for
EP destroy. With existing UD service providers the CM and AH objects
are linked to EP and not destroyed until consumer calls dat_ep_free.
dat_ib_ud_cm_free() frees CM object after AH and private data are copied
and stored by consumer. Provider will destroy internal object
and memory associated with CM and AH resolution.
MAY be called after CM establishment and before EP destroyed
dat_ib_ud_ah_free() destroys UD Address Handle (AH).
MUST be called after all UD sends are complete and
before UD EP is destroyed.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
This patch adds the dapl_os_atopmic_inc, dapl_os_atomic_dec,
and dapl_os_atomic_assign function implementatios to the dapl
userspace package to provide the DAPL API support on the s390x
platform by adding Assembler language implemenation of those
platform specific functions.
Signed-off-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com> Acked-by: Arlin Davis <arlin.r.davis@intel.com>
Amir Hanania [Tue, 26 Aug 2014 22:41:10 +0000 (15:41 -0700)]
dtest server exchange connection info with client
The server and client create connection for the server to send the setup info to the client.
When using dtest, the client only needs to use -h <hostname/IP address> option and it will get the rest of the info from the server.
Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Arlin Davis [Mon, 25 Aug 2014 15:59:50 +0000 (08:59 -0700)]
mcm: implement proxy mix_prov_attr function, add fields CPU model and family
Provide MIC consumers with a provider specific query for proxy CPU model and family
to identify platform type from MIC side. Supported in MCM provider only.
The following provider specific name attributes were added to MCM:
DAT_IB_PROXY_CPU_FAMILY
DAT_IB_PROXY_CPU_MODEL
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Mon, 11 Aug 2014 16:49:08 +0000 (09:49 -0700)]
mpxyd: change MIC cpu_mask to per numa node instead of adapter
The proxy processing threads for multiple cards in same socket will overlap
same cpu cores with existing cpumask per adapter. Change thread affinity
and cpumask to a per socket method.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 1 Aug 2014 17:54:14 +0000 (10:54 -0700)]
mpxyd: MXS based alltoall benchmark hangs or returns post_send timeout
Clean-up shared proxy buffer slot management during IO completions.
Current code adjusts proxy buffer tail, using m_idx, incorrectly
if freeing multiple in order buffer slots. Also, when processing
immediate in-order slot, m_po_buf_tl() failed to continue parsing
list to free other in-order !busy slots.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>