Amir Hanania [Wed, 28 Sep 2016 21:41:56 +0000 (14:41 -0700)]
common: set atomic attributes based on provider/device capabilities
DAT_IB_FETCH_AND_ADD and DAT_IB_CMP_AND_SWAP values in provider_specific_attr are always set to TRUE.
Set their value according to the device atomic capability.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com> Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Amir Hanania [Mon, 19 Sep 2016 23:42:39 +0000 (16:42 -0700)]
MCM MIX: When mmap req from MIC return with fail stat print WARN.
When MIC mmap req response return with fail stat, print WARN as it only means that the host is not in polling mode and does not support send op via mmap.
Not an ERR.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com> Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Amir Hanania [Thu, 26 May 2016 21:28:32 +0000 (14:28 -0700)]
mcm proxy: push WR from MIC to host with scif mmap memory instead of scif_send.
Mapping host memory to the MIC. Use this memory, in a ring buffer way,
to send the post send work requests from MIC to host. This is replacing
the scif_send to scif_recv and the recv data FD event mechanism.
Since there is no use of FD to wake up the host proxy service,
the host needs to run in polling mode to use this option.
How to run the host in polling mode:
By default, the proxy is now running in polling mode.
You can verify that it is the case in the mpxyd.log file.
Or, edit the mpxyd.conf file: set mcm_affinity to 2.
This optimization improves small message latencies on MFO
devices by as much as 50%.
Signed-off-by: Amir Hanania <amir.hanania@intel.com> Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 6 Apr 2016 22:41:44 +0000 (15:41 -0700)]
mpxyd: m_req_event assert during large io streams, HST to MIC
When proxy-in (PI) WR queue is full and client is blocked on
new WR entries, the WR completion processing can
incorrectly reference a PI WR field after the client is
given remote access.
After data is forwarded to the appropriate MIC, the proxy
service will send a RW_imm WC message. This releases
the m_wr_rx entry for re-use by remote mcm provider client.
At the same time, the proxy can be processing the RW_imm
completion and incorrectly use the wr_rx->context field for
m_qp reference. Change the proxy_in event processing code
to avoid dependencies on any wr_rx content.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 6 Apr 2016 21:02:59 +0000 (14:02 -0700)]
mcm: HST->MXS IO streams can overrun MPXYD proxy-in WR queue
MPXYD proxy-in service cannot consume HST->MIC WR's fast
enough on 100Gb/s fabrics and from server based clients. This
results in post_send failing with DAT_INSUFFICIENT_RESOURCES.
Add retry mechanism, with limited retries, for the
host side mcm provider via dat_ep_post_send.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 17 Mar 2016 23:06:35 +0000 (16:06 -0700)]
scm: backward compatibility issue with MTU negotiation
The scm provider builds the CM reply message on stack
and doesnt memset to zero so resv fields are unknown.
The client cannot check mtu/resv field for MTU adjustments.
Bump provider CM message version to DCM_VER_MTU and add
check for appropriate version.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
New MCM provider on MIC side needs to open in compat mode
with MTU set to 2048. It needs to allow proxy, if new, to
adjust to active MTU. If old proxy is on host side, 2048
is returned as normal and new MCM provider remains in
compat mode with MTU at 2048.
New proxy on host side needs to support an old version of
MCM provider and adjust MTU only if MIC side changes
dev_attr.mtu settings. It will bump up to active_MTU
only if the MCM provider is new and sets the MIX_OP_SET
bit on the mic->host proxy device open call.
Proxy open device MUST set new dev attributes in client SMD
device object and not in the shared MD device object since
there can be multiple clients with different attribute
settings from MIC side.
MCM provider MUST query and setup MTU in open instead of query
so subsequent queries don't override negotiated setting.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Wed, 10 Feb 2016 22:45:12 +0000 (14:45 -0800)]
openib_common: set providers mtu to active_mtu instead of 2048
Better out of the box performance when setting mtu to active_mtu
instead of default settings of 2K. The new mtu settings are applied
on a per QP basis and negotiated via CM mtu 8-bit field. One of the
reserved 8 bit CM message fields is used to insure compatibility
with older versions.
If older endpoints are mixed with newer versions it will fallback to
the pre-existing 2K MTU settings, unless overriden by DAPL_IB_MTU.
The change has been made across all providers including ucm, scm, mcm,
and cma (rdma_cm). The mcm provider on a MIC will notify the CCL Proxy
service of a DAPL_IB_MTU override via a new MIX_OP_FLAGS bit
MIX_OP_MTU during the open call.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Amir Hanania [Tue, 26 Jan 2016 22:03:16 +0000 (14:03 -0800)]
dtest: enhancement to test, -D option for data check
With -D option, dtest will run pingpong rdma write test
with data validation. Changes pattern during iterations.
Aborts and reports location/pattern with any miscompare.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com> Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Amir Hanania [Mon, 25 Jan 2016 20:30:38 +0000 (12:30 -0800)]
mcm: add support for Intel Omni-Path driver (hfi) via mic MFO mode
Set MIC based consumer to MFO (full offload) mode for both qib and new hfi devices.
Add to dat.conf entries for hfi verbs support. This can be run from mic or host
endpoints.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com> Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Arlin Davis [Mon, 25 Jan 2016 19:51:33 +0000 (11:51 -0800)]
mpxyd: fix ordering issues with the CCL Proxy receive side forwarding mechanism
scif_writeto doesn't guarantee ordering on DMA posting like IB rdma writes.
Since CCL Proxy is emulating IB semantics we must perserve order of
the rdma write request from MIC consumers via any proxy scif operations.
Changes made to proxy-in to defer forwarding RR completed segments
unless they are middle segments of a larger write operation. On FS or LS
the previous scif_writeto DMA operations must be completed and signaled
before posting a first or last segment. Last segment scif_writeto
operation is ordered to insure last byte is the last byte of
complete rdma write proxied operation.
During scif_wt errors send WC error status for each pending segment
with rdma write operation for accurate proxy-out error processing.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Thu, 10 Dec 2015 22:48:05 +0000 (14:48 -0800)]
mpxyd: with abnormal CM termination a CM object can be referenced after QP destroy
The proxy-in CQ is not flushed and processes properly during
mix_qp_destroy. Depending on the EP mode there can be 2 seperate
connections with multiple CQs to process. Add new mix_cq_flush
function that will flush all pending work on TX and RX side of
proxy engine. CM object is destroyed and reset only after all
pending work is processed on ALL endpoint CQ associations.
Add error logging when WR resources are exhausted.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Error caused by cm_msg size compatability issue with new v8
protocol and older socket cm providers (2.1.4 and older).
The ucm, cma, and mcm providers are not affected.
Modify socket data sizes for SCM request/reply to interoperate
between new v8 with smaller private data and older protocols.
Adjust SCM reply/rtu based on remote CM version and retry a failed
request with pre-v8 adjusted size in case of server side failure.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
In function dapls_ib_qp_free(), pointers qp and cm_ptr->cm_id->qp are pointing to the same qp
structure, initialized in function dapls_ib_qp_alloc(). The memory pointed by these pointers are freed
twice in function dapls_ib_qp_free(), using rdma_destroy_qp() for the case _OPENIB_CMA defined and
then further using ibv_destroy_qp(), causing a segmentation fault while freeing the qp. Therefore
assigned NULL value to qp to avoid freeing illegal memory.
Fixes: 7ff4f840bf11 ("common: add CM-EP linking to support mutiple CM's and proper protection during
destruction")
Signed-off-by: Bharat Potnuri <bharat@chelsio.com> Acked-by: Arlin Davis <arlin.r.davis@intel.com>
Amir Hanania [Wed, 23 Sep 2015 21:43:38 +0000 (14:43 -0700)]
mpxyd: add P2P inline support for data size <= 96 bytes
Improve small message latency for proxy to proxy service
by including data with the proxy work request. Necessary
changes made to preservie order across WR's regardless
of size. Additional logging included. Improves single byte
one-way latency of about 27% on MFO configurations.
Changes made to avoid forwarding 0-byte rdma write to
scif_writeto, remove CPU hand copies, and order.
Changes for numa_node == -1 such that mic0 assumes MSS
and mic1 assumes MXS modes.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com> Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Arlin Davis [Mon, 21 Sep 2015 22:48:15 +0000 (15:48 -0700)]
dtest: change rdma_write_ping_pong so client is always last receiver
server always waits after test loops for DREQ event so in order
to gracefully shutdown client should always receive last handshake
message and issue DREQ. Remove logging in loop.
Always init data and increase min rdma buffer size to 4KB.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Carol L Soto [Mon, 24 Aug 2015 19:58:58 +0000 (12:58 -0700)]
dapltest: dapltest with no argument not working in ppc64 arch
If dapltest is run with no args then the client was getting
Warning: conn_event_wait DAT_CONNECTION_EVENT_NON_PEER_REJECTED
Reference to RH1056487- dapltest Read and Write performance
tests are not working
Signed-off-by: Carol L Soto <clsoto@linux.vnet.ibm.com>
Arlin Davis [Wed, 12 Aug 2015 16:46:30 +0000 (09:46 -0700)]
mpxyd: proxy_in data transfers can improperly start before RTU received
Proxy-in data transfers must be defered until RTU is received
and QP is in CONN state. Otherwise, the remote PI WC address/rkey
information is still unitialized.
Check for initial CONN state before processing RR or WT data phase
and set RR to pause state until RTU and remote PI WRC information
is processed. Update pi_req_event error logging.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Amir Hanania [Wed, 5 Aug 2015 21:55:30 +0000 (14:55 -0700)]
mpxyd: add MFO support on proxy side
Add checking for MFO and MXS and provide proxy-in and proxy-out
services for each mode. MXS_EP check is now MXF_EP (MFO or MXS).
Add new MIX device open, query, port query, pz operations.
Add new pz list and object management via scif_dev structure.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com> Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Amir Hanania [Wed, 5 Aug 2015 20:41:32 +0000 (13:41 -0700)]
mcm: add MFO support to openib_common code base
Provide full proxy support of CQ, QP, PZ, MR and device.
Use use new MXF_EP macro to switch proxy service based
on MXS (cross socket) or MFO (full offload) modes.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com> Signed-off-by: Amir Hanania <amir.hanania@intel.com>
Arlin Davis [Fri, 24 Jul 2015 23:01:29 +0000 (16:01 -0700)]
mcm: add intra-node support via ibscif device and mcm provider
- New device entry ofa-v2-scif0-m
- Support for different CM and EP locality (MIC vs proxy LID)
- MSS mode for all scif device opens via proxy
- logging changes for multi-lid options
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Tue, 14 Jul 2015 21:58:32 +0000 (14:58 -0700)]
mcm,mpxyd: fix dreq processing to defer QP flush when proxy WRs still pending
The proxy will now defer DREQ flushing of proxy QPs if PI and PO
data engines have outstanding requests. Add mcm_qp_busy routine
for checking PI and PO data engines. When MIC calls disconnect
always send DREQ up to proxy in order to handle deferred flush
of proxy side posted rcv messages.
Change QP free to modify both local and proxy QPs and check for
outstanding rcv message before qp_destroy to avoid infinite wait
in dapls_ep_flush_cqs.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Amir Hanania [Wed, 17 Jun 2015 17:12:24 +0000 (10:12 -0700)]
mcm: bug fixes for non-inline devices
mcm proxy mi_send_pi setup registered WR structure properly for no
inline data support but incorrectly overwrote sg.addr with WR
WR structure on stack.
qp create didn't check for no inline and setup create accordingly
Signed-off-by: Amir Hanania <amir.hanania@intel.com> Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
Arlin Davis [Fri, 5 Jun 2015 19:14:37 +0000 (12:14 -0700)]
mpxyd,mcm: RDMA write with immed data not signaled on request side
With eager completions set, the wc_flags is not set properly on event.
With eager completions no set, the proxy CQ reference is incorrect
and event is forwarded to MCM receive EVD instead of transmit EVD.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>