]> git.openfabrics.org - ~ardavis/dapl.git/log
~ardavis/dapl.git
10 years agodtest: add times for open_query, remove sleep
Arlin Davis [Tue, 18 Feb 2014 22:47:02 +0000 (14:47 -0800)]
dtest: add times for open_query, remove sleep

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agocommon: add provider name and len to DTO error logging
Arlin Davis [Tue, 18 Feb 2014 22:45:18 +0000 (14:45 -0800)]
common: add provider name and len to DTO error logging

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agonew lightweight open_query/close_query IB extension for fast attribute query
Arlin Davis [Wed, 12 Feb 2014 22:55:25 +0000 (14:55 -0800)]
new lightweight open_query/close_query IB extension for fast attribute query

Consumers that need provider attributes must do a full device open
in order to get any provider/device information. With so many static device
entries in /etc/dat.conf consumers are building classification
mechanisms to identify provider type, locality, name, device
mode, and decide which device is appropriate. The existing DAT interface
doesn't provide a lightweight mechanism for queries.

The following fast query functions have been added to dat_ib_extensions.h:

dat_ib_open_query(name, ia_handle, ia_mask, ia_attr, prov_mask, prov_attr)
dat_ib_close_query(ia_handle)

In addition, DAT extension interface, dat_extension_op, has been
expanded to include new internal calls to handle quick provider load
and function linkage via udat_extension_open, and udat_extension_close
functions. Extended operations needing DAT open/close services need
to be defined from a DAT_OPEN_EXTENSION_BASE or DAT_CLOSE_EXTENSION_BASE
respectively.

NOTE: The ia_handle returned with open query must be closed with subsequent
close_query and not used with any other dat_ia_ operations. Attribute
storage from query_open is not valid after close_query call.

The IB extensions have been rolled to version 2.0.8 with this new API.
The changes are backward compatible.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: need CM to QP linking with CM references
Arlin Davis [Wed, 12 Feb 2014 21:41:37 +0000 (13:41 -0800)]
mpxyd: need CM to QP linking with CM references

Complete coding support for ref_cnt on CM to allow for
proper destruction of CM resourses. Ref count for CM alloc,
QP linking, and queue list. List dequeue will trigger CM
free, move to destroy state, and dealloc if ref_cnt is zero.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodist: ib collective and MIC extension include files missing
Arlin Davis [Tue, 11 Feb 2014 22:31:49 +0000 (14:31 -0800)]
dist: ib collective and MIC extension include files missing

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodapltest: the quit command is missing changes for -n option.
Arlin Davis [Tue, 11 Feb 2014 18:19:05 +0000 (10:19 -0800)]
dapltest: the quit command is missing changes for -n option.

Server-port was not being set properly during param init phase on the client side.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoNULL undefined on Fedora, incorrectly using kernel stddef.h
Arlin Davis [Tue, 11 Feb 2014 18:17:04 +0000 (10:17 -0800)]
NULL undefined on Fedora, incorrectly using kernel stddef.h

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoMerge branch 'proxy' of ssh://beany.openfabrics.org/home/ardavis/scm/dapl into proxy
Arlin Davis [Mon, 10 Feb 2014 17:45:35 +0000 (09:45 -0800)]
Merge branch 'proxy' of ssh://beany.openfabrics.org/home/ardavis/scm/dapl into proxy

10 years agoucm: fix CM service, initial rcv msg posts incorrect
Arlin Davis [Tue, 4 Feb 2014 03:17:33 +0000 (19:17 -0800)]
ucm: fix CM service, initial rcv msg posts incorrect

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoucm: add/cleanup debug log information
Arlin Davis [Tue, 4 Feb 2014 03:15:57 +0000 (19:15 -0800)]
ucm: add/cleanup debug log information

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoscm: add/cleanup debug log information
Arlin Davis [Tue, 4 Feb 2014 03:14:31 +0000 (19:14 -0800)]
scm: add/cleanup debug log information

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agomakefile: update for MCM proxy-in changes
Arlin Davis [Tue, 4 Feb 2014 03:13:01 +0000 (19:13 -0800)]
makefile: update for MCM proxy-in changes

add mpxyd.h to dist files
separate funtionallity to multiple source files,
util.c, mix.c, mcm.c, mpxy_out.c, mpxy_in.c

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodtest: update for ep_mode on MCM providers
Arlin Davis [Tue, 4 Feb 2014 03:12:06 +0000 (19:12 -0800)]
dtest: update for ep_mode on MCM providers

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd.conf: updated for proxy-in parameters
Arlin Davis [Tue, 4 Feb 2014 03:09:44 +0000 (19:09 -0800)]
mpxyd.conf: updated for proxy-in parameters

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: proxy-in added to proxy-out service to increase cross socket performance
Arlin Davis [Tue, 4 Feb 2014 02:58:04 +0000 (18:58 -0800)]
mpxyd: proxy-in added to proxy-out service to increase cross socket performance

Proxy-in service added to MCM dapl providers to
improve cross socket MIC adapter performance.

Additional RX thread created to handle PI service,
new CM wire protocol to exchange WR and WC references,
and new DTO wire protocol to Read remote PO data
segments and forward via SCIF writeto.

In order to maintain DAT API compatibility the IB
MR addr, rkeys are translated to SCIF addresses
and TPT entries created on the MPXYD to handle inbound
rmda writes targeted to MIC adapters.

Code broken out into separate source files:
mpxy_in.c - proxy_in service
mpxy_out.c - proxy_out service
util.c - general utilities
mix.c - MIC to HOST operations
mpxyd.c - device open, RX, TX, OP, CM threads.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agomcm: add proxy in support to MCM provider and MPXYD interface
Arlin Davis [Tue, 4 Feb 2014 02:49:07 +0000 (18:49 -0800)]
mcm: add proxy in support to MCM provider and MPXYD interface

Add dapli_mix_post_recv, dapli_mix_mr_create, dapli_mix_mr_free
no QPr exist on MIC with MXS to MXS connections
cm addr becomes addr1, save all QPr addr1 info during rejects
verify CM service exists before freeing port space, could be on mpxyd
system guid support to verify locality to inside/outside the box
change UD mode checking on EP instead of QP, QP doesnt exist on MXS
add reject support
Fix for CM service RX posting, walking queue doesnt include GRH.
Add system_guid field to MCM provider ib_hca_transport struct

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoopen_ib common: qp, cq, and post_recv changes for proxy-in
Arlin Davis [Tue, 4 Feb 2014 02:37:34 +0000 (18:37 -0800)]
open_ib common: qp, cq, and post_recv changes for proxy-in

Modify common QP, CQ, and DTO services to support proxy-in
service that eliminates the need for local QP and CQ resouces
on the MIC adapter.

Change WR UD type check to support no QP mode.
Add dapli_mix_post_recv funtionality for PI, QPr on mpxyd.
Store platform unique guid for EP locality - inside/outside.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agocommom: add lmr support for proxy in service
Arlin Davis [Tue, 4 Feb 2014 02:31:31 +0000 (18:31 -0800)]
commom: add lmr support for proxy in service

Registration details must be tranfered to proxy service
to enable proxy-in data transfers. IB registration
and SCIF registration is sent to mpxyd for inbound
rdma write TPT services for IB RW store and SCIF writeto
forward capabilities. Extend DAT LMR to include
scif information and ID. If proxy service is
in use call new functions dapli_mix_mr_create/free
to sync with mpxyd.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agonew definitions and states for CCL Proxy-in support
Arlin Davis [Tue, 4 Feb 2014 02:26:53 +0000 (18:26 -0800)]
new definitions and states for CCL Proxy-in support

MCM proxy data limits, new CM free state, EP mode support for EP locallity
New ep mapping field in address structure
New MIX ops mix_recv and mix_cm_reject_user
Expanded MIX ops mr structure to include IB and SCIF details
Changed dat_mix_send struct name to dat_mix_sr for send and recv

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoRelease dapl-2.0.39.1-1
Arlin Davis [Thu, 5 Dec 2013 17:43:44 +0000 (09:43 -0800)]
Release dapl-2.0.39.1-1

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoRename back to dapl, small fixes for spec file
pmmccorm [Fri, 22 Nov 2013 20:31:34 +0000 (12:31 -0800)]
Rename back to dapl, small fixes for spec file

10 years agoRelease intel-mic-ofed-dapl-2.0.36.12-1
Arlin Davis [Wed, 25 Sep 2013 22:14:19 +0000 (15:14 -0700)]
Release intel-mic-ofed-dapl-2.0.36.12-1

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoucm, scm: UD mode triggers list_head assert with large scale alltoall test
Arlin Davis [Wed, 25 Sep 2013 22:10:56 +0000 (15:10 -0700)]
ucm, scm: UD mode triggers list_head assert with large scale alltoall test

1024+ ranks, IMB alltoall may hit assert when running Intel MPI in UD mode.

CR clean up was implemented with EP to CR references still linked.
During cr_accept, the CR remote_ia_address is linked to EP object
by mistake with UD mode. UD mode my have multiple CRs per EP so
no direct mappings to CR memory can exist unless RC mode which
always has one EP to CR mapping.

In scm, ucm: for CM object free with CR references the search and
unlinking from SP must be under SP lock to serialize. Also,
cleanup thread wakeup logic to only trigger the thread if
reference count indicates the need for more processing.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: ERR: stalled, insufficient proxy memory
Arlin Davis [Fri, 13 Sep 2013 22:12:05 +0000 (15:12 -0700)]
mpxyd: ERR: stalled, insufficient proxy memory

When scaling up/out with lots of QP's using shared
proxy buffer the rdma writes can block waiting for
memory to free. The signal rate on the posted
writes must be reduced to insure proxy buffer
are freed in a more timely manner.

Add logic to return failure if stalling becomes
excessive.

Allow administrator to adjust IB mcm_signal_rate
via mpxyd.conf. Default is now 10 instead of 100.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: handle catastrophic IB async events, including IBV_EVENT_LID_CHANGE
Arlin Davis [Thu, 12 Sep 2013 21:03:58 +0000 (14:03 -0700)]
mpxyd: handle catastrophic IB async events, including IBV_EVENT_LID_CHANGE

cleanup mdev destroy functions, use mcm_ib_async_str for all IB events.
Destroy all mdev resouces, including CM services, and abort all
open clients when receiving the following IB async events:

IBV_EVENT_PATH_MIG
IBV_EVENT_PATH_MIG_ERR
IBV_EVENT_DEVICE_FATAL
IBV_EVENT_PORT_ERR
IBV_EVENT_LID_CHANGE
IBV_EVENT_PKEY_CHANGE
IBV_EVENT_SM_CHANGE

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agomcm: add get/set prov attributes op, str print for ib async events
Arlin Davis [Thu, 12 Sep 2013 21:01:05 +0000 (14:01 -0700)]
mcm: add get/set prov attributes op, str print for ib async events

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agomcm: reduce max qp depth and msg size in proxy mode, allow override
Arlin Davis [Thu, 12 Sep 2013 16:12:55 +0000 (09:12 -0700)]
mcm: reduce max qp depth and msg size in proxy mode, allow override

DAPL_MCM_WR_MAX is used set max qp depth on mcm provider, default=500
DAPL_MCM_MSG_MAX is used set max msg size on mcm provider, default=8388608
DAPL_WR_MAX is used to override max qp depth on all IB providers.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: CM_REPLY: RETRIES (7) EXHAUSTED
Arlin Davis [Wed, 11 Sep 2013 22:04:37 +0000 (15:04 -0700)]
mpxyd: CM_REPLY: RETRIES (7) EXHAUSTED

The clients RTU is not processed by mpxyd thread in corner cases.
The SCIF EP, handling the client cm thread (scif_ev_ep) operations,
was not added to select FD set so the op_thread didn't wake up in the
case where RTU's were sent on scif_ev_ep and no operations are
being sent on scif_op_ep.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: reduce default proxy buffer and max message size
Arlin Davis [Wed, 11 Sep 2013 18:56:09 +0000 (11:56 -0700)]
mpxyd: reduce default proxy buffer and max message size

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: set eager completion on by default
Arlin Davis [Tue, 10 Sep 2013 16:26:17 +0000 (09:26 -0700)]
mpxyd: set eager completion on by default

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agocommon: cleanup async event processing and logging
Arlin Davis [Tue, 10 Sep 2013 16:19:18 +0000 (09:19 -0700)]
common: cleanup async event processing and logging

Add formatted string print for ib verbs async events
Remove unecessary logging and duplicate async callbacks
Modify all IB providers to use dapli_async_event_cb()

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoRelease intel-mic-ofed-dapl-2.0.36.11-1
Arlin Davis [Fri, 9 Aug 2013 18:18:04 +0000 (11:18 -0700)]
Release intel-mic-ofed-dapl-2.0.36.11-1

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoallow DAPL_DBG_TYPE settings between device opens
Arlin Davis [Fri, 9 Aug 2013 18:14:50 +0000 (11:14 -0700)]
allow DAPL_DBG_TYPE settings between device opens

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: add warning message and enable counters for CQ/QP
Arlin Davis [Thu, 8 Aug 2013 23:42:04 +0000 (16:42 -0700)]
mpxyd: add warning message and enable counters for CQ/QP

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: add new logging levels, bit mapped for better control
Arlin Davis [Tue, 6 Aug 2013 19:54:38 +0000 (12:54 -0700)]
mpxyd: add new logging levels, bit mapped for better control

 log_level:
 Indicates the amount of detailed data written to the log file.  Log levels
 are bit mapped as follow: 0xf for full verbose

 0x0 - errors always reported
 0x1 - warnings
 0x2 - cm operations
 0x4 - data operations
 0x8 - all other operations

default is still 0 == errors only

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: duplicate cm_req check across multi devices is wrong, causing incorrect reject...
Arlin Davis [Mon, 5 Aug 2013 21:55:12 +0000 (14:55 -0700)]
mpxyd: duplicate cm_req check across multi devices is wrong, causing incorrect reject on retries

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: segfault with NO LISTENER case, null smd-> on reject
Arlin Davis [Mon, 5 Aug 2013 20:52:37 +0000 (13:52 -0700)]
mpxyd: segfault with NO LISTENER case, null smd-> on reject

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoRelease intel-mic-ofed-dapl-2.0.36.10-1
Arlin Davis [Thu, 25 Jul 2013 22:42:27 +0000 (15:42 -0700)]
Release intel-mic-ofed-dapl-2.0.36.10-1

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: roll version of mcm command messages to v4
Arlin Davis [Mon, 29 Jul 2013 07:09:09 +0000 (00:09 -0700)]
mpxyd: roll version of mcm command messages to v4

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: return cmd response on device open failure
Arlin Davis [Mon, 29 Jul 2013 07:06:20 +0000 (00:06 -0700)]
mpxyd: return cmd response on device open failure

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agomcm: add debug info for device open failures
Arlin Davis [Mon, 29 Jul 2013 07:05:13 +0000 (00:05 -0700)]
mcm: add debug info for device open failures

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agomcm,mpxyd: pack all proxy command structures
Arlin Davis [Thu, 25 Jul 2013 16:08:50 +0000 (09:08 -0700)]
mcm,mpxyd: pack all proxy command structures

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agompxyd: return cq_id on create_cq proxy command
Arlin Davis [Thu, 25 Jul 2013 16:06:01 +0000 (09:06 -0700)]
mpxyd: return cq_id on create_cq proxy command

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agomcm: convert IB wr to MIX wr on proxy post_send
Arlin Davis [Thu, 25 Jul 2013 16:05:06 +0000 (09:05 -0700)]
mcm: convert IB wr to MIX wr on proxy post_send

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocma: add DAPL_CM_TOS environment variable to enable passing a TOS to the RDMA CM
Arlin Davis [Mon, 24 Jun 2013 21:19:22 +0000 (14:19 -0700)]
cma: add DAPL_CM_TOS environment variable to enable passing a TOS to the RDMA CM

Signed-off-by Matthew Finlay <matt@mellanox.com>
Acked-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agomcm: support incompatable verbs definitions inter-node within the platform
Arlin Davis [Thu, 18 Jul 2013 17:17:11 +0000 (10:17 -0700)]
mcm: support incompatable verbs definitions inter-node within the platform

OFA verbs 3.5 and 1.5.4 are incompatable so there can be no direct
mappings to verbs within any MIC to Host communications. Remove
all direct verbs mappings in MIX and create inline construct fuctions
to convert verbs to new dat_mix_wr and dat_mix_wc types for both work requests
and work completions.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agodapltest: add -n parameter to override default server port number (45278)
Arlin Davis [Tue, 16 Jul 2013 23:12:37 +0000 (16:12 -0700)]
dapltest: add -n parameter to override default server port number (45278)

Modify all tests and commands to take a new -n parameter option for server
listen port. The default port, when running multiple EP's and threads,
will sometimes collide and fail with EADDRINUSE on iWARP configurations
using rdma_bind_addr with sin_port=0.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoucm,scm: UD mode creates many CR objects per EP that needs cleaned up
Arlin Davis [Fri, 12 Jul 2013 18:52:33 +0000 (11:52 -0700)]
ucm,scm: UD mode creates many CR objects per EP that needs cleaned up

After connection is established and the AH is provided to consumer
on UD connect establishment there is no need to keep the CR object
on the SP. For large clusters this results in a growing memory
footprint for CR objects and long cleanup times on device close.

Change ucm and scm providers to unlink and free CR resources
during CM object free if this is a UD QP and CONN_EST state.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoRelease intel-mic-ofed-dapl-2.0.36.9-1
Arlin Davis [Wed, 26 Jun 2013 23:58:12 +0000 (16:58 -0700)]
Release intel-mic-ofed-dapl-2.0.36.9-1

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd: fix build with/without scif, init.d directory conflict, and missing -lpthread
Arlin Davis [Wed, 26 Jun 2013 23:45:12 +0000 (16:45 -0700)]
mpxyd: fix build with/without scif, init.d directory conflict, and missing -lpthread

Signed-off-by: Patrick Mccormick <patrick.m.mccormick@intel.com>
Acked-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd: add support for dynamic affinity support
Arlin Davis [Fri, 21 Jun 2013 20:43:17 +0000 (13:43 -0700)]
mpxyd: add support for dynamic affinity support

Add query feature via mic sysfs files numa_node and local_cpulist
for proper thread bindings - host to device.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoconfig: update dat.conf with more consistent naming conventions for device/provider...
Arlin Davis [Fri, 21 Jun 2013 20:41:00 +0000 (13:41 -0700)]
config: update dat.conf with more consistent naming conventions for device/provider types

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoinitialize new DAPL_DBG_LEVEL default to zero
Arlin Davis [Thu, 20 Jun 2013 18:13:07 +0000 (11:13 -0700)]
initialize new DAPL_DBG_LEVEL default to zero

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoadd DAPL_DBG_LEVEL for more debug log control
Arlin Davis [Thu, 20 Jun 2013 18:10:54 +0000 (11:10 -0700)]
add DAPL_DBG_LEVEL for more debug log control

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocma: long delays when opening cma provider with no IPoIB configured
Arlin Davis [Wed, 12 Jun 2013 16:45:45 +0000 (09:45 -0700)]
cma: long delays when opening cma provider with no IPoIB configured

The rdma_cm provider (ofa-v2-ib0) can take netdev, ip address, or hostname
for local address bindings. When trying to open a non-existent netdev (ib0)
the provider will fall through and use the getaddrinfo sys call assuming
dat.conf parameter is either an IP address or hostname and not a netdev.

When trying hostname option it will attempt to resolve the name via the
name services. On a KNC this can result in long timeouts depending on the
configuration. This changes the error handling when opening the cma provider
on a non-existant netdev and will only call getaddrinfo with AI_CANONNAME
hints after checking the dat.conf parameter for a valid hostname.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoRelease intel-mic-ofed-dapl-2.0.36.8-1
Arlin Davis [Wed, 5 Jun 2013 22:40:19 +0000 (15:40 -0700)]
Release intel-mic-ofed-dapl-2.0.36.8-1

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoupdate README.mcm
Arlin Davis [Wed, 5 Jun 2013 22:22:48 +0000 (15:22 -0700)]
update README.mcm

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoconfig: set eager_completion default setting to disabled
Arlin Davis [Wed, 5 Jun 2013 22:21:09 +0000 (15:21 -0700)]
config: set eager_completion default setting to disabled

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoconfig: add mlx5 entries to dat.conf
Arlin Davis [Wed, 5 Jun 2013 22:16:35 +0000 (15:16 -0700)]
config: add mlx5 entries to dat.conf

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd: only dump cm_lists with debug build
Arlin Davis [Wed, 29 May 2013 23:12:09 +0000 (16:12 -0700)]
mpxyd: only dump cm_lists with debug build

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd: CM optimizations for MIC clients, improved checking on inbound CM messages
Arlin Davis [Wed, 29 May 2013 23:00:32 +0000 (16:00 -0700)]
mpxyd: CM optimizations for MIC clients, improved checking on inbound CM messages

allow CM operations to be received on OP or EV channels from
MIC clients and provide each SMD channel with aligned message buffer
for scif_recv processing.

add checking for NO match at MD level after checking all SMD children
for inbound CM message match and add dump_cm_lists function for debug.

add check for inline message threshold, DAT_MIX_INLINE_MAX

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agomcm: scif_recv err on mpxyd when scaling up on MPI IMB scatter benchmark
Arlin Davis [Wed, 29 May 2013 22:36:29 +0000 (15:36 -0700)]
mcm: scif_recv err on mpxyd when scaling up on MPI IMB scatter benchmark

The inline send changes incorporated fragmented scif_send options which
de-serialized the stream operation on the scif endpoint. This can result
in a CM operation from the CM thread to interleave with the post_send
inline operation that sends a hdr and inline data separately.

Modify the post_send to use only one scif_send operation for inline.
Also optimize CM and Operations by moving all CM message to the
scif_ev_ep. Cleanup operation log messages to include op strings
for easier debug.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agomcm: add inline data threshold definition, 256 bytes
Arlin Davis [Wed, 29 May 2013 22:14:18 +0000 (15:14 -0700)]
mcm: add inline data threshold definition, 256 bytes

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocommon: change debug level of EP free warning
Arlin Davis [Wed, 29 May 2013 17:31:34 +0000 (10:31 -0700)]
common: change debug level of EP free warning

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocommon: increase DCM RTU time from 400 to 800ms
Arlin Davis [Tue, 28 May 2013 23:20:36 +0000 (16:20 -0700)]
common: increase DCM RTU time from 400 to 800ms

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocommon: increase DCM retry from 1 to 5
Arlin Davis [Tue, 28 May 2013 23:03:53 +0000 (16:03 -0700)]
common: increase DCM retry from 1 to 5

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocommon: increase level of ASYNC error logging, include in free builds
Arlin Davis [Tue, 28 May 2013 23:02:29 +0000 (16:02 -0700)]
common: increase level of ASYNC error logging, include in free builds

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agomcm: improved logging with CM retry/poll errors
Arlin Davis [Tue, 28 May 2013 22:59:39 +0000 (15:59 -0700)]
mcm: improved logging with CM retry/poll errors

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd: MPI IMB scatter on 12 ranks, 2 KNCs + 2 Hosts, fails
Arlin Davis [Wed, 22 May 2013 21:27:54 +0000 (14:27 -0700)]
mpxyd: MPI IMB scatter on 12 ranks, 2 KNCs + 2 Hosts, fails

multiple QP's processing multiple completions
hit bug in mix_dto_event when copying multiple
cq wc entries into a single dat_mix_dto_comp_t msg.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agomcm: segfault with counters, no CM object when not found and duplicate case
Arlin Davis [Tue, 21 May 2013 21:11:07 +0000 (14:11 -0700)]
mcm: segfault with counters, no CM object when not found and duplicate case

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoEnable ccl-proxy support if possible by default: yes, if nothing specified and scif...
pmmccorm [Mon, 13 May 2013 21:03:04 +0000 (14:03 -0700)]
Enable ccl-proxy support if possible by default: yes, if nothing specified and scif.h is present no, if scif.h is not present and nothing specfied no, if --enable-mcm=no is specified yes, if --enable-mcm=yes and scif.h is present error, if --enable-mcm=yes and scif.h missing

Make the corresponding changes to the spec file so that whatever
options are specified, the RPM will contain the right files (before
we were shipping the mpxyd service and conf regardless).

11 years agoclean up configure.in and make consistent patch
pmmccorm [Mon, 13 May 2013 20:47:36 +0000 (13:47 -0700)]
clean up configure.in and make consistent patch

11 years agoupdate for dapl-2.0.36.7
Arlin Davis [Fri, 17 May 2013 19:33:26 +0000 (12:33 -0700)]
update for dapl-2.0.36.7

11 years agoSCM: getifaddrs modfications for better out of the box experience with MIC
Arlin Davis [Fri, 17 May 2013 19:27:36 +0000 (12:27 -0700)]
SCM: getifaddrs modfications for better out of the box experience with MIC

socket cm will now walk list of interfaces and ignore loopback
and ignore IB devices, unless the IB netdev is the only device.
Works better in a heterogenous environment with a mix of MICs.
Tested with br0, mic0, and mic0:ib netdev mixes.
Overriding with DAPL_SCM_NETDEV still works as is.

Signed-off-by: Patrick Mccormick <patrick.m.mccormick@intel.com>
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd: mr_create and mr_free not returning error
Arlin Davis [Tue, 7 May 2013 21:56:39 +0000 (14:56 -0700)]
mpxyd: mr_create and mr_free not returning error

change mr_create and mr_free to return error
to MIC client if incorrectly called. Unsupported
feature at this time.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd: cleanup port space, qp and cm objects
Arlin Davis [Mon, 29 Apr 2013 19:00:39 +0000 (12:00 -0700)]
mpxyd: cleanup port space, qp and cm objects

Port space leak during close, and CM disconnect.
Changes to link and unlink CM and QP during
QP create/destruction and CM disconnect states.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agomcm: memory leak of scif EP's, cleanup listen errors
Arlin Davis [Mon, 29 Apr 2013 18:55:17 +0000 (11:55 -0700)]
mcm: memory leak of scif EP's, cleanup listen errors

The new ev and tx EP's created for performance
were not destroyed properly during close.

Listen returned incorrect error instead of EADDRINUSE
so consumers didn't retry appropriately.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agodapltest: fix endian adjustments for different platform types
Arlin Davis [Wed, 17 Apr 2013 22:12:34 +0000 (15:12 -0700)]
dapltest: fix endian adjustments for different platform types

if local and remote endpoints are different endian then swap meminfo
and key information for RDMA. Was only swapping big endian side.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agobuild/packaging: auto scan cleanup
Arlin Davis [Wed, 17 Apr 2013 17:53:03 +0000 (10:53 -0700)]
build/packaging: auto scan cleanup

Signed-off-by: Patrick Mccormick <patrick.m.mccormick@intel.com>
Acked-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agomcm: Correct some error logging, avoids err=Success messages when failing to open...
Arlin Davis [Wed, 17 Apr 2013 17:43:51 +0000 (10:43 -0700)]
mcm: Correct some error logging, avoids err=Success messages when failing to open a device

Signed-off-by: Patrick Mccormick <patrick.m.mccormick@intel.com>
Acked-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd: allow separate CPU affinity bindings for MIC and IB
Arlin Davis [Wed, 17 Apr 2013 17:19:08 +0000 (10:19 -0700)]
mpxyd: allow separate CPU affinity bindings for MIC and IB

allow configuration of different CPU bindings as follow:
mcm_affinity_base_hca 1
mcm_affinity_base_mic 8

Note: when set to 0, mpxyd will dynamically set affinity
based on locality of HCA and MIC adapter specified
during device open.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agodtest: set rdma_read_in/out attributes to 0 on write only
Arlin Davis [Wed, 17 Apr 2013 17:05:21 +0000 (10:05 -0700)]
dtest: set rdma_read_in/out attributes to 0 on write only

when running in write_only mode, create the EP with rdma read
attributes set to 0.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agodat.conf: add new device definitions for 2nd mlx4 adapter
Arlin Davis [Wed, 17 Apr 2013 16:57:01 +0000 (09:57 -0700)]
dat.conf: add new device definitions for 2nd mlx4 adapter

add mlx4_1 entries for ucm, scm, and mcm providers
in dat.conf. New entries appended to existing list for
backward compatibility.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agomcm: returning device rdma read depth of 0 causes MPI to fail
Arlin Davis [Mon, 15 Apr 2013 18:58:59 +0000 (11:58 -0700)]
mcm: returning device rdma read depth of 0 causes MPI to fail

return the RDMA read support via provider query and simply return
the rdma_read values from device via dat_ia_query/dat_ep_query.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocommon: add error logging on ep_create attribute checking
Arlin Davis [Mon, 15 Apr 2013 18:56:36 +0000 (11:56 -0700)]
common: add error logging on ep_create attribute checking

add logging to help distinguish between transport and general EP
attribute failures for ARG6.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocma: user reject is IB specific, should be transport agnostic
Arlin Davis [Thu, 11 Apr 2013 22:34:57 +0000 (15:34 -0700)]
cma: user reject is IB specific, should be transport agnostic

remove check for IB type, private data is enough
context for user specific reject type.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoopenib: add new provider specific attributes
Arlin Davis [Mon, 25 Mar 2013 17:24:54 +0000 (10:24 -0700)]
openib: add new provider specific attributes

DAT_IB_PROVIDER_NAME = MCM/UCM/CMA/SCM
DAT_IB_DEVICE_NAME = mlx4_0/scif0/ipath0/etc
DAT_IB_CONNECTIVITY_MODE = DIRECT/PROXY
DAT_IB_RDMA_READ = TRUE/FALSE
DAT_IB_NODE_GUID = xxxx:xxxx:xxxx:xxxx

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd: add support for full work request or memory pool
Arlin Davis [Mon, 25 Mar 2013 17:18:05 +0000 (10:18 -0700)]
mpxyd: add support for full work request or memory pool

Current implemention will fail when WR or memory is full. Change to
throttle and retry mix post_send opertions during full work queue.
New wr_pp (pst pending) added to m_qp for tracking outstanding
IB work request in flight.

Add counters for full wr and mem pool cases. Print mix-version on
startup.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoRelease intel-mic-ofed-dapl-2.0.36.7-1
Arlin Davis [Mon, 18 Mar 2013 18:49:50 +0000 (11:49 -0700)]
Release intel-mic-ofed-dapl-2.0.36.7-1

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd: reducing logging and add configuration for counters
Arlin Davis [Mon, 18 Mar 2013 18:46:16 +0000 (11:46 -0700)]
mpxyd: reducing logging and add configuration for counters

reduce logging level of resource allocation to reduce noise.
mcm_counters setting added to mpxyd.conf to control counters

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoconfig: add dat.conf entry for RDMA CM provider for IB SCIF
Arlin Davis [Fri, 15 Mar 2013 23:22:46 +0000 (16:22 -0700)]
config: add dat.conf entry for RDMA CM provider for IB SCIF

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd: mpxyd does not start on SLES11SP2
Arlin Davis [Fri, 15 Mar 2013 21:55:19 +0000 (14:55 -0700)]
mpxyd: mpxyd does not start on SLES11SP2

add checking for SUSE and RH and process
accordingly.

Signed-off-by: Patrick Mccormick <patrick.m.mccormick@intel.com>
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd: Intel MPI Library functional tests failed with CCL-proxy enabled
Arlin Davis [Fri, 15 Mar 2013 21:45:35 +0000 (14:45 -0700)]
mpxyd: Intel MPI Library functional tests failed with CCL-proxy enabled

fix for too many open files on SCIF. CCL proxy missed cleanup
of new scif_ev_ep for CM processing. Every MIC client open/close
leaked a scif EP.

Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agodtest: fix test to ack data before time stamp for accurate perf results
Arlin Davis [Fri, 15 Mar 2013 21:36:21 +0000 (14:36 -0700)]
dtest: fix test to ack data before time stamp for accurate perf results

Add rdma_write with immediat on last message and returned message to
insure all data is received. The existing write time was not accurate.

Fix signaling rate support, default = 10.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd: add inline support, eager completion, improve proxy resource management
Arlin Davis [Fri, 15 Mar 2013 21:27:17 +0000 (14:27 -0700)]
mpxyd: add inline support, eager completion, improve proxy resource management

Add inline support for MIX and IB dma channels

Add eager completion, configurable, to signal writes or sends
after scif_readfrom is signaled and all data is local to proxy
instead of waiting for IB signal. User data on MIC is available
for reuse.

Combine sends and writes to mix_post_send command, provide
ordering guarantees between inline and dma data. Allow's
direct posting from OP thread is head of queue.

Add new counters for inline and signaled IO.

Extend m_wr to include flags for controlling eager completions
and proxy buffer and work request management.

cq event FD is now non-blocking and processed via TX thread
instead of OP thread. Allows for polling > 1 event at a time.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agompxyd config: add new options for inline and eager completions
Arlin Davis [Fri, 15 Mar 2013 21:17:24 +0000 (14:17 -0700)]
mpxyd config: add new options for inline and eager completions

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agomix: protocol change to v3, reduce sge add inline options
Arlin Davis [Fri, 15 Mar 2013 21:14:26 +0000 (14:14 -0700)]
mix: protocol change to v3, reduce sge add inline options

post send changes to reduce sge entries to 4,
add inline options, and remove hard coded
wr size.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoopenib: add m_inline field to provider QP object
Arlin Davis [Fri, 15 Mar 2013 19:29:30 +0000 (12:29 -0700)]
openib: add m_inline field to provider QP object

add mix to mpxyd inline configuration

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoopenib: cleanup warnings/logging in dto post send function
Arlin Davis [Fri, 15 Mar 2013 19:27:59 +0000 (12:27 -0700)]
openib: cleanup warnings/logging in dto post send function

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agomcm: add support for mix inline data, improve mix_poll events
Arlin Davis [Fri, 15 Mar 2013 19:23:16 +0000 (12:23 -0700)]
mcm: add support for mix inline data, improve mix_poll events

mpxyd can be configured for inline data for posted
writes and sends. This will use scif_send/recv instead
of scif_readfrom based on threashold set in mpxyd.conf

change the mix_poll command to NOT issue the request
on scif and simply wait for mpxyd to write completion
back to EVD. This removes unneccesary SCIF command
traffic.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>