]> git.openfabrics.org - ~ardavis/dapl.git/log
~ardavis/dapl.git
9 years agompxyd/mcm: add provider specific attribute DAT_IB_PROXY_VERSION
Arlin Davis [Mon, 15 Sep 2014 17:30:56 +0000 (10:30 -0700)]
mpxyd/mcm: add provider specific attribute DAT_IB_PROXY_VERSION

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agompxyd: log warning if running in COMPAT mode
Arlin Davis [Mon, 15 Sep 2014 17:28:40 +0000 (10:28 -0700)]
mpxyd: log warning if running in COMPAT mode

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoadd provider and proxy support for GUID across platform
Arlin Davis [Fri, 5 Sep 2014 15:07:04 +0000 (08:07 -0700)]
add provider and proxy support for GUID across platform

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agocommon: return appropriate handles with affiliated EP and EVD async events
Arlin Davis [Wed, 3 Sep 2014 22:47:51 +0000 (15:47 -0700)]
common: return appropriate handles with affiliated EP and EVD async events

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoRelease 2.1.2 dapl-2.1.2-1
Arlin Davis [Tue, 2 Sep 2014 21:54:51 +0000 (14:54 -0700)]
Release 2.1.2

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agompxyd: add global routing support for proxy connections
Arlin Davis [Tue, 2 Sep 2014 19:53:23 +0000 (12:53 -0700)]
mpxyd: add global routing support for proxy connections

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agomcm: only call mix_get_attr if running on MIC
Arlin Davis [Tue, 2 Sep 2014 19:52:06 +0000 (12:52 -0700)]
mcm: only call mix_get_attr if running on MIC

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoopenib: modify check for link_layer to handle unspecified
Arlin Davis [Tue, 2 Sep 2014 15:47:29 +0000 (08:47 -0700)]
openib: modify check for link_layer to handle unspecified

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoThis patch adds the dapl_os_atopmic_inc, dapl_os_atomic_dec,
Alexey Ishchuk [Tue, 2 Sep 2014 15:34:19 +0000 (08:34 -0700)]
This patch adds the dapl_os_atopmic_inc, dapl_os_atomic_dec,
and dapl_os_atomic_assign function implementatios to the dapl
userspace package to provide the DAPL API support on the s390x
platform by adding Assembler language implemenation of those
platform specific functions.

Signed-off-by: Alexey Ishchuk <aishchuk@linux.vnet.ibm.com>
Acked-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agodtest server exchange connection info with client
Amir Hanania [Tue, 26 Aug 2014 22:41:10 +0000 (15:41 -0700)]
dtest server exchange connection info with client

The server and client create connection for the server to send the setup info to the client.
When using dtest, the client only needs to use -h <hostname/IP address> option and it will get the rest of the info from the server.

Signed-off-by: Amir Hanania <amir.hanania@intel.com>
9 years agompxyd: 2 MICs in same numa_node will overlap CPU affinity, don't reset base
Arlin Davis [Mon, 25 Aug 2014 23:30:45 +0000 (16:30 -0700)]
mpxyd: 2 MICs in same numa_node will overlap CPU affinity, don't reset base

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agomcm: implement proxy mix_prov_attr function, add fields CPU model and family
Arlin Davis [Mon, 25 Aug 2014 15:59:50 +0000 (08:59 -0700)]
mcm: implement proxy mix_prov_attr function, add fields CPU model and family

Provide MIC consumers with a provider specific query for proxy CPU model and family
to identify platform type from MIC side. Supported in MCM provider only.

The following provider specific name attributes were added to MCM:

DAT_IB_PROXY_CPU_FAMILY
DAT_IB_PROXY_CPU_MODEL

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agompxyd: tx thread may not be signaled on small segment writes
Arlin Davis [Fri, 22 Aug 2014 17:27:46 +0000 (10:27 -0700)]
mpxyd: tx thread may not be signaled on small segment writes

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoRelease 2.1.1 dapl-2.1.1-1
Arlin Davis [Wed, 13 Aug 2014 18:58:49 +0000 (11:58 -0700)]
Release 2.1.1

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agocommon: add provider name to log messages
Arlin Davis [Wed, 13 Aug 2014 18:12:29 +0000 (11:12 -0700)]
common: add provider name to log messages

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agompxyd: log warning message if numa_node invalid
Arlin Davis [Wed, 13 Aug 2014 18:10:03 +0000 (11:10 -0700)]
mpxyd: log warning message if numa_node invalid

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoinclude debuginfo with build
Arlin Davis [Mon, 11 Aug 2014 20:45:12 +0000 (13:45 -0700)]
include debuginfo with build

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agompxyd: tx thread doesn't sleep during no pending IO state
Arlin Davis [Mon, 11 Aug 2014 17:50:05 +0000 (10:50 -0700)]
mpxyd: tx thread doesn't sleep during no pending IO state

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agompxyd: change MIC cpu_mask to per numa node instead of adapter
Arlin Davis [Mon, 11 Aug 2014 16:49:08 +0000 (09:49 -0700)]
mpxyd: change MIC cpu_mask to per numa node instead of adapter

The proxy processing threads for multiple cards in same socket will overlap
same cpu cores with existing cpumask per adapter. Change thread affinity
and cpumask to a per socket method.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agompxyd: set to MXS mode if device numa_node is invalid (-1)
Arlin Davis [Fri, 1 Aug 2014 18:10:47 +0000 (11:10 -0700)]
mpxyd: set to MXS mode if device numa_node is invalid (-1)

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agompxyd: MXS based alltoall benchmark hangs or returns post_send timeout
Arlin Davis [Fri, 1 Aug 2014 17:54:14 +0000 (10:54 -0700)]
mpxyd: MXS based alltoall benchmark hangs or returns post_send timeout

Clean-up shared proxy buffer slot management during IO completions.
Current code adjusts proxy buffer tail, using m_idx, incorrectly
if freeing multiple in order buffer slots. Also, when processing
immediate in-order slot, m_po_buf_tl() failed to continue parsing
list to free other in-order !busy slots.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agompxyd: add IO profile capabilities to help debug alltoall stall cases
Arlin Davis [Thu, 31 Jul 2014 16:50:30 +0000 (09:50 -0700)]
mpxyd: add IO profile capabilities to help debug alltoall stall cases

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agompxyd: retry stalled inline post_send, init m_idx only when signaled
Arlin Davis [Thu, 31 Jul 2014 16:37:27 +0000 (09:37 -0700)]
mpxyd: retry stalled inline post_send, init m_idx only when signaled

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoRelease 2.1.0 dapl-2.1.0-1
Arlin Davis [Fri, 25 Jul 2014 15:35:31 +0000 (08:35 -0700)]
Release 2.1.0

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agobuild: add missing NEWS file
Arlin Davis [Wed, 23 Jul 2014 22:32:06 +0000 (15:32 -0700)]
build: add missing NEWS file

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoupdate autogen.sh
Arlin Davis [Mon, 21 Jul 2014 19:55:54 +0000 (12:55 -0700)]
update autogen.sh

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoAdd MCM provider and MPXYD service to build
Arlin Davis [Mon, 21 Jul 2014 19:33:12 +0000 (12:33 -0700)]
Add MCM provider and MPXYD service to build

update package version to 2.1.0
MCM provider is dependent on Intel MPSS SCIF library.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agompxyd: service startup script and configuration file
Arlin Davis [Mon, 21 Jul 2014 19:05:44 +0000 (12:05 -0700)]
mpxyd: service startup script and configuration file

mpxyd -       Starts/Stops MIC SCIF/DAPL RDMA proxy server
mpxyd.conf -  Config details: service logs, CM timers, proxy buffers, data segment size, etc.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoadd readme for MCM provider and MPXYD service
Arlin Davis [Mon, 21 Jul 2014 18:55:09 +0000 (11:55 -0700)]
add readme for MCM provider and MPXYD service

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoupdate Copyright dates
Arlin Davis [Mon, 21 Jul 2014 18:51:11 +0000 (11:51 -0700)]
update Copyright dates

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoAdd new MIC RDMA proxy service daemon (MPXYD)
Arlin Davis [Mon, 21 Jul 2014 18:18:07 +0000 (11:18 -0700)]
Add new MIC RDMA proxy service daemon (MPXYD)

New service created to support MIC based proxy RDMA. Includes
services to manage connectivity of multi-path heterogeneous
endpoints and use data paths based on platform constraints.

It will create and manage multiple QP's per endpoint if needed. This
allows optimal performance per direction based on various platform
constraints.  For example, if the MIC is on same socket as HCA, only
proxy out is needed and not proxy in. In this case, data can go direct
from MPXYD->MIC. However, if the MIC is on a different CPU socket
from HCA, the provider will use both proxy out and proxy in services
to avoid additional constraints of the server platform.

The MCM provider and MPXYD will support connections between
MIC and non MIC endpoints.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoadd new dapl MIC provider (MCM) to support MIC RDMA proxy services
Arlin Davis [Mon, 21 Jul 2014 17:58:37 +0000 (10:58 -0700)]
add new dapl MIC provider (MCM) to support MIC RDMA proxy services

Provider supports all modes of connectivity and will setup data paths
based on endpoint locality and platform constraints. Provides
transparent DAT API support for RDMA writes, RDMA write with
immediate data, Sends, and Recvs. No RDMA read or atomic support.
To use MCM provider an application can use the new ofa-v2-mcm
device definations in dat.conf. Intel MPSS is required for
for MCM provider build and usage.

The following shows connectivity modes and data paths:

HST -> HST to HCA
MSS -> MIC to HCA same socket
MXS -> MIC to HCA cross socket

1.  HST->HST:    Xeon->HCA->fabric->HCA->Xeon  (direct->direct)
    HST<-HST:    Xeon<-HCA<-fabric<-HCA<-Xeon  (direct<-direct)

2.  MSS->MSS:    KNC->Xeon->HCA->fabric->HCA->KNC  (proxy->direct)
    MSS<-MSS:    KNC<-HCA<-fabric<-HCA<-Xeon<-KNC  (direct<-proxy)

3.  MSX->MSX:    KNC->Xeon->HCA->fabric->HCA->Xeon->KNC  (proxy->proxy)
    MSX<-MSX:    KNC<-Xeon<-HCA<-fabric<-HCA<-Xeon<-KNC  (proxy<-proxy)

4.  MSS->MSX:    KNC->Xeon->HCA->fabric->HCA->Xeon->KNC  (proxy->proxy)
    MSS<-MXS:    KNC<-HCA<-fabric<-HCA<-Xeon<-KNC        (direct<-proxy)

5.  MSS->HST:    KNC->Xeon->HCA->fabric->HCA->Xeon  (proxy->direct)
    MSS<-HST:    KNC<-HCA<-fabric<-HCA<-Xeon        (direct<-direct)

6.  MSX->HST:    KNC->Xeon->HCA->fabric->HCA->Xeon  (proxy->direct)
    MSX<-HST:    KNC<-Xeon<-HCA<-fabric<-HCA<-Xeon  (proxy<-direct)

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
9 years agoMCM: new MIC provider and proxy service definitions
Arlin Davis [Mon, 21 Jul 2014 15:03:46 +0000 (08:03 -0700)]
MCM: new MIC provider and proxy service definitions

Definitions for MIC Proxy RDMA services

 MCM <-> MPXYD over SCI (Symmetric Communications InterFace) - ops, cm, events
 MCM <-> MCM over IB -  CM, WR/WC proxy-in and proxy-out wire protocol

This service enables MIC based DAPL provider (MCM) to use
proxy data service (host CPU) for SND/RCV and RDMA write operations.
RDMA reads and atomics are not supported. This service communicates within
within a server platform over PCI-E bus using SCIF and a MCM specific
MIX (MIC exchange) messaging protocol. The MCM provider uses a new MCM
CM protocol on the wire along with a Proxy WR/WC protocol.

This service is designed to improved bandwidth on larger IO
when direct MIC based IO is contrained.

This new MCM provider maintains the DAT level API semantics, including
strict ordering requirements of data flow. RDMA write with immediate
data is the only IB extension supported.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agocleanup build warnings
Arlin Davis [Fri, 18 Jul 2014 18:17:03 +0000 (11:17 -0700)]
cleanup build warnings

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agocommon: add CQ,QP,MR abstractions for new MIC provider and data proxy service
Arlin Davis [Fri, 18 Jul 2014 15:51:00 +0000 (08:51 -0700)]
common: add CQ,QP,MR abstractions for new MIC provider and data proxy service

The new MIC (many integrated core) based provider (MCM) has the capability to
shadow QPs,CQs,MRs on the host side of the platform for optimial performance
based on locality of endpoints and platform contraints. Each endpoint (DAPL_EP),
transparent to consumer, may have multiple connections via MCM provider.

openib_common ib_cq/ib_qp code base has been expanded, MCM only, to support
separate send and receive channels per endpoint.

openib_common dapl_mr code base has been expanded, MCM only, to support
MIC base DMA interfaces for MIC to HOST communications.

openib_common post_send,post_recv inline code base, MCM only, has been
modified to proxy data services via the new MCM provider.

dapl_ib_async_str added for better logging across openib providers.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoopenib: cleanup, use inet_ntop for GIDs, remove some logs, destroy pipes on release
Arlin Davis [Wed, 16 Jul 2014 22:25:44 +0000 (15:25 -0700)]
openib: cleanup, use inet_ntop for GIDs, remove some logs, destroy pipes on release

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agocommon: new dapls_evd_cqe_to_event call, cqe to event
Arlin Davis [Tue, 15 Jul 2014 22:06:08 +0000 (15:06 -0700)]
common: new dapls_evd_cqe_to_event call, cqe to event

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agocommon: init ring_buffer, assign hd/tl pos in range
Arlin Davis [Tue, 15 Jul 2014 21:39:44 +0000 (14:39 -0700)]
common: init ring_buffer, assign hd/tl pos in range

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoallow log level changes during device open
Arlin Davis [Fri, 11 Jul 2014 18:32:43 +0000 (11:32 -0700)]
allow log level changes during device open

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoucm: fix cm rbuf setup, include grh pad on initialization
Arlin Davis [Fri, 11 Jul 2014 16:53:27 +0000 (09:53 -0700)]
ucm: fix cm rbuf setup, include grh pad on initialization

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoucm: remove duplicate async_event code, use common async event call
Arlin Davis [Fri, 11 Jul 2014 16:11:25 +0000 (09:11 -0700)]
ucm: remove duplicate async_event code, use common async event call

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agonew lightweight open_query/close_query IB extension for fast attribute query
Arlin Davis [Fri, 11 Jul 2014 15:39:01 +0000 (08:39 -0700)]
new lightweight open_query/close_query IB extension for fast attribute query

Consumers that need provider attributes must do a full device open
in order to get any provider/device information. With so many static device
entries in /etc/dat.conf consumers are building classification
mechanisms to identify provider type, locality, name, device
mode, and decide which device is appropriate. The existing DAT interface
doesn't provide a lightweight mechanism for queries.

The following fast query functions have been added to dat_ib_extensions.h:

dat_ib_open_query(name, ia_handle, ia_mask, ia_attr, prov_mask, prov_attr)
dat_ib_close_query(ia_handle)

In addition, DAT extension interface, dat_extension_op, has been
expanded to include new internal calls to handle quick provider load
and function linkage via udat_extension_open, and udat_extension_close
functions. Extended operations needing DAT open/close services need
to be defined from a DAT_OPEN_EXTENSION_BASE or DAT_CLOSE_EXTENSION_BASE
respectively.

NOTE: The ia_handle returned with open query must be closed with subsequent
close_query and not used with any other dat_ia_ operations. Attribute
storage from query_open is not valid after close_query call.

The IB extensions have been rolled to version 2.0.8 with this new API.
The changes are backward compatible.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodtestcm: add more detailed debug during disconnect phase
Arlin Davis [Wed, 9 Jul 2014 16:43:47 +0000 (09:43 -0700)]
dtestcm: add more detailed debug during disconnect phase

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agocma: long delays when opening cma provider with no IPoIB configured
Arlin Davis [Tue, 8 Jul 2014 23:14:51 +0000 (16:14 -0700)]
cma: long delays when opening cma provider with no IPoIB configured

The rdma_cm provider (ofa-v2-ib0) can take netdev, ip address, or hostname
for local address bindings. When trying to open a non-existent netdev (ib0)
the provider will fall through and use the getaddrinfo sys call assuming
dat.conf parameter is either an IP address or hostname and not a netdev.

This patch changes getipaddr() error handling when opening the cma provider
on a non-existant netdev. It will only call getaddrinfo with AI_CANONNAME
hints after checking for a valid hostname.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agocommon: new debug levels for low system memory, IA stats, and package info
Arlin Davis [Tue, 8 Jul 2014 21:20:27 +0000 (14:20 -0700)]
common: new debug levels for low system memory, IA stats, and package info

DAPL_DBG_TYPE_SYS_WARN = 0x800000
DAPL_DBG_TYPE_VER      = 0x1000000
DAPL_DBG_TYPE_IA_STATS = 0x2000000

export DAPL_DBG_SYS_MEM = 5 will set the checking for memory less than 5%
when DAPL_DBG_TYPE is set with bit DAPL_DBG_TYPE_SYS_WARN.

The package must be built with --enable-counters for memory checking and
IA stats capabilities.

In addition, if DAPL_DBG_TYPE is set with bit DAPL_DBG_TYPE_VER than
the package rev and build date will be sent to stdout during library
init.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agobuild: remove library check for mverbs with --enable-fca
Arlin Davis [Thu, 26 Jun 2014 22:40:46 +0000 (15:40 -0700)]
build: remove library check for mverbs with --enable-fca

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoIB extension: segfault in create collective group with non-vector type IA handle"
Arlin Davis [Tue, 24 Jun 2014 22:49:20 +0000 (15:49 -0700)]
IB extension: segfault in create collective group with non-vector type IA handle"

The dats_get_ia_handle call was change in 2.0.34 to convert IA handle from
both vector to handle and handle to vector to fix query calls that
incorrectly returned IA handles in non-vector form. If a caller uses a
non vector IA handle it will get converted incorrectly to a vector and cause
a segfault. Add additional check to verify a IA handle type before calling
get ia handle to avoid incorrect translation.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agobuild: change configure help to correctly state collective default=none
Arlin Davis [Tue, 24 Jun 2014 22:48:38 +0000 (15:48 -0700)]
build: change configure help to correctly state collective default=none

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoRelease 2.0.42 dapl-2.0.42-1
Arlin Davis [Mon, 5 May 2014 16:11:18 +0000 (09:11 -0700)]
Release 2.0.42

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodapltest: increase DTO evd size to prevent CQ overflow on limit_rpost test
Arlin Davis [Tue, 15 Apr 2014 21:48:54 +0000 (14:48 -0700)]
dapltest: increase DTO evd size to prevent CQ overflow on limit_rpost test

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoCreation of reserved SP moves EP state to DAT_EP_STATE_RESERVED even in failure
Arlin Davis [Tue, 15 Apr 2014 20:44:16 +0000 (13:44 -0700)]
Creation of reserved SP moves EP state to DAT_EP_STATE_RESERVED even in failure
cases. Reserve EP after successfully binding the listening port.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodapl: fix string bug in dapls_dto_op_str
Dave Goodell [Mon, 24 Mar 2014 21:07:37 +0000 (14:07 -0700)]
dapl: fix string bug in dapls_dto_op_str

This led to indexing off the end of the array and gave surprising
results for OP_RECV_UD.

10 years agoRelease 2.0.41 dapl-2.0.41-1
Arlin Davis [Mon, 17 Mar 2014 21:20:08 +0000 (14:20 -0700)]
Release 2.0.41

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodapltest: change server port, from 45278 to 62000, out of registered IANA range
Arlin Davis [Fri, 14 Mar 2014 17:47:06 +0000 (10:47 -0700)]
dapltest: change server port, from 45278 to 62000, out of registered IANA range

The existing port 45278 is in the registered port range.

RFC 6335:
 System Ports, well known, 0-1023 (assigned by IANA)
 User Ports, registered, 1024-49151 (assigned by IANA)
 Dynamic Ports, private or Ephemeral, 49152-65535 (never assigned)

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodat: lower log level on load errors of provider library
Arlin Davis [Thu, 13 Mar 2014 16:55:29 +0000 (09:55 -0700)]
dat: lower log level on load errors of provider library

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodat: dat_ia_open should close provider after failure
Arlin Davis [Tue, 4 Mar 2014 18:52:49 +0000 (10:52 -0800)]
dat: dat_ia_open should close provider after failure

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodapltest: set default limit max to 1000
Arlin Davis [Tue, 4 Mar 2014 18:48:55 +0000 (10:48 -0800)]
dapltest: set default limit max to 1000

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoopenib: add new provider specific attributes
Arlin Davis [Tue, 4 Mar 2014 18:30:02 +0000 (10:30 -0800)]
openib: add new provider specific attributes

DAT_IB_PROVIDER_NAME = UCM/CMA/SCM
DAT_IB_DEVICE_NAME = ibv_get_device_name
DAT_IB_CONNECTIVITY_MODE = DIRECT/PROXY
DAT_IB_RDMA_READ = TRUE/FALSE
DAT_IB_NODE_GUID = xxxx:xxxx:xxxx:xxxx
DAT_IB_PORT_STATE = ibv_port_state_str

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodapltest: update scripts for regression testing purposes
Arlin Davis [Mon, 3 Mar 2014 23:04:12 +0000 (15:04 -0800)]
dapltest: update scripts for regression testing purposes

cl.sh and srv.sh update to provide better examples and
a methods to quickly regression test any dapltest changes.

 usage: srv.sh devicename
   where devicename is provider (default = ofa-v2-mlx4_0-1)

 usage: cl.sh hostname testname devicename
   where testname
     stop - request DAPLtest server to exit.
     conn - simple connection with limited dater transfer
     trans - single transaction test
     transm - transaction test: multiple transactions [RW SND, RDMA]
     transt - transaction test: multi-threaded, single transaction
     transme - transaction test: multi-endpoints per thread
     transmet - transaction test: multi: threads and endpoints per thread
     transmete - transaction test: multi threads == endpoints
     perf - Performance test
     threads - multi-threaded single transaction test.
     threadsm - multi: threads and endpoints, single transaction test.
     rdma-write - RDMA write
     rdma-read - RDMA read
     bw - bandwidth
     latb - latency tests, blocking for events
     latp - latency tests, polling for events
     lim - limit tests.
     regression - loop over a collection of all tests.
   where devicename is provider (default = ofa-v2-mlx4_0-1)

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodapltest: Add final send/recv "sync" for transaction tests.
swise@opengridcomputing.com [Mon, 3 Mar 2014 22:35:43 +0000 (14:35 -0800)]
dapltest: Add final send/recv "sync" for transaction tests.

The transaction tests need both sides to send a sync message after running the test.  This ensures that all remote operations are complete before dapltest deregeisters memory and disconnects the endpoints.

Without this logic, we see intermittent async errors on iwarp devices because a read response or write arrives after the rmr has been destroyed.
I believe this is more likely to happen with iWARP than IB because iWARP completions only indicate the local buffer can be reused.  It doesn't imply that the message has even arrived at the peer, let alone been placed in the peer application's memory.

Changes from V1:

- allocate new send/recv buffers for the Final Sync message.

- post the Final Sync recv buffer at the beginning of the final iteration of a test.

- tests ok on cxgb4 and mlx4 devices.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
10 years agoRelease 2.0.40
Arlin Davis [Mon, 10 Feb 2014 21:07:00 +0000 (13:07 -0800)]
Release 2.0.40

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodist: ib collective extension include files missing
Arlin Davis [Mon, 10 Feb 2014 07:34:43 +0000 (23:34 -0800)]
dist: ib collective extension include files missing

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodapltest: the quit command is missing changes for -n option.
Arlin Davis [Mon, 10 Feb 2014 07:24:29 +0000 (23:24 -0800)]
dapltest: the quit command is missing changes for -n option.

Server-port was not being set properly during param init phase on the client side.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodat.conf: remove v1, add Mellanox Connect-IB and Intel Xeon Phi MIC
Arlin Davis [Mon, 10 Feb 2014 06:55:17 +0000 (22:55 -0800)]
dat.conf: remove v1, add Mellanox Connect-IB and Intel Xeon Phi MIC

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoNULL undefined on Fedora, incorrectly using kernel stddef.h
Arlin Davis [Mon, 10 Feb 2014 21:01:47 +0000 (13:01 -0800)]
NULL undefined on Fedora, incorrectly using kernel stddef.h

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoRelease 2.0.39
Arlin Davis [Thu, 3 Oct 2013 23:05:06 +0000 (16:05 -0700)]
Release 2.0.39

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agodapltest: fix endian swap issue with performance test
Arlin Davis [Thu, 3 Oct 2013 22:21:08 +0000 (15:21 -0700)]
dapltest: fix endian swap issue with performance test

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoSCM: getifaddrs modfications for better out of the box experience
Arlin Davis [Tue, 1 Oct 2013 22:40:17 +0000 (15:40 -0700)]
SCM: getifaddrs modfications for better out of the box experience

socket cm will now walk list of interfaces and ignore loopback
and ignore IB devices, unless the IB netdev is the only device.
Works better in a heterogenous environment with a mix of net device.
Tested with br0, mic0, and mic0:ib netdev mixes.
Overriding with DAPL_SCM_NETDEV still works as is.

Signed-off-by: Patrick Mccormick <patrick.m.mccormick@intel.com>
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoucm, scm: UD mode triggers list_head assert with large scale alltoall test
Arlin Davis [Tue, 1 Oct 2013 21:03:51 +0000 (14:03 -0700)]
ucm, scm: UD mode triggers list_head assert with large scale alltoall test

1024+ ranks, IMB alltoall may hit assert when running Intel MPI in UD mode.

CR clean up was implemented with EP to CR references still linked.
During cr_accept, the CR remote_ia_address is linked to EP object
by mistake with UD mode. UD mode my have multiple CRs per EP so
no direct mappings to CR memory can exist unless RC mode which
always has one EP to CR mapping.

In scm, ucm: for CM object free with CR references the search and
unlinking from SP must be under SP lock to serialize. Also,
cleanup thread wakeup logic to only trigger the thread if
reference count indicates the need for more processing.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
10 years agoRelease 2.0.38
Arlin Davis [Mon, 22 Jul 2013 19:37:21 +0000 (12:37 -0700)]
Release 2.0.38

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agodapltest: add -n parameter to override default server port number (45278)
Arlin Davis [Tue, 16 Jul 2013 23:12:37 +0000 (16:12 -0700)]
dapltest: add -n parameter to override default server port number (45278)

Modify all tests and commands to take a new -n parameter option for server
listen port. The default port, when running multiple EP's and threads,
will sometimes collide and fail with EADDRINUSE on iWARP configurations
using rdma_bind_addr with sin_port=0.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoucm,scm: UD mode creates many CR objects per EP that needs cleaned up
Arlin Davis [Fri, 12 Jul 2013 18:52:33 +0000 (11:52 -0700)]
ucm,scm: UD mode creates many CR objects per EP that needs cleaned up

After connection is established and the AH is provided to consumer
on UD connect establishment there is no need to keep the CR object
on the SP. For large clusters this results in a growing memory
footprint for CR objects and long cleanup times on device close.

Change ucm and scm providers to unlink and free CR resources
during CM object free if this is a UD QP and CONN_EST state.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocma: add DAPL_CM_TOS environment variable to enable passing a TOS to the RDMA CM
Arlin Davis [Mon, 24 Jun 2013 21:19:22 +0000 (14:19 -0700)]
cma: add DAPL_CM_TOS environment variable to enable passing a TOS to the RDMA CM

Signed-off-by Matthew Finlay <matt@mellanox.com>
Acked-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agoRelease 2.0.37 dapl-2.0.37-1
Arlin Davis [Fri, 7 Jun 2013 01:22:52 +0000 (18:22 -0700)]
Release 2.0.37

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocommon: add support for ia name during dat_ia_query
Arlin Davis [Wed, 29 May 2013 23:59:09 +0000 (16:59 -0700)]
common: add support for ia name during dat_ia_query

the device name was not being updated during a query. Copy
the hca name into ia_attr->adapter_name for consumers.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agocommon: dapl_os_atomic_inc/dec() not working as expected on ppc64 machines.
Arlin Davis [Wed, 29 May 2013 23:53:18 +0000 (16:53 -0700)]
common: dapl_os_atomic_inc/dec() not working as expected on ppc64 machines.

Signed-off-by: Pradeep Satyanarayana <pradeep@us.ibm.com>
Acked-by: Arlin Davis <arlin.r.davis@intel.com>
11 years agodapltest: ppc64 endian issue with exchanged mem handle and address
Arlin Davis [Wed, 29 May 2013 23:45:20 +0000 (16:45 -0700)]
dapltest: ppc64 endian issue with exchanged mem handle and address

Signed-off-by: Pradeep Satyanarayana <pradeep@us.ibm.com>
Signed-off-by: Aravinda Venkatramana <Aravinda.Venkatramana@emulex.com>
Acked-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoRelease 2.0.36
Arlin Davis [Thu, 5 Jul 2012 17:00:28 +0000 (10:00 -0700)]
Release 2.0.36

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: increase ACK timeout to 20 for a default value to match other providers.
Arlin Davis [Thu, 5 Jul 2012 16:58:21 +0000 (09:58 -0700)]
scm: increase ACK timeout to 20 for a default value to match other providers.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: allow qp modify in init state
Arlin Davis [Mon, 14 May 2012 21:51:38 +0000 (14:51 -0700)]
common: allow qp modify in init state

Allow consumer to modify attributes via dat_ep_modify
in init state.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: check for valid states during ep posting
Arlin Davis [Thu, 10 May 2012 21:57:31 +0000 (14:57 -0700)]
common: check for valid states during ep posting

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agodat.conf: keep list of providers in order for backward compatibility
Arlin Davis [Thu, 10 May 2012 20:35:55 +0000 (13:35 -0700)]
dat.conf: keep list of providers in order for backward compatibility

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoucm: record and silently drop a duplicate reject CM message
Arlin Davis [Thu, 10 May 2012 17:49:09 +0000 (10:49 -0700)]
ucm: record and silently drop a duplicate reject CM message

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agowindows: new version of getlocalipaddr not portable
Arlin Davis [Wed, 25 Apr 2012 20:37:53 +0000 (13:37 -0700)]
windows: new version of getlocalipaddr not portable

revert to the original getaddrinfo method for windows

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agodapltest: DFLT_QLEN is defined in multiple tests
Arlin Davis [Wed, 25 Apr 2012 20:36:52 +0000 (13:36 -0700)]
dapltest: DFLT_QLEN is defined in multiple tests

add #ifdef checking in transaction test.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoRelease 2.0.35 dapl-2.0.35-1
Arlin Davis [Wed, 25 Apr 2012 20:10:39 +0000 (13:10 -0700)]
Release 2.0.35

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoconfig/build: remove post/postun hacking used to modify dat.conf
Arlin Davis [Wed, 25 Apr 2012 20:07:10 +0000 (13:07 -0700)]
config/build: remove post/postun hacking used to modify dat.conf

Return to the tried and true method of managing configuration
files via %config directive and remove ugly sed editing methods.
The dat.conf includes both v1 and v2 device entries to insure
backward compatibility. Add doc/dat.conf

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoconfig: clean up help option displays with ext-type options
Arlin Davis [Mon, 23 Apr 2012 17:35:24 +0000 (10:35 -0700)]
config: clean up help option displays with ext-type options

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agowindows: Provide auto-detect between RoCE and Infiniband for Windows.
stan smith [Mon, 23 Apr 2012 17:32:00 +0000 (10:32 -0700)]
windows: Provide auto-detect between RoCE and Infiniband for Windows.

For RoCE, enable transport global ID use.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoucm: update UD cm provider to support new CM stat and error counters
Arlin Davis [Fri, 20 Apr 2012 00:40:45 +0000 (17:40 -0700)]
ucm: update UD cm provider to support new CM stat and error counters

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: update socket cm provider to support new CM stat and error counters
Arlin Davis [Fri, 20 Apr 2012 00:40:03 +0000 (17:40 -0700)]
scm: update socket cm provider to support new CM stat and error counters

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommom: add cm, link, and diag event counters in IB extended builds
Arlin Davis [Fri, 20 Apr 2012 00:15:22 +0000 (17:15 -0700)]
commom: add cm, link, and diag event counters in IB extended builds

Add additional event monitoring capabilities during runtime to help
isolate issues during scaling in lieu of logging/printing warning
messages. Counters have been added to provider CM services and counters
have been added and mapped to sysfs ib_cm, device port and device
diag counters. ibdev_path is used for device sysfs counters.

uDAPL CM events are tracked on a per IA instance via internal
provider counters. The ib_cm, link, and diag events are tracked on a
per platform basis via sysfs. For these running counters a start
and stop function is provided for sampling and mapping to DAPL
64 bit counters. All counters, along with new start and stop functions,
are provided via dat_ib_extensions.h. New IB extension version is 2.0.7

New DCNT_IA_xx counters include 40 cm, 9 link, and 9 diag types.

To enable new counters (default build is disabled):
./configure --enable-counters

New bitmappings have been added to DAPL_DBG_TYPE environment
variable to automatically start/stop counters and log
errors if counters are enabled. The following will control
CM, LINK, and DIAG respectively:

   DAPL_DBG_TYPE_CM_ERRS = 0x080000,
   DAPL_DBG_TYPE_LINK_ERRS = 0x100000,
   DAPL_DBG_TYPE_DIAG_ERRS = 0x400000,

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: use ioctl SIOCIFCONF to get complete list of configured netdev interfaces
Arlin Davis [Tue, 17 Apr 2012 22:24:22 +0000 (15:24 -0700)]
scm: use ioctl SIOCIFCONF to get complete list of configured netdev interfaces

replace usage of getaddrinfo since is doesnt actually return bound addresses
and can return the loopback address in some configurations. Some
systems may not have eth0 configured so you cannot assume eth0 as a non-loopback
default netdev.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoucm: UD send failures at scale, ucm_send ERR: get_smsg(hd=149,tl=150)
Arlin Davis [Fri, 17 Feb 2012 18:28:48 +0000 (10:28 -0800)]
ucm: UD send failures at scale, ucm_send ERR: get_smsg(hd=149,tl=150)

Full sendq should retry polling completions instead of failing.
When sendq is full and all requests are pending the get send message
code should retry polling for completions and not return error on first
empty CQ attempt. Give HCA a chance to complete some batched requests.
Also, clean up the send message error logging.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoscm: fix retry count on connection pending timeout
Arlin Davis [Mon, 6 Feb 2012 22:04:37 +0000 (14:04 -0800)]
scm: fix retry count on connection pending timeout

Retry count not being decremented on connection TIMEOUT.
Also, cleanup log messages on CONN and REP pending and
add local port to output.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agoucm: cleanup debug message, ntohl on p_size is incorrect
Arlin Davis [Mon, 6 Feb 2012 22:03:20 +0000 (14:03 -0800)]
ucm: cleanup debug message, ntohl on p_size is incorrect

private data size is a short, change to ntohs on log message

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocma, scm, ucm: allow EP (QP) creation without EVD (CQ)
Arlin Davis [Mon, 30 Jan 2012 18:19:29 +0000 (10:19 -0800)]
cma, scm, ucm: allow EP (QP) creation without EVD (CQ)

Provide ability to create a EP/QP with no EVD/CQ on either the
request or receive queue. The current implementation allows on
receive queue but not request queue. Not all ofa devices support
a null CQ so if necessary create a dummy CQ at the time of
QP creation. Also, if no CQ is specified set appropriate QP
max wr/sge attributes to zero.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: add DAPL_DBG_TYPE_CM_STATS (0x40000) to debug log options
Arlin Davis [Mon, 30 Jan 2012 18:09:42 +0000 (10:09 -0800)]
common: add DAPL_DBG_TYPE_CM_STATS (0x40000) to debug log options

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: dapls_ep_flush_cq will segfault when no CQ is attached to EP
Arlin Davis [Wed, 25 Jan 2012 19:54:29 +0000 (11:54 -0800)]
common: dapls_ep_flush_cq will segfault when no CQ is attached to EP

add check for NULL request/receive EVD (cq) before flushing.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>
12 years agocommon: ep_create should allow max_request_iov attribute setting of zero
Arlin Davis [Wed, 25 Jan 2012 19:50:21 +0000 (11:50 -0800)]
common: ep_create should allow max_request_iov attribute setting of zero

When creating an EP without a request EVD (cq) the max_request_iov
and max_request_sge will be 0. Allow this combination when checking
attribute settings for ARG6.

Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>