git.openfabrics.org - ~shefty/librdmacm.git/log

rstream: Group latency/bandwidth tests together

Rather than grouping tests by transfer size, group by the test type.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rstream: Set rsocket nonblocking if set to async operation

If asynchronous use is specified (use of poll/select), set the
rsocket to nonblocking. This matches the common usage case for
asynchronous sockets.

When asynchronous support is enabled, the nonblocking/blocking
test option determines whether the poll/select call will block,
or if rstream will spin on the calls.

This provides more flexibility with how the rsocket is used.
Specifically, MPI often uses nonblocking sockets, but spins on
poll/select. However, many apps will use nonblocking sockets,
but wait on poll/select.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rstream: Clarify use of async test option

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/rstream: Set rsocket nonblocking for base tests

The base set of rstream tests want nonblocking rsockets, but don't
actually set the rsocket to nonblocking.  It instead relies on the
MSG_DONTWAIT flag.  Make the code match the expected behavior and
set the rsocket to nonblocking and make nonblocking the default.

Provide a test option to switch it back to blocking mode.  We keep
the existing nonblocking test option for compatibility.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rstream: Always set TCP_NODELAY on rsocket

The NODELAY option is coupled with whether the socket is blocking
or nonblocking. Remove this coupling and always set the NODELAY
option.

NODELAY currently has no effect on rsockets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/rsocket: Succeed setsockopt REUSEADDR on connected sockets

The RDMA CM fail calls to set REUSEADDR on an rdma_cm_id if
it is not in the idle state. As a result, this causes a failure
in NetPipe when run with socket calls intercepted by rsockets.
Fix this by returning success when REUSEADDR is set on an rsocket
that has already been connected. When running over IB, REUSEADDR
is not necessary, since the TCP/IP addresses are mapped.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rsockets: Optimize synchronization to improve performance

Hotspot performance analysis using VTune showed pthread_mutex_unlock()
as the most significant hotspot when transferring small messages using
rstream.  To reduce the impact of using pthread mutexes, replace it
with a custom lock built using an atomic variable and a semaphore.
When there's no contention for the lock (which is the expected case
for nonblocking sockets), the synchronization is reduced to
incrementing and decrementing an atomic variable.

A test that acquired and released a lock 2 billion times reported that
the custom lock was roughly 20% faster than using the mutex.
26.6 seconds versus 33.0 seconds.

Unfortunately, further analysis showed that using the custom lock
provided a minimal performance gain on rstream itself, and simply
moved the hotspot to the custom unlock call.  The hotspot is likely
a result of some other interaction, rather than caused by slowness
in releasing a lock.  However, we keep the custom lock based on
the results of the direct lock tests that were done.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rping: Replace sprintf with snprintf to protect from buffer overflow

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rsocket: Succeed setting SO_KEEPALIVE option

memcached sets SO_KEEPALIVE, so succeed any requests to set
that option.  We don't actually implement keepalive at this time.

To implement keepalive, we would need to record the last time
that a message was sent or received on an rsocket.  If no
new messages are processed within the keepalive timeout, then
we would need to issue a keepalive message.  For rsockets,
this would simply mean sending a 0-byte control message that
gets ignored on the remote side.

The only real difficulty with handlng keepalive is doing it
without adding threads.  This requires additional work in
poll to occasionally timeout, send keepalive messages, then
return to polling if no new data has arrived.  Alternatively,
we can add a thread to handle sending keepalive messages.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rsocket: Succeed SO_LINGER socket option

Succeed calls to set the SO_LINGER socket option. We don't
actually implement SO_LINGER semantics because we never place
an rsocket into a timewait state. Unlike socket behavior,
we do wait for all pending data to be transferred by the hardware.
This is done so that the disconnect message can be sent over
the rsocket connection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rsocket: Handle socket option toggling on/off

If the user turns a socket option off, record that, so that
rgetsockopt returns the correct state of the option.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rsocket: Discard unrecognized control messages

If we receive a control message that is not known, simply discard it.
This provides some ability to support forward compatibility.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rsocket: Work-arounds to support RH EL5

Discard ENOSYS errors when trying to set address reuse.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rsocket: Allow use of LD_PRELOAD to intercept socket calls

Intercept socket calls and convert TCP socket operation to
streaming over RDMA.

Allow falling back from rsockets to normal sockets on error
or when trying to bind/connect to a reserved port. This is
needed to handle MPI job startup, where MPI should use rsockets,
but mpiexect needs to communicate using ssh over normal sockets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rsocket: Add sample application to copy files over rsockets

rcopy will copy files from a source system to a specified remote
server. It's essentially a really dumb FTP type program that can
be used to quickly transfer files between systems, which can be
useful to verify data integrity.

(It was easier to create this program than modify an existing FTP
client and server application, which was my first choice. Fork
support is difficult.)

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rsocket: Add example program that uses rsocket

rstream provides an example that uses either rsocket or socket
APIs. The latter allows rstream to be used to verify rsocket
behavior compared to socket.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Define streaming over RDMA interface (rsockets)

Introduces a new set of APIs that support a byte streaming interface
over RDMA devices. The new interface matches sockets, except that all
function calls are prefixed with an 'r'.

The following functions are defined:

rsocket
rbind, rlisten, raccept, rconnect
rshutdown, rclose
rrecv, rrecvfrom, rrecvmsg, rread, rreadv
rsend, rsendto, rsendmsg, rwrite, rwritev
rpoll, rselect
rgetpeername, rgetsockname
rsetsockopt, rgetsockopt, rfcntl

Functions take the same parameters as that use for sockets. The
follow capabilities and flags are supported at this time:

PF_INET, PF_INET6, SOCK_STREAM, IPPROTO_TCP
MSG_DONTWAIT, MSG_PEEK
SO_REUSEADDR, TCP_NODELAY, SO_ERROR, SO_SNDBUF, SO_RCVBUF
O_NONBLOCK

The rpoll call supports polling both rsockets and normal fd's.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

ucmatose: Fix segfault on address error

Client connect_events() shoudl fail if it received some error,
otherwise the program will try to reach a non-existent QP
resource resulting in a segfault. Return an error from
cma_handler() if we had a connection error.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Automatically detect if ibacm is installed

If the ibacm header file is available, automatically have the
librdmacm configured to use it. This removes the --with-ib_acm
configure option.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Update rdma_disconnect to indicate both sides should call it.

rdma_disconnect should be called from both sides to quickly disconnect.
Clarify this in the man page.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmamcm: Check for valid route in ucma_set_ib_route

ucma_set_ib_route will call rdma_getaddrinfo to obtain IB path
information.  However, rdma_getaddrinfo will return success,
but not provide routing data if no route can be found (the IB
ACM service is not running).  In this case, we can call
rdma_set_option without a valid route.  Although the kernel
will trap this and fail, we can detect the error in the library.
This will speed up the connection rate if IB ACM is not in use.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Fix warning 'resolve_msg' breaks aliasing rules

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Return an error if user specifies AF_IB but it is not supported

If the user specifies an AF_IB address into rdma_bind_addr,
rdma_resolve_addr, rdma_join_multicast, or rdma_leave_multicast,
but the kernel does not support AF_IB return an error.

Note that rdma_getaddrinfo will never return an AF_IB address to the
user unless kernel support is present. A application would need
to construct and AF_IB address by hand before making one of the
above mentioned calls. This check prevents overrunning the
command buffer written to the kernel.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

udaddy: Update udaddy to use rdma_getaddrinfo

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

cmatose: Replace use of getaddrinfo with rdma_getaddrinfo

Now that rdma_getaddrinfo exists, use it rather than getaddrinfo.
This will eventually allow us to specify native IB addresses into
cmatose once AF_IB support is there.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Report AF_IB as second rdma_addrinfo

If AF_IB is supported, the librdmacm will attempt to convert
AF_INET or AF_INET6 to AF_IB. Rather than replacing the
AF_INET/6 rdma_addrinfo, provide the AF_IB addresses as a
second rdma_addrinfo linked from the AF_INET/6 version.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Set errno correctly in ucma_complete

The status value is negative, convert it to positive before setting errno.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rdma_verbs: Set errno correctly in rdma_get_send/recv_comp

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Merge branch 'sor'

librdmacm: Update web site and email addresses

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Merge branch 'sor'

udaddy/ucmatose: allow easy setting of tos in hex

Under IBoE, the 3 MSBits of the TOS map to the SL, hence letting
the user to specify them in hex makes the interface friendlier.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Return ECONNREFUSED from rdma_connect on reject

Make the errno return code from rdma_connect constistent with
connect. The underlying status value is available by reading
the event data.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rdma/cma: minor code refactoring when saving a string content

In this case, using strdup will provide a cleaner code
(and maybe a little bit faster too).

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/udaddy: Fix resource leak in case of error

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Verify size of route_len

If the user specifies route information on input to rdma_getaddrinfo,
verify that the size of the routing data is something that we're
prepared to handle.

The routing data is only useful if IB ACM is enabled and may be
either struct ibv_path_record or struct ibv_path_data on input.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Fix duplicate free of connect

The connect data stored with the cma_id_private is freed in
rdma_connect, since it is no longer needed. Avoid duplicating
the free in rdma_destroy_id by checking for connect_len = 0,
rather than connect to be NULL.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rdma/verbs: Fix race polling for completions

To avoid hanging in rdma_get_send/recv_comp, we need to rearm
the CQ inside of the while loop.  If the CQ is armed,
the HCA will write an entry to the CQ, then generate a CQ
event.  However, a caller could poll the CQ, find the entry,
then attempt to rearm the CQ before the HCA generates the CQ
event.  In this case, the rearm call (ibv_req_notify_cq) will
act as a no-op, since the HCA hasn't finished generating the
event for the previous completion.  At this point, the event
will be queued.

A call to ibv_get_cq_event will find the event, but not
a CQ entry.  The CQ is now not armed, and a call to
ibv_get_cq_event will block waiting for an event that will
never occur.

Problem was found in an rdma_cm example test under development.
The test can ping-pong messages between two applications.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

v1.0.15

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Fix resource in rdma_migrate_id() error flow

Prevent resource leak by destroying the event channel before returning from
function in an error flow.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Fix resource leak when CMA_CREATE_MSG_CMD_RESP fails

If resources are allocated before CMA_CREATE_MSG_CMD_RESP or
CMA_CREATE_MSG_CMD are called, and those calls fail, we need
to cleanup the resources before returning.

Fix this by changing the CMA_CREATE_MSG macros to remove the
alloca and calling return. The request and response structures
are now declared directly on the stack. To accomplish this,
we merge the abi header definition into each command structure.

Problem reported by: Dotan Barak <dotanb@dev.mellanox.co.il>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rdma_xserver/client: Add new test apps

Add new versions of the rdma_server and rdma_client tests that
support other types of connections and show how to use more
RDMA features. We keep the existing rdma_server and rdma_client
tests as simple examples.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Do not wait in rdma_accept for UD QPs

There are no additional connection events to process for UD QPs
after calling rdma_accept(). When using synchronous rdma_cm_id's,
simply return to the user after sending the reply. Do not wait
for additional events.

This fixes a hang on the server side when setting up UD QP
communication.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Specify QP type separately from port space

We need to know the QP type separately from the port space. In
order to support XRC, UC, and other QP types, we use RDMA_PS_IB,
which no longer provides a 1:1 mapping between the port space
and QP type.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Abstract ibverbs SRQ creation

Support QPs with SRQs. If a user allocates an SRQ on an
rdma_cm_id, we post receive messages directly to the SRQ.
This also allows us to handle XRC SRQs, which may be associated
with an rdma_cm_id, but without a corresponding QP.

To handle registering memory, we store the PD associated
with an rdma_cm_id directly with the id, rather than finding
the PD using a QP pointer.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Add support for XRC qp types

Support XRC send/receive qp types.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Renumber RDMA_PS_IB to match kernel patch

RDMA_PS_IB is only a placeholder and not usable yet. Update
the assigned value to match that specified for the kernel.

Update rdma_getaddrinfo to use the port space when formatting
responses.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Limit autotools output

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Use union with sockaddr structures

To avoid strict aliasing compiler warnings, use an unamed union
to store the src and dst addresses. This eliminates the need
for padding and sockaddr casts.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Fix crash in rdma_connect

When using rdma_connect for UD QP lookup, there may not be any
QP associated with the rdma_cm_id.  Plus there may not be any use
for the conn_param parameter.  Allow conn_param to be optional
in this situation.  This fixes a crash exposed by rdma_xclient
sample using XRC QPs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Fail ucma_init if ibv_get_device_list is empty

From the ibv_get_device_list man page:

   ibv_get_device_list() returns the array of available RDMA devices, or
   sets errno and returns NULL if the request fails. If no devices are
   found then num_devices is set to 0, and non-NULL is returned.

The librdmacm handles the failure case, but not the case where no
devices are found.  Handle that case as well.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

rdma_server: fix typo in print

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: define REUSEADDR option

Support equivalent of SO_REUSEADDR socket option.  When specified
the rdma_cm_id will be bound to a reuseable address.  This will
allow other users to bind to that same address.  This is needed
to support lustre on clusters larger than 1024 nodes.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/man: Size of private data for accept is 196, not 160

The rdma cm header is not used on the reply.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Document the fact that errno of EISCONN (Transport endpoint
is already connected) isn't a failure

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/doc: Document private data length limitations

Document the limitations on the user provided private data length
over Infiniband networks. These limitations are calculated by
subtracting the rdma-cm header size (see IBA Annex A11 "RDMA CM IP
Service") from IB's private data len for the REQ (rdma_connect) and
REP (rdma_accept) messages

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/man: fixup rdma_accept documentation for responder_resources

Responder_resources may be greater than that specified in the
connect request, up to the maximum supported by the device.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Document to set accept_local sysctl for loopback

For loopback connections between ports on a single system to work,
accept_local sysctl must be set to 1. Add this to the readme.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Only allow a user to allocate a single QP off an rdma_cm_id

Add a simple check to rdma_create_qp() to see if a QP has already been
associated with an rdma_cm_id. Otherwise a user can allocate a
second QP with an ID, with the reference to the first QP replaced
by the second allocation.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

Update rdma_join_multicast documentation to include attaching QP

The man pages for rdma_join_multicast are not clear on how the
user can attach the QP to the multicast group. Clarify this
behavior.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Add a check for libpthread during librdmacm configure.

Add a check for libpthread during librdmacm configure. This will add
libpthread to the list of libraries that librdmacm is linked to.
Currently librdmacm gets libpthread implicitly through libibverbs, but
this breaks when using a linker that does not implicitly link with such
dependencies; eg the new gold linker is such a linker:

<http://wiki.debian.org/qa.debian.org/FTBFS#A2009-11-
02Packagesfailingbecausebinutils-gold.2BAC8-indirectlinking>

Addresses: http://bugs.debian.org/555380
Addresses: https://bugs.launchpad.net/ubuntu/+source/librdmacm/+bug/687983

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

v1.0.14

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Support RAI_NUMERICHOST and no delay options

Add support similar to getaddrinfo AI_NUMERICHOST.  This
indicates that lengthy address resolution protocols should
not be used.  Also allow a caller of rdma_getaddrinfo to
indicate that lengthy route resolution protocols should not
be used.

Since rdma_getaddrinfo is a synchronous call, this allows a
user to obtain locally available data only without long
delays that may block an application thread.  Callers can then
use the asynchronous librdmacm calls to complete any missing
information.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: support non-default ACM port number

By default, ACM uses port 6125. The actual port number
used is now published in /var/run/ibacm.port. Attempt to
obtain the correct port number from here, and if that fails
revert to using the default port number of 6125.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/rping: Make sure CQ event thread exits before destroying the CQ

It is possible for the CQ event thread to poll the CQ after it has been
destroyed which can result in a seg fault on T3 interfaces. This patch
waits for the thread to exit before destroying the CQ.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: fix compiler warning of void * arithmetic

void * pointer arithmetic is non-standard.

Signed-off-by: Jonathan Rosser <jrosser@rd.bbc.co.uk>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: fix make install

make install fails if the include files in the install prefix
include/rdma,infiniband already exist. install claims that the <src>
and <destination> file are the same and exits with an error.

This patch modifies Makefile.am so that the rdma and infiniband include
files explicitly reference the source directory rather than the build
directory.

Also, EXTRA_DIST now only lists files that are not referenced anywhere
else in Makefile.am

Signed-off-by: Jonathan Rosser <jrosser@rd.bbc.co.uk>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: Fix autotools to include the necessary M4 files

Otherwise running autogen.sh with a new version of autotools and then
building on a system with an older version tends to explode.
Unfortunately this is sometimes necessary since the new version is
required by the package.

This is how GNU envisions this mess works at least..

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

RPING: Remove printf for FLUSH completion.

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: do not modify qp_init_attr in rdma_get_request

rdma_create_qp modifies the qp_init_attr structure passed in
by the user to return the actual QP capabilities that were
allocated.  If the qp_init_attr does not specify CQs, the
librdmacm will allocate CQs for the user and return them.

rdma_get_request will allocate a QP off newly connected rdma_cm_id
if the corresponding listen request is associated with a
qp_init_attr structure.  The librdmacm passes in the listen->
qp_init_attr structure into the rdma_create_qp call.
rdma_create_qp ends up modifying the qp_init_attr's associated
with the listen.  The result is that future calls to
rdma_get_request will use the modified qp attributes, rather
than those specified by the user.

Fix this by having rdma_get_request pass in a copy of the
qp_init_attr, rather than modifying those associated with the
listen.  Also update the man page for rdma_create_qp to indicate
that the qp_init_attr structure may be modified on output.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: only allocate qp in rdma_create_ep if qp_attr provided

The comments and documentation for rdma_create_ep indicate that
it will only allocate a QP if initial QP attributes are provided.
However, the code always attempts to create a QP off an associated
active rdma_cm_id endpoint.

By _not_ allocating the QP, this allows a user to first determine
what RDMA device a rdma_cm_id was associated with. The user can
then create a QP that references an existing CQ, SRQ, or PD.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: expand support for hints to rdma_getaddrinfo

If a user passes in hints into rdma_getaddrinfo, they can
specify resolved source and destination addresses. In this
case, there's no need for the user to specify the node or
service parameters. This differs from getaddrinfo, which
indicates that either node or service must be provided, but
is useful if rdma_getaddrinfo is being used to obtain
routing data.

Supporting this option allows the librdmacm to call
rdma_getaddrinfo internally from rdma_resolve_route when IB ACM
is enabled.

In addition to specifying the source and destination addresses
as part of the hints, a user could instead specify partial
routing data and rdma_getaddrinfo can resolve the full route.

This helps to support MPI applications that exchange endpoint
data, such as LIDs, out of band, but require SL data from the
SA to avoid potential deadlock conditions.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: do not pass uninitialized ai_hints into getaddrinfo

If rdma_getaddrinfo is called with hints set to NULL, then an
uninitialized ai_hints structure will be passed into getaddrinfo.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: release 1.0.13

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: document event status field

Clarify the value returned in the event status field.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/man: fix typos in man pages

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: specify return value in man pages

Document the return value of all calls in their respective man pages.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: fix all calls to set errno

The librdmacm documentation (rdma_cm.7 man page) specifies that librdmacm
functions return 0 on success and -1 on error, with errno set correctly.
Update places in the code where errno is not set correctly and
cleanup setting the error code.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: convert ibv error values to errno

The librdmacm documentation (rdma_cm.7 man page) specifies that librdmacm
functions return 0 on success and -1 on error, with errno set correctly.
The libibverbs abstractions simply pass the libibverbs return codes
through to the user. Since libibverbs may return errno directly through
a given call, convert the return status to use errno where appropriate.

This fixes an issue with rdma_get_send_comp and rdma_get_recv_comp, where
a return value of 1 could indicate both success (1 completed request
returned) and failure (EPERM error).

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/man: add man pages for calls in rdma_verbs.h

rdma_verbs.h define several inline functions that wrap around verb
routines. Add man pages for these calls.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/man: add rdma_get_request man page to release

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: remove 32-bit build warnings

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: support 2.6.9

Redhat 4.x is based on 2.6.9. Add support for older kernels.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: release 1.0.12

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: fix makefile installation path

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/man: add man pages for new APIs

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: define struct ibv_path_record if not defined

librdmacm relies on struct ibv_path_record from libibverbs. However,
to support older versions of libibverbs, define struct ibv_path_record
if it is not already defined.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: disable AF_IB support

To avoid potential compatability issues, disable AF_IB support
until it has been queued for inclusion upstream. We will re-enable
AF_IB support after releasing version 1.0.12 for OFED 1.5.2, which
will include support for IB ACM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: initialize path_cnt

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: use IB ACM to resolve IB path

Starting with 2.6.33, the kernel supports the ability to
manually specify the path record that a connection should
use. Allow the librdmacm to contact the IB ACM to acquire
path record data, even if rdma_getaddrinfo is not used and
the kernel does not support AF_IB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: check if kernel supports AF_IB

Add check during initialization to determine if the kernel
supports AF_IB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/mckey: use AF_IB for unmapped multicast addresses

If the user joins an unmapped multicast address, use AF_IB,
rather than AF_INET6, to communicate that information with the
kernel.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: update man pages

Update man pages to reflect recent changes to the APIs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/rdma_server: add new sample server application

Provide a simple server application to demonstrate the minimal
amount of coding needed to accept a connection request from
a client and exchange messages.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm/rdma_client: add new client sample

Provide a very simple client application that shows the
minimal coding needed to establish a connection and exchange
messages with a server. The client makes use of the new
rdma_getaddrinfo and rdma_create_ep calls, plus rdma verbs
abstractions.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: provide abstracted verb calls

Provide abstractions to the verb calls to simplify the user
interface for more casual verbs consumers. Users still have
access to the full range of verbs functionality by calling
verbs directly.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: format IB CM private data RDMA CM header

When IB ACM is used, the address and route resolution is
done entirely in user space. Before converting AF_INET or
AF_INET6 addresses to AF_IB, format the connection private
data for IB CM REQ messages.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: use sockaddr_ib addressing if IB ACM is in use

If IB ACM route resolution succeeds, provide the user with
AF_IB addresses, rather than AF_INET or AF_INET6 addressing.
AF_IB identifies the local and remote devices directly,
eliminating the need to perform address resolution a second
time via rdma_resolve_addr.

AF_IB addresses are returned using sockaddr_ib as part of
rdma_getaddrinfo.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

librdmacm: define RDMA_PS_IB

AF_IB uses the IB port space.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>