]> git.openfabrics.org - ~shefty/librdmacm.git/log
~shefty/librdmacm.git
10 years ago[librdmacm] man/rstream.1: Update man page to be consistent with rstream -h
Hal Rosenstock [Wed, 11 Sep 2013 19:37:11 +0000 (15:37 -0400)]
[librdmacm] man/rstream.1: Update man page to be consistent with rstream -h

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years ago[librdmacm] rstream.c: Indicate when specified address family is unknown
Hal Rosenstock [Wed, 11 Sep 2013 18:44:32 +0000 (14:44 -0400)]
[librdmacm] rstream.c: Indicate when specified address family is unknown

Signed-off-by: Hal Rosenstock >hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years ago[librdmacm] man/rdma_create_id.3: Add RDMA_PS_IB port space description
Hal Rosenstock [Wed, 11 Sep 2013 18:44:28 +0000 (14:44 -0400)]
[librdmacm] man/rdma_create_id.3: Add RDMA_PS_IB port space description

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years agoexamples: Add cmtime to .gitignore
Yan Droneaud [Tue, 27 Aug 2013 18:37:54 +0000 (11:37 -0700)]
examples: Add cmtime to .gitignore

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years agorsocket: Update rsocket man page
Sean Hefty [Thu, 22 Aug 2013 22:29:15 +0000 (15:29 -0700)]
rsocket: Update rsocket man page

Update fork support and RDMA_ROUTE socket option.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years agocmtime: Add retry support for address and route resolution
Sean Hefty [Thu, 22 Aug 2013 19:00:54 +0000 (12:00 -0700)]
cmtime: Add retry support for address and route resolution

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years agocmtime: Allow user to specify timeout values
Sean Hefty [Thu, 22 Aug 2013 18:54:56 +0000 (11:54 -0700)]
cmtime: Allow user to specify timeout values

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years agocmtime: Add ability to time rdma_bind_addr calls
Sean Hefty [Thu, 22 Aug 2013 18:30:33 +0000 (11:30 -0700)]
cmtime: Add ability to time rdma_bind_addr calls

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years agocmtime: Add example program that times rdma cm calls
Sean Hefty [Mon, 5 Aug 2013 17:57:43 +0000 (10:57 -0700)]
cmtime: Add example program that times rdma cm calls

cmtime is a new sample program that measures how long it
takes for each step in the connection process to complete.
It can be used to analyze the performance of the various
CM steps.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years agorstream: Use rsocket option to set route directly
Sean Hefty [Fri, 26 Jul 2013 16:52:55 +0000 (09:52 -0700)]
rstream: Use rsocket option to set route directly

If we're using GID addressing, rdma_getaddrinfo can return
routing data directly.  Add an option for the user to
indicate that rdma_getaddrinfo should be called in place of
getaddrinfo.  And if routing data is available, call
rsetsockopt to set the route.

This helps test rsockets when ibacm and AF_IB support are
available.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years agorsocket: Return 0 on success for SOL_RDMA options
Sean Hefty [Fri, 2 Aug 2013 21:18:06 +0000 (14:18 -0700)]
rsocket: Return 0 on success for SOL_RDMA options

The processing of SOL_RDMA does not set the return value in
the case of successfully handled options.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years agorsockets: Add ability to set the IB route directly
Sean Hefty [Mon, 10 Jun 2013 19:33:20 +0000 (12:33 -0700)]
rsockets: Add ability to set the IB route directly

Add an RDMA specific rsocket option that allows the user
to program the RDMA route directly.  This is useful
for apps that have path record data available, e.g. from
ibacm.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years agoexamples: Add support for native IB addressing to samples
Sean Hefty [Sun, 21 Jul 2013 02:22:55 +0000 (19:22 -0700)]
examples: Add support for native IB addressing to samples

Allow the user to specify GID addresses (AF_IB) into
udaddy and rstream.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years agorsockets: Support native IB addressing on connected rsockets
Sean Hefty [Thu, 18 Jul 2013 20:26:15 +0000 (13:26 -0700)]
rsockets: Support native IB addressing on connected rsockets

Update rsockets to support AF_IB addresses on connected rsockets.
Support for datagram rsockets is more difficult as a result of
using real UDP sockets for QP resolution, so that support is
deferred.  For connected sockets, we need to update internal
checks to handle AF_IB.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
10 years ago[4/4] Declare 'server_port' as an unsigned variable
Bart Van Assche [Sun, 28 Jul 2013 09:20:54 +0000 (11:20 +0200)]
[4/4] Declare 'server_port' as an unsigned variable

Change the data type of the 'server_port' variable from signed to
unsigned such that the cast in the fscanf() call can be removed.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
10 years ago[3/4] rsocket: Remove the unused variable 'ret'
Bart Van Assche [Sun, 28 Jul 2013 09:19:48 +0000 (11:19 +0200)]
[3/4] rsocket: Remove the unused variable 'ret'

The variable 'ret' is assigned a value but that value is never used.
This triggers the following compiler warning:

src/rsocket.c:3720:9: warning: variable 'ret' set but not used [-Wunused-but-set-variable]

Hence remove this variable.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
10 years ago[2/4] cma: Remove the unused variable 'id_priv'
Bart Van Assche [Sun, 28 Jul 2013 09:19:15 +0000 (11:19 +0200)]
[2/4] cma: Remove the unused variable 'id_priv'

The variable 'id_priv' is assigned a value but is never used.
This triggers the following compiler warning:

src/cma.c:1178:25: warning: variable 'id_priv' set but not used [-Wunused-but-set-variable]

Hence remove this variable.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
10 years ago[1/4] acm: Remove the unused variable 'pri_path'
Bart Van Assche [Sun, 28 Jul 2013 09:18:36 +0000 (11:18 +0200)]
[1/4] acm: Remove the unused variable 'pri_path'

The variable 'pri_path' is assigned a value but is never used.
This triggers the following compiler warning:

src/acm.c:301:26: warning: variable 'pri_path' set but not used [-Wunused-but-set-variable]

Hence remove this variable.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
11 years agoinit: Remove USE_IB_ACM configuration option
Sean Hefty [Mon, 10 Jun 2013 17:57:56 +0000 (10:57 -0700)]
init: Remove USE_IB_ACM configuration option

When the librdmacm is configured, it sets the USE_IB_ACM option
if infininband/acm.h is found.  We can remove this option with
very little overhead, which would allow a user to install
ACM after installing the librdmacm, and the librdmacm would be
able to make use of ACM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoacm: Define needed ACM protocol messages
Sean Hefty [Mon, 10 Jun 2013 18:07:12 +0000 (11:07 -0700)]
acm: Define needed ACM protocol messages

The librdmacm needs message definitions used to communicate
with the ibacm.  It currently pulls these from infiniband/acm.h,
which is installed by ibacm.  This creates an install order
dependency on ibacm.  However, work on the scalable SA has
the ibacm using the librdmacm (via rsockets) for communication
between the different SSA components.

To resolve this issue, have the librdmacm define the message
structures that it needs to communicate with ibacm.  The
librdmacm already defines some ACM messages through configuration
checks.  We just expand that capability, which isolates the librdmacm
package from the ibacm package.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agocmatose: Allow user to specify address format
Sean Hefty [Wed, 29 Aug 2012 22:02:54 +0000 (15:02 -0700)]
cmatose: Allow user to specify address format

Provide an option for the user to indicate the type of
addresses used as input.  Support hostname, IPv4, IPv6,
and GIDs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoRemove executable mode bit on text files
Yann Droneaud [Tue, 16 Jul 2013 23:03:42 +0000 (16:03 -0700)]
Remove executable mode bit on text files

Source code and man page should not be executable.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoOpen files with "close on exec" flag
Yann Droneaud [Tue, 16 Jul 2013 21:59:52 +0000 (23:59 +0200)]
Open files with "close on exec" flag

File opened by librdmacm are not supposed to be inherited across
exec*(), most of the files are of no use for another program, and
others cannot be used without the associated memory mapping.

This patch changes fopen() open() and socket() to always set
close on exec flag.

This patch also add checks to configure to guess if fopen() supports
"e" flag. If O_CLOEXEC and SOCK_CLOEXEC are supported, fopen() should
support "e". If not supported, its discarded according to POSIX. Many
operating systems have support for fopen("e").

You might find more information about close on exec in the following articles:

- "Excuse me son, but your code is leaking !!!" by Dan Walsh
  http://danwalsh.livejournal.com/53603.html

- "Secure File Descriptor Handling" by Ulrich Drepper
  http://udrepper.livejournal.com/20407.html

Note: this patch won't set close on exec flag on file descriptors
created by the kernel for completion channel and such.
This is addressed by another kernel patch.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoAdd .gitignore rules
Yann Droneaud [Tue, 16 Jul 2013 21:59:50 +0000 (23:59 +0200)]
Add .gitignore rules

Add the list of files/patterns to be exclueded from git status output.
Additionally it will prevent such files/patterns to be added and committed.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoconfigure: Use automake's option "subdir-objects"
Yann Droneaud [Tue, 16 Jul 2013 21:59:49 +0000 (23:59 +0200)]
configure: Use automake's option "subdir-objects"

Following advice in "Autotool Mythbuster" [1], option subdir-objects
can be used to have Makefiles create object files in the same
directory than theirs source files.

It reduces clobbering in the build directory.

[1] "Autotool Mythbuster", by Diego Elio "Flameeyes" Petten`o
http://www.flameeyes.eu/autotools-mythbuster/automake/nonrecursive.html

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoconfigure: Apply updates proposed by autoupdate
Yann Droneaud [Tue, 16 Jul 2013 21:59:48 +0000 (23:59 +0200)]
configure: Apply updates proposed by autoupdate

'autoupdate' is a tool to help developer to update configure.ac.

This patch applies a few fixes as suggested by autoupdate.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoautogen.sh: Use autoreconf in autogen.sh
Jeff Squyres [Tue, 16 Jul 2013 21:59:47 +0000 (23:59 +0200)]
autogen.sh: Use autoreconf in autogen.sh

The old sequence of Autotools commands listed in autogen.sh is no
longer correct.  Instead, just use the single "autoreconf" command,
which will invoke all the Right Autotools commands in the correct
order.

Signed-off-by: Jeff Squyres <jsquyres@cisco.com>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoMakefile.am: Fix an automake warning
Bart Van Assche [Tue, 16 Jul 2013 21:59:46 +0000 (23:59 +0200)]
Makefile.am: Fix an automake warning

Fix the following automake warning message:

    Makefile.am:1: `INCLUDES' is the old name for `AM_CPPFLAGS' (or `*_CPPFLAGS')

A quote from the automake manual:

    INCLUDES
        This does the same job as AM_CPPFLAGS (or any per-target _CPPFLAGS variable
        if it is used). It is an older name for the same functionality. This
        variable is deprecated; we suggest using AM_CPPFLAGS and per-target
        _CPPFLAGS instead.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoAdd "foreign" option to AM_INIT_AUTOMAKE
Bart Van Assche [Tue, 16 Jul 2013 21:59:45 +0000 (23:59 +0200)]
Add "foreign" option to AM_INIT_AUTOMAKE

Switch to the modern form of the AM_INIT_AUTOMAKE macro and tell
automake that the librdmacm package does not follow the GNU
standards. This change makes it possible to use 'autoreconf' for the
librdmacm package.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolib: Rename configure.in to configure.ac
Sean Hefty [Thu, 2 May 2013 20:47:51 +0000 (13:47 -0700)]
lib: Rename configure.in to configure.ac

Update to latest autotools naming.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsocket: Add support for iWarp
Sean Hefty [Thu, 11 Apr 2013 17:05:29 +0000 (10:05 -0700)]
rsocket: Add support for iWarp

iWarp does not support RDMA writes with immediate data.
Instead of sending messages using immediate data, allow
the rsocket protocol to exchange messages using sends.

The rsocket protocol remains the same.  RDMA writes are
used for data transfers, with send messages used to transfer
rsocket protocol messages.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsocket: Merge usage of wr_id between stream and datagram svcs
Sean Hefty [Fri, 12 Apr 2013 21:41:52 +0000 (14:41 -0700)]
rsocket: Merge usage of wr_id between stream and datagram svcs

The rsocket data streaming and datagram services use different
formats for the wr_id.  Although some differences are needed,
we can make them more similar.  This will be useful when the
wr_id is used for iwarp support, plus eliminates use of wr_id
bits that aren't actually needed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Release 1.0.17 v1.0.17
Sean Hefty [Wed, 6 Mar 2013 01:18:11 +0000 (17:18 -0800)]
librdmacm: Release 1.0.17

11 years agolibrdmacm/rsocket: Fix resetting O_NONBLOCK after calling shutdown
Sean Hefty [Wed, 20 Feb 2013 04:03:58 +0000 (20:03 -0800)]
librdmacm/rsocket: Fix resetting O_NONBLOCK after calling shutdown

Shutdown switches an rsocket from nonblocking to blocking to
ensure that all data has been sent.  After completing all
transfers, it should switch back to nonblocking; this handles
partial shutdown situations, where only half the connection
is shut down.  However, the code uses the value of '1' to
set the nonblocking flag, rather than O_NONBLOCK.  Fix this.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm/rstream: Reduce default transfer count
Sean Hefty [Tue, 5 Feb 2013 00:52:18 +0000 (16:52 -0800)]
librdmacm/rstream: Reduce default transfer count

1 million ping-pong transfers takes over 3 seconds to
complete, and I'm impatient.  Reduce the default number of
transfers for small messsages to speed up running
performance tests, especially when running over slower
connections, like TCP sockets or over a WAN.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Work-around kernel bug returning uid = 0
Sean Hefty [Sat, 2 Feb 2013 01:17:34 +0000 (17:17 -0800)]
librdmacm: Work-around kernel bug returning uid = 0

Older kernels have a bug where it can report an event with the
uid set to 0.  The librdmacm crashes when casting the uid to
an rdma_cm_id and dereferencing the NULL pointer.

There are a limited number of events where this can occur and
in most cases it's safe to simply discard the event.  (This is
what the kernel does anyway.)  However, it's possible for us
to process an RDMA_CM_EVENT_ESTABLISHED event with the uid
set to 0.  (See kernel commit 418edaaba96e58112b15c82b4907084e2a9caf42.)

Although it's rare for this to occur, it does in fact happen
in practice.  To work-around the kernel bug, when the uid of an
established event is set to 0, we first try to locate the correct
user space id based on related data before discarding the event.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Define ucma_ib_init when IB_ACM is disabled
Sean Hefty [Mon, 28 Jan 2013 22:56:25 +0000 (14:56 -0800)]
librdmacm: Define ucma_ib_init when IB_ACM is disabled

ucma_ib_init is only defined if IB_ACM is enabled, which is
determined by looking for the infiniband/acm.h header file.
Define ucma_ib_init when IB_ACM is disabled.

Problem reportedy by Suresh Shelvapille <suri@baymicrosystems.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Update rsocket man page
Sean Hefty [Mon, 21 Jan 2013 23:28:39 +0000 (15:28 -0800)]
rsockets: Update rsocket man page

Update man page to include recently added rsocket options
and undocumented configuration file.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Add support for existing UDP apps
Sean Hefty [Wed, 9 Jan 2013 22:54:47 +0000 (14:54 -0800)]
rsockets: Add support for existing UDP apps

Support for existing UDP applications is done via the rspreload
library.  However, when the preload library is loaded, socket
calls used by rsockets get intercepted and converted into
rsocket calls.

The preload library was able to handle this for TCP rsockets
by using a per thread variable and checking for recursive calls
coming from rsockets back into the preload library.  The preload
library would direct such calls to the real socket calls.

The problem is more complex for UDP rsockets, which can invoke
socket calls from an internal rsocket thread.  The result is that
the preload library intercepts socket calls that originate from
the rsocket library which are not recursive.

Although, this is really a problem with the preload library,
the simplest solution is for rsockets to fully initialize the
library when allocating the first rsocket, versus deferring
initialization until required.  The preload library can then
detect the recursive calls.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoexamples/udpong: Add test program for rsocket datagrams
Sean Hefty [Wed, 5 Dec 2012 23:58:03 +0000 (15:58 -0800)]
examples/udpong: Add test program for rsocket datagrams

Add a sample test program to test datagram rsockets.  Move
common routines used by udpong and other test programs into
a common source file.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsocket: Add datagram support
Sean Hefty [Fri, 9 Nov 2012 18:26:38 +0000 (10:26 -0800)]
rsocket: Add datagram support

Add datagram support through the rsocket API.

Datagram support is handled through an entirely different protocol and
internal implementation than streaming sockets.  Unlike connected rsockets,
datagram rsockets are not necessarily bound to a network (IP) address.
A datagram socket may use any number of network (IP) addresses, including
those which map to different RDMA devices.  As a result, a single datagram
rsocket must support using multiple RDMA devices and ports, and a datagram
rsocket references a single UDP socket, plus zero or more UD QPs.

Rsockets uses headers inserted before user data sent over UDP sockets to
resolve remote UD QP numbers.  When a user first attempts to send a datagram
to a remote address (IP and UDP port), rsockets will take the following steps:

1. Store the destination address into a lookup table.
2. Resolve which local network address should be used when sending
   to the specified destination.
3. Allocate a UD QP on the RDMA device associated with the local address.
4. Send the user's datagram to the remote UDP socket.

A header is inserted before the user's datagram.  The header specifies the
UD QP number associated with the local network address (IP and UDP port) of
the send.

A service thread is used to process messages received on the UDP socket.  This
thread updates the rsocket lookup tables with the remote QPN and path record
data.  The service thread forwards data received on the UDP socket to an
rsocket QP.  After the remote QPN and path records have been resolved, datagram
communication between two nodes are done over the UD QP.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years ago[librdmacm] Fixed build problem due to missing macro
Or Gerlitz [Sun, 2 Dec 2012 12:04:23 +0000 (12:04 +0000)]
[librdmacm] Fixed build problem due to missing macro

rsocket.c wasn't passing compilation as of missing definition for the
container_of macro, fix it. Reported-by: Eyal Salamon <esalomon@mellanox.com>

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsocket: Remove fscanf build warnings
Sean Hefty [Mon, 5 Nov 2012 19:53:03 +0000 (11:53 -0800)]
rsocket: Remove fscanf build warnings

Cast fscanf return values to (void) to indicate that we don't
care if the call fails.  In the case of a failure, we simply
fall back to using default values.

Problem reported by Or Gerlitz <ogerlitz@mellanox.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoriostream: Add example program for using iomap routines.
Sean Hefty [Wed, 24 Oct 2012 17:23:52 +0000 (10:23 -0700)]
riostream: Add example program for using iomap routines.

riostream is based on rstream, but uses the new riomap, riounmap,
and riowrite calls instead.  It runs a series of latency and
bandwidth tests using remote iomapped memory.

riostream is limited to using zero copy transfers at the
receiving side only at this time.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsocket: Add APIs for direct data placement
Sean Hefty [Sun, 21 Oct 2012 21:16:03 +0000 (14:16 -0700)]
rsocket: Add APIs for direct data placement

We introduce rsocket extensions for supporting direct
data placement (also known as zero copy).  Direct data
placement avoids data copies into network buffers when
sending or receiving data.  This patch implements zero
copies on the receive side, but adds some basic framework for
supporting it on the sending side.

Integrating zero copy support into the existing socket APIs
is difficult to achieve when the sockets are set as
nonblocking.  Any such implementation is likely to be unusable
in practice.  The problem stems from the fact that socket
operations are synchronous in nature.  Support for asynchronous
operations is limited to connection establishment.

Therefore we introduce new calls to handle direct data placement.
The use of the new calls is optional and does not affect the
use of the existing calls.  An attempt is made to have the new
routines integrate naturally with the existing APIs.  The new
functions are: riomap, riounmap, and riowrite.  The basic operation
can be described as follows:

1. App A calls riomap to register a data buffer with the local
   RDMA device.  Riomap returns an off_t offset value that
   corresponds to the registered data buffer.  The app may
   select the offset value.
2. Rsockets will transmit an internal message to the remote
   peer with information about the registration.  This exchange
   is hidden from the applications.
3. App A sends a notification message to app B indicating that
   the remote iomapped buffer is now available to receive data.
4. App B calls riowrite to transmit data directly into the
   riomapped data buffer.
5. App B sends a notification message to app A indicating that
   data is available in the mapped buffer.
6. After all transfers are complete, app A calls riounmap to
   deregister its data buffer.

Riomap and riounmap are functionally equivalent to RDMA
memory registration and deregistration routines.  They are loosely
based on the mmap and munmap APIs.

off_t riomap(int socket, void *buf, size_t len,
     int prot, int flags, off_t offset)

Riomap registers an application buffer with the RDMA hardware
associated with an rsocket.  The buffer is registered either for
local only access (PROT_NONE) or for remote write access (PROT_WRITE).
When registered for remote access, the buffer is mapped to a given
offset.  The offset is either provided by the user, or if the user
selects -1 for the offset, rsockets selects one.  The remote peer may
access an iomapped buffer directly by specifying the correct offset.
The mapping is not guaranteed to be available until after the remote
peer receives a data transfer initiated after riomap has completed.

int riounmap(int socket, void *buf, size_t len)

Riounmap removes the mapping between a buffer and an rsocket.

size_t riowrite(int socket, const void *buf, size_t count,
off_t offset, int flags)

Riowrite allows an application to transfer data over an rsocket
directly into a remotely iomapped buffer.  The remote buffer is specified
through an offset parameter, which corresponds to a remote iomapped buffer.
From the sender's perspective, riowrite behaves similar to rwrite.  From
a receiver's view, riowrite transfers are silently redirected into a pre-
determined data buffer.  Data is received automatically, and the receiver
is not informed of the transfer.  However, iowrite data is still considered
part of the data stream, such that iowrite data will be written before a
subsequent transfer is received.  A message sent immediately after
initiating an iowrite may be used to notify the receiver of the iowrite.

It should be noted that the current implementation primarily focused
on being functional for evaluation purposes.  Some checks have been
deferred for subsequent patches, and performance is currently limited
by linear lookups.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agordma_xserver/client: Fix man page formatting
Roland Dreier [Tue, 16 Oct 2012 19:44:39 +0000 (19:44 +0000)]
rdma_xserver/client: Fix man page formatting

Putting 'r' at the beginning of a line in the nroff source for man pages
is confusing to nroff because lines that start with a single quote
character ' or a dot character . are treated as control lines, which is
not what's intended here.  Some of the man page text ends up left out of
the formatted output.

Fix this by just wrapping the text slightly differently in the source
(which doesn't matter since nroff reflows the text anyway).  Also add a
missing ".TP" so that the -p and -c options are not run together in the
formatted output.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Disable ACM support if ibacm.port is not found
Sean Hefty [Mon, 8 Oct 2012 17:33:21 +0000 (10:33 -0700)]
librdmacm: Disable ACM support if ibacm.port is not found

The librdmacm will try to connect port 6125 if ibacm.port is
not found.  The problem is that some other service or application
could be using that port and respond with garbage.  Rather
than falling back to a hard coded port number, if ibacm.port
is not found, simply disable ACM support.

This has the effect of removing support for older versions
of ibacm, unless the port file is created manually.

Patch created based on feedback from Doug Ledford and Florian
Weimer from RedHat.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years ago[5/5,librdmacm] rping: added checks to the return values functions
Dotan Barak [Tue, 9 Oct 2012 12:27:52 +0000 (12:27 +0000)]
[5/5,librdmacm] rping: added checks to the return values functions

This will make rping to exit with return value other than zero in case of an
error.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years ago[4/5,librdmacm] rstream: added missing return is accept() failed
Dotan Barak [Tue, 9 Oct 2012 12:27:51 +0000 (12:27 +0000)]
[4/5,librdmacm] rstream: added missing return is accept() failed

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years ago[3/5,librdmacm] rstream: initialize return value in server_connect()
Dotan Barak [Tue, 9 Oct 2012 12:27:50 +0000 (12:27 +0000)]
[3/5,librdmacm] rstream: initialize return value in server_connect()

If use_async == 0 and rs_accept() passes (i.e. non negative value), then
the return value from the function was uninitialized.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years ago[2/5,librdmacm] rsocket: added missing break
Dotan Barak [Tue, 9 Oct 2012 12:27:49 +0000 (12:27 +0000)]
[2/5,librdmacm] rsocket: added missing break

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years ago[1/5,librdmacm] rsocket: add missing va_end() after calling va_end()
Dotan Barak [Tue, 9 Oct 2012 12:27:48 +0000 (12:27 +0000)]
[1/5,librdmacm] rsocket: add missing va_end() after calling va_end()

Not doing so, may lead to resource leak.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoucmatose: Remove connect parameter passed into rdma_accept
Sean Hefty [Thu, 4 Oct 2012 19:01:50 +0000 (12:01 -0700)]
ucmatose: Remove connect parameter passed into rdma_accept

Pass in NULL for conn_param into rdma_accept to indicate
that the passive side will use the values specified by the
active side.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoucmatose: Fix number of connections to disconnect
Sean Hefty [Thu, 4 Oct 2012 18:49:59 +0000 (11:49 -0700)]
ucmatose: Fix number of connections to disconnect

When ucmatose aborts because of issues trying to connect
to the server, it moves to disconnecting all connections.
However, not all connections may have been established.
The result is that ucmatose will hang in disconnect_events.
Fix this by setting the number of times that we need to
disconnect to the number of times that we successfully
connect.

This problem is based on a report by Doug Ledford
<dledford@redhat.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorping: Reduce retry_count to fit in 3-bits
Sean Hefty [Wed, 3 Oct 2012 22:05:20 +0000 (15:05 -0700)]
rping: Reduce retry_count to fit in 3-bits

retry_count is a 3 bit value on IB, reduce it from
10 to 7.

A value of 10 prevents rping from working over the Intel
IB HCA.  Problem reported by Doug Ledford <dledford@redhat.com>

The retry_count is also not set when calling rdma_accept.
Rather than passing different values into rdma_accept than
what was specified by the remote side, use the values given
in the connection request.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Place container_of inside #ifdef
Sean Hefty [Sat, 22 Sep 2012 00:16:09 +0000 (17:16 -0700)]
librdmacm: Place container_of inside #ifdef

verbs.h defines container_of.  Only define it if not defined

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoaddrinfo: Remove debug printf calls
Sean Hefty [Wed, 3 Oct 2012 22:18:29 +0000 (15:18 -0700)]
addrinfo: Remove debug printf calls

These never should have made it into the commit.  :P

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Document rsocket protocol and design
Sean Hefty [Mon, 10 Sep 2012 21:32:45 +0000 (14:32 -0700)]
rsockets: Document rsocket protocol and design

Include a brief overview of the rsocket protocol and underlying design
with the source code to make it easier for someone trying to decipher
the actual code.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Support using GIDs with rdma_getaddrinfo
Sean Hefty [Tue, 28 Aug 2012 19:33:04 +0000 (12:33 -0700)]
librdmacm: Support using GIDs with rdma_getaddrinfo

Allow the user to specify a GID as the node parameter into
rdma_getaddrinfo.

To distinguish between the node being an IPv6 address or a GID,
we add a new flag, RAI_FAMILY, which can be set as part of the
hints to rdma_getaddrinfo.  When set, this flag indicates that the
value of ai_family in the hints should be used when interpretting
the node parameter.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Fix state checks in dup2
Sean Hefty [Fri, 7 Sep 2012 17:20:53 +0000 (10:20 -0700)]
rspreload: Fix state checks in dup2

The patch to add dup2 support was never updated to handle the fd
state.  The check for the fd type == fd_fork is no longer valid.
We need to instead check the fd state before handling forking.

Problem pointed out by Alex Couvrard <acouvrard@gmail.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Rename ucma_copy_rai_addr to ucma_set_ep_data
Sean Hefty [Wed, 29 Aug 2012 00:37:30 +0000 (17:37 -0700)]
librdmacm: Rename ucma_copy_rai_addr to ucma_set_ep_data

Simple function rename to better indicate operation.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Enable AF_IB support
Sean Hefty [Fri, 17 Aug 2012 21:02:45 +0000 (14:02 -0700)]
librdmacm: Enable AF_IB support

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Set address family for source address returned by ACM
Sean Hefty [Thu, 23 Aug 2012 22:48:06 +0000 (15:48 -0700)]
librdmacm: Set address family for source address returned by ACM

Set the sa_family type when saving the source address returnd
by ACM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Report error level in error messages
Yann Droneaud [Mon, 27 Aug 2012 23:37:29 +0000 (16:37 -0700)]
librdmacm: Report error level in error messages

Report error messages as either 'Warning' or 'Fatal'.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Use common prefix for error messages
Yann Droneaud [Mon, 27 Aug 2012 23:35:32 +0000 (16:35 -0700)]
librdmacm: Use common prefix for error messages

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Report error messages on stderr
Yann Droneaud [Mon, 27 Aug 2012 23:33:50 +0000 (16:33 -0700)]
librdmacm: Report error messages on stderr

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Avoid rsocket calls until after fork
Sean Hefty [Thu, 23 Aug 2012 18:20:08 +0000 (11:20 -0700)]
rspreload: Avoid rsocket calls until after fork

When an rsocket call is made before an application calls fork(),
the forked applications can hang.  This can be seen by running
netserver and two netperf clients simultaneously.  The second
netperf client will eventually stop performing data transfers.

LD_PRELOAD=librspreload.so netserver -D

LD_PRELOAD=librspreload.so netperf -v2 -c -C -H 192.168.0.101 -l30
LD_PRELOAD=librspreload.so netperf -v2 -c -C -H 192.168.0.101 -l30

It's not clear what the specific problem is.  The best guess is
that libibverbs or the provider library (e.g. libmlx4) perform
some initialization, such as mmap'ing device memory, which does not
work when fork is called.

As a work-around, avoid calling rsocket routines until immediately
before they are needed.  This allows the process to fork before
the libraries are initialized.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Fix checks in fork_active/passive
Sean Hefty [Mon, 20 Aug 2012 16:06:49 +0000 (09:06 -0700)]
rspreload: Fix checks in fork_active/passive

Fix passing in wrong variable to rconnect(), check state instead
of type, and move call to getpeername until after we are sure than
the normal socket connection has completed.

Problems pointed out by Sridhar Samudrala <sri@us.ibm.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Re-enable ibacm support
Sean Hefty [Fri, 17 Aug 2012 23:41:04 +0000 (16:41 -0700)]
librdmacm: Re-enable ibacm support

Commit 272c3cc024d0e5854cbafa6c2f1e8560398a68d7, "Delay ACM
connection until resolving an address", removed the call to
ucma_ib_init without adding it back in the correct location.
As a result, the librdmacm no longer uses ibacm.  Fix this
by adding the initialization call when resolving an address.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorstream: Use MSG_WAITALL for blocking test
Sean Hefty [Thu, 16 Aug 2012 22:41:35 +0000 (15:41 -0700)]
rstream: Use MSG_WAITALL for blocking test

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Add support for MSG_WAITALL rrecv() flag
Sean Hefty [Thu, 28 Jun 2012 18:34:38 +0000 (11:34 -0700)]
rsockets: Add support for MSG_WAITALL rrecv() flag

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Add fstat support
Sean Hefty [Tue, 7 Aug 2012 16:37:24 +0000 (09:37 -0700)]
rspreload: Add fstat support

vsftpd calls fstat on a socket.  Fake it out.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Support sendfile
Sean Hefty [Tue, 14 Aug 2012 00:00:42 +0000 (17:00 -0700)]
rspreload: Support sendfile

Handle users calling sendfile with an rsocket.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Do not block connect when supporting fork
Sean Hefty [Sat, 11 Aug 2012 04:44:39 +0000 (21:44 -0700)]
rspreload: Do not block connect when supporting fork

Many FTP servers require fork support.  However, FTP clients,
such as ncftp, will perform the following call sequence:

send PASV request to server over connection 1
         server will listen for connection 2
issue nonblocking connect to server
send ACCEPT request to server over connection 1
         server will accept connection 2

The current fork support converts all nonblocking connect
calls to blocking.  The result is that the FTP client ends up
blocked waiting for the server to accept the connection,
which it will never do.

To handle this case, we have the active side follow the same
rule as the server side and defer establishing the rsocket
connection until the user calls the first data transfer routine.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Minor cleanup of fork_passive handling
Sean Hefty [Mon, 13 Aug 2012 23:00:16 +0000 (16:00 -0700)]
rspreload: Minor cleanup of fork_passive handling

Minor code cleanup in passive side handling of fork support.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Support SO_OOBINLINE
Sean Hefty [Wed, 8 Aug 2012 04:31:12 +0000 (21:31 -0700)]
rsockets: Support SO_OOBINLINE

We don't support urgent data, so just return success.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Support dup2 calls
Sean Hefty [Mon, 30 Jul 2012 23:06:32 +0000 (16:06 -0700)]
rspreload: Support dup2 calls

vsftpd requires dup2() support.  To handle dup2, we need to add
reference count tracking to the preload fd's.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Call real.close in fd_close
Sean Hefty [Wed, 1 Aug 2012 23:26:11 +0000 (16:26 -0700)]
rspreload: Call real.close in fd_close

The index into the preload lookup table is obtained by opening
/dev/null and use the returned value.  When closing the file,
use the real close call and not the preload close call.  This
is a minor optimization, but clarifies the expected operation.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsocket: Improve disconnect time under normal conditions
Sean Hefty [Fri, 27 Jul 2012 17:46:42 +0000 (10:46 -0700)]
rsocket: Improve disconnect time under normal conditions

When both sides of a connection attempt to close at the same
time, one of the two sides can easily get an error when sending
a disconnect message.  This results in that side hanging
during close until the send times out.  (The time out is caused
by the remote side destroying its QP.)

We can reduce the chance of this occurring by immediately
assuming that the disconnect has been successful once we've
received the remote side's disconnect message, or we've
polled a send completion for the local disconnect message.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Use wr_id to determine completion type
Sean Hefty [Thu, 26 Jul 2012 22:35:32 +0000 (15:35 -0700)]
rsockets: Use wr_id to determine completion type

If a work request has completed in error, the completion type
field is undefined.  Use the wr_id to determine if the failed
completion was a send or receive.

This fixes an issue where MPI can hang during finalize.  With
both sides of a connection shutting down simultaneously, one
side may complete quicker and delete its QP before the other
side receives an acknowledgement to their disconnect message.
Eventually, the disconnect message will time out, but because
the completion type field is undefined, it may be processed
as a failed receive, rather than a failed send.  The end
result is that the second side hangs waiting for the send to
complete.

This problem showed up more easily after commit
2e5b0fc95964f74ea59dd725e849027faa0cd526, but existed beforehand.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Enable support for privileged ports
Sean Hefty [Wed, 25 Jul 2012 18:11:56 +0000 (11:11 -0700)]
rsockets: Enable support for privileged ports

Allow the preload library to use rsockets with priviledged
ports.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Call init from getsockname()
Sean Hefty [Tue, 24 Jul 2012 21:13:55 +0000 (14:13 -0700)]
rspreload: Call init from getsockname()

netperf for some unknown reason calls getsockname() using a
hard coded value of 0, without first allocating a socket.
This causes the rsocket preload library to crash, since the
library has not been properly initialized.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorstream: Add option to test fork support
Sean Hefty [Tue, 17 Jul 2012 22:32:54 +0000 (15:32 -0700)]
rstream: Add option to test fork support

If the user specifies '-T f', rstream will process
connections in a child process.  The server continues
to run until all child processes have completed their
tests.

Fork support requires use of the librspreload library.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrspreload: Support server apps that call fork()
Sean Hefty [Tue, 24 Jul 2012 18:40:10 +0000 (11:40 -0700)]
librspreload: Support server apps that call fork()

Provide limited support for applications that call fork().  To
handle fork(), we establish connections using normal sockets.
The socket is later converted to an rsocket when the user
makes the first call to a data transfer function (e.g. send,
recv, read, write, etc.).

Fork support is indicated by setting the environment variable
RDMAV_FORK_SAFE = 1.  When set, the preload library will delay
converting to an rsocket until the user attempts to send or receive
data on the socket.  To convert from a normal socket to an
rsocket, the preload library must inject a message on the
normal socket to synchronize between the client and server.  As
a result, if the rsocket connection fails, the ability to
silently fallback to the normal socket may be compromised.  Fork
support is disabled by default.

The current implementation works for simple test apps under
ideal conditions.  Although it supports nonblocking sockets, it
uses blocking rsockets when migrating connections.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrspreload: Make socket_fallback() call more generic
Sean Hefty [Mon, 16 Jul 2012 21:17:58 +0000 (14:17 -0700)]
librspreload: Make socket_fallback() call more generic

socket_fallback is used to switch from an rsocket to a normal
socket in the case of failures.  Rename the call and make it
more generic, so that it can switch between an rsocket and
a normal socket in either direction.  This will be used to
support fork().

As part of this change, we move the list of hooked and rsocket
calls into structures, versus maintaining a large number of
static variables.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrdmacm: Only allocate verbs resources when needed
Sean Hefty [Thu, 19 Jul 2012 17:09:48 +0000 (10:09 -0700)]
librdmacm: Only allocate verbs resources when needed

The librdmacm allocates a PD per device on initialization.  Although
we need to maintain the device list while the library is loaded
(see rdma_get_devices), we can reduce the overhead by only allocating
verbs resources when they are needed.

This allows the rsocket preload library to support fork for
applications that spawn connections off to child processes.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrdmacm: Remove unused 'ib' variable from ucma_init
Sean Hefty [Thu, 19 Jul 2012 17:13:50 +0000 (10:13 -0700)]
librdmacm: Remove unused 'ib' variable from ucma_init

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agov1.0.16 v1.0.16
Sean Hefty [Wed, 11 Jul 2012 00:55:32 +0000 (17:55 -0700)]
v1.0.16

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrspreload: Fix typecast to eliminate compile warnings
Hal Rosenstock [Thu, 12 Jul 2012 16:24:10 +0000 (09:24 -0700)]
librspreload: Fix typecast to eliminate compile warnings

src/preload.c: In function ?bind?:
src/preload.c:350: warning: assignment from incompatible pointer type
src/preload.c: In function ?connect?:
src/preload.c:397: warning: assignment from incompatible pointer type

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorspreload: Document use of librspreload library
Sean Hefty [Wed, 11 Jul 2012 22:13:29 +0000 (15:13 -0700)]
rspreload: Document use of librspreload library

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrdmacm: Include src/common.h in distribution
sean.hefty@intel.com [Wed, 11 Jul 2012 19:39:08 +0000 (12:39 -0700)]
librdmacm: Include src/common.h in distribution

Add missing header file to distribution to allow rpmbuild to
work.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrdmacm: Validate source address protocol family in rdma_resolve_addr
Yann Droneaud [Wed, 11 Jul 2012 18:54:39 +0000 (11:54 -0700)]
librdmacm: Validate source address protocol family in rdma_resolve_addr

If a source address is provided but its protocol family is not recognized,
returns an error.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Build librspreload library as part of build
Sean Hefty [Mon, 9 Jul 2012 21:58:14 +0000 (14:58 -0700)]
rsocket: Build librspreload library as part of build

Build the rsocket preload library as part of the build.  To reduce the
risk of the preload library intercepting calls without the user's
knowledge, the preload library is installed into {_libdir}/rsocket.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Support IPV6_V6ONLY socket option
Sean Hefty [Tue, 12 Jun 2012 19:02:04 +0000 (12:02 -0700)]
rsocket: Support IPV6_V6ONLY socket option

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Handle other shutdown option
Sean Hefty [Mon, 25 Jun 2012 21:19:54 +0000 (14:19 -0700)]
rsocket: Handle other shutdown option

Handle SHUT_RD and SHUT_WR shutdown options.

In order to handle shutting down the send and receive sides
separately, we break the connection state into multiple sub-states.
This allows us to be partially connected (i.e. for either just
reads or just writes).

Support for SHUT_WR is needed to handle netperf properly, which
shuts down a socket by having the client use SHUT_WR, followed by
the server completing the disconnect with SHUT_RDWR.  The following
patch eliminates an error message from netperf:

'shutdown_control: no response received  errno 95'

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Set readfds event if rsocket has been disconnected
Sean Hefty [Mon, 25 Jun 2012 22:04:52 +0000 (15:04 -0700)]
rsocket: Set readfds event if rsocket has been disconnected

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Add rsocket man page
Sean Hefty [Mon, 11 Jun 2012 20:20:18 +0000 (13:20 -0700)]
rsocket: Add rsocket man page

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Use configuration files to specify default settings
Sean Hefty [Tue, 5 Jun 2012 22:28:18 +0000 (15:28 -0700)]
rsocket: Use configuration files to specify default settings

Give an administrator control over the default settings
used by rsockets.  Use files under %sysconfig%/rdma/rsocket as shown:

mem_default - default size of receive buffer(s)
wmem_default - default size of send buffer(s)
sqsize_default - default size of send queue
rqsize_default - default size of receive queue
inline_default - default size of inline data

If configuration files are not available, rsockets will continue to
use internal defaults.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Spin before blocking on an rsocket
Sean Hefty [Mon, 4 Jun 2012 21:51:41 +0000 (14:51 -0700)]
rsocket: Spin before blocking on an rsocket

The latency cost of blocking is significant compared to round
trip ping-pong time.  Spin briefly on rsockets before calling
into the kernel and blocking.

The time to spin before blocking is read from an rsocket
configuration file %sysconfig%/rdma/rsocket/polling_time.  This
is user adjustable.

As a completely unintentional side effect, this just happens to
improve application performance in benchmarks, like netpipe,
significantly. ;)

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Handle TCP_MAXSEG socket option
Sean Hefty [Mon, 4 Jun 2012 20:22:10 +0000 (13:22 -0700)]
rsocket: Handle TCP_MAXSEG socket option

netperf uses the TCP_MAXSEG socket option.  Add support for it.
Problem reported by Sridhar Samudrala <sri@us.ibm.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>