]> git.openfabrics.org - ~shefty/librdmacm.git/log
~shefty/librdmacm.git
11 years agorsocket: Add support for iWarp
Sean Hefty [Thu, 11 Apr 2013 17:05:29 +0000 (10:05 -0700)]
rsocket: Add support for iWarp

iWarp does not support RDMA writes with immediate data.
Instead of sending messages using immediate data, allow
the rsocket protocol to exchange messages using sends.

The rsocket protocol remains the same.  RDMA writes are
used for data transfers, with send messages used to transfer
rsocket protocol messages.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsocket: Merge usage of wr_id between stream and datagram svcs
Sean Hefty [Fri, 12 Apr 2013 21:41:52 +0000 (14:41 -0700)]
rsocket: Merge usage of wr_id between stream and datagram svcs

The rsocket data streaming and datagram services use different
formats for the wr_id.  Although some differences are needed,
we can make them more similar.  This will be useful when the
wr_id is used for iwarp support, plus eliminates use of wr_id
bits that aren't actually needed.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Release 1.0.17 v1.0.17
Sean Hefty [Wed, 6 Mar 2013 01:18:11 +0000 (17:18 -0800)]
librdmacm: Release 1.0.17

11 years agolibrdmacm/rsocket: Fix resetting O_NONBLOCK after calling shutdown
Sean Hefty [Wed, 20 Feb 2013 04:03:58 +0000 (20:03 -0800)]
librdmacm/rsocket: Fix resetting O_NONBLOCK after calling shutdown

Shutdown switches an rsocket from nonblocking to blocking to
ensure that all data has been sent.  After completing all
transfers, it should switch back to nonblocking; this handles
partial shutdown situations, where only half the connection
is shut down.  However, the code uses the value of '1' to
set the nonblocking flag, rather than O_NONBLOCK.  Fix this.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm/rstream: Reduce default transfer count
Sean Hefty [Tue, 5 Feb 2013 00:52:18 +0000 (16:52 -0800)]
librdmacm/rstream: Reduce default transfer count

1 million ping-pong transfers takes over 3 seconds to
complete, and I'm impatient.  Reduce the default number of
transfers for small messsages to speed up running
performance tests, especially when running over slower
connections, like TCP sockets or over a WAN.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Work-around kernel bug returning uid = 0
Sean Hefty [Sat, 2 Feb 2013 01:17:34 +0000 (17:17 -0800)]
librdmacm: Work-around kernel bug returning uid = 0

Older kernels have a bug where it can report an event with the
uid set to 0.  The librdmacm crashes when casting the uid to
an rdma_cm_id and dereferencing the NULL pointer.

There are a limited number of events where this can occur and
in most cases it's safe to simply discard the event.  (This is
what the kernel does anyway.)  However, it's possible for us
to process an RDMA_CM_EVENT_ESTABLISHED event with the uid
set to 0.  (See kernel commit 418edaaba96e58112b15c82b4907084e2a9caf42.)

Although it's rare for this to occur, it does in fact happen
in practice.  To work-around the kernel bug, when the uid of an
established event is set to 0, we first try to locate the correct
user space id based on related data before discarding the event.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Define ucma_ib_init when IB_ACM is disabled
Sean Hefty [Mon, 28 Jan 2013 22:56:25 +0000 (14:56 -0800)]
librdmacm: Define ucma_ib_init when IB_ACM is disabled

ucma_ib_init is only defined if IB_ACM is enabled, which is
determined by looking for the infiniband/acm.h header file.
Define ucma_ib_init when IB_ACM is disabled.

Problem reportedy by Suresh Shelvapille <suri@baymicrosystems.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Update rsocket man page
Sean Hefty [Mon, 21 Jan 2013 23:28:39 +0000 (15:28 -0800)]
rsockets: Update rsocket man page

Update man page to include recently added rsocket options
and undocumented configuration file.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Add support for existing UDP apps
Sean Hefty [Wed, 9 Jan 2013 22:54:47 +0000 (14:54 -0800)]
rsockets: Add support for existing UDP apps

Support for existing UDP applications is done via the rspreload
library.  However, when the preload library is loaded, socket
calls used by rsockets get intercepted and converted into
rsocket calls.

The preload library was able to handle this for TCP rsockets
by using a per thread variable and checking for recursive calls
coming from rsockets back into the preload library.  The preload
library would direct such calls to the real socket calls.

The problem is more complex for UDP rsockets, which can invoke
socket calls from an internal rsocket thread.  The result is that
the preload library intercepts socket calls that originate from
the rsocket library which are not recursive.

Although, this is really a problem with the preload library,
the simplest solution is for rsockets to fully initialize the
library when allocating the first rsocket, versus deferring
initialization until required.  The preload library can then
detect the recursive calls.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoexamples/udpong: Add test program for rsocket datagrams
Sean Hefty [Wed, 5 Dec 2012 23:58:03 +0000 (15:58 -0800)]
examples/udpong: Add test program for rsocket datagrams

Add a sample test program to test datagram rsockets.  Move
common routines used by udpong and other test programs into
a common source file.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsocket: Add datagram support
Sean Hefty [Fri, 9 Nov 2012 18:26:38 +0000 (10:26 -0800)]
rsocket: Add datagram support

Add datagram support through the rsocket API.

Datagram support is handled through an entirely different protocol and
internal implementation than streaming sockets.  Unlike connected rsockets,
datagram rsockets are not necessarily bound to a network (IP) address.
A datagram socket may use any number of network (IP) addresses, including
those which map to different RDMA devices.  As a result, a single datagram
rsocket must support using multiple RDMA devices and ports, and a datagram
rsocket references a single UDP socket, plus zero or more UD QPs.

Rsockets uses headers inserted before user data sent over UDP sockets to
resolve remote UD QP numbers.  When a user first attempts to send a datagram
to a remote address (IP and UDP port), rsockets will take the following steps:

1. Store the destination address into a lookup table.
2. Resolve which local network address should be used when sending
   to the specified destination.
3. Allocate a UD QP on the RDMA device associated with the local address.
4. Send the user's datagram to the remote UDP socket.

A header is inserted before the user's datagram.  The header specifies the
UD QP number associated with the local network address (IP and UDP port) of
the send.

A service thread is used to process messages received on the UDP socket.  This
thread updates the rsocket lookup tables with the remote QPN and path record
data.  The service thread forwards data received on the UDP socket to an
rsocket QP.  After the remote QPN and path records have been resolved, datagram
communication between two nodes are done over the UD QP.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years ago[librdmacm] Fixed build problem due to missing macro
Or Gerlitz [Sun, 2 Dec 2012 12:04:23 +0000 (12:04 +0000)]
[librdmacm] Fixed build problem due to missing macro

rsocket.c wasn't passing compilation as of missing definition for the
container_of macro, fix it. Reported-by: Eyal Salamon <esalomon@mellanox.com>

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsocket: Remove fscanf build warnings
Sean Hefty [Mon, 5 Nov 2012 19:53:03 +0000 (11:53 -0800)]
rsocket: Remove fscanf build warnings

Cast fscanf return values to (void) to indicate that we don't
care if the call fails.  In the case of a failure, we simply
fall back to using default values.

Problem reported by Or Gerlitz <ogerlitz@mellanox.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoriostream: Add example program for using iomap routines.
Sean Hefty [Wed, 24 Oct 2012 17:23:52 +0000 (10:23 -0700)]
riostream: Add example program for using iomap routines.

riostream is based on rstream, but uses the new riomap, riounmap,
and riowrite calls instead.  It runs a series of latency and
bandwidth tests using remote iomapped memory.

riostream is limited to using zero copy transfers at the
receiving side only at this time.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsocket: Add APIs for direct data placement
Sean Hefty [Sun, 21 Oct 2012 21:16:03 +0000 (14:16 -0700)]
rsocket: Add APIs for direct data placement

We introduce rsocket extensions for supporting direct
data placement (also known as zero copy).  Direct data
placement avoids data copies into network buffers when
sending or receiving data.  This patch implements zero
copies on the receive side, but adds some basic framework for
supporting it on the sending side.

Integrating zero copy support into the existing socket APIs
is difficult to achieve when the sockets are set as
nonblocking.  Any such implementation is likely to be unusable
in practice.  The problem stems from the fact that socket
operations are synchronous in nature.  Support for asynchronous
operations is limited to connection establishment.

Therefore we introduce new calls to handle direct data placement.
The use of the new calls is optional and does not affect the
use of the existing calls.  An attempt is made to have the new
routines integrate naturally with the existing APIs.  The new
functions are: riomap, riounmap, and riowrite.  The basic operation
can be described as follows:

1. App A calls riomap to register a data buffer with the local
   RDMA device.  Riomap returns an off_t offset value that
   corresponds to the registered data buffer.  The app may
   select the offset value.
2. Rsockets will transmit an internal message to the remote
   peer with information about the registration.  This exchange
   is hidden from the applications.
3. App A sends a notification message to app B indicating that
   the remote iomapped buffer is now available to receive data.
4. App B calls riowrite to transmit data directly into the
   riomapped data buffer.
5. App B sends a notification message to app A indicating that
   data is available in the mapped buffer.
6. After all transfers are complete, app A calls riounmap to
   deregister its data buffer.

Riomap and riounmap are functionally equivalent to RDMA
memory registration and deregistration routines.  They are loosely
based on the mmap and munmap APIs.

off_t riomap(int socket, void *buf, size_t len,
     int prot, int flags, off_t offset)

Riomap registers an application buffer with the RDMA hardware
associated with an rsocket.  The buffer is registered either for
local only access (PROT_NONE) or for remote write access (PROT_WRITE).
When registered for remote access, the buffer is mapped to a given
offset.  The offset is either provided by the user, or if the user
selects -1 for the offset, rsockets selects one.  The remote peer may
access an iomapped buffer directly by specifying the correct offset.
The mapping is not guaranteed to be available until after the remote
peer receives a data transfer initiated after riomap has completed.

int riounmap(int socket, void *buf, size_t len)

Riounmap removes the mapping between a buffer and an rsocket.

size_t riowrite(int socket, const void *buf, size_t count,
off_t offset, int flags)

Riowrite allows an application to transfer data over an rsocket
directly into a remotely iomapped buffer.  The remote buffer is specified
through an offset parameter, which corresponds to a remote iomapped buffer.
From the sender's perspective, riowrite behaves similar to rwrite.  From
a receiver's view, riowrite transfers are silently redirected into a pre-
determined data buffer.  Data is received automatically, and the receiver
is not informed of the transfer.  However, iowrite data is still considered
part of the data stream, such that iowrite data will be written before a
subsequent transfer is received.  A message sent immediately after
initiating an iowrite may be used to notify the receiver of the iowrite.

It should be noted that the current implementation primarily focused
on being functional for evaluation purposes.  Some checks have been
deferred for subsequent patches, and performance is currently limited
by linear lookups.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agordma_xserver/client: Fix man page formatting
Roland Dreier [Tue, 16 Oct 2012 19:44:39 +0000 (19:44 +0000)]
rdma_xserver/client: Fix man page formatting

Putting 'r' at the beginning of a line in the nroff source for man pages
is confusing to nroff because lines that start with a single quote
character ' or a dot character . are treated as control lines, which is
not what's intended here.  Some of the man page text ends up left out of
the formatted output.

Fix this by just wrapping the text slightly differently in the source
(which doesn't matter since nroff reflows the text anyway).  Also add a
missing ".TP" so that the -p and -c options are not run together in the
formatted output.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Disable ACM support if ibacm.port is not found
Sean Hefty [Mon, 8 Oct 2012 17:33:21 +0000 (10:33 -0700)]
librdmacm: Disable ACM support if ibacm.port is not found

The librdmacm will try to connect port 6125 if ibacm.port is
not found.  The problem is that some other service or application
could be using that port and respond with garbage.  Rather
than falling back to a hard coded port number, if ibacm.port
is not found, simply disable ACM support.

This has the effect of removing support for older versions
of ibacm, unless the port file is created manually.

Patch created based on feedback from Doug Ledford and Florian
Weimer from RedHat.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years ago[5/5,librdmacm] rping: added checks to the return values functions
Dotan Barak [Tue, 9 Oct 2012 12:27:52 +0000 (12:27 +0000)]
[5/5,librdmacm] rping: added checks to the return values functions

This will make rping to exit with return value other than zero in case of an
error.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years ago[4/5,librdmacm] rstream: added missing return is accept() failed
Dotan Barak [Tue, 9 Oct 2012 12:27:51 +0000 (12:27 +0000)]
[4/5,librdmacm] rstream: added missing return is accept() failed

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years ago[3/5,librdmacm] rstream: initialize return value in server_connect()
Dotan Barak [Tue, 9 Oct 2012 12:27:50 +0000 (12:27 +0000)]
[3/5,librdmacm] rstream: initialize return value in server_connect()

If use_async == 0 and rs_accept() passes (i.e. non negative value), then
the return value from the function was uninitialized.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years ago[2/5,librdmacm] rsocket: added missing break
Dotan Barak [Tue, 9 Oct 2012 12:27:49 +0000 (12:27 +0000)]
[2/5,librdmacm] rsocket: added missing break

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years ago[1/5,librdmacm] rsocket: add missing va_end() after calling va_end()
Dotan Barak [Tue, 9 Oct 2012 12:27:48 +0000 (12:27 +0000)]
[1/5,librdmacm] rsocket: add missing va_end() after calling va_end()

Not doing so, may lead to resource leak.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoucmatose: Remove connect parameter passed into rdma_accept
Sean Hefty [Thu, 4 Oct 2012 19:01:50 +0000 (12:01 -0700)]
ucmatose: Remove connect parameter passed into rdma_accept

Pass in NULL for conn_param into rdma_accept to indicate
that the passive side will use the values specified by the
active side.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoucmatose: Fix number of connections to disconnect
Sean Hefty [Thu, 4 Oct 2012 18:49:59 +0000 (11:49 -0700)]
ucmatose: Fix number of connections to disconnect

When ucmatose aborts because of issues trying to connect
to the server, it moves to disconnecting all connections.
However, not all connections may have been established.
The result is that ucmatose will hang in disconnect_events.
Fix this by setting the number of times that we need to
disconnect to the number of times that we successfully
connect.

This problem is based on a report by Doug Ledford
<dledford@redhat.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorping: Reduce retry_count to fit in 3-bits
Sean Hefty [Wed, 3 Oct 2012 22:05:20 +0000 (15:05 -0700)]
rping: Reduce retry_count to fit in 3-bits

retry_count is a 3 bit value on IB, reduce it from
10 to 7.

A value of 10 prevents rping from working over the Intel
IB HCA.  Problem reported by Doug Ledford <dledford@redhat.com>

The retry_count is also not set when calling rdma_accept.
Rather than passing different values into rdma_accept than
what was specified by the remote side, use the values given
in the connection request.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Place container_of inside #ifdef
Sean Hefty [Sat, 22 Sep 2012 00:16:09 +0000 (17:16 -0700)]
librdmacm: Place container_of inside #ifdef

verbs.h defines container_of.  Only define it if not defined

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agoaddrinfo: Remove debug printf calls
Sean Hefty [Wed, 3 Oct 2012 22:18:29 +0000 (15:18 -0700)]
addrinfo: Remove debug printf calls

These never should have made it into the commit.  :P

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Document rsocket protocol and design
Sean Hefty [Mon, 10 Sep 2012 21:32:45 +0000 (14:32 -0700)]
rsockets: Document rsocket protocol and design

Include a brief overview of the rsocket protocol and underlying design
with the source code to make it easier for someone trying to decipher
the actual code.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Support using GIDs with rdma_getaddrinfo
Sean Hefty [Tue, 28 Aug 2012 19:33:04 +0000 (12:33 -0700)]
librdmacm: Support using GIDs with rdma_getaddrinfo

Allow the user to specify a GID as the node parameter into
rdma_getaddrinfo.

To distinguish between the node being an IPv6 address or a GID,
we add a new flag, RAI_FAMILY, which can be set as part of the
hints to rdma_getaddrinfo.  When set, this flag indicates that the
value of ai_family in the hints should be used when interpretting
the node parameter.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Fix state checks in dup2
Sean Hefty [Fri, 7 Sep 2012 17:20:53 +0000 (10:20 -0700)]
rspreload: Fix state checks in dup2

The patch to add dup2 support was never updated to handle the fd
state.  The check for the fd type == fd_fork is no longer valid.
We need to instead check the fd state before handling forking.

Problem pointed out by Alex Couvrard <acouvrard@gmail.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Rename ucma_copy_rai_addr to ucma_set_ep_data
Sean Hefty [Wed, 29 Aug 2012 00:37:30 +0000 (17:37 -0700)]
librdmacm: Rename ucma_copy_rai_addr to ucma_set_ep_data

Simple function rename to better indicate operation.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Enable AF_IB support
Sean Hefty [Fri, 17 Aug 2012 21:02:45 +0000 (14:02 -0700)]
librdmacm: Enable AF_IB support

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Set address family for source address returned by ACM
Sean Hefty [Thu, 23 Aug 2012 22:48:06 +0000 (15:48 -0700)]
librdmacm: Set address family for source address returned by ACM

Set the sa_family type when saving the source address returnd
by ACM.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Report error level in error messages
Yann Droneaud [Mon, 27 Aug 2012 23:37:29 +0000 (16:37 -0700)]
librdmacm: Report error level in error messages

Report error messages as either 'Warning' or 'Fatal'.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Use common prefix for error messages
Yann Droneaud [Mon, 27 Aug 2012 23:35:32 +0000 (16:35 -0700)]
librdmacm: Use common prefix for error messages

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Report error messages on stderr
Yann Droneaud [Mon, 27 Aug 2012 23:33:50 +0000 (16:33 -0700)]
librdmacm: Report error messages on stderr

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Avoid rsocket calls until after fork
Sean Hefty [Thu, 23 Aug 2012 18:20:08 +0000 (11:20 -0700)]
rspreload: Avoid rsocket calls until after fork

When an rsocket call is made before an application calls fork(),
the forked applications can hang.  This can be seen by running
netserver and two netperf clients simultaneously.  The second
netperf client will eventually stop performing data transfers.

LD_PRELOAD=librspreload.so netserver -D

LD_PRELOAD=librspreload.so netperf -v2 -c -C -H 192.168.0.101 -l30
LD_PRELOAD=librspreload.so netperf -v2 -c -C -H 192.168.0.101 -l30

It's not clear what the specific problem is.  The best guess is
that libibverbs or the provider library (e.g. libmlx4) perform
some initialization, such as mmap'ing device memory, which does not
work when fork is called.

As a work-around, avoid calling rsocket routines until immediately
before they are needed.  This allows the process to fork before
the libraries are initialized.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Fix checks in fork_active/passive
Sean Hefty [Mon, 20 Aug 2012 16:06:49 +0000 (09:06 -0700)]
rspreload: Fix checks in fork_active/passive

Fix passing in wrong variable to rconnect(), check state instead
of type, and move call to getpeername until after we are sure than
the normal socket connection has completed.

Problems pointed out by Sridhar Samudrala <sri@us.ibm.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrdmacm: Re-enable ibacm support
Sean Hefty [Fri, 17 Aug 2012 23:41:04 +0000 (16:41 -0700)]
librdmacm: Re-enable ibacm support

Commit 272c3cc024d0e5854cbafa6c2f1e8560398a68d7, "Delay ACM
connection until resolving an address", removed the call to
ucma_ib_init without adding it back in the correct location.
As a result, the librdmacm no longer uses ibacm.  Fix this
by adding the initialization call when resolving an address.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorstream: Use MSG_WAITALL for blocking test
Sean Hefty [Thu, 16 Aug 2012 22:41:35 +0000 (15:41 -0700)]
rstream: Use MSG_WAITALL for blocking test

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Add support for MSG_WAITALL rrecv() flag
Sean Hefty [Thu, 28 Jun 2012 18:34:38 +0000 (11:34 -0700)]
rsockets: Add support for MSG_WAITALL rrecv() flag

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Add fstat support
Sean Hefty [Tue, 7 Aug 2012 16:37:24 +0000 (09:37 -0700)]
rspreload: Add fstat support

vsftpd calls fstat on a socket.  Fake it out.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Support sendfile
Sean Hefty [Tue, 14 Aug 2012 00:00:42 +0000 (17:00 -0700)]
rspreload: Support sendfile

Handle users calling sendfile with an rsocket.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Do not block connect when supporting fork
Sean Hefty [Sat, 11 Aug 2012 04:44:39 +0000 (21:44 -0700)]
rspreload: Do not block connect when supporting fork

Many FTP servers require fork support.  However, FTP clients,
such as ncftp, will perform the following call sequence:

send PASV request to server over connection 1
         server will listen for connection 2
issue nonblocking connect to server
send ACCEPT request to server over connection 1
         server will accept connection 2

The current fork support converts all nonblocking connect
calls to blocking.  The result is that the FTP client ends up
blocked waiting for the server to accept the connection,
which it will never do.

To handle this case, we have the active side follow the same
rule as the server side and defer establishing the rsocket
connection until the user calls the first data transfer routine.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Minor cleanup of fork_passive handling
Sean Hefty [Mon, 13 Aug 2012 23:00:16 +0000 (16:00 -0700)]
rspreload: Minor cleanup of fork_passive handling

Minor code cleanup in passive side handling of fork support.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Support SO_OOBINLINE
Sean Hefty [Wed, 8 Aug 2012 04:31:12 +0000 (21:31 -0700)]
rsockets: Support SO_OOBINLINE

We don't support urgent data, so just return success.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Support dup2 calls
Sean Hefty [Mon, 30 Jul 2012 23:06:32 +0000 (16:06 -0700)]
rspreload: Support dup2 calls

vsftpd requires dup2() support.  To handle dup2, we need to add
reference count tracking to the preload fd's.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Call real.close in fd_close
Sean Hefty [Wed, 1 Aug 2012 23:26:11 +0000 (16:26 -0700)]
rspreload: Call real.close in fd_close

The index into the preload lookup table is obtained by opening
/dev/null and use the returned value.  When closing the file,
use the real close call and not the preload close call.  This
is a minor optimization, but clarifies the expected operation.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsocket: Improve disconnect time under normal conditions
Sean Hefty [Fri, 27 Jul 2012 17:46:42 +0000 (10:46 -0700)]
rsocket: Improve disconnect time under normal conditions

When both sides of a connection attempt to close at the same
time, one of the two sides can easily get an error when sending
a disconnect message.  This results in that side hanging
during close until the send times out.  (The time out is caused
by the remote side destroying its QP.)

We can reduce the chance of this occurring by immediately
assuming that the disconnect has been successful once we've
received the remote side's disconnect message, or we've
polled a send completion for the local disconnect message.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Use wr_id to determine completion type
Sean Hefty [Thu, 26 Jul 2012 22:35:32 +0000 (15:35 -0700)]
rsockets: Use wr_id to determine completion type

If a work request has completed in error, the completion type
field is undefined.  Use the wr_id to determine if the failed
completion was a send or receive.

This fixes an issue where MPI can hang during finalize.  With
both sides of a connection shutting down simultaneously, one
side may complete quicker and delete its QP before the other
side receives an acknowledgement to their disconnect message.
Eventually, the disconnect message will time out, but because
the completion type field is undefined, it may be processed
as a failed receive, rather than a failed send.  The end
result is that the second side hangs waiting for the send to
complete.

This problem showed up more easily after commit
2e5b0fc95964f74ea59dd725e849027faa0cd526, but existed beforehand.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorsockets: Enable support for privileged ports
Sean Hefty [Wed, 25 Jul 2012 18:11:56 +0000 (11:11 -0700)]
rsockets: Enable support for privileged ports

Allow the preload library to use rsockets with priviledged
ports.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorspreload: Call init from getsockname()
Sean Hefty [Tue, 24 Jul 2012 21:13:55 +0000 (14:13 -0700)]
rspreload: Call init from getsockname()

netperf for some unknown reason calls getsockname() using a
hard coded value of 0, without first allocating a socket.
This causes the rsocket preload library to crash, since the
library has not been properly initialized.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agorstream: Add option to test fork support
Sean Hefty [Tue, 17 Jul 2012 22:32:54 +0000 (15:32 -0700)]
rstream: Add option to test fork support

If the user specifies '-T f', rstream will process
connections in a child process.  The server continues
to run until all child processes have completed their
tests.

Fork support requires use of the librspreload library.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
11 years agolibrspreload: Support server apps that call fork()
Sean Hefty [Tue, 24 Jul 2012 18:40:10 +0000 (11:40 -0700)]
librspreload: Support server apps that call fork()

Provide limited support for applications that call fork().  To
handle fork(), we establish connections using normal sockets.
The socket is later converted to an rsocket when the user
makes the first call to a data transfer function (e.g. send,
recv, read, write, etc.).

Fork support is indicated by setting the environment variable
RDMAV_FORK_SAFE = 1.  When set, the preload library will delay
converting to an rsocket until the user attempts to send or receive
data on the socket.  To convert from a normal socket to an
rsocket, the preload library must inject a message on the
normal socket to synchronize between the client and server.  As
a result, if the rsocket connection fails, the ability to
silently fallback to the normal socket may be compromised.  Fork
support is disabled by default.

The current implementation works for simple test apps under
ideal conditions.  Although it supports nonblocking sockets, it
uses blocking rsockets when migrating connections.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrspreload: Make socket_fallback() call more generic
Sean Hefty [Mon, 16 Jul 2012 21:17:58 +0000 (14:17 -0700)]
librspreload: Make socket_fallback() call more generic

socket_fallback is used to switch from an rsocket to a normal
socket in the case of failures.  Rename the call and make it
more generic, so that it can switch between an rsocket and
a normal socket in either direction.  This will be used to
support fork().

As part of this change, we move the list of hooked and rsocket
calls into structures, versus maintaining a large number of
static variables.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrdmacm: Only allocate verbs resources when needed
Sean Hefty [Thu, 19 Jul 2012 17:09:48 +0000 (10:09 -0700)]
librdmacm: Only allocate verbs resources when needed

The librdmacm allocates a PD per device on initialization.  Although
we need to maintain the device list while the library is loaded
(see rdma_get_devices), we can reduce the overhead by only allocating
verbs resources when they are needed.

This allows the rsocket preload library to support fork for
applications that spawn connections off to child processes.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrdmacm: Remove unused 'ib' variable from ucma_init
Sean Hefty [Thu, 19 Jul 2012 17:13:50 +0000 (10:13 -0700)]
librdmacm: Remove unused 'ib' variable from ucma_init

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agov1.0.16 v1.0.16
Sean Hefty [Wed, 11 Jul 2012 00:55:32 +0000 (17:55 -0700)]
v1.0.16

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrspreload: Fix typecast to eliminate compile warnings
Hal Rosenstock [Thu, 12 Jul 2012 16:24:10 +0000 (09:24 -0700)]
librspreload: Fix typecast to eliminate compile warnings

src/preload.c: In function ?bind?:
src/preload.c:350: warning: assignment from incompatible pointer type
src/preload.c: In function ?connect?:
src/preload.c:397: warning: assignment from incompatible pointer type

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorspreload: Document use of librspreload library
Sean Hefty [Wed, 11 Jul 2012 22:13:29 +0000 (15:13 -0700)]
rspreload: Document use of librspreload library

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrdmacm: Include src/common.h in distribution
sean.hefty@intel.com [Wed, 11 Jul 2012 19:39:08 +0000 (12:39 -0700)]
librdmacm: Include src/common.h in distribution

Add missing header file to distribution to allow rpmbuild to
work.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrdmacm: Validate source address protocol family in rdma_resolve_addr
Yann Droneaud [Wed, 11 Jul 2012 18:54:39 +0000 (11:54 -0700)]
librdmacm: Validate source address protocol family in rdma_resolve_addr

If a source address is provided but its protocol family is not recognized,
returns an error.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Build librspreload library as part of build
Sean Hefty [Mon, 9 Jul 2012 21:58:14 +0000 (14:58 -0700)]
rsocket: Build librspreload library as part of build

Build the rsocket preload library as part of the build.  To reduce the
risk of the preload library intercepting calls without the user's
knowledge, the preload library is installed into {_libdir}/rsocket.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Support IPV6_V6ONLY socket option
Sean Hefty [Tue, 12 Jun 2012 19:02:04 +0000 (12:02 -0700)]
rsocket: Support IPV6_V6ONLY socket option

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Handle other shutdown option
Sean Hefty [Mon, 25 Jun 2012 21:19:54 +0000 (14:19 -0700)]
rsocket: Handle other shutdown option

Handle SHUT_RD and SHUT_WR shutdown options.

In order to handle shutting down the send and receive sides
separately, we break the connection state into multiple sub-states.
This allows us to be partially connected (i.e. for either just
reads or just writes).

Support for SHUT_WR is needed to handle netperf properly, which
shuts down a socket by having the client use SHUT_WR, followed by
the server completing the disconnect with SHUT_RDWR.  The following
patch eliminates an error message from netperf:

'shutdown_control: no response received  errno 95'

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Set readfds event if rsocket has been disconnected
Sean Hefty [Mon, 25 Jun 2012 22:04:52 +0000 (15:04 -0700)]
rsocket: Set readfds event if rsocket has been disconnected

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Add rsocket man page
Sean Hefty [Mon, 11 Jun 2012 20:20:18 +0000 (13:20 -0700)]
rsocket: Add rsocket man page

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Use configuration files to specify default settings
Sean Hefty [Tue, 5 Jun 2012 22:28:18 +0000 (15:28 -0700)]
rsocket: Use configuration files to specify default settings

Give an administrator control over the default settings
used by rsockets.  Use files under %sysconfig%/rdma/rsocket as shown:

mem_default - default size of receive buffer(s)
wmem_default - default size of send buffer(s)
sqsize_default - default size of send queue
rqsize_default - default size of receive queue
inline_default - default size of inline data

If configuration files are not available, rsockets will continue to
use internal defaults.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Spin before blocking on an rsocket
Sean Hefty [Mon, 4 Jun 2012 21:51:41 +0000 (14:51 -0700)]
rsocket: Spin before blocking on an rsocket

The latency cost of blocking is significant compared to round
trip ping-pong time.  Spin briefly on rsockets before calling
into the kernel and blocking.

The time to spin before blocking is read from an rsocket
configuration file %sysconfig%/rdma/rsocket/polling_time.  This
is user adjustable.

As a completely unintentional side effect, this just happens to
improve application performance in benchmarks, like netpipe,
significantly. ;)

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Handle TCP_MAXSEG socket option
Sean Hefty [Mon, 4 Jun 2012 20:22:10 +0000 (13:22 -0700)]
rsocket: Handle TCP_MAXSEG socket option

netperf uses the TCP_MAXSEG socket option.  Add support for it.
Problem reported by Sridhar Samudrala <sri@us.ibm.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agoman: List RDMA_PS_IB as a supported port space in rdma_getaddrinfo man page
Hal Rosenstock [Fri, 1 Jun 2012 17:52:03 +0000 (10:52 -0700)]
man: List RDMA_PS_IB as a supported port space in rdma_getaddrinfo man page

Signed-off-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorstream: Use separate connections for latency/bw tests
Sean Hefty [Sun, 27 May 2012 21:07:42 +0000 (14:07 -0700)]
rstream: Use separate connections for latency/bw tests

Optimize each connection for either latency or bandwidth
results.  This improves small message latency under 384
bytes by .5 - 1 us, while increasing bandwidth by
1 - 1.5 Gbps.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorstream: Use snprintf in place of sprintf
Sean Hefty [Tue, 29 May 2012 21:15:57 +0000 (14:15 -0700)]
rstream: Use snprintf in place of sprintf

Avoid possible buffer overrun.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorstream: Add option to specify size of send/recv buffers
Sean Hefty [Wed, 23 May 2012 18:32:46 +0000 (11:32 -0700)]
rstream: Add option to specify size of send/recv buffers

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsockets: Change the default QP size from 512 to 384
Sean Hefty [Thu, 24 May 2012 21:31:12 +0000 (14:31 -0700)]
rsockets: Change the default QP size from 512 to 384

Simple latency/bandwidth tests using rstream showed minimal
difference in performance between using a QP sized to 384
entries versus 512.  Reduce the overhead of a default rsocket
by using 384 entries.  A user can request a larger size by
calling rsetsockopt.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsockets: Simplify state checks
Sean Hefty [Sat, 26 May 2012 00:28:44 +0000 (17:28 -0700)]
rsockets: Simplify state checks

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket preload: Use environment variable to set QP size
Sean Hefty [Tue, 22 May 2012 01:46:36 +0000 (18:46 -0700)]
rsocket preload: Use environment variable to set QP size

Allow the user to specify the size of the send/receive queues
and inline data size through environment variables:
RS_SQ_SIZE, RS_RQ_SIZE, and RS_INLINE.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Add option to specify size of inline data
Sean Hefty [Tue, 22 May 2012 18:39:21 +0000 (11:39 -0700)]
rsocket: Add option to specify size of inline data

Allow the user to override the default inline data size.
We still require a minimum size in order to transfer receive
buffer update message.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsockets: Allow user to specify the QP sizes
Sean Hefty [Fri, 18 May 2012 23:56:15 +0000 (16:56 -0700)]
rsockets: Allow user to specify the QP sizes

Add setsockopt options that allow the user to specify the desired
size of the underlying QP.  The provided sizes are used as the
maximum size when creating the QP.  The actual sizes of the QP
are the smaller of the user provided maximum and the maximum
sizes supported by the underlying hardware.

A user may retrieve the actual sizes of the QP through the
getsockopt call.

The send and receive queue sizes are specified separately.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsockets: Define options specific to rsockets
Sean Hefty [Fri, 18 May 2012 23:36:07 +0000 (16:36 -0700)]
rsockets: Define options specific to rsockets

Allow a user to control some of the RDMA related attributes
of an rsocket through setsockopt/getsockopt.  A user specifies
that the rsocket should be modified through SOL_RDMA level.

This patch provides the initial framework.  Subsequent patches
will add the configurable parameters.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsockets: Reduce QP size if larger than hardware maximums
Sean Hefty [Sat, 19 May 2012 00:07:11 +0000 (17:07 -0700)]
rsockets: Reduce QP size if larger than hardware maximums

When porting rsockets to iwarp, it was discovered that the default
QP size (512) was larger than that supported by the hardware.
Decrease the size of the QP if the default size is larger than
the maximum supported by the hardware.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agors-preload: Handle recursive socket() calls
Sean Hefty [Fri, 25 May 2012 19:42:12 +0000 (12:42 -0700)]
rs-preload: Handle recursive socket() calls

When ACM support is enabled in the librdmacm, it will attempt to
establish a socket connection to the ACM daemon.  When the rsocket
preload library is in use, this can result in a recursive call
to socket() that results in the library hanging.  The resulting
call stack is:

socket() -> rsocket() -> rdma_create_id() -> ucma_init() ->
socket() -> rsocket() -> rdma_create_id() -> ucma_init()

The second call to ucma_init() hangs because initialization is
still pending.

Fix this by checking for recursive calls to socket() in the preload
library.  When detected, call the real socket() call instead of
directing the call back into rsockets().  Since rsockets is a part
of the librdmacm, it can call rsockets directly if it wants to use
rsockets instead of standard sockets.

This problem and the cause was reported by Chet Murthy <chet@watson.ibm.com>

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrdmacm: Delay ACM connection until resolving an address
Sean Hefty [Fri, 25 May 2012 17:48:47 +0000 (10:48 -0700)]
librdmacm: Delay ACM connection until resolving an address

Avoid creating a connection to the ACM service when
it's not needed.  For example, if the user of the librdmacm
is a server application, it will not use ACM services.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agoacm: Use -1 to indicate an invalid socket rather than 0
Sean Hefty [Fri, 25 May 2012 19:23:10 +0000 (12:23 -0700)]
acm: Use -1 to indicate an invalid socket rather than 0

socket() can return 0 as a valid socket.  This can happen
when using a daemon that closes stdin/out/err.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Fix hang in rrecv/rsend after disconnecting
Sean Hefty [Sat, 26 May 2012 00:24:08 +0000 (17:24 -0700)]
rsocket: Fix hang in rrecv/rsend after disconnecting

If a user calls rrecv() after a blocking rsocket has been disconnected,
it will hang.  This problem and the cause was reported by Sirdhar Samudrala
<samudrala@us.ibm.com>.  It can be reproduced by running netserver -f -D
using the rs-preload library.  A similar issue exists with rsend().

Fix this by not blocking on a CQ unless we're connected.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorstream: Check for connection error on async connect
Sean Hefty [Mon, 21 May 2012 23:39:46 +0000 (16:39 -0700)]
rstream: Check for connection error on async connect

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrdmacm: Check that send and recv CQs are different before destroying
Sean Hefty [Mon, 21 May 2012 05:41:38 +0000 (22:41 -0700)]
librdmacm: Check that send and recv CQs are different before destroying

ucma_destroy_cqs() destroys both the send and recv CQs if they
are non-null.  If the two CQs are actually the same one, this
results in a crash when trying to destroy the second CQ.  Check
that the CQs are different before destroying the second CQ.

This fixes a crash when using rsockets, which sets the send and
recv CQs to the same CQ.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrdmacm: Support older acm.h header files
Sean Hefty [Fri, 18 May 2012 17:00:58 +0000 (10:00 -0700)]
librdmacm: Support older acm.h header files

Older versions of acm.h do not include the resolve_data or
perf_data fields in struct acm_msg.  If we're using an older
version of the acm.h header file, use an internal definition
of struct acm_msg.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorstream: Add test option to include more sizes
Sean Hefty [Thu, 17 May 2012 16:50:15 +0000 (09:50 -0700)]
rstream: Add test option to include more sizes

Allow user to specify that a full set of transfer sizes should
be tested.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorstream: Group latency/bandwidth tests together
Sean Hefty [Thu, 17 May 2012 16:26:13 +0000 (09:26 -0700)]
rstream: Group latency/bandwidth tests together

Rather than grouping tests by transfer size, group by the test type.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorstream: Set rsocket nonblocking if set to async operation
Sean Hefty [Wed, 16 May 2012 22:23:41 +0000 (15:23 -0700)]
rstream: Set rsocket nonblocking if set to async operation

If asynchronous use is specified (use of poll/select), set the
rsocket to nonblocking.  This matches the common usage case for
asynchronous sockets.

When asynchronous support is enabled, the nonblocking/blocking
test option determines whether the poll/select call will block,
or if rstream will spin on the calls.

This provides more flexibility with how the rsocket is used.
Specifically, MPI often uses nonblocking sockets, but spins on
poll/select.  However, many apps will use nonblocking sockets,
but wait on poll/select.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorstream: Clarify use of async test option
Sean Hefty [Fri, 11 May 2012 17:41:02 +0000 (10:41 -0700)]
rstream: Clarify use of async test option

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrdmacm/rstream: Set rsocket nonblocking for base tests
Sean Hefty [Fri, 11 May 2012 17:33:13 +0000 (10:33 -0700)]
librdmacm/rstream: Set rsocket nonblocking for base tests

The base set of rstream tests want nonblocking rsockets, but don't
actually set the rsocket to nonblocking.  It instead relies on the
MSG_DONTWAIT flag.  Make the code match the expected behavior and
set the rsocket to nonblocking and make nonblocking the default.

Provide a test option to switch it back to blocking mode.  We keep
the existing nonblocking test option for compatibility.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorstream: Always set TCP_NODELAY on rsocket
Sean Hefty [Wed, 16 May 2012 22:16:40 +0000 (15:16 -0700)]
rstream: Always set TCP_NODELAY on rsocket

The NODELAY option is coupled with whether the socket is blocking
or nonblocking.  Remove this coupling and always set the NODELAY
option.

NODELAY currently has no effect on rsockets.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agolibrdmacm/rsocket: Succeed setsockopt REUSEADDR on connected sockets
Sean Hefty [Thu, 10 May 2012 18:17:32 +0000 (11:17 -0700)]
librdmacm/rsocket: Succeed setsockopt REUSEADDR on connected sockets

The RDMA CM fail calls to set REUSEADDR on an rdma_cm_id if
it is not in the idle state.  As a result, this causes a failure
in NetPipe when run with socket calls intercepted by rsockets.
Fix this by returning success when REUSEADDR is set on an rsocket
that has already been connected.  When running over IB, REUSEADDR
is not necessary, since the TCP/IP addresses are mapped.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsockets: Optimize synchronization to improve performance
Sean Hefty [Tue, 8 May 2012 00:16:47 +0000 (17:16 -0700)]
rsockets: Optimize synchronization to improve performance

Hotspot performance analysis using VTune showed pthread_mutex_unlock()
as the most significant hotspot when transferring small messages using
rstream.  To reduce the impact of using pthread mutexes, replace it
with a custom lock built using an atomic variable and a semaphore.
When there's no contention for the lock (which is the expected case
for nonblocking sockets), the synchronization is reduced to
incrementing and decrementing an atomic variable.

A test that acquired and released a lock 2 billion times reported that
the custom lock was roughly 20% faster than using the mutex.
26.6 seconds versus 33.0 seconds.

Unfortunately, further analysis showed that using the custom lock
provided a minimal performance gain on rstream itself, and simply
moved the hotspot to the custom unlock call.  The hotspot is likely
a result of some other interaction, rather than caused by slowness
in releasing a lock.  However, we keep the custom lock based on
the results of the direct lock tests that were done.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorping: Replace sprintf with snprintf to protect from buffer overflow
Dotan Barak [Tue, 24 Apr 2012 18:20:55 +0000 (11:20 -0700)]
rping: Replace sprintf with snprintf to protect from buffer overflow

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Succeed setting SO_KEEPALIVE option rsocket
Sean Hefty [Mon, 9 Apr 2012 19:05:39 +0000 (12:05 -0700)]
rsocket: Succeed setting SO_KEEPALIVE option

memcached sets SO_KEEPALIVE, so succeed any requests to set
that option.  We don't actually implement keepalive at this time.

To implement keepalive, we would need to record the last time
that a message was sent or received on an rsocket.  If no
new messages are processed within the keepalive timeout, then
we would need to issue a keepalive message.  For rsockets,
this would simply mean sending a 0-byte control message that
gets ignored on the remote side.

The only real difficulty with handlng keepalive is doing it
without adding threads.  This requires additional work in
poll to occasionally timeout, send keepalive messages, then
return to polling if no new data has arrived.  Alternatively,
we can add a thread to handle sending keepalive messages.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Succeed SO_LINGER socket option
Sean Hefty [Fri, 6 Apr 2012 22:40:10 +0000 (15:40 -0700)]
rsocket: Succeed SO_LINGER socket option

Succeed calls to set the SO_LINGER socket option.  We don't
actually implement SO_LINGER semantics because we never place
an rsocket into a timewait state.  Unlike socket behavior,
we do wait for all pending data to be transferred by the hardware.
This is done so that the disconnect message can be sent over
the rsocket connection.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
12 years agorsocket: Handle socket option toggling on/off
Sean Hefty [Mon, 9 Apr 2012 19:16:21 +0000 (12:16 -0700)]
rsocket: Handle socket option toggling on/off

If the user turns a socket option off, record that, so that
rgetsockopt returns the correct state of the option.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>