Sean Hefty [Fri, 29 May 2015 18:24:26 +0000 (11:24 -0700)]
rsockets: Delay initializing buffers until the inline size is known
The qib HCA ignores the requested max_inline_size on input and
instead returns the supported value on output. As a result,
the default inline size requested by rsockets is ignored, with 0
being returned. The code catches this after it creates the QP, but
has already initialized its data buffers prior to creating the QP.
The result is that the inline size used in rs_init_bufs() is larger
than that supported by the qib device. This causes a failure when
attempting to update available receive buffer space. The registered
data buffer for the credit message is smaller than what is needed.
Work-around this issue by delaying the initialization of the data
buffers until after the QP has been created and the real size of
the inline data is known.
Sean Hefty [Tue, 28 Apr 2015 21:31:15 +0000 (14:31 -0700)]
Remove prints to stderr
The library should just fail operations with ENODEV, rather than
printing to stderr. Printing can result in applications failing,
or displaying incorrect error messages when no verb devices are
actually present.
Ilya Nelkenbaum [Thu, 26 Mar 2015 19:41:11 +0000 (12:41 -0700)]
cma: Workaround for rdma_ucm kernel bug
For certain new kernels, IB_QP_SMAC bit is erroneously
set in QP attribute mask. This workaround turns
off that bit. It allows SSA connections (AF_IB)
to work. Without this workaround, there are issues
on the client side.
Kernel patch which caused issue is commit dd5f03b
IB/core: Ethernet L2 attributes in verbs/cm structures
IB/core: When marshaling ucma path from user-space, clear unused fields
When marshaling a user path to the kernel struct ib_sa_path, we need
to zero smac and dmac and set the vlan id to the "no vlan" value.
This is to ensure that Ethernet attributes are not used with
InfiniBand QPs.
Fixes: dd5f03beb4f7 ("IB/core: Ethernet L2 attributes in verbs/cm structures") Signed-off-by: Ilya Nelkenbaum <ilyan@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
The fix was pushed to stable 3.14, 3.18 and 3.19 versions.
Signed-off-by: Ilya Nelkenbaum <ilyan@mellanox.com> Signed-off-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Sean Hefty [Wed, 11 Feb 2015 00:50:08 +0000 (16:50 -0800)]
rsocket: Fix race in indexer map
Although insertions and removals of rsockets are protected
against accesses to the index map, when reading the map using
a non-rsocket (i.e. normal fd), the reading of the map may
overlap with the removal of an rsocket. This can result in
accessing freed memory.
We can avoid this by not freeing the memory when rsockets
no longer reference an index array. This ensures that the
memory is valid, and protects against reading the memory without
adding locking into the read path.
Problem reported by: Sasha Kotchubievsky <sashakot@mellanox.com>
Sean Hefty [Fri, 6 Feb 2015 05:17:03 +0000 (21:17 -0800)]
rsockets: Fix setting flags in rfctl
The rfcntl() call to set rsocket flags merely OR's in
the updated flags with the existing ones, rather than
replacing them. Also, it does not handle setting an
rsocket from nonblocking mode back to blocking mode.
Steve Wise [Mon, 12 Jan 2015 16:57:40 +0000 (10:57 -0600)]
rping: create persistent server threads in DETACHED state
Since the persistent server threads aren't joined, they must be created in
the DETACHED state or resources will not be cleaned up when they exit.
This results in pthread_create() failures after thousands of rping
instances are run against a persistent server.
Also check the return from all calls to pthread_create() so we don't
ignore a thread creation failure.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Hariprasad S [Thu, 6 Nov 2014 09:12:56 +0000 (14:42 +0530)]
rping: Fixes race, where ibv context was getting freed before memory was deregistered
While running rping as a client without server on the other end,
rping_test_client fails and the ibv context was getting freed
before memory was deregistered. This patch fixes it.
Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
While waiting for a completion event, rsocket state is incorrectly
set to error when interrupted. Instead, the caller of get
completion event should decide what to do with it based on
errno. The fix is do not change the state to rs_error when
errno is EINTR inside get completion event.
Perform completion event acknowledgments in batches instead
of individually to minimze locking overheads. Size of the
completion queue decides the size of a batch.
Signed-off-by: Sreedhar Kodali <srkodali@linux.vnet.ibm.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
rsockets: Add fine grained interception mechanism for preload library
By default the R-Sockets pre-loading library intercepts all
the stream and datagram sockets belonging to a launched
program processes and threads.
However, distributed application and database servers may
require fine grained interception to ensure that only the
processes which are listening for remote connections on the
RDMA transport need to be enabled with RDMA while remaining
can continue to use TCP as before. This allows proper
communication happening between various server components locally.
A configuration file based mechanism is introduced to facilitate
this fine grained interception mechanism. As part of preload
initialization, the configuration file is scanned and an
in-memory record store is created with all the entries found.
When a request is made to intercept a socket, its attributes
are cross checked with stored records to see whether we
should proceed with rsocket switch over.
Note: Right now, the fine grained interception mechanism is
enabled only for newly created sockets. Going forward,
this can be extened to select connections based on the
specified host/IP addresses and ports as well.
"preload_config" is the name of the configuration file which
should exist in the default configuration location
(usually the full path to this configuration file is:
<install-root>/etc/rdma/rsocket/preload_config)
of an installed rsocket library.
The sample format for this configuration file is shown below:
# Sample config file for preloading in a program specific way
#
# Each line entry should have the following format:
#
# program domain type protocol
#
# where,
#
# program - program or command name (string without spaces)
# domain - the socket domain: AF_INET / AF_INET6 / AF_IB
# type - the socket type: SOCK_STREAM / SOCK_DGRAM
# protocol - the socket protocol: IPPROTO_TCP / IPPROTO_UDP
#
# The wildcard value of '*' is supported for any
#
# Note:
# Lines beginning with '#' character are treated as comments.
Sean Hefty [Thu, 4 Sep 2014 18:19:28 +0000 (11:19 -0700)]
rsockets: Support calling listen multiple times on same rsocket
Standard sockets allows an application to call listen() multiple
times on the same socket without error. This allows a multi-threaded
app to call listen from all threads.
rsockets will fail the second listen call. Modify the behavior to
match standard sockets.
Problem reported by: Sreedhar Kodali <srkodali@linux.vnet.ibm.com>
rsocket: Index map item is cleaned before it is used in iomapping cleanup
rs_free function clears index map item corresponding to the roscket
(in idm_clear called from rs_remove) and then uses it in
iomapping cleanup (in riounmap called from rs_free_iomappings).
Signed-off-by: Sasha Kotchubievsky <sashakot@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
rsocket: Segmentation fault fix in case of multiple connections
In case of more than 16 rsocket connections
are established, "svc->rss" buffer is reallocated
with more memory. Index 0 is reserved for the service's
communication socket, and this is not taken in count
when data is copied from old buffer location to
new one.
Signed-off-by: Ilya Nelkenbaum <ilyan@mellanox.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Sean Hefty [Wed, 16 Jul 2014 20:44:56 +0000 (13:44 -0700)]
riostream: Only verify last data transfer
Data verification will fail when running the bandwidth
tests or the transfer count is > 1. The issue is that
subsequent writes by the initiator side will overwrite
the data in the target buffer before the receiver can
verify that it is correct.
To fix this, only verify that the data in the buffer
is correct after the last transfer has completed.
0-byte RDMA writes appears to be working correctly with
HCAs from 2 different vendors. The original problem that
was reported turned out to be a user error.
Sean Hefty [Thu, 3 Jul 2014 20:45:52 +0000 (13:45 -0700)]
rsocket: Update correct rsocket keepalive time
When the keepalive time of an rsocket is updated, the
updated information is forwarded to the keepalive service
thread. However, the thread updates the time for the
wrong service as shown:
Sean Hefty [Thu, 3 Jul 2014 20:55:39 +0000 (13:55 -0700)]
rsocket: Fix removing rsocket from service thread
When removing an rsocket from a service thread, we replace
the removed service with the one at the end of the service list.
This keeps the array tightly packed. However, rs_svc_rm_rs
decrements the rsocket count before doing the swap. The result
is that the entry at the end of the list gets dropped off.
Defer decrementing the count until the swap has been made.
In this case, the cnt value is a valid index into the array,
because we start at index 1. Index 0 is used internally by
the service thread.
Sean Hefty [Wed, 2 Jul 2014 22:37:10 +0000 (15:37 -0700)]
rsocket: Fix crash resulting from keepalive timeout
The following crash was reported by Hal Rosenstock,
<hal@mellanox.com>, with keepalive enabled. The crash
occurs in the keepalive thread attempting to send a
keepalive message.
report:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffecf08700 (LWP 6013)]
rs_post_write (rs=<value optimized out>, sgl=0x0, nsge=0, wr_data=3758096385,
flags=0, addr=0, rkey=0) at src/rsocket.c:1660
1660 return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad));
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64
(gdb)
(gdb) p/x rs
$1 = value has been optimized out
So I added in the following to debug:
1660 if (rs == NULL)
1661 abort();
1662 if (rs->cm_id == NULL)
1663 abort();
1664 if (rs->cm_id->qp == NULL)
1665 abort();
1666 return rdma_seterrno(ibv_post_send(rs->cm_id->qp, &wr, &bad));
1667 }
And saw in gdb:
Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffecf08700 (LWP 8096)]
0x00000030d50328a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6.x86_64
(gdb)
(gdb) bt
#0 0x00000030d50328a5 in raise () from /lib64/libc.so.6
#1 0x00000030d5034085 in abort () from /lib64/libc.so.6
#2 0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0,
nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665
#3 0x00007ffff058193d in tcp_svc_send_keepalive (arg=0x7ffff0789f20)
at src/rsocket.c:4245
#4 tcp_svc_run (arg=0x7ffff0789f20) at src/rsocket.c:4279
#5 0x00000030d5807851 in start_thread () from /lib64/libpthread.so.0
#6 0x00000030d50e890d in clone () from /lib64/libc.so.6
(gdb) fr 2
#2 0x00007ffff057fe23 in rs_post_write (rs=<value optimized out>, sgl=0x1fa0,
nsge=6, wr_data=4294967295, flags=0, addr=0, rkey=0) at src/rsocket.c:1665
1665 abort();
So qp is NULL somehow...
:end report
There is an issue if an rsocket is closed without going through
the rshutdown.
int rshutdown(int socket, int how)
{
...
if (rs->opts & RS_OPT_SVC_ACTIVE)
rs_notify_svc(&tcp_svc, rs, RS_SVC_REM_KEEPALIVE);
We remove the rsocket from the keepalive thread in rshutdown.
int rclose(int socket)
{
...
if (rs->state & rs_connected)
rshutdown(socket, SHUT_RDWR);
...
rs_free(rs);
rclose will call shutdown only if we're connected. However, if the
keepalive failed, the socket will be in an error state. So,
no call to rshutdown, which will leave the freed rsocket on
the keepalive thread's list.
The fix is to to have rclose remove an rsocket from being processed
by a service thread if it is still active.
Testing has shown that this does not always result in the
keep-alive message working correctly, such that a broken
connection is reported as having failed. The reason for this
behavior is unknown, but revert the patch until the issue has
been resolved.
Doug Ledford [Wed, 18 Jun 2014 17:45:23 +0000 (10:45 -0700)]
rdma_server: handle IBV_SEND_INLINE correctly
Not all RDMA devices support IBV_SEND_INLINE. At least some of those
that don't will ignore the flag passed to rdma_post_send and attempt to
send the command by using an sge entry instead. Because we don't
register the send memory, this fails. The proper way to deal with the
fact that IBV_SEND_INLINE is not guaranteed is to check the returned
value in our cap struct to see if we have support for inline data, and
if not, fall back to non-inline sends and to register the send memory
region.
Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Doug Ledford [Wed, 18 Jun 2014 17:44:49 +0000 (10:44 -0700)]
rdma_client: handle IBV_SEND_INLINE correctly
Not all RDMA devices support IBV_SEND_INLINE. At least some of those
that don't will ignore the flag passed to rdma_post_send and attempt to
send the command by using an sge entry instead. Because we don't
register the send memory, this fails. The proper way to deal with the
fact that IBV_SEND_INLINE is not guaranteed is to check the returned
value in our cap struct to see if we have support for inline data, and
if not, fall back to non-inline sends and to register the send memory
region.
Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Doug Ledford [Wed, 18 Jun 2014 17:44:28 +0000 (10:44 -0700)]
rdma_server: use perror, unwind allocs on failure
Our main test function prints out errno directly, which is hard to read
as it's not decoded at all. Instead, use perror() to make failures more
readable. Also redo the failure flow so that we can do a simple unwind
at the end of the function and just jump to the right unwind spot on
error.
Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Doug Ledford [Wed, 18 Jun 2014 17:44:13 +0000 (10:44 -0700)]
rdma_client: use perror, unwind allocs on failure
Our main test function prints out errno directly, which is hard to read
as it's not decoded at all. Instead, use perror() to make failures more
readable. Also redo the failure flow so that we can do a simple unwind
at the end of the function and just jump to the right unwind spot on
error.
Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Doug Ledford [Wed, 18 Jun 2014 17:43:04 +0000 (10:43 -0700)]
cmtime: rework program to be multithread
When using very large numbers of connections (10,000 was in use here),
we ran into a problem where when we resolved a performance problem in
the kernel cma.c code, we suddenly developed a new problem. That new
problem turned out to be the fact that with the underlying kernel issue
resolved, 10,000 connect requests would flood the server side of the
test and the cmtime application would respond as quickly as possible.
However, the client side would not bother to check any of the returns
until after having sent all 10,000 connect requests. When the kernel
had a serializing performance problem, this was OK. When it was fixed,
this caused a general slowdown in connect operations due to overruns in
the event processing. This patch causes the client side to fire off
threads that will handle responses to connect requests as they come in
instead of allowing them to backlog uncontrollably. Times for a 10,000
connect run changed from this:
[root@rdma-dev-01 ~]# more
3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+.output
ib1:
step total ms max ms min us us / conn
create id : 46.64 0.10 1.00 4.66
bind addr : 89.61 0.04 7.00 8.96
resolve addr : 50.63 26.18 23976.00 5.06
resolve route: 565.44 538.77 26736.00 56.54
create qp : 4028.31 5.70 326.00 402.83
connect : 50077.42 49990.49 90734.00 5007.74
disconnect : 5277.25 4850.35 380017.00 527.72
destroy : 42.15 0.04 2.00 4.21
ib0:
step total ms max ms min us us / conn
create id : 34.82 0.04 1.00 3.48
bind addr : 25.94 0.02 1.00 2.59
resolve addr : 48.18 25.01 22779.00 4.82
resolve route: 501.28 476.26 25071.00 50.13
create qp : 3274.12 6.05 257.00 327.41
connect : 55549.64 55490.32 62150.00 5554.96
disconnect : 5263.64 4851.18 375628.00 526.36
destroy : 47.20 0.07 2.00 4.72
to this:
[root@rdma-dev-01 ~]# more
3.12.0-rc1.cached_gids+optimized_connect+trimmed_cache+-fixed-cmtime.output
ib1:
step total ms max ms min us us / conn
create id : 34.45 0.08 1.00 3.44
bind addr : 88.41 0.04 7.00 8.84
resolve addr : 33.59 4.65 612.00 3.36
resolve route: 618.68 0.61 97.00 61.87
create qp : 4024.03 6.30 341.00 402.40
connect : 6983.35 6886.33 8509.00 698.33
disconnect : 5066.47 230.34 831.00 506.65
destroy : 37.02 0.03 2.00 3.70
ib0:
step total ms max ms min us us / conn
create id : 42.61 0.14 1.00 4.26
bind addr : 27.05 0.03 2.00 2.70
resolve addr : 40.65 10.73 869.00 4.06
resolve route: 626.75 0.60 103.00 62.68
create qp : 3334.50 6.48 273.00 333.45
connect : 6310.29 6251.59 13298.00 631.03
disconnect : 5111.12 365.87 867.00 511.11
destroy : 36.57 0.02 2.00 3.66
with this patch.
Signed-off-by: Doug Ledford <dledford@redhat.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Sean Hefty [Thu, 22 May 2014 23:13:08 +0000 (16:13 -0700)]
indexer: Free index_map resources when cleared
Free memory allocated for index map entries when they are no
longer in use. To handle this, count the number of entries
stored by the index map item arrays and release the arrays when
no items are being tracked.
This reduces valgrind noise.
Problem reported by: Hannes Weisbach <hannes_weisbach@gmx.net>
Sean Hefty [Thu, 17 Apr 2014 05:01:51 +0000 (22:01 -0700)]
rsocket: Relax requirement for minimal inline data
Inline data support is optional. Allow rsockets to work
with devices that do not support inline data, provided
that they do support RDMA writes with immediate data.
This allows rsockets to work over Intel TrueScale HCA.
Patch derived from work by: Amir Hanania
Signed-off-by: Amir Hanania <amir.hanania@intel.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Sean Hefty [Thu, 17 Apr 2014 05:33:38 +0000 (22:33 -0700)]
rsocket: Modify when control messages are available
Rsockets currently tracks how many control messages (i.e.
entries in the send queue) that are available using a
single ctrl_avail counter. Seems simple enough.
However, control messages currently require the use of
inline data. In order to support control messages that
do not use inline data, we need to associate each
control message with a specific data buffer. This will
become easier to manage if we modify how we track when
control messages are available.
We replace the single ctrl_avail counter with two new
counters. The new counters conceptually treat control
messages as if each message had its own sequence number.
The sequence number will then be able to correspond to
a specific data buffer in a follow up patch.
ctrl_seqno will be used to indicate the current control
message being sent. ctrl_max_seqno will track the
highest control message that may be sent.
A side effect of this change is that we will be able to
see how many control messages have been sent. This also
separates the updating of the control count on the
sending side, versus the receiving side.
Sean Hefty [Thu, 17 Apr 2014 15:37:47 +0000 (08:37 -0700)]
rsocket: Dedicate a fixed number of SQEs for control messages
The number of SQEs allocated for control messages is set
to 1 of 2 constant values (either 4 or 2). A default
value is used unless the size of the SQ is below a certain
threshold (16 entries). This results in additional code
complexity, and it is highly unlikely that the SQ would
ever be allocated smaller than 16 entries.
Simplify the code to use a single constant value for the
number of SQEs allocated for control messages. This will
also help in subsequent patches that will need to deal
with HCAs that do not support inline data.
Sean Hefty [Thu, 17 Apr 2014 04:42:06 +0000 (21:42 -0700)]
rsocket: Check max inline data after creating QP
The ipath provider will ignore the max_inline_size
specified as input into ibv_create_qp and instead
return the size that it supports (which is 0) on
output.
Update the actual inline size returned from create QP,
and check that it meets the minimum requirement for
rsockets.
Sean Hefty [Wed, 9 Apr 2014 19:19:25 +0000 (12:19 -0700)]
librdmacm: Support lazy initialization
librdmacm currently opens a device context per configured HCA. This is
usually done in rdma_create_event_channel() or first time whenever
ucma_init() is called. If a process is only going to use one of the
configured HCAs/RDMA IPs then the remaining device contexts are not
used/required. Opening a device context on each device apriori limits the
maximum number of processes that can be supported on a node to the maximum
number of open context supported per HCA regardless of number of HCAs present
in the system.
Signed-off-by: Shamir Rabinovitch <shamir.rabinovitch@oracle.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Sean Hefty [Thu, 6 Mar 2014 21:42:31 +0000 (13:42 -0800)]
rsocket: Fix sbuf_bytes_avail counter 'overrun' with iwarp
Reported-by: Jonas Pfefferle1 <JPF@zurich.ibm.com>
"The problem is that on the client side sbuf_bytes_avail overflows
in rs_poll_cq. And from what I debugged so far there are 2
completions for every send and this is because I use iWarp hardware
which does not support write with immediate so there is one completion
for the write and one for the send (both go into the default case
and add the length to sbuf_bytes_avail)."
To avoid the issue, we flag send message operations that are used
in place of immediate data. Other send message operations are
not affected. The completion code can then check whether the
completion is for a send message which was paired with an RDMA
write transaction and adjust the behavior accordingly. Additionally,
such send messages only carry the opcode in their WR_ID, with the
data portion zeroed. This avoids adding the length value twice.
Sean Hefty [Mon, 27 Jan 2014 19:30:34 +0000 (11:30 -0800)]
udaddy: Remove support for port space IB
UD support for the IB port space requires that the application
use rdma_create_ep, rather than rdma_create_id. However, using
rdma_create_ep results in address and route resolution being
performed synchronously as part of the rdma_create_ep call.
Since udaddy is an example, we want to show how it can be used
with asynchronous events. So, rather than update udaddy to
use rdma_create_ep in order to support the IB port space, it
would be better to remove that support.
Sean Hefty [Tue, 26 Nov 2013 21:16:19 +0000 (13:16 -0800)]
librdmacm: Check 'init' under mutex
ucma_ib_init() does a quick check that access to ibacm has
been initialized. This check is done outside of the
acm_lock mutex. We need to check init again inside of
holding the mutex to ensure that we don't run the
initialization code twice.
Sean Hefty [Mon, 18 Nov 2013 21:12:04 +0000 (13:12 -0800)]
rping: Fix server reporting error on exit
Commit e57196c71ddd850e14f3e66355f02786e4914f72
rping: added checks to the return values functions
resulted in the rping server always reporting that
it failed. Fix this by only failing in the case of
an unexpected termination, and not the result of
the client completing.
Sean Hefty [Mon, 11 Nov 2013 18:24:54 +0000 (10:24 -0800)]
Retrieve SGID after calling rdma_bind_addr
A change was made to rdma_bind_addr when AF_IB is enabled
to only retrieve the resulting bound address. Previously,
rdma_bind_addr would retrieve the corresponding SGID as
well. This breaks some apps which were checking the
SGID after binding to an IP address. Revert to the
previous behavior of also retrieving the SGID after
calling rdma_bind_addr.
Tested-by: Christoph Lameter <cl@linux.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Sean Hefty [Fri, 16 Aug 2013 22:15:12 +0000 (15:15 -0700)]
rsockets: Handle race between rshutdown and rpoll
Multi-threaded applications which call rpoll and rshutdown
simultaneously can hang. Ceph developers reported an issue
with the rsocket implementation. Ceph calls rpoll in
one thread, and while that thread is blocked in rpoll,
a second thread may cann rshutdown on the socket. In
normal sockets, this results in the poll call unblocking
(since a call to read on the socket will no longer block).
however, rsockets does not free the thread blocked on the
rpoll call.
To fix this, we add some additional state checking to
protect against threads calling rpoll and rshutdown
simultaneously. We also have the rshutdown call
transition the QP into an error state. This causes all
posted receives to complete as flushed, which results
in unblocking the thread in rpoll (to process the flushed
receives).
Sean Hefty [Mon, 5 Aug 2013 17:57:43 +0000 (10:57 -0700)]
cmtime: Add example program that times rdma cm calls
cmtime is a new sample program that measures how long it
takes for each step in the connection process to complete.
It can be used to analyze the performance of the various
CM steps.
Sean Hefty [Fri, 26 Jul 2013 16:52:55 +0000 (09:52 -0700)]
rstream: Use rsocket option to set route directly
If we're using GID addressing, rdma_getaddrinfo can return
routing data directly. Add an option for the user to
indicate that rdma_getaddrinfo should be called in place of
getaddrinfo. And if routing data is available, call
rsetsockopt to set the route.
This helps test rsockets when ibacm and AF_IB support are
available.
Sean Hefty [Mon, 10 Jun 2013 19:33:20 +0000 (12:33 -0700)]
rsockets: Add ability to set the IB route directly
Add an RDMA specific rsocket option that allows the user
to program the RDMA route directly. This is useful
for apps that have path record data available, e.g. from
ibacm.
Sean Hefty [Thu, 18 Jul 2013 20:26:15 +0000 (13:26 -0700)]
rsockets: Support native IB addressing on connected rsockets
Update rsockets to support AF_IB addresses on connected rsockets.
Support for datagram rsockets is more difficult as a result of
using real UDP sockets for QP resolution, so that support is
deferred. For connected sockets, we need to update internal
checks to handle AF_IB.
Sean Hefty [Mon, 10 Jun 2013 17:57:56 +0000 (10:57 -0700)]
init: Remove USE_IB_ACM configuration option
When the librdmacm is configured, it sets the USE_IB_ACM option
if infininband/acm.h is found. We can remove this option with
very little overhead, which would allow a user to install
ACM after installing the librdmacm, and the librdmacm would be
able to make use of ACM.
Sean Hefty [Mon, 10 Jun 2013 18:07:12 +0000 (11:07 -0700)]
acm: Define needed ACM protocol messages
The librdmacm needs message definitions used to communicate
with the ibacm. It currently pulls these from infiniband/acm.h,
which is installed by ibacm. This creates an install order
dependency on ibacm. However, work on the scalable SA has
the ibacm using the librdmacm (via rsockets) for communication
between the different SSA components.
To resolve this issue, have the librdmacm define the message
structures that it needs to communicate with ibacm. The
librdmacm already defines some ACM messages through configuration
checks. We just expand that capability, which isolates the librdmacm
package from the ibacm package.
File opened by librdmacm are not supposed to be inherited across
exec*(), most of the files are of no use for another program, and
others cannot be used without the associated memory mapping.
This patch changes fopen() open() and socket() to always set
close on exec flag.
This patch also add checks to configure to guess if fopen() supports
"e" flag. If O_CLOEXEC and SOCK_CLOEXEC are supported, fopen() should
support "e". If not supported, its discarded according to POSIX. Many
operating systems have support for fopen("e").
You might find more information about close on exec in the following articles:
- "Excuse me son, but your code is leaking !!!" by Dan Walsh
http://danwalsh.livejournal.com/53603.html
- "Secure File Descriptor Handling" by Ulrich Drepper
http://udrepper.livejournal.com/20407.html
Note: this patch won't set close on exec flag on file descriptors
created by the kernel for completion channel and such.
This is addressed by another kernel patch.
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Following advice in "Autotool Mythbuster" [1], option subdir-objects
can be used to have Makefiles create object files in the same
directory than theirs source files.
It reduces clobbering in the build directory.
[1] "Autotool Mythbuster", by Diego Elio "Flameeyes" Petten`o
http://www.flameeyes.eu/autotools-mythbuster/automake/nonrecursive.html
Signed-off-by: Yann Droneaud <ydroneaud@opteya.com> Signed-off-by: Sean Hefty <sean.hefty@intel.com>