Sean Hefty [Wed, 1 Aug 2012 23:26:11 +0000 (16:26 -0700)]
rspreload: Call real.close in fd_close
The index into the preload lookup table is obtained by opening
/dev/null and use the returned value. When closing the file,
use the real close call and not the preload close call. This
is a minor optimization, but clarifies the expected operation.
Sean Hefty [Fri, 27 Jul 2012 17:46:42 +0000 (10:46 -0700)]
rsocket: Improve disconnect time under normal conditions
When both sides of a connection attempt to close at the same
time, one of the two sides can easily get an error when sending
a disconnect message. This results in that side hanging
during close until the send times out. (The time out is caused
by the remote side destroying its QP.)
We can reduce the chance of this occurring by immediately
assuming that the disconnect has been successful once we've
received the remote side's disconnect message, or we've
polled a send completion for the local disconnect message.
Sean Hefty [Fri, 27 Jul 2012 17:46:42 +0000 (10:46 -0700)]
rsocket: Improve disconnect time under normal conditions
When both sides of a connection attempt to close at the same
time, one of the two sides can easily get an error when sending
a disconnect message. This results in that side hanging
during close until the send times out. (The time out is caused
by the remote side destroying its QP.)
We can reduce the chance of this occurring by immediately
assuming that the disconnect has been successful once we've
received the remote side's disconnect message.
Sean Hefty [Thu, 26 Jul 2012 22:35:32 +0000 (15:35 -0700)]
rsockets: Use wr_id to determine completion type
If a work request has completed in error, the completion type
field is undefined. Use the wr_id to determine if the failed
completion was a send or receive.
This fixes an issue where MPI can hang during finalize. With
both sides of a connection shutting down simultaneously, one
side may complete quicker and delete its QP before the other
side receives an acknowledgement to their disconnect message.
Eventually, the disconnect message will time out, but because
the completion type field is undefined, it may be processed
as a failed receive, rather than a failed send. The end
result is that the second side hangs waiting for the send to
complete.
Sean Hefty [Fri, 27 Jul 2012 17:46:42 +0000 (10:46 -0700)]
rsocket: Improve disconnect time under normal conditions
When both sides of a connection attempt to close at the same
time, one of the two sides can easily get an error when sending
a disconnect message. This results in that side hanging
during close until the send times out. (The time out is caused
by the remote side destroying its QP.)
We can reduce the chance of this occurring by immediately
assuming that the disconnect has been successful once we've
received the remote side's disconnect message.
Sean Hefty [Fri, 27 Jul 2012 17:46:42 +0000 (10:46 -0700)]
rsocket: Improve disconnect time under normal conditions
When both sides of a connection attempt to close at the same
time, one of the two sides can easily get an error when sending
a disconnect message. This results in that side hanging
during close until the send times out. (The time out is caused
by the remote side destroying its QP.)
We can reduce the chance of this occurring by immediately
assuming that the disconnect has been successful once we've
received the remote side's disconnect message.
Sean Hefty [Fri, 27 Jul 2012 17:46:42 +0000 (10:46 -0700)]
rsocket: Improve disconnect time under normal conditions
When both sides of a connection attempt to close at the same
time, one of the two sides can easily get an error when sending
a disconnect message. This results in that side hanging
during close until the send times out. (The time out is caused
by the remote side destroying its QP.)
We can reduce the chance of this occurring by immediately
assuming that the disconnect has been successful once we've
received the remote side's disconnect message.