Sean Hefty [Thu, 22 Dec 2011 08:14:12 +0000 (00:14 -0800)]
libibverbs: Support both OFED verbs and ibverbs
From: Sean Hefty <sean.hefty@intel.com>
This patch allows libibverbs to support both libibverbs API that
shipped with OFED 1.5 and the upstream libibverbs API. This
supports existing apps that are compiled against the upstream
libibverbs (ibverbs). And in ideal cases, an application
coded to the OFED version of libibverbs (ofverbs) would only
need to be recompiled with 'CFLAGS=-DOFED_VERBS' given as a
configuration option.
Support for OFED verbs is done using macros that convert the OFED
APIs to ibverbs APIs. In most cases, simple data casts are all
that are necessary, with XRC support being the primary exception.
Sean Hefty [Thu, 22 Dec 2011 08:14:12 +0000 (00:14 -0800)]
libibverbs: Support both OFED verbs and ibverbs
From: Sean Hefty <sean.hefty@intel.com>
This patch allows libibverbs to support both libibverbs API that
shipped with OFED 1.5 and the upstream libibverbs API. This
supports existing apps that are compiled against the upstream
libibverbs (ibverbs). And in ideal cases, an application
coded to the OFED version of libibverbs (ofverbs) would only
need to be recompiled with 'CFLAGS=-DOFED_VERBS' given as a
configuration option.
Support for OFED verbs is done using macros that convert the OFED
APIs to ibverbs APIs. In most cases, simple data casts are all
that are necessary, with XRC support being the primary exception.
Sean Hefty [Thu, 22 Dec 2011 08:14:12 +0000 (00:14 -0800)]
libibverbs: Support both OFED verbs and ibverbs
From: Sean Hefty <sean.hefty@intel.com>
This patch allows libibverbs to support both libibverbs API that
shipped with OFED 1.5 and the upstream libibverbs API. This
supports existing apps that are compiled against the upstream
libibverbs (ibverbs). And in ideal cases, an application
coded to the OFED version of libibverbs (ofverbs) would only
need to be recompiled with 'CFLAGS=-DOFED_VERBS' given as a
configuration option.
Support for OFED verbs is done using macros that convert the OFED
APIs to ibverbs APIs. In most cases, simple data casts are all
that are necessary, with XRC support being the primary exception.
Sean Hefty [Thu, 22 Dec 2011 08:10:11 +0000 (00:10 -0800)]
Add ibv_open_qp()
XRC receive QPs are shareable across multiple processes. Allow
any process with access to the xrc domain to open an existing
QP. After opening the QP, the process will receive events
related to the QP and be able to modify the QP.
Sean Hefty [Thu, 22 Dec 2011 08:08:19 +0000 (00:08 -0800)]
Using extensions to define XRC support
Define a common libibverbs extension to support XRC.
XRC introduces several new concepts and structures:
XRC domains: xrcd's are a type of protection domain used to
associate shared receive queues with xrc queue pairs. Since
xrcd are meant to be shared among multiple processes, we
introduce new APIs to open/close xrcd's.
XRC shared receive queues: xrc srq's are similar to normal
srq's, except that they are bound to an xrcd, rather
than to a protection domain. Based on the current spec
and implementation, they are only usable with xrc qps. To
support xrc srq's, we extend the existing srq_init_attr
structure to include an srq type and other needed information.
The extended fields are ignored unless extensions are being
used to support existing applications.
XRC queue pairs: xrc defines two new types of QPs. The
initiator, or send-side, xrc qp behaves similar to a send-
only RC qp. xrc send qp's are managed through the existing
QP functions. The send_wr structure is extended in a back-
wards compatible way to support posting sends on a send xrc
qp, which require specifying the remote xrc srq.
The target, or receive-side, xrc qp behaves differently
than other implemented qp's. A recv xrc qp can be created,
modified, and destroyed like other qp's through the existing
calls. The qp_init_attr structure is extended for xrc qp's,
with extension support dependent upon the qp_type being
defined correctly.
Because xrc recv qp's are bound to an xrcd, rather than a pd,
it is intended to be used among multiple processes. Any process
with access to an xrcd may allocate and connect an xrc recv qp.
The actual xrc recv qp is allocated and managed by the kernel.
If the owning process explicit destroys the xrc recv qp, it is
destroyed. However, if the xrc recv qp is left open when the
user process exits or closes its device, then the lifetime of
the xrc recv qp is bound with the lifetime of the xrcd.
The user to kernel ABI is extended to account for opening/
closing the xrcd and the creation of the extended srq type.
Sean Hefty [Thu, 22 Dec 2011 08:07:51 +0000 (00:07 -0800)]
Allow 3rd party extensions to verb routines
In order to support OFED, vendor specific calls, or new ibverbs
operations, define a generic extension mechanism. This allows
OFED, an RDMA vendor, or another registered 3rd party (for
example, the librdmacm) to define RDMA extensions, plus provides
a backwards compatible way to add new features to ibverbs.
Users which make use extensions are aware that they are not
only using an extended call, but are given information regarding
how widely the extension by be supported based on the name of the
extension. E.g. a VENDOR extension is specific to a vendor, whereas
an OFA extension is standardized within an organization.
Support for extended functions, data structures, and enums are defined.
Extensions are referenced by name. There is an assumption that
extension names are prefixed relative to the supporting party.
Until an extension has been incorporated into libibverbs, it
should be defined in an appropriate external header file.
Driver libraries that support extensions are given a new
registration call, ibv_register_device_ext(). Use of this call
indicates to libibverbs that the library allocates extended
versions of struct ibv_device and struct ibv_context.
The following new APIs are added to libibverbs to applications
to use to determine if an extension is supported and to obtain the
extended function calls.
ibv_have_ext_ops - returns true if an extension is supported
ibv_get_device_ext_ops - return extended operations for a device
ibv_get_ext_ops - return extended operations for an open context
To maintain backwards compatibility with existing applications,
internally, the library uses the last byte of the device name
to record if the device was registered with extension support.
Add support to ibv_devinfo for displaying extended speeds
Add code to ibv_devinfo to display the following new speeds:
8: FDR-10 is a proprietary link speed which is 10.3125 Gbps with 64b/66b
encoding rather than 8b/10b encoding.
16: FDR - 14.0625 Gbps
32: EDR - 25.78125 Gbps
Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Reviewed-by: Hal Rosenstock <hal@mellanox.com> Signed-off-by: Roland Dreier <roland@purestorage.com>
Bart Van Assche [Sun, 7 Aug 2011 18:01:48 +0000 (18:01 +0000)]
Makefile.am: Fix an automake warning
Fix the following automake warning message:
Makefile.am:1: `INCLUDES' is the old name for `AM_CPPFLAGS' (or `*_CPPFLAGS')
A quote from the automake manual:
INCLUDES
This does the same job as AM_CPPFLAGS (or any per-target _CPPFLAGS variable
if it is used). It is an older name for the same functionality. This
variable is deprecated; we suggest using AM_CPPFLAGS and per-target
_CPPFLAGS instead.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Bart Van Assche [Sun, 7 Aug 2011 18:01:08 +0000 (18:01 +0000)]
Add "foreign" option to AM_INIT_AUTOMAKE
Switch to the modern form of the AM_INIT_AUTOMAKE macro and tell
automake that the libibverbs package does not follow the GNU
standards. This change makes it possible to use 'autoreconf' for the
libibverbs package.
Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Roland Dreier <roland@purestorage.com>
Or Gerlitz [Tue, 19 Jul 2011 09:31:32 +0000 (09:31 +0000)]
Update examples for IBoE
Since IBoE requires usage of GRH, update ibv_*_pinpong examples to
accept GIDs. GIDs are given as an index to the local port's table and
are exchanged between the client and the server through the socket
connection.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Or Gerlitz [Tue, 19 Jul 2011 09:28:42 +0000 (09:28 +0000)]
Update kernel API header to include link_layer
Modify the code to handle returning the link layer of a port from the
kernel to the library. The kernel has done this since commit 2420b60b1dc4 ("IB/uverbs: Return link layer type to userspace for
query port operation"), merged in 2.6.37-rc1.
The new field does not change the size of struct ibv_query_port_resp
as it replaces a reserved field. Binary compatibility between the
kernel to the library is kept, since old kernels running below new
library will not zero that field, so it will be read as "unspecified,"
while an old library running over new kernel will ignore the value
returned by the kernel.
The solution was suggested by Roland Dreier <roland@purestorage.com>
and Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Or Gerlitz [Wed, 20 Jul 2011 19:37:24 +0000 (19:37 +0000)]
Add link_layer field port attribute
The new field has three possible values: IBV_LINK_LAYER_UNSPECIFIED,
IBV_LINK_LAYER_INFINIBAND, IBV_LINK_LAYER_ETHERNET. It can be used by
applications to know the link layer used by the port, which can be
either InfiniBand or Ethernet.
The addition of the new field does not change the size of struct
ibv_port_attr due to alignment of the preceding fields. Binary
compatibility between the library to applications is kept, since old
apps running over new library do not read this field, and new apps
running over old library will determine the link layer as unspecified
and hence take their IB code path.
The solution was suggested by Roland Dreier <roland@purestorage.com>
and Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il> Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Handle huge pages in ibv_fork_init() and madvise tracking
When fork support is enabled in libibverbs, madvise() is called for
every memory page that is registered as a memory region. Memory
ranges that are passed to madvise() must be page aligned and the size
must be a multiple of the page size.
libibverbs uses sysconf(_SC_PAGESIZE) to find out the system page size
and rounds all ranges passed to reg_mr() according to this page size.
When memory from libhugetlbfs is passed to reg_mr(), this does not
work as the page size for this memory range might be different
(e.g. 16MB). So libibverbs would have to use the huge page size to
calculate a page aligned range for madvise.
As huge pages are provided to the application "under the hood" when
preloading libhugetlbfs, the application does not have any knowledge
about when it registers a huge page or a usual page.
To work around this issue, detect the use of huge pages in libibverbs
and align memory ranges passed to madvise according to the huge page
size. Determining the page size of a given memory range by watching
madvise() fail has proven to be unreliable. So we introduce the
RDMAV_HUGEPAGES_SAFE environment variable to let the user decide if
the page size should be checked on every reg_mr() call or not. This
requires the user to be aware if huge pages are used by the running
application or not.
I did not add an aditional API call to enable this, as applications
can use setenv() + ibv_fork_init() to enable checking for huge pages
in the code.
Signed-off-by: Alexander Schmidt <alexs@linux.vnet.ibm.com>
[ Updated ibv_fork_init() manpage for RDMAV_HUGEPAGES_SAFE. - Roland ]
Signed-off-by: Roland Dreier <roland@purestorage.com>