From: Stan Smith Date: Fri, 23 Apr 2010 17:53:37 +0000 (+0000) Subject: [DOC] added librdmacm & libibverbs documentation based on OFED for Linux man pages. X-Git-Url: https://openfabrics.org/gitweb/?a=commitdiff_plain;h=4418efaf5d13efb0101343d336aec42c2aca0bb0;p=~shefty%2Frdma-win.git [DOC] added librdmacm & libibverbs documentation based on OFED for Linux man pages. git-svn-id: svn://openib.tc.cornell.edu/gen1@2796 ad392aa1-c5ef-ae45-8dd8-e69d62a5ef86 --- diff --git a/trunk/docs/Manual.htm b/trunk/docs/Manual.htm index 2b5d3e88..184e6b6b 100644 --- a/trunk/docs/Manual.htm +++ b/trunk/docs/Manual.htm @@ -4,6 +4,8 @@ @@ -15,7 +17,7 @@ div.Section1

User's Manual

Release 2.3

-03/29/2010

+04/23/2010

Overview

The OpenFabrics Enterprise Distribution for Windows package is composed of software modules intended @@ -84,8 +86,7 @@ style='background-position: 0% 0%; mso-highlight:yellow; background-image:none;

  • DAPLtest

  • -

    DAPLtest Examples

    -
  • +

    DAPLtest Examples

  • DAT Application Build

     

  • @@ -96,7 +97,16 @@ style='background-position: 0% 0%; mso-highlight:yellow; background-image:none;

    QLogic VNIC_Driver

  • - InfiniBand Software Development Kit

  • + OFED Software Development Kit + +
  • WinVerbs

  • @@ -1701,20 +1711,20 @@ mask bit on

    -g get multicast group info

    --m get multicast member info. If a group is specified, limit the
    -output to the group specified and print one line containing only
    -the GUID and node description for each entry. Example: saquery
    +-m get multicast member info. If a group is specified, limit the +output to the group specified and print one line containing only +the GUID and node description for each entry. Example: saquery -m 0xc000

    -x get LinkRecord info

    --src-to-dst
    -get a PathRecord for <src:dst> where src and dst are either node
    +get a PathRecord for <src:dst> where src and dst are either node names or LIDs

    --sgid-to-dgid
    -get a PathRecord for sgid to dgid where both GIDs are in an IPv6
    -format acceptable to inet_pton(3).
    +get a PathRecord for sgid to dgid where both GIDs are in an IPv6 format +acceptable to inet_pton.

    -C <ca_name>
    use the specified ca_name.
    @@ -1800,8 +1810,8 @@ OPTIONS

    -a set activity count

    -
    COMMON OPTIONS
    +
    Most OFED diagnostics take the following common flags. The exact list
    of supported flags per utility can be found in the usage message and
    can be shown using the util_name -h syntax.
    @@ -5030,9 +5040,9 @@ IOC. So qlgcvnic_config can be used to create multiple VNIC interfaces by giving target ioc guid parameters as input.

    -Usage:-- +Usage:

    -To create child vnic devices +Create child vnic devices

    qlgcvnic_config -c {caguid}  {iocguid}  {instanceid}  {interface description}

    @@ -5053,43 +5063,3493 @@ Executing qlgcvnic_config without any option or with -l option will list the IOC

     <return-to-top>

     

    -

    InfiniBand Software +

    OFED Software Development Kit


    -

    If selected during a OFED install, the IB Software Development Kit will -be installed as '%SystemDrive%\IBSDK'. Underneath the IBSDK\ folder you will find an -include folder 'Inc\',  library definition files 'Lib\'  along with a -'Samples' folder.

    +

    If selected during install, the OFED Software Development Kit will +be installed as '%SystemDrive%\OFED_SDK'. Underneath the OFED_SDK\ folder you will find +the following folders:

    +

    Compilation:

    -

    Add the additional include path '%SystemDrive%\IBSDK\Inc'; resource files +

    Add the additional include path '%SystemDrive%\OFED_SDK\Inc'; resource files may also use this path.

    Linking:

    -

    Add the additional library search path '%SystemDrive%\IBSDK\Lib'.

    +

    Add the additional library search path '%SystemDrive%\OFED_SDK\Lib'.

    Include dependent libraries: ibal.lib and complib.lib, or ibal32.lib & complib32.lib for win32 applications on 64-bit platforms.

    Samples:

    +

    <return-to-top>

    +

     

    +

    OFED InfiniBand Verbs

    +
    + + +

    NAME
    +
        +libibverbs.lib - OpenFabrics Enterprise Distribution (OFED) Infiniband verbs library

    +SYNOPSIS
    +

        +#include <infiniband/verbs.h>
    +
    DESCRIPTION

    +
    +

    This library is an implementation of the verbs based on the Infiniband +specification volume 1.2 chapter 11. It handles the control path of creating, +modifying, querying and destroying resources such as Protection Domains (PD), +Completion Queues (CQ), Queue-Pairs (QP), Shared Receive Queues (SRQ), Address +Handles (AH), Memory Regions (MR). It also handles sending and receiving data +posted to QPs and SRQs, getting completions from CQs using polling and +completions events.

    The control path is implemented through system calls to the uverbs kernel module +which further calls the low level HW driver. The data path is implemented through +calls made to low level HW library which in most cases interacts directly with +the HW providing kernel and network stack bypass (saving context/mode switches) +along with zero copy and an asynchronous I/O model.

    Typically, under network and RDMA programming, there are operations which +involve interaction with remote peers (such as address resolution and connection +establishment) and remote entities (such as route resolution and joining a +multicast group under IB), where a resource managed through IB verbs such as QP +or AH would be eventually created or effected from this interaction. In such +cases, applications whose addressing semantics is based on IP can use librdmacm +(see rdma_cm) which works in conjunction with libibverbs.

    This library is thread safe library and verbs can be called from every thread in +the process (the same resource can even be handled from different threads, for +example: ibv_poll_cq can be called from more than one thread).

    However, it is up to the user to stop working with a resource after it was +destroyed (by the same thread or by any other thread), this may result a +segmentation fault.

    +

    The following shall be declared as functions and may also be defined as +macros.

    +
    +
    +

    Function prototypes are provided in + + %SystemDrive%\OFED_SDK\inc\infiniband\verbs.h.
    +
    Link to + %SystemDrive%\OFED_SDK\lib\libibverbs.lib

    +
    +

    Device functions

    +
    +

    struct ibv_device **ibv_get_device_list(int *num_devices);

    +

    void ibv_free_device_list(struct ibv_device **list);

    +

    const char *ibv_get_device_name(struct ibv_device *device);

    +

    uint64_t ibv_get_device_guid(struct ibv_device *device);

    +
    +

    Context functions

    +
    +

    struct ibv_context *ibv_open_device(struct ibv_device *device);

    +

    int ibv_close_device(struct ibv_context *context);

    +
    +

    Queries

    +
    +

    int ibv_query_device(struct ibv_context *context, +struct ibv_device_attr *device_attr);

    +

    int ibv_query_port(struct ibv_context *context, uint8_t port_num, +struct ibv_port_attr *port_attr);

    +

    int ibv_query_pkey(struct ibv_context *context, uint8_t port_num, +int index, uint16_t *pkey);

    +

    int ibv_query_gid(struct ibv_context *context, uint8_t port_num, +int index, union ibv_gid *gid);

    +
    +

    Asynchronous events

    +
    +

    int ibv_get_async_event(struct ibv_context *context, +struct ibv_async_event *event);

    +

    void ibv_ack_async_event(struct ibv_async_event *event);

    +
    +

    Protection Domains

    +
    +

    struct ibv_pd *ibv_alloc_pd(struct ibv_context *context);

    +

    int ibv_dealloc_pd(struct ibv_pd *pd);

    +
    +

    Memory Regions

    +
    +

    struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr, +size_t length, enum ibv_access_flags access);

    +

    int ibv_dereg_mr(struct ibv_mr *mr);

    +
    +

    Address Handles

    +
    +

    struct ibv_ah *ibv_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr);

    int + ibv_init_ah_from_wc(struct ibv_context *context, uint8_t port_num, +struct ibv_wc *wc, struct ibv_grh *grh, +struct ibv_ah_attr *ah_attr);

    struct ibv_ah *ibv_create_ah_from_wc(struct ibv_pd *pd, struct ibv_wc *wc, +struct ibv_grh *grh, uint8_t port_num);

    int ibv_destroy_ah(struct ibv_ah *ah);

    +
    +

    Completion event channels

    +
    +

    struct ibv_comp_channel *ibv_create_comp_channel(struct ibv_context + *context);

    +
    +
    +

    int ibv_destroy_comp_channel(struct ibv_comp_channel *channel);

    +
    +

    Completion Queues Control

    +
    +

    struct ibv_cq *ibv_create_cq(struct ibv_context *context, int cqe, +void *cq_context, +struct ibv_comp_channel *channel, +int comp_vector);

    int ibv_destroy_cq(struct ibv_cq *cq);

    int + ibv_resize_cq(struct ibv_cq *cq, int cqe);

    +
    +

    Reading Completions from CQ

    +
    +

    int ibv_poll_cq(struct ibv_cq *cq, int num_entries, struct ibv_wc *wc);

    +
    +

    Requesting / Managing CQ events

    +
    +

    int ibv_req_notify_cq(struct ibv_cq *cq, int solicited_only);

    +

    int ibv_get_cq_event(struct ibv_comp_channel *channel, +struct ibv_cq **cq, void **cq_context);

    +

    void ibv_ack_cq_events(struct ibv_cq *cq, unsigned int nevents);

    +
    +

    Shared Receive Queue control

    +
    +

    struct ibv_srq *ibv_create_srq(struct ibv_pd *pd, struct ibv_srq_init_attr *srq_init_attr);
    +
    int ibv_destroy_srq(struct ibv_srq *srq);

    int + ibv_modify_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr, enum ibv_srq_attr_mask srq_attr_mask);
    +
    int ibv_query_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr);

    +
    +

    eXtended Reliable Connection control

    +
    +

    struct ibv_xrc_domain *ibv_open_xrc_domain(struct ibv_context *context, int fd, int oflag);
    +
    int ibv_close_xrc_domain(struct ibv_xrc_domain *d);

    struct ibv_srq *ibv_create_xrc_srq(struct ibv_pd *pd, struct ibv_xrc_domain *xrc_domain, struct ibv_cq *xrc_cq, struct ibv_srq_init_attr *srq_init_attr);
    +
    int ibv_create_xrc_rcv_qp(struct ibv_qp_init_attr *init_attr, uint32_t *xrc_rcv_qpn);
    +
    int ibv_modify_xrc_rcv_qp(struct ibv_xrc_domain *xrc_domain, uint32_t xrc_qp_num, struct ibv_qp_attr *attr, int attr_mask);
    +
    int ibv_query_xrc_rcv_qp(struct ibv_xrc_domain *xrc_domain, uint32_t xrc_qp_num, struct ibv_qp_attr *attr, int attr_mask, struct ibv_qp_init_attr *init_attr);
    +
    int ibv_reg_xrc_rcv_qp(struct ibv_xrc_domain *xrc_domain, uint32_t xrc_qp_num);
    +
    int ibv_unreg_xrc_rcv_qp(struct ibv_xrc_domain *xrc_domain, uint32_t xrc_qp_num);

    +
    +

    Queue Pair control

    +
    +

    struct ibv_qp *ibv_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr);
    +
    int ibv_destroy_qp(struct ibv_qp *qp);

    int + ibv_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask);
    +
    int ibv_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask, struct ibv_qp_init_attr *init_attr);

    +
    +

    Posting Work Requests to QPs/SRQs

    +
    +

    int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr, struct ibv_send_wr **bad_wr);
    +
    int ibv_post_recv(struct ibv_qp *qp, struct ibv_recv_wr *wr, struct ibv_recv_wr **bad_wr);
    +
    int ibv_post_srq_recv(struct ibv_srq *srq, struct ibv_recv_wr *recv_wr, struct ibv_recv_wr **bad_recv_wr);

    +
    +

    Multicast group

    +
    +

    int ibv_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid);

    +

    int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid);

    +
    +

    General functions

    +
    +

    int ibv_rate_to_mult(enum ibv_rate rate);

    enum ibv_rate + mult_to_ibv_rate(int mult);
     

    +
    +

    SEE ALSO

    +
    +

    ibv_get_device_list, + ibv_free_device_list,
    + ibv_get_device_name, + ibv_get_device_guid, + ibv_open_device,
    + ibv_close_device, + ibv_query_device, + ibv_query_port,
    + ibv_query_pkey, ibv_query_gid, + ibv_get_async_event,
    + ibv_ack_async_event, + ibv_alloc_pd, ibv_dealloc_pd, + ibv_reg_mr,
    ibv_dereg_mr, + ibv_create_ah, ibv_init_ah_from_wc, + ibv_create_ah_from_wc,
    + ibv_destroy_ah, + ibv_create_comp_channel,
    + ibv_destroy_comp_channel, + ibv_create_cq, ibv_destroy_cq,
    + ibv_resize_cq, ibv_poll_cq, + ibv_req_notify_cq,
    + ibv_get_cq_event, + ibv_ack_cq_events, + ibv_create_srq,
    ibv_destroy_srq, + ibv_modify_srq, ibv_query_srq,
    + ibv_open_xrc_domain, + ibv_close_xrc_domain, + ibv_create_xrc_srq,
    + ibv_create_xrc_rcv_qp, + ibv_modify_xrc_rcv_qp,
    + ibv_query_xrc_rcv_qp, + ibv_reg_xrc_rcv_qp, + ibv_unreg_xrc_rcv_qp,
    + ibv_post_srq_recv, ibv_create_qp, + ibv_destroy_qp, ibv_modify_qp,
    + ibv_query_qp, ibv_post_send, + ibv_post_recv,
    ibv_attach_mcast, + ibv_detach_mcast, + ibv_rate_to_mult, + mult_to_ibv_rate

    +
    +


    AUTHORS

    +
    +

    Dotan Barak <dotanb@mellanox.co.il>
    Or Gerlitz <ogerlitz@voltaire.com>
    Stan Smith <stan.smith@intel.com>

    +
    +

    + +<return-to-top>

    +

     

    +

    IBV_GET_DEVICE_LIST

    +

    IBV_FREE_DEVICE_LIST

    +
    +

    NAME

    +ibv_get_device_list, ibv_free_device_list - get and release list of available +RDMA devices

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +struct ibv_device **ibv_get_device_list(int *num_devices);
    +
    +void ibv_free_device_list(struct ibv_device **list);
    +

    DESCRIPTION

    +ibv_get_device_list() returns a NULL-terminated array of RDMA devices +currently available. The argument num_devices is optional; if not NULL, +it is set to the number of devices returned in the array. +

    ibv_free_device_list() frees the array of devices list returned +by ibv_get_device_list().

    +

    RETURN VALUE

    +ibv_get_device_list() returns the array of available RDMA devices, or +sets errno and returns NULL if the request fails. If no devices are found +then num_devices is set to 0, and non-NULL is returned. +

    ibv_free_device_list() returns no value.

    +

    ERRORS

    +
    +
    EPERM
    +
    Permission denied. +
    +
    ENOSYS
    +
    No kernel support for RDMA. +
    +
    ENOMEM
    +
    Insufficient memory to complete the operation.
    +
    +

    NOTES

    +Client code should open all the devices it intends to use with +ibv_open_device() before calling ibv_free_device_list(). Once it +frees the array with ibv_free_device_list(), it will be able to use only +the open devices; pointers to unopened devices will no longer be valid. +  +

    SEE ALSO

    +ibv_get_device_name, +ibv_get_device_guid, +ibv_open_device

     

    +

    IBV_GET_DEVICE_GUID

    +
    +

    NAME

    +ibv_get_device_guid - get an RDMA device's GUID +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +uint64_t ibv_get_device_guid(struct ibv_device *device); 
    +

    DESCRIPTION

    +ibv_get_device_name() returns the Global Unique IDentifier (GUID) of the +RDMA device device.

    RETURN VALUE

    +ibv_get_device_guid() returns the GUID of the device in network byte +order.   +

    SEE ALSO

    +ibv_get_device_list, +ibv_get_device_name, +ibv_open_device

     

    +


    +IBV_GET_DEVICE_NAME

    +
    +

    NAME

    +ibv_get_device_name - get an RDMA device's name

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +const char *ibv_get_device_name(struct ibv_device *device);
    +

    DESCRIPTION

    +ibv_get_device_name() returns a human-readable name associated with the +RDMA device device.

    RETURN VALUE

    +ibv_get_device_name() returns a pointer to the device name, or NULL if +the request fails.

    SEE ALSO

    +ibv_get_device_list, +ibv_get_device_guid, +ibv_open_device

    +
    +
    +IBV_OPEN_DEVICE

    +

    IBV_CLOSE_DEVICE

    +
    +

    NAME

    +ibv_open_device, ibv_close_device - open and close an RDMA device context +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +struct ibv_context *ibv_open_device(struct ibv_device *device);
    +
    +int ibv_close_device(struct ibv_context *context);
    +
    +

    DESCRIPTION

    +ibv_open_device() opens the device device and creates a context +for further use. +

    ibv_close_device() closes the device context context.

    +

    RETURN VALUE

    +ibv_open_device() returns a pointer to the allocated device context, or +NULL if the request fails. +

    ibv_close_device() returns 0 on success, -1 on failure.

    +

    NOTES

    +ibv_close_device() does not release all the resources allocated using +context context. To avoid resource leaks, the user should release all +associated resources before closing a context. +

    SEE ALSO

    +ibv_get_device_list, +ibv_query_device, +ibv_query_port, +ibv_query_gid, ibv_query_pkey

     

    +

     

    +


    +IBV_GET_ASYNC_EVENT

    +

    +
    +
    +IBV_ACK_ASYNC_EVENT

    +
    +

    NAME

    +ibv_get_async_event, ibv_ack_async_event - get or acknowledge asynchronous +events   +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_get_async_event(struct ibv_context *context,
    +                        struct ibv_async_event *event);
    +
    +void ibv_ack_async_event(struct ibv_async_event *event);
    +

    DESCRIPTION

    +ibv_get_async_event() waits for the next async event of the RDMA device +context context and returns it through the pointer event, which is +an ibv_async_event struct, as defined in <infiniband/verbs.h>. +

    +
    struct ibv_async_event {
    +union {
    +struct ibv_cq  *cq;             /* CQ that got the event */
    +struct ibv_qp  *qp;             /* QP that got the event */
    +struct ibv_srq *srq;            /* SRQ that got the event */
    +int             port_num;       /* port number that got the event */
    +} element;
    +enum ibv_event_type     event_type;     /* type of the event */
    +};
    +
    +

    One member of the element union will be valid, depending on the event_type +member of the structure. event_type will be one of the following events:

    +

    QP events:

    +
    +
    IBV_EVENT_QP_FATAL Error occurred on a QP and it transitioned to + error state
    +
    +
    IBV_EVENT_QP_REQ_ERR Invalid Request Local Work Queue Error
    +
    +
    IBV_EVENT_QP_ACCESS_ERR Local access violation error
    +
    +
    IBV_EVENT_COMM_EST Communication was established on a QP
    +
    +
    IBV_EVENT_SQ_DRAINED Send Queue was drained of outstanding + messages in progress
    +
    +
    IBV_EVENT_PATH_MIG A connection has migrated to the alternate + path
    +
    +
    IBV_EVENT_PATH_MIG_ERR A connection failed to migrate to the + alternate path
    +
    +
    IBV_EVENT_QP_LAST_WQE_REACHED Last WQE Reached on a QP associated + with an SRQ
    +
    +
    +

    CQ events:

    +
    +
    IBV_EVENT_CQ_ERR CQ is in error (CQ overrun)
    +
    +
    +

    SRQ events:

    +
    +
    IBV_EVENT_SRQ_ERR Error occurred on an SRQ
    +
    +
    IBV_EVENT_SRQ_LIMIT_REACHED SRQ limit was reached
    +
    +
    +

    Port events:

    +
    +
    IBV_EVENT_PORT_ACTIVE Link became active on a port
    +
    +
    IBV_EVENT_PORT_ERR Link became unavailable on a port
    +
    +
    IBV_EVENT_LID_CHANGE LID was changed on a port
    +
    +
    IBV_EVENT_PKEY_CHANGE P_Key table was changed on a port
    +
    +
    IBV_EVENT_SM_CHANGE SM was changed on a port
    +
    +
    IBV_EVENT_CLIENT_REREGISTER SM sent a CLIENT_REREGISTER request + to a port
    +
    +
    +

    CA events:

    +
    +
    IBV_EVENT_DEVICE_FATAL CA is in FATAL state
    +
    +
    +

    ibv_ack_async_event() acknowledge the async event event.

    +

    RETURN VALUE

    +ibv_get_async_event() returns 0 on success, and -1 on error. +

    ibv_ack_async_event() returns no value.

    +

    NOTES

    +All async events that ibv_get_async_event() returns must be acknowledged +using ibv_ack_async_event(). To avoid races, destroying an object (CQ, +SRQ or QP) will wait for all affiliated events for the object to be +acknowledged; this avoids an application retrieving an affiliated event after +the corresponding object has already been destroyed. +

    ibv_get_async_event() is a blocking function. If multiple threads call +this function simultaneously, then when an async event occurs, only one thread +will receive it, and it is not possible to predict which thread will receive it. +

    +

    EXAMPLES

    +The following code example demonstrates one possible way to work with async +events in non-blocking mode. It performs the following steps: +

    1. Set the async events queue work mode to be non-blocked
    +2. Poll the queue until it has an async event
    +3. Get the async event and ack it

    +

    +
    /* change the blocking mode of the async event queue */
    +flags = fcntl(ctx->async_fd, F_GETFL);
    +rc = fcntl(ctx->async_fd, F_SETFL, flags | O_NONBLOCK);
    +if (rc < 0) {
    +        fprintf(stderr, "Failed to change file descriptor of async event queue\n");
    +        return 1;
    +}
    +
    +/*
    + * poll the queue until it has an event and sleep ms_timeout
    + * milliseconds between any iteration
    + */
    +my_pollfd.fd      = ctx->async_fd;
    +my_pollfd.events  = POLLIN;
    +my_pollfd.revents = 0;
    +
    +do {
    +        rc = poll(&my_pollfd, 1, ms_timeout);
    +} while (rc == 0);
    +if (rc < 0) {
    +        fprintf(stderr, "poll failed\n");
    +        return 1;
    +}
    +
    +/* Get the async event */
    +if (ibv_get_async_event(ctx, &async_event)) {
    +        fprintf(stderr, "Failed to get async_event\n");
    +        return 1;
    +}
    +
    +/* Ack the event */
    +ibv_ack_async_event(&async_event);
    +
    +
    +

    SEE ALSO

    +ibv_open_device +

     

    +


    +IBV_QUERY_DEVICE

    +
    +

    NAME

    +ibv_query_device - query an RDMA device's attributes   +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_query_device(struct ibv_context *context,
    +                     struct ibv_device_attr *device_attr);
    +

    DESCRIPTION

    +ibv_query_device() returns the attributes of the device with context +context. The argument device_attr is a pointer to an ibv_device_attr +struct, as defined in <infiniband/verbs.h>. +

    +
    struct ibv_device_attr {
    +char                    fw_ver[64];             /* FW version */
    +uint64_t                node_guid;              /* Node GUID (in network byte order) */
    +uint64_t                sys_image_guid;         /* System image GUID (in network byte order) */
    +uint64_t                max_mr_size;            /* Largest contiguous block that can be registered */
    +uint64_t                page_size_cap;          /* Supported memory shift sizes */
    +uint32_t                vendor_id;              /* Vendor ID, per IEEE */
    +uint32_t                vendor_part_id;         /* Vendor supplied part ID */
    +uint32_t                hw_ver;                 /* Hardware version */
    +int                     max_qp;                 /* Maximum number of supported QPs */
    +int                     max_qp_wr;              /* Maximum number of outstanding WR on any work queue */
    +int                     device_cap_flags;       /* HCA capabilities mask */
    +int                     max_sge;                /* Maximum number of s/g per WR for non-RD QPs */
    +int                     max_sge_rd;             /* Maximum number of s/g per WR for RD QPs */
    +int                     max_cq;                 /* Maximum number of supported CQs */
    +int                     max_cqe;                /* Maximum number of CQE capacity per CQ */
    +int                     max_mr;                 /* Maximum number of supported MRs */
    +int                     max_pd;                 /* Maximum number of supported PDs */
    +int                     max_qp_rd_atom;         /* Maximum number of RDMA Read & Atomic operations that can be outstanding per QP */
    +int                     max_ee_rd_atom;         /* Maximum number of RDMA Read & Atomic operations that can be outstanding per EEC */
    +int                     max_res_rd_atom;        /* Maximum number of resources used for RDMA Read & Atomic operations by this HCA as the Target */
    +int                     max_qp_init_rd_atom;    /* Maximum depth per QP for initiation of RDMA Read & Atomic operations */ 
    +int                     max_ee_init_rd_atom;    /* Maximum depth per EEC for initiation of RDMA Read & Atomic operations */
    +enum ibv_atomic_cap     atomic_cap;             /* Atomic operations support level */
    +int                     max_ee;                 /* Maximum number of supported EE contexts */
    +int                     max_rdd;                /* Maximum number of supported RD domains */
    +int                     max_mw;                 /* Maximum number of supported MWs */
    +int                     max_raw_ipv6_qp;        /* Maximum number of supported raw IPv6 datagram QPs */
    +int                     max_raw_ethy_qp;        /* Maximum number of supported Ethertype datagram QPs */
    +int                     max_mcast_grp;          /* Maximum number of supported multicast groups */
    +int                     max_mcast_qp_attach;    /* Maximum number of QPs per multicast group which can be attached */
    +int                     max_total_mcast_qp_attach;/* Maximum number of QPs which can be attached to multicast groups */
    +int                     max_ah;                 /* Maximum number of supported address handles */
    +int                     max_fmr;                /* Maximum number of supported FMRs */
    +int                     max_map_per_fmr;        /* Maximum number of (re)maps per FMR before an unmap operation in required */
    +int                     max_srq;                /* Maximum number of supported SRQs */
    +int                     max_srq_wr;             /* Maximum number of WRs per SRQ */
    +int                     max_srq_sge;            /* Maximum number of s/g per SRQ */
    +uint16_t                max_pkeys;              /* Maximum number of partitions */
    +uint8_t                 local_ca_ack_delay;     /* Local CA ack delay */
    +uint8_t                 phys_port_cnt;          /* Number of physical ports */
    +};
    +

    RETURN VALUE

    +ibv_query_device() returns 0 on success, or the value of errno on failure +(which indicates the failure reason).   +

    NOTES

    +The maximum values returned by this function are the upper limits of supported +resources by the device. However, it may not be possible to use these maximum +values, since the actual number of any resource that can be created may be +limited by the machine configuration, the amount of host memory, user +permissions, and the amount of resources already in use by other +users/processes.

    SEE ALSO

    +ibv_open_device, +ibv_query_port, ibv_query_pkey, +ibv_query_gid +

     

    +


    +IBV_QUERY_GID

    +
    +

    NAME

    +ibv_query_gid - query an InfiniBand port's GID table

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_query_gid(struct ibv_context *context, uint8_t port_num,
    +                  int index, union ibv_gid *gid);
    +

    DESCRIPTION

    +ibv_query_gid() returns the GID value in entry index of port +port_num for device context context through the pointer gid.

    +RETURN VALUE

    +ibv_query_gid() returns 0 on success, and -1 on error.

    SEE ALSO

    +ibv_open_device, +ibv_query_device, +ibv_query_port, +ibv_query_pkey +

     

    +


    +IBV_QUERY_PKEY

    +
    +

    NAME

    +ibv_query_pkey - query an InfiniBand port's P_Key table +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_query_pkey(struct ibv_context *context, uint8_t port_num,
    +                   int index, uint16_t *pkey);
    +

    DESCRIPTION

    +ibv_query_pkey() returns the P_Key value (in network byte order) in entry +index of port port_num for device context context through +the pointer pkey.

    RETURN VALUE

    +ibv_query_pkey() returns 0 on success, and -1 on error. +

    SEE ALSO

    +ibv_open_device, +ibv_query_device, +ibv_query_port, +ibv_query_gid

     

    +


    +IBV_QUERY_PORT

    +
    +

    NAME

    +ibv_query_port - query an RDMA port's attributes +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_query_port(struct ibv_context *context, uint8_t port_num,
    +                   struct ibv_port_attr *port_attr); 
    +

    DESCRIPTION

    +ibv_query_port() returns the attributes of port port_num for +device context context through the pointer port_attr. The argument +port_attr is an ibv_port_attr struct, as defined in <infiniband/verbs.h>. +

    +
    struct ibv_port_attr {
    +enum ibv_port_state     state;          /* Logical port state */
    +enum ibv_mtu            max_mtu;        /* Max MTU supported by port */
    +enum ibv_mtu            active_mtu;     /* Actual MTU */
    +int                     gid_tbl_len;    /* Length of source GID table */
    +uint32_t                port_cap_flags; /* Port capabilities */
    +uint32_t                max_msg_sz;     /* Maximum message size */
    +uint32_t                bad_pkey_cntr;  /* Bad P_Key counter */
    +uint32_t                qkey_viol_cntr; /* Q_Key violation counter */
    +uint16_t                pkey_tbl_len;   /* Length of partition table */
    +uint16_t                lid;            /* Base port LID */
    +uint16_t                sm_lid;         /* SM LID */
    +uint8_t                 lmc;            /* LMC of LID */
    +uint8_t                 max_vl_num;     /* Maximum number of VLs */
    +uint8_t                 sm_sl;          /* SM service level */
    +uint8_t                 subnet_timeout; /* Subnet propagation delay */
    +uint8_t                 init_type_reply;/* Type of initialization performed by SM */
    +uint8_t                 active_width;   /* Currently active link width */
    +uint8_t                 active_speed;   /* Currently active link speed */
    +uint8_t                 phys_state;     /* Physical port state */
    +};
    +

    RETURN VALUE

    +ibv_query_port() returns 0 on success, or the value of errno on failure +(which indicates the failure reason). +

    SEE ALSO

    +ibv_create_qp, ibv_destroy_qp, +ibv_query_qp, +ibv_create_ah

     

    +

     

    +

    IBV_ALLOC_PD

    +

    IBV_DEALLOC_PD

    +
    +

    NAME

    +ibv_alloc_pd, ibv_dealloc_pd - allocate or deallocate a protection domain (PDs)

    +SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +struct ibv_pd *ibv_alloc_pd(struct ibv_context *context);
    +
    +int ibv_dealloc_pd(struct ibv_pd *pd);
    +

    DESCRIPTION

    +ibv_alloc_pd() allocates a PD for the RDMA device context context. + +

    ibv_dealloc_pd() deallocates the PD pd.

    +

    RETURN VALUE

    +ibv_alloc_pd() returns a pointer to the allocated PD, or NULL if the +request fails. +

    ibv_dealloc_pd() returns 0 on success, or the value of errno on +failure (which indicates the failure reason).  

    +

    NOTES

    +ibv_dealloc_pd() may fail if any other resource is still associated with +the PD being freed.   +

    SEE ALSO

    +ibv_reg_mr, ibv_create_srq, +ibv_create_qp, +ibv_create_ah, +ibv_create_ah_from_wc

     

    +

     

    +

    IBV_REG_MR

    +

    IBV_DEREG_MR

    +
    +

    NAME

    +ibv_reg_mr, ibv_dereg_mr - register or deregister a memory region (MR) +  +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr,
    +                          size_t length, int access);
    +
    +int ibv_dereg_mr(struct ibv_mr *mr);
    +

    DESCRIPTION

    +ibv_reg_mr() registers a memory region (MR) associated with the +protection domain pd. The MR's starting address is addr and its +size is length. The argument access describes the desired memory +protection attributes; it is either 0 or the bitwise OR of one or more of the +following flags: +

    +
    +
    IBV_ACCESS_LOCAL_WRITE Enable Local Write Access
    +
    +
    IBV_ACCESS_REMOTE_WRITE Enable Remote Write Access
    +
    +
    IBV_ACCESS_REMOTE_READ Enable Remote Read Access
    +
    +
    IBV_ACCESS_REMOTE_ATOMIC Enable Remote Atomic Operation Access + (if supported)
    +
    +
    IBV_ACCESS_MW_BIND Enable Memory Window Binding
    +
    +
    +

    If IBV_ACCESS_REMOTE_WRITE or IBV_ACCESS_REMOTE_ATOMIC is set, +then IBV_ACCESS_LOCAL_WRITE must be set too.

    +

    Local read access is always enabled for the MR.

    +

    ibv_dereg_mr() deregisters the MR mr.

    +

    RETURN VALUE

    +ibv_reg_mr() returns a pointer to the registered MR, or NULL if the +request fails. The local key (L_Key) field lkey is used as the +lkey field of struct ibv_sge when posting buffers with ibv_post_* verbs, and the +the remote key (R_Key) field rkey is used by remote processes to +perform Atomic and RDMA operations. The remote process places this rkey +as the rkey field of struct ibv_send_wr passed to the ibv_post_send function. +

    ibv_dereg_mr() returns 0 on success, or the value of errno on failure +(which indicates the failure reason).

    +

    NOTES

    +ibv_dereg_mr() fails if any memory window is still bound to this MR.

    +SEE ALSO

    +ibv_alloc_pd, ibv_post_send, +ibv_post_recv, +ibv_post_srq_recv +

     

    +


    +IBV_CREATE_AH

    +


    +IBV_DESTROY_AH

    +
    +

    NAME

    +ibv_create_ah, ibv_destroy_ah - create or destroy an address handle (AH)

    +SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +struct ibv_ah *ibv_create_ah(struct ibv_pd *pd,
    +                             struct ibv_ah_attr *attr);
    +
    +int ibv_destroy_ah(struct ibv_ah *ah); 
    +

    DESCRIPTION

    +ibv_create_ah() creates an address handle (AH) associated with the +protection domain pd. The argument attr is an ibv_ah_attr struct, +as defined in <infiniband/verbs.h>. +

    +
    struct ibv_ah_attr {
    +struct ibv_global_route grh;            /* Global Routing Header (GRH) attributes */
    +uint16_t                dlid;           /* Destination LID */
    +uint8_t                 sl;             /* Service Level */
    +uint8_t                 src_path_bits;  /* Source path bits */
    +uint8_t                 static_rate;    /* Maximum static rate */
    +uint8_t                 is_global;      /* GRH attributes are valid */
    +uint8_t                 port_num;       /* Physical port number */
    +};
    +
    +struct ibv_global_route {
    +union ibv_gid           dgid;           /* Destination GID or MGID */
    +uint32_t                flow_label;     /* Flow label */
    +uint8_t                 sgid_index;     /* Source GID index */
    +uint8_t                 hop_limit;      /* Hop limit */
    +uint8_t                 traffic_class;  /* Traffic class */
    +};
    +
    +

    +

    ibv_destroy_ah() destroys the AH ah.

    +

    RETURN VALUE

    +ibv_create_ah() returns a pointer to the created AH, or NULL if the +request fails. +

    ibv_destroy_ah() returns 0 on success, or the value of errno on +failure (which indicates the failure reason).

    +

    SEE ALSO

    +ibv_alloc_pd, +ibv_init_ah_from_wc, +ibv_create_ah_from_wc +

     

    +


    +IBV_CREATE_AH_FROM_WC

    +


    +IBV_INIT_AH_FROM_WC

    +
    +

    NAME

    +ibv_init_ah_from_wc, ibv_create_ah_from_wc - initialize or create an address +handle (AH) from a work completion   +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_init_ah_from_wc(struct ibv_context *context, uint8_t port_num,
    +                        struct ibv_wc *wc, struct ibv_grh *grh,
    +                        struct ibv_ah_attr *ah_attr);
    +
    +struct ibv_ah *ibv_create_ah_from_wc(struct ibv_pd *pd,
    +                                     struct ibv_wc *wc,
    +                                     struct ibv_grh *grh,
    +                                     uint8_t port_num);
    +
    +

    DESCRIPTION

    +ibv_init_ah_from_wc() initializes the address handle (AH) attribute +structure ah_attr for the RDMA device context context using the +port number port_num, using attributes from the work completion wc +and the Global Routing Header (GRH) structure grh. + +

    ibv_create_ah_from_wc() creates an AH associated with the protection +domain pd using the port number port_num, using attributes from +the work completion wc and the Global Routing Header (GRH) structure +grh.

    +

    RETURN VALUE

    +ibv_init_ah_from_wc() returns 0 on success, and -1 on error. +

    ibv_create_ah_from_wc() returns a pointer to the created AH, or NULL +if the request fails.  

    +

    NOTES

    +The filled structure ah_attr returned from ibv_init_ah_from_wc() +can be used to create a new AH using ibv_create_ah(). +

    SEE ALSO

    +ibv_open_device, +ibv_alloc_pd, ibv_create_ah, +ibv_destroy_ah, ibv_poll_cq

     

    +

    IBV_CREATE_COMP_CHANNEL

    +

    IBV_DESTROY_COMP_CHANNEL

    +
    +

    NAME

    +ibv_create_comp_channel, ibv_destroy_comp_channel - create or destroy a +completion event channel

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +struct ibv_comp_channel *ibv_create_comp_channel(struct ibv_context
    +                                                 *context);
    +
    +int ibv_destroy_comp_channel(struct ibv_comp_channel *channel);
    +

    DESCRIPTION

    +ibv_create_comp_channel() creates a completion event channel for the RDMA +device context context. + +

    ibv_destroy_comp_channel() destroys the completion event channel +channel.

    +

    RETURN VALUE

    +ibv_create_comp_channel() returns a pointer to the created completion +event channel, or NULL if the request fails. +

    ibv_destroy_comp_channel() returns 0 on success, or the value of errno +on failure (which indicates the failure reason).

    +

    NOTES

    +A "completion channel" is an abstraction introduced by libibverbs that does not +exist in the InfiniBand Architecture verbs specification or RDMA Protocol Verbs +Specification. A completion channel is essentially file descriptor that is used +to deliver completion notifications to a userspace process. When a completion +event is generated for a completion queue (CQ), the event is delivered via the +completion channel attached to that CQ. This may be useful to steer completion +events to different threads by using multiple completion channels. +

    ibv_destroy_comp_channel() fails if any CQs are still associated with +the completion event channel being destroyed.

    +

    SEE ALSO

    +ibv_open_device, +ibv_create_cq, ibv_get_cq_event

     

    +

    IBV_CREATE_CQ

    +

    IBV_DESTROY_CQ

    +
    +

    NAME

    +ibv_create_cq, ibv_destroy_cq - create or destroy a completion queue (CQ) +  +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +struct ibv_cq *ibv_create_cq(struct ibv_context *context, int cqe,
    +                             void *cq_context,
    +                             struct ibv_comp_channel *channel,
    +                             int comp_vector);
    +
    +int ibv_destroy_cq(struct ibv_cq *cq);
    +

    DESCRIPTION

    +ibv_create_cq() creates a completion queue (CQ) with at least cqe +entries for the RDMA device context context. The pointer cq_context +will be used to set user context pointer of the CQ structure. The argument +channel is optional; if not NULL, the completion channel channel will +be used to return completion events. The CQ will use the completion vector +comp_vector for signaling completion events; it must be at least zero and +less than context->num_comp_vectors. + +

    ibv_destroy_cq() destroys the CQ cq.

    +

    RETURN VALUE

    +ibv_create_cq() returns a pointer to the CQ, or NULL if the request +fails. +

    ibv_destroy_cq() returns 0 on success, or the value of errno on +failure (which indicates the failure reason).

    +

    NOTES

    +ibv_create_cq() may create a CQ with size greater than or equal to the +requested size. Check the cqe attribute in the returned CQ for the actual size. +

    ibv_destroy_cq() fails if any queue pair is still associated with this +CQ.

    +

    SEE ALSO

    +ibv_resize_cq, +ibv_req_notify_cq, +ibv_ack_cq_events, +ibv_create_qp

     

    +

    IBV_POLL_CQ

    +
    +

    NAME

    +ibv_poll_cq - poll a completion queue (CQ)   +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_poll_cq(struct ibv_cq *cq, int num_entries,
    +                struct ibv_wc *wc);
    +

    DESCRIPTION

    +ibv_poll_cq() polls the CQ cq for work completions and returns the +first num_entries (or all available completions if the CQ contains fewer +than this number) in the array wc. The argument wc is a pointer to +an array of ibv_wc structs, as defined in <infiniband/verbs.h>. +

    +
    struct ibv_wc {
    +uint64_t                wr_id;          /* ID of the completed Work Request (WR) */
    +enum ibv_wc_status      status;         /* Status of the operation */
    +enum ibv_wc_opcode      opcode;         /* Operation type specified in the completed WR */
    +uint32_t                vendor_err;     /* Vendor error syndrome */
    +uint32_t                byte_len;       /* Number of bytes transferred */
    +uint32_t                imm_data;       /* Immediate data (in network byte order) */
    +uint32_t                qp_num;         /* Local QP number of completed WR */
    +uint32_t                src_qp;         /* Source QP number (remote QP number) of completed WR (valid only for UD QPs) */
    +int                     wc_flags;       /* Flags of the completed WR */
    +uint16_t                pkey_index;     /* P_Key index (valid only for GSI QPs) */
    +uint16_t                slid;           /* Source LID */
    +uint8_t                 sl;             /* Service Level */
    +uint8_t                 dlid_path_bits; /* DLID path bits (not applicable for multicast messages) */
    +};
    +
    +
    +

    The attribute wc_flags describes the properties of the work completion. It is +either 0 or the bitwise OR of one or more of the following flags:

    +

    +
    +
    IBV_WC_GRH GRH is present (valid only for UD QPs)
    +
    +
    IBV_WC_WITH_IMM Immediate data value is valid
    +
    +
    +

    Not all wc attributes are always valid. If the completion status is +other than IBV_WC_SUCCESS, only the following attributes are valid: wr_id, +status, qp_num, and vendor_err.

    +

    RETURN VALUE

    +On success, ibv_poll_cq() returns a non-negative value equal to the +number of completions found. On failure, a negative value is returned.

    NOTES

    +

    Each polled completion is removed from the CQ and cannot be returned to it. +

    +

    The user should consume work completions at a rate that prevents CQ overrun +from occurrence. In case of a CQ overrun, the async event IBV_EVENT_CQ_ERR +will be triggered, and the CQ cannot be used.

    +

    SEE ALSO

    +ibv_post_send, ibv_post_recv

     

    +

    IBV_RESIZE_CQ

    +
    +

    NAME

    +ibv_resize_cq - resize a completion queue (CQ)

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_resize_cq(struct ibv_cq *cq, int cqe);
    +

    DESCRIPTION

    +ibv_resize_cq() resizes the completion queue (CQ) cq to have at +least cqe entries. cqe must be at least the number of unpolled +entries in the CQ cq. If cqe is a valid value less than the +current CQ size, ibv_resize_cq() may not do anything, since this function +is only guaranteed to resize the CQ to a size at least as big as the requested +size.

    RETURN VALUE

    +ibv_resize_cq() returns 0 on success, or the value of errno on failure +(which indicates the failure reason).

    NOTES

    +ibv_resize_cq() may assign a CQ size greater than or equal to the +requested size. The cqe member of cq will be updated to the actual size.

    +SEE ALSO

    + +ibv_create_cq ibv_destroy_cq +

     

    +


    +IBV_GET_CQ_EVENT

    +


    +IBV_ACK_CQ_EVENTS

    +
    +

    NAME

    +ibv_get_cq_event, ibv_ack_cq_events - get and acknowledge completion queue (CQ) +events +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_get_cq_event(struct ibv_comp_channel *channel,
    +                     struct ibv_cq **cq, void **cq_context);
    +
    +void ibv_ack_cq_events(struct ibv_cq *cq, unsigned int nevents);
    +

    DESCRIPTION

    +ibv_get_cq_event() waits for the next completion event in the completion +event channel channel. Fills the arguments cq with the CQ that got +the event and cq_context with the CQ's context. +

    ibv_ack_cq_events() acknowledges nevents events on the CQ cq.

    +

    RETURN VALUE

    +ibv_get_cq_event() returns 0 on success, and -1 on error. +

    ibv_ack_cq_events() returns no value.  

    +

    NOTES

    +All completion events that ibv_get_cq_event() returns must be +acknowledged using ibv_ack_cq_events(). To avoid races, destroying a CQ +will wait for all completion events to be acknowledged; this guarantees a +one-to-one correspondence between acks and successful gets. +

    Calling ibv_ack_cq_events() may be relatively expensive in the +datapath, since it must take a mutex. Therefore it may be better to amortize +this cost by keeping a count of the number of events needing acknowledgement and +acking several completion events in one call to ibv_ack_cq_events().

    +

    EXAMPLES

    +The following code example demonstrates one possible way to work with completion +events. It performs the following steps: +

    Stage I: Preparation
    +1. Creates a CQ
    +2. Requests for notification upon a new (first) completion event

    +

    Stage II: Completion Handling Routine
    +3. Wait for the completion event and ack it
    +4. Request for notification upon the next completion event
    +5. Empty the CQ

    +

    Note that an extra event may be triggered without having a corresponding +completion entry in the CQ. This occurs if a completion entry is added to the CQ +between Step 4 and Step 5, and the CQ is then emptied (polled) in Step 5.

    +

    +
    cq = ibv_create_cq(ctx, 1, ev_ctx, channel, 0);
    +if (!cq) {
    +        fprintf(stderr, "Failed to create CQ\n");
    +        return 1;
    +}
    +
    +/* Request notification before any completion can be created */
    +if (ibv_req_notify_cq(cq, 0)) {
    +        fprintf(stderr, "Couldn't request CQ notification\n");
    +        return 1;
    +}
    +
    +.
    +.
    +.
    +
    +/* Wait for the completion event */
    +if (ibv_get_cq_event(channel, &ev_cq, &ev_ctx)) {
    +        fprintf(stderr, "Failed to get cq_event\n");
    +        return 1;
    +}
    +
    +/* Ack the event */
    +ibv_ack_cq_events(ev_cq, 1);
    +
    +/* Request notification upon the next completion event */
    +if (ibv_req_notify_cq(ev_cq, 0)) {
    +        fprintf(stderr, "Couldn't request CQ notification\n");
    +        return 1;
    +}
    +
    +/* Empty the CQ: poll all of the completions from the CQ (if any exist) */
    +do {
    +        ne = ibv_poll_cq(cq, 1, &wc);
    +        if (ne < 0) {
    +                fprintf(stderr, "Failed to poll completions from the CQ\n");
    +                return 1;
    +        }
    +
    +        /* there may be an extra event with no completion in the CQ */
    +        if (ne == 0)
    +                continue;
    +
    +        if (wc.status != IBV_WC_SUCCESS) {
    +                fprintf(stderr, "Completion with status 0x%x was found\n", wc.status);
    +                return 1;
    +        }
    +} while (ne);
    +
    +

    The following code example demonstrates one possible way to work with +completion events in non-blocking mode. It performs the following steps:

    +

    1. Set the completion event channel to be non-blocked
    +2. Poll the channel until there it has a completion event
    +3. Get the completion event and ack it

    +

    +
    /* change the blocking mode of the completion channel */
    +flags = fcntl(channel->fd, F_GETFL);
    +rc = fcntl(channel->fd, F_SETFL, flags | O_NONBLOCK);
    +if (rc < 0) {
    +        fprintf(stderr, "Failed to change file descriptor of completion event channel\n");
    +        return 1;
    +}
    +
    +
    +/*
    + * poll the channel until it has an event and sleep ms_timeout
    + * milliseconds between any iteration
    + */
    +my_pollfd.fd      = channel->fd;
    +my_pollfd.events  = POLLIN;
    +my_pollfd.revents = 0;
    +
    +do {
    +        rc = poll(&my_pollfd, 1, ms_timeout);
    +} while (rc == 0);
    +if (rc < 0) {
    +        fprintf(stderr, "poll failed\n");
    +        return 1;
    +}
    +ev_cq = cq;
    +
    +/* Wait for the completion event */
    +if (ibv_get_cq_event(channel, &ev_cq, &ev_ctx)) {
    +        fprintf(stderr, "Failed to get cq_event\n");
    +        return 1;
    +}
    +
    +/* Ack the event */
    +ibv_ack_cq_events(ev_cq, 1);
    +

    SEE ALSO

    +ibv_create_comp_channel, +ibv_create_cq, ibv_req_notify_cq, +ibv_poll_cq

     

    +


    +IBV_REQ_NOTIFY_CQ

    +
    +

    NAME

    +ibv_req_notify_cq - request completion notification on a completion queue (CQ)

    +SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_req_notify_cq(struct ibv_cq *cq, int solicited_only);
    +

    DESCRIPTION

    +ibv_req_notify_cq() requests a completion notification on the completion +queue (CQ) cq. + +

    Upon the addition of a new CQ entry (CQE) to cq, a completion event +will be added to the completion channel associated with the CQ. If the argument +solicited_only is zero, a completion event is generated for any new CQE. +If solicited_only is non-zero, an event is only generated for a new CQE +with that is considered "solicited." A CQE is solicited if it is a receive +completion for a message with the Solicited Event header bit set, or if the +status is not successful. All other successful receive completions, or any +successful send completion is unsolicited.

    +

    RETURN VALUE

    +ibv_req_notify_cq() returns 0 on success, or the value of errno on +failure (which indicates the failure reason).

    NOTES

    +The request for notification is "one shot." Only one completion event will be +generated for each call to ibv_req_notify_cq().   +

    SEE ALSO

    +ibv_create_comp_channel, +ibv_create_cq, ibv_get_cq_event

     

    +

     

    +


    +IBV_CREATE_SRQ

    +


    +IBV_CREATE_XRC_SRQ

    +


    +IBV_DESTROY_SRQ

    +
    +

    NAME

    +ibv_create_srq, ibv_destroy_srq - create or destroy a shared receive queue (SRQ) +  +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +struct ibv_srq *ibv_create_srq(struct ibv_pd *pd, struct 
    +                               ibv_srq_init_attr *srq_init_attr);
    +
    +struct ibv_srq *ibv_create_xrc_srq(struct ibv_pd *pd,
    +                                   struct ibv_xrc_domain *xrc_domain,
    +                                   struct ibv_cq *xrc_cq,
    +                                   struct ibv_srq_init_attr *srq_init_attr);
    +
    +int ibv_destroy_srq(struct ibv_srq *srq);
    +

    DESCRIPTION

    +ibv_create_srq() creates a shared receive queue (SRQ) associated with the +protection domain pd. + +

    ibv_create_xrc_srq() creates an XRC shared receive queue (SRQ) +associated with the protection domain pd, the XRC domain xrc_domain +and the CQ which will hold the XRC completion xrc_cq.

    +

    The argument srq_init_attr is an ibv_srq_init_attr struct, as defined +in <infiniband/verbs.h>.

    +

    +
    struct ibv_srq_init_attr {
    +void                   *srq_context;    /* Associated context of the SRQ */
    +struct ibv_srq_attr     attr;           /* SRQ attributes */
    +};
    +
    +struct ibv_srq_attr {
    +uint32_t                max_wr;         /* Requested max number of outstanding work requests (WRs) in the SRQ */
    +uint32_t                max_sge;        /* Requested max number of scatter elements per WR */
    +uint32_t                srq_limit;      /* The limit value of the SRQ (irrelevant for ibv_create_srq) */
    +};
    +
    +

    The function ibv_create_srq() will update the srq_init_attr +struct with the original values of the SRQ that was created; the values of +max_wr and max_sge will be greater than or equal to the values requested.

    +

    ibv_destroy_srq() destroys the SRQ srq.

    +

    RETURN VALUE

    +ibv_create_srq() returns a pointer to the created SRQ, or NULL if the +request fails. +

    ibv_destroy_srq() returns 0 on success, or the value of errno on +failure (which indicates the failure reason).

    +

    NOTES

    +ibv_destroy_srq() fails if any queue pair is still associated with this +SRQ.

    SEE ALSO

    +ibv_alloc_pd, ibv_modify_srq, +ibv_query_srq

     

    +


    +IBV_MODIFY_SRQ

    +
    +

    NAME

    +ibv_modify_srq - modify attributes of a shared receive queue (SRQ)

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_modify_srq(struct ibv_srq *srq,
    +                   struct ibv_srq_attr *srq_attr,
    +                   int srq_attr_mask);
    +

    DESCRIPTION

    +ibv_modify_srq() modifies the attributes of SRQ srq with the +attributes in srq_attr according to the mask srq_attr_mask. The +argument srq_attr is an ibv_srq_attr struct, as defined in <infiniband/verbs.h>. +

    +
    struct ibv_srq_attr {
    +uint32_t                max_wr;      /* maximum number of outstanding work requests (WRs) in the SRQ */
    +uint32_t                max_sge;     /* number of scatter elements per WR (irrelevant for ibv_modify_srq) */
    +uint32_t                srq_limit;   /* the limit value of the SRQ */
    +};
    +
    +

    The argument srq_attr_mask specifies the SRQ attributes to be +modified. The argument is either 0 or the bitwise OR of one or more of the +following flags:

    +

    +
    +
    IBV_SRQ_MAX_WR Resize the SRQ
    +
    +
    IBV_SRQ_LIMIT Set the SRQ limit
    +
    +
    +

    RETURN VALUE

    +ibv_modify_srq() returns 0 on success, or the value of errno on failure +(which indicates the failure reason).

    NOTES

    +If any of the modify attributes is invalid, none of the attributes will be +modified. +

    Not all devices support resizing SRQs. To check if a device supports it, +check if the IBV_DEVICE_SRQ_RESIZE bit is set in the device capabilities +flags.

    +

    Modifying the srq_limit arms the SRQ to produce an +IBV_EVENT_SRQ_LIMIT_REACHED "low watermark" asynchronous event once the +number of WRs in the SRQ drops below srq_limit.

    +

    SEE ALSO

    +ibv_query_device, +ibv_create_srq, ibv_destroy_srq, +ibv_query_srq

     

    +


    +IBV_QUERY_SRQ

    +
    +

    NAME

    +ibv_query_srq - get the attributes of a shared receive queue (SRQ)

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_query_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr);
    +

    DESCRIPTION

    +ibv_query_srq() gets the attributes of the SRQ srq and returns +them through the pointer srq_attr. The argument srq_attr is an +ibv_srq_attr struct, as defined in <infiniband/verbs.h>. +

    +
    struct ibv_srq_attr {
    +uint32_t                max_wr;         /* maximum number of outstanding work requests (WRs) in the SRQ */
    +uint32_t                max_sge;        /* maximum number of scatter elements per WR */
    +uint32_t                srq_limit;      /* the limit value of the SRQ */
    +}; 
    +

    RETURN VALUE

    +ibv_query_srq() returns 0 on success, or the value of errno on failure +(which indicates the failure reason).

    NOTES

    +If the value returned for srq_limit is 0, then the SRQ limit reached ("low +watermark") event is not (or no longer) armed, and no asynchronous events will +be generated until the event is rearmed.   +

    SEE ALSO

    +ibv_create_srq, +ibv_destroy_srq, +ibv_modify_srq

     

     

    +

    IBV_CREATE_XRC_RCV_QP

    +
    +

    NAME

    +ibv_create_xrc_rcv_qp - create an XRC queue pair (QP) for serving as a +receive-side only QP

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_create_xrc_rcv_qp(struct ibv_qp_init_attr *init_attr,
    +                          uint32_t *xrc_rcv_qpn); 
    +

    DESCRIPTION

    +ibv_create_xrc_rcv_qp() creates an XRC queue pair (QP) for serving as a +receive-side only QP and returns its number through the pointer xrc_rcv_qpn. +This QP number should be passed to the remote node (sender). The remote node +will use xrc_rcv_qpn in ibv_post_send() when sending to an XRC SRQ +on this host in the same xrc domain as the XRC receive QP. This QP is created in +kernel space, and persists until the last process registered for the QP calls +ibv_unreg_xrc_rcv_qp() (at which time the QP is destroyed). +

    The process which creates this QP is automatically registered for it, and +should also call ibv_unreg_xrc_rcv_qp() at some point, to unregister.

    +

    Processes which wish to receive on an XRC SRQ via this QP should call +ibv_reg_xrc_rcv_qp() for this QP, to guarantee that the QP will not be +destroyed while they are still using it for receiving on the XRC SRQ.

    +

    The argument qp_init_attr is an ibv_qp_init_attr struct, as defined in +<infiniband/verbs.h>.

    +

    +
    struct ibv_qp_init_attr {
    +void                   *qp_context;     /* value is being ignored */
    +struct ibv_cq          *send_cq;        /* value is being ignored */ 
    +struct ibv_cq          *recv_cq;        /* value is being ignored */
    +struct ibv_srq         *srq;            /* value is being ignored */
    +struct ibv_qp_cap       cap;            /* value is being ignored */
    +enum ibv_qp_type        qp_type;        /* value is being ignored */
    +int                     sq_sig_all;     /* value is being ignored */
    +struct ibv_xrc_domain  *xrc_domain;     /* XRC domain the QP will be associated with */
    +};
    +
    +

    Most of the attributes in qp_init_attr are being ignored because this +QP is a receive only QP and all RR are being posted to an SRQ.

    +

    RETURN VALUE

    +ibv_create_xrc_rcv_qp() returns 0 on success, or the value of errno on +failure (which indicates the failure reason).

    SEE ALSO

    +ibv_open_xrc_domain, +ibv_modify_xrc_rcv_qp, +ibv_query_xrc_rcv_qp, +ibv_reg_xrc_rcv_qp, +ibv_unreg_xrc_rcv_qp, +ibv_post_send

     

    +

    IBV_MODIFY_XRC_RCV_QP

    +
    +

    NAME

    +ibv_modify_xrc_rcv_qp - modify the attributes of an XRC receive queue pair (QP)

    +SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_modify_xrc_rcv_qp(struct ibv_xrc_domain *xrc_domain, uint32_t xrc_qp_num,
    +                          struct ibv_qp_attr *attr, int attr_mask);
    +

    DESCRIPTION

    +ibv_modify_qp() modifies the attributes of an XRC receive QP with the +number xrc_qp_num which is associated with the XRC domain xrc_domain +with the attributes in attr according to the mask attr_mask and +move the QP state through the following transitions: Reset -> Init -> RTR. +attr_mask should indicate all of the attributes which will be used in this +QP transition and the following masks (at least) should be set: +

    +
    Next state     Required attributes
    +----------     ----------------------------------------
    +Init           IBV_QP_STATE, IBV_QP_PKEY_INDEX, IBV_QP_PORT, 
    +               IBV_QP_ACCESS_FLAGS 
    +RTR            IBV_QP_STATE, IBV_QP_AV, IBV_QP_PATH_MTU, 
    +               IBV_QP_DEST_QPN, IBV_QP_RQ_PSN, 
    +               IBV_QP_MAX_DEST_RD_ATOMIC, IBV_QP_MIN_RNR_TIMER 
    +
    +

    The user can add optional attributes as well.

    +

    The argument attr is an ibv_qp_attr struct, as defined in +<infiniband/verbs.h>.

    +

    +
    struct ibv_qp_attr {
    +enum ibv_qp_state       qp_state;               /* Move the QP to this state */
    +enum ibv_qp_state       cur_qp_state;           /* Assume this is the current QP state */
    +enum ibv_mtu            path_mtu;               /* Path MTU (valid only for RC/UC QPs) */
    +enum ibv_mig_state      path_mig_state;         /* Path migration state (valid if HCA supports APM) */
    +uint32_t                qkey;                   /* Q_Key for the QP (valid only for UD QPs) */
    +uint32_t                rq_psn;                 /* PSN for receive queue (valid only for RC/UC QPs) */
    +uint32_t                sq_psn;                 /* PSN for send queue (valid only for RC/UC QPs) */
    +uint32_t                dest_qp_num;            /* Destination QP number (valid only for RC/UC QPs) */
    +int                     qp_access_flags;        /* Mask of enabled remote access operations (valid only for RC/UC QPs) */
    +struct ibv_qp_cap       cap;                    /* QP capabilities (valid if HCA supports QP resizing) */
    +struct ibv_ah_attr      ah_attr;                /* Primary path address vector (valid only for RC/UC QPs) */
    +struct ibv_ah_attr      alt_ah_attr;            /* Alternate path address vector (valid only for RC/UC QPs) */
    +uint16_t                pkey_index;             /* Primary P_Key index */
    +uint16_t                alt_pkey_index;         /* Alternate P_Key index */
    +uint8_t                 en_sqd_async_notify;    /* Enable SQD.drained async notification (Valid only if qp_state is SQD) */
    +uint8_t                 sq_draining;            /* Is the QP draining? Irrelevant for ibv_modify_qp() */
    +uint8_t                 max_rd_atomic;          /* Number of outstanding RDMA reads & atomic operations on the destination QP (valid only for RC QPs) */
    +uint8_t                 max_dest_rd_atomic;     /* Number of responder resources for handling incoming RDMA reads & atomic operations (valid only for RC QPs) */
    +uint8_t                 min_rnr_timer;          /* Minimum RNR NAK timer (valid only for RC QPs) */
    +uint8_t                 port_num;               /* Primary port number */
    +uint8_t                 timeout;                /* Local ack timeout for primary path (valid only for RC QPs) */
    +uint8_t                 retry_cnt;              /* Retry count (valid only for RC QPs) */
    +uint8_t                 rnr_retry;              /* RNR retry (valid only for RC QPs) */
    +uint8_t                 alt_port_num;           /* Alternate port number */
    +uint8_t                 alt_timeout;            /* Local ack timeout for alternate path (valid only for RC QPs) */
    +};
    +
    +

    For details on struct ibv_qp_cap see the description of ibv_create_qp(). +For details on struct ibv_ah_attr see the description of ibv_create_ah(). +

    +

    The argument attr_mask specifies the QP attributes to be modified. The +argument is either 0 or the bitwise OR of one or more of the following flags: +

    +

    +
    +
    IBV_QP_STATE Modify qp_state
    +
    +
    IBV_QP_CUR_STATE Set cur_qp_state
    +
    +
    IBV_QP_EN_SQD_ASYNC_NOTIFY Set en_sqd_async_notify
    +
    +
    IBV_QP_ACCESS_FLAGS Set qp_access_flags
    +
    +
    IBV_QP_PKEY_INDEX Set pkey_index
    +
    +
    IBV_QP_PORT Set port_num
    +
    +
    IBV_QP_QKEY Set qkey
    +
    +
    IBV_QP_AV Set ah_attr
    +
    +
    IBV_QP_PATH_MTU Set path_mtu
    +
    +
    IBV_QP_TIMEOUT Set timeout
    +
    +
    IBV_QP_RETRY_CNT Set retry_cnt
    +
    +
    IBV_QP_RNR_RETRY Set rnr_retry
    +
    +
    IBV_QP_RQ_PSN Set rq_psn
    +
    +
    IBV_QP_MAX_QP_RD_ATOMIC Set max_rd_atomic
    +
    +
    IBV_QP_ALT_PATH Set the alternative path via: alt_ah_attr, + alt_pkey_index, alt_port_num, alt_timeout
    +
    +
    IBV_QP_MIN_RNR_TIMER Set min_rnr_timer
    +
    +
    IBV_QP_SQ_PSN Set sq_psn
    +
    +
    IBV_QP_MAX_DEST_RD_ATOMIC Set max_dest_rd_atomic
    +
    +
    IBV_QP_PATH_MIG_STATE Set path_mig_state
    +
    +
    IBV_QP_CAP Set cap
    +
    +
    IBV_QP_DEST_QPN Set dest_qp_num
    +
    +
    +

    RETURN VALUE

    +ibv_modify_xrc_rcv_qp() returns 0 on success, or the value of errno on +failure (which indicates the failure reason). +

    NOTES

    +If any of the modify attributes or the modify mask are invalid, none of the +attributes will be modified (including the QP state). +

    Not all devices support alternate paths. To check if a device supports it, +check if the IBV_DEVICE_AUTO_PATH_MIG bit is set in the device +capabilities flags.

    +

    SEE ALSO

    +ibv_open_xrc_domain, +ibv_create_xrc_rcv_qp, +ibv_query_xrc_rcv_qp +

     

    +


    +IBV_OPEN_XRC_DOMAIN

    +


    +IBV_CLOSE_XRC_DOMAIN

    +
    +

    NAME

    +ibv_open_xrc_domain, ibv_close_xrc_domain - open or close an eXtended Reliable +Connection (XRC) domain +

    SYNOPSIS

    +
    #include <fcntl.h>
    +#include <infiniband/verbs.h>
    +
    +struct ibv_xrc_domain *ibv_open_xrc_domain(struct ibv_context *context,
    +                                           int fd, int oflag);
    +int ibv_close_xrc_domain(struct ibv_xrc_domain *d);
    +

    DESCRIPTION

    +ibv_open_xrc_domain() open an XRC domain for the InfiniBand device +context context or return a reference to an opened one. fd is the +file descriptor to be associated with the XRC domain. The argument oflag +describes the desired file creation attributes; it is either 0 or the bitwise OR +of one or more of the following flags: +

    +
    +
    O_CREAT
    +
    If a domain belonging to device named by context is already associated + with the inode, this flag has no effect, except as noted under O_EXCL + below. Otherwise, a new XRC domain is created and is associated with inode + specified by fd. + +
    +
    O_EXCL
    +
    If O_EXCL and O_CREAT are set, open will fail if a domain + associated with the inode exists. The check for the existence of the domain + and creation of the domain if it does not exist is atomic with respect to + other processes executing open with fd naming the same inode. +
    +
    +

    If fd equals -1, no inode is is associated with the domain, and the +only valid value for oflag is O_CREAT.

    +

    ibv_close_xrc_domain() closes the XRC domain d. If this is the +last reference, the XRC domain will be destroyed.

    +

    RETURN VALUE

    +ibv_open_xrc_domain() returns a pointer to an opened XRC, or NULL if the +request fails. +

    ibv_close_xrc_domain() returns 0 on success, or the value of errno on +failure (which indicates the failure reason).

    +

    NOTES

    +Not all devices support XRC. To check if a device supports it, check if the +IBV_DEVICE_XRC bit is set in the device capabilities flags. +

    ibv_close_xrc_domain() may fail if any QP or SRQ are still associated +with the XRC domain being closed.

    +

    SEE ALSO

    +ibv_create_xrc_srq, +ibv_create_qp, +ibv_create_xrc_rcv_qp, +ibv_modify_xrc_rcv_qp, +ibv_query_xrc_rcv_qp, +ibv_reg_xrc_rcv_qp

     

    +


    +IBV_QUERY_XRC_RCV_QP

    +
    +

    NAME

    +ibv_query_xrc_rcv_qp - get the attributes of an XRC receive queue pair (QP) +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_query_xrc_rcv_qp(struct ibv_xrc_domain *xrc_domain, uint32_t xrc_qp_num,
    +                         struct ibv_qp_attr *attr, int attr_mask,
    +                         struct ibv_qp_init_attr *init_attr);
    +

    DESCRIPTION

    +ibv_query_xrc_rcv_qp() gets the attributes specified in attr_mask +for the XRC receive QP with the number xrc_qp_num which is associated +with the XRC domain xrc_domain and returns them through the pointers +attr and init_attr. The argument attr is an ibv_qp_attr +struct, as defined in <infiniband/verbs.h>. +

    +
    struct ibv_qp_attr {
    +enum ibv_qp_state       qp_state;            /* Current QP state */
    +enum ibv_qp_state       cur_qp_state;        /* Current QP state - irrelevant for ibv_query_qp */
    +enum ibv_mtu            path_mtu;            /* Path MTU (valid only for RC/UC QPs) */
    +enum ibv_mig_state      path_mig_state;      /* Path migration state (valid if HCA supports APM) */
    +uint32_t                qkey;                /* Q_Key of the QP (valid only for UD QPs) */
    +uint32_t                rq_psn;              /* PSN for receive queue (valid only for RC/UC QPs) */
    +uint32_t                sq_psn;              /* PSN for send queue (valid only for RC/UC QPs) */
    +uint32_t                dest_qp_num;         /* Destination QP number (valid only for RC/UC QPs) */
    +int                     qp_access_flags;     /* Mask of enabled remote access operations (valid only for RC/UC QPs) */
    +struct ibv_qp_cap       cap;                 /* QP capabilities */
    +struct ibv_ah_attr      ah_attr;             /* Primary path address vector (valid only for RC/UC QPs) */
    +struct ibv_ah_attr      alt_ah_attr;         /* Alternate path address vector (valid only for RC/UC QPs) */
    +uint16_t                pkey_index;          /* Primary P_Key index */
    +uint16_t                alt_pkey_index;      /* Alternate P_Key index */
    +uint8_t                 en_sqd_async_notify; /* Enable SQD.drained async notification - irrelevant for ibv_query_qp */
    +uint8_t                 sq_draining;         /* Is the QP draining? (Valid only if qp_state is SQD) */
    +uint8_t                 max_rd_atomic;       /* Number of outstanding RDMA reads & atomic operations on the destination QP (valid only for RC QPs) */
    +uint8_t                 max_dest_rd_atomic;  /* Number of responder resources for handling incoming RDMA reads & atomic operations (valid only for RC QPs) */
    +uint8_t                 min_rnr_timer;       /* Minimum RNR NAK timer (valid only for RC QPs) */
    +uint8_t                 port_num;            /* Primary port number */
    +uint8_t                 timeout;             /* Local ack timeout for primary path (valid only for RC QPs) */
    +uint8_t                 retry_cnt;           /* Retry count (valid only for RC QPs) */
    +uint8_t                 rnr_retry;           /* RNR retry (valid only for RC QPs) */
    +uint8_t                 alt_port_num;        /* Alternate port number */
    +uint8_t                 alt_timeout;         /* Local ack timeout for alternate path (valid only for RC QPs) */
    +};
    +
    +

    For details on struct ibv_qp_cap see the description of ibv_create_qp(). +For details on struct ibv_ah_attr see the description of ibv_create_ah().

    +

    RETURN VALUE

    +ibv_query_xrc_rcv_qp() returns 0 on success, or the value of errno on +failure (which indicates the failure reason).

    NOTES

    +The argument attr_mask is a hint that specifies the minimum list of +attributes to retrieve. Some InfiniBand devices may return extra attributes not +requested, for example if the value can be returned cheaply. +

    Attribute values are valid if they have been set using +ibv_modify_xrc_rcv_qp(). The exact list of valid attributes depends on the +QP state.

    +

    Multiple calls to ibv_query_xrc_rcv_qp() may yield some differences in +the values returned for the following attributes: qp_state, path_mig_state, +sq_draining, ah_attr (if APM is enabled).

    +

    SEE ALSO

    +ibv_open_xrc_domain, +ibv_create_xrc_rcv_qp, +ibv_modify_xrc_rcv_qp

     

    +


    +IBV_REG_XRC_RCV_QP

    +


    +IBV_UNREG_XRC_RCV_QP

    +
    +

    NAME

    +ibv_reg_xrc_rcv_qp, ibv_unreg_xrc_rcv_qp - register and unregister a user +process with an XRC receive queue pair (QP)   +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_reg_xrc_rcv_qp(struct ibv_xrc_domain *xrc_domain, uint32_t xrc_qp_num);
    +int ibv_unreg_xrc_rcv_qp(struct ibv_xrc_domain *xrc_domain, uint32_t xrc_qp_num); 
    +

    DESCRIPTION

    +ibv_reg_xrc_rcv_qp() registers a user process with the XRC receive QP +(created via ibv_create_xrc_rcv_qp() ) whose number is xrc_qp_num, +and which is associated with the XRC domain xrc_domain. + +

    ibv_unreg_xrc_rcv_qp() unregisters a user process from the XRC receive +QP number xrc_qp_num, which is associated with the XRC domain +xrc_domain. When the number of user processes registered with this XRC +receive QP drops to zero, the QP is destroyed.

    +

    RETURN VALUE

    +ibv_reg_xrc_rcv_qp() and ibv_unreg_xrc_rcv_qp() returns 0 on +success, or the value of errno on failure (which indicates the failure reason).

    +NOTES

    +ibv_reg_xrc_rcv_qp() and ibv_unreg_xrc_rcv_qp() may fail if the +number xrc_qp_num is not a number of a valid XRC receive QP (the QP is +not allocated or it is the number of a non-XRC QP), or the XRC receive QP was +created with an XRC domain other than xrc_domain. + +

    If a process is still registered with any XRC RCV QPs belonging to some +domain, ibv_close_xrc_domain() will return failure if called for that +domain in that process.

    +

    ibv_create_xrc_rcv_qp() performs an implicit registration for the +creating process; when that process is finished with the XRC RCV QP, it should +call ibv_unreg_xrc_rcv_qp() for that QP. Note that if no other processes +are registered with the QP at this time, its registration count will drop to +zero and it will be destroyed.  

    +

    SEE ALSO

    +ibv_open_xrc_domain, +ibv_create_xrc_rcv_qp

     

    +

     

    +


    +IBV_CREATE_QP

    +


    +IBV_DESTROY_QP

    +
    +

    NAME

    +ibv_create_qp, ibv_destroy_qp - create or destroy a queue pair (QP)

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +struct ibv_qp *ibv_create_qp(struct ibv_pd *pd,
    +                             struct ibv_qp_init_attr *qp_init_attr);
    +
    +int ibv_destroy_qp(struct ibv_qp *qp);
    +

    DESCRIPTION

    +ibv_create_qp() creates a queue pair (QP) associated with the protection +domain pd. The argument qp_init_attr is an ibv_qp_init_attr +struct, as defined in <infiniband/verbs.h>. +

    +
    struct ibv_qp_init_attr {
    +void                   *qp_context;     /* Associated context of the QP */
    +struct ibv_cq          *send_cq;        /* CQ to be associated with the Send Queue (SQ) */ 
    +struct ibv_cq          *recv_cq;        /* CQ to be associated with the Receive Queue (RQ) */
    +struct ibv_srq         *srq;            /* SRQ handle if QP is to be associated with an SRQ, otherwise NULL */
    +struct ibv_qp_cap       cap;            /* QP capabilities */
    +enum ibv_qp_type        qp_type;        /* QP Transport Service Type: IBV_QPT_RC, IBV_QPT_UC, IBV_QPT_UD or IBV_QPT_XRC */
    +int                     sq_sig_all;     /* If set, each Work Request (WR) submitted to the SQ generates a completion entry */
    +struct ibv_xrc_domain  *xrc_domain;     /* XRC domain the QP will be associated with (valid only for IBV_QPT_XRC QP), otherwise NULL */
    +};
    +
    +struct ibv_qp_cap {
    +uint32_t                max_send_wr;    /* Requested max number of outstanding WRs in the SQ */
    +uint32_t                max_recv_wr;    /* Requested max number of outstanding WRs in the RQ */
    +uint32_t                max_send_sge;   /* Requested max number of scatter/gather (s/g) elements in a WR in the SQ */
    +uint32_t                max_recv_sge;   /* Requested max number of s/g elements in a WR in the SQ */
    +uint32_t                max_inline_data;/* Requested max number of data (bytes) that can be posted inline to the SQ, otherwise 0 */
    +};
    +
    +

    The function ibv_create_qp() will update the qp_init_attr->cap +struct with the actual QP values of the QP that was +created; the values will be greater than or equal to the values requested.

    +

    ibv_destroy_qp() destroys the QP qp.

    +

    RETURN VALUE

    +ibv_create_qp() returns a pointer to the created QP, or NULL if the +request fails. Check the QP number (qp_num) in the returned QP. +

    ibv_destroy_qp() returns 0 on success, or the value of errno on +failure (which indicates the failure reason).

    +

    NOTES

    +ibv_create_qp() will fail if a it is asked to create QP of a type other +than IBV_QPT_RC or IBV_QPT_UD associated with an SRQ. +

    The attributes max_recv_wr and max_recv_sge are ignored by ibv_create_qp() +if the QP is to be associated with an SRQ.

    +

    ibv_destroy_qp() fails if the QP is attached to a multicast group.

    +

    SEE ALSO

    +ibv_alloc_pd, ibv_modify_qp, +ibv_query_qp +

     

    +


    +IBV_MODIFY_QP

    +
    +

    NAME

    +ibv_modify_qp - modify the attributes of a queue pair (QP) +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,
    +                  int attr_mask); 
    +

    DESCRIPTION

    +ibv_modify_qp() modifies the attributes of QP qp with the +attributes in attr according to the mask attr_mask. The argument +attr is an ibv_qp_attr struct, as defined in <infiniband/verbs.h>. +

    +
    struct ibv_qp_attr {
    +enum ibv_qp_state       qp_state;               /* Move the QP to this state */
    +enum ibv_qp_state       cur_qp_state;           /* Assume this is the current QP state */
    +enum ibv_mtu            path_mtu;               /* Path MTU (valid only for RC/UC QPs) */
    +enum ibv_mig_state      path_mig_state;         /* Path migration state (valid if HCA supports APM) */
    +uint32_t                qkey;                   /* Q_Key for the QP (valid only for UD QPs) */
    +uint32_t                rq_psn;                 /* PSN for receive queue (valid only for RC/UC QPs) */
    +uint32_t                sq_psn;                 /* PSN for send queue (valid only for RC/UC QPs) */
    +uint32_t                dest_qp_num;            /* Destination QP number (valid only for RC/UC QPs) */
    +int                     qp_access_flags;        /* Mask of enabled remote access operations (valid only for RC/UC QPs) */
    +struct ibv_qp_cap       cap;                    /* QP capabilities (valid if HCA supports QP resizing) */
    +struct ibv_ah_attr      ah_attr;                /* Primary path address vector (valid only for RC/UC QPs) */
    +struct ibv_ah_attr      alt_ah_attr;            /* Alternate path address vector (valid only for RC/UC QPs) */
    +uint16_t                pkey_index;             /* Primary P_Key index */
    +uint16_t                alt_pkey_index;         /* Alternate P_Key index */
    +uint8_t                 en_sqd_async_notify;    /* Enable SQD.drained async notification (Valid only if qp_state is SQD) */
    +uint8_t                 sq_draining;            /* Is the QP draining? Irrelevant for ibv_modify_qp() */
    +uint8_t                 max_rd_atomic;          /* Number of outstanding RDMA reads & atomic operations on the destination QP (valid only for RC QPs) */
    +uint8_t                 max_dest_rd_atomic;     /* Number of responder resources for handling incoming RDMA reads & atomic operations (valid only for RC QPs) */
    +uint8_t                 min_rnr_timer;          /* Minimum RNR NAK timer (valid only for RC QPs) */
    +uint8_t                 port_num;               /* Primary port number */
    +uint8_t                 timeout;                /* Local ack timeout for primary path (valid only for RC QPs) */
    +uint8_t                 retry_cnt;              /* Retry count (valid only for RC QPs) */
    +uint8_t                 rnr_retry;              /* RNR retry (valid only for RC QPs) */
    +uint8_t                 alt_port_num;           /* Alternate port number */
    +uint8_t                 alt_timeout;            /* Local ack timeout for alternate path (valid only for RC QPs) */
    +};
    +
    +

    For details on struct ibv_qp_cap see the description of ibv_create_qp(). +For details on struct ibv_ah_attr see the description of ibv_create_ah(). +

    +

    The argument attr_mask specifies the QP attributes to be modified. The +argument is either 0 or the bitwise OR of one or more of the following flags: +

    +

    +
    +
    IBV_QP_STATE Modify qp_state
    +
    +
    IBV_QP_CUR_STATE Set cur_qp_state
    +
    +
    IBV_QP_EN_SQD_ASYNC_NOTIFY Set en_sqd_async_notify
    +
    +
    IBV_QP_ACCESS_FLAGS Set qp_access_flags
    +
    +
    IBV_QP_PKEY_INDEX Set pkey_index
    +
    +
    IBV_QP_PORT Set port_num
    +
    +
    IBV_QP_QKEY Set qkey
    +
    +
    IBV_QP_AV Set ah_attr
    +
    +
    IBV_QP_PATH_MTU Set path_mtu
    +
    +
    IBV_QP_TIMEOUT Set timeout
    +
    +
    IBV_QP_RETRY_CNT Set retry_cnt
    +
    +
    IBV_QP_RNR_RETRY Set rnr_retry
    +
    +
    IBV_QP_RQ_PSN Set rq_psn
    +
    +
    IBV_QP_MAX_QP_RD_ATOMIC Set max_rd_atomic
    +
    +
    IBV_QP_ALT_PATH Set the alternative path via: alt_ah_attr, + alt_pkey_index, alt_port_num, alt_timeout
    +
    +
    IBV_QP_MIN_RNR_TIMER Set min_rnr_timer
    +
    +
    IBV_QP_SQ_PSN Set sq_psn
    +
    +
    IBV_QP_MAX_DEST_RD_ATOMIC Set max_dest_rd_atomic
    +
    +
    IBV_QP_PATH_MIG_STATE Set path_mig_state
    +
    +
    IBV_QP_CAP Set cap
    +
    +
    IBV_QP_DEST_QPN Set dest_qp_num
    +
    +
    +

    RETURN VALUE

    +ibv_modify_qp() returns 0 on success, or the value of errno on failure +(which indicates the failure reason).

    NOTES

    +If any of the modify attributes or the modify mask are invalid, none of the +attributes will be modified (including the QP state). +

    Not all devices support resizing QPs. To check if a device supports it, check +if the IBV_DEVICE_RESIZE_MAX_WR bit is set in the device capabilities +flags.

    +

    Not all devices support alternate paths. To check if a device supports it, +check if the IBV_DEVICE_AUTO_PATH_MIG bit is set in the device +capabilities flags.

    +

    The following tables indicate for each QP Transport Service Type, the minimum +list of attributes that must be changed upon transitioning QP state from: Reset +--> Init --> RTR --> RTS.

    +

    +
    For QP Transport Service Type  IBV_QPT_UD:
    +
    +Next state     Required attributes
    +----------     ----------------------------------------
    +Init           IBV_QP_STATE, IBV_QP_PKEY_INDEX, IBV_QP_PORT, 
    +               IBV_QP_QKEY 
    +RTR            IBV_QP_STATE 
    +RTS            IBV_QP_STATE, IBV_QP_SQ_PSN 
    +
    +

    +
    For QP Transport Service Type  IBV_QPT_UC:
    +
    +Next state     Required attributes
    +----------     ----------------------------------------
    +Init           IBV_QP_STATE, IBV_QP_PKEY_INDEX, IBV_QP_PORT, 
    +               IBV_QP_ACCESS_FLAGS 
    +RTR            IBV_QP_STATE, IBV_QP_AV, IBV_QP_PATH_MTU, 
    +               IBV_QP_DEST_QPN, IBV_QP_RQ_PSN 
    +RTS            IBV_QP_STATE, IBV_QP_SQ_PSN 
    +
    +

    +
    For QP Transport Service Type  IBV_QPT_RC:
    +
    +Next state     Required attributes
    +----------     ----------------------------------------
    +Init           IBV_QP_STATE, IBV_QP_PKEY_INDEX, IBV_QP_PORT, 
    +               IBV_QP_ACCESS_FLAGS 
    +RTR            IBV_QP_STATE, IBV_QP_AV, IBV_QP_PATH_MTU, 
    +               IBV_QP_DEST_QPN, IBV_QP_RQ_PSN, 
    +               IBV_QP_MAX_DEST_RD_ATOMIC, IBV_QP_MIN_RNR_TIMER 
    +RTS            IBV_QP_STATE, IBV_QP_SQ_PSN, IBV_QP_MAX_QP_RD_ATOMIC, 
    +               IBV_QP_RETRY_CNT, IBV_QP_RNR_RETRY, IBV_QP_TIMEOUT
    +

    SEE ALSO

    +ibv_create_qp, ibv_destroy_qp, +ibv_query_qp, +ibv_create_ah

     

    +

     

    +


    +IBV_POST_RECV

    +
    +

    NAME

    +ibv_post_recv - post a list of work requests (WRs) to a receive queue

    +SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_post_recv(struct ibv_qp *qp, struct ibv_recv_wr *wr,
    +                  struct ibv_recv_wr **bad_wr);
    +

    DESCRIPTION

    +ibv_post_recv() posts the linked list of work requests (WRs) starting +with wr to the receive queue of the queue pair qp. It stops +processing WRs from this list at the first failure (that can be detected +immediately while requests are being posted), and returns this failing WR +through bad_wr. + +

    The argument wr is an ibv_recv_wr struct, as defined in <infiniband/verbs.h>. +

    +

    +
    struct ibv_recv_wr {
    +uint64_t                wr_id;     /* User defined WR ID */
    +struct ibv_recv_wr     *next;      /* Pointer to next WR in list, NULL if last WR */
    +struct ibv_sge         *sg_list;   /* Pointer to the s/g array */
    +int                     num_sge;   /* Size of the s/g array */
    +};
    +
    +struct ibv_sge {
    +uint64_t                addr;      /* Start address of the local memory buffer */
    +uint32_t                length;    /* Length of the buffer */
    +uint32_t                lkey;      /* Key of the local Memory Region */
    +};
    +

    RETURN VALUE

    +ibv_post_recv() returns 0 on success, or the value of errno on failure +(which indicates the failure reason). +

    NOTES

    +The buffers used by a WR can only be safely reused after WR the request is fully +executed and a work completion has been retrieved from the corresponding +completion queue (CQ). +

    If the QP qp is associated with a shared receive queue, you must use +the function ibv_post_srq_recv(), and not ibv_post_recv(), since +the QP's own receive queue will not be used.

    +

    If a WR is being posted to a UD QP, the Global Routing Header (GRH) of the +incoming message will be placed in the first 40 bytes of the buffer(s) in the +scatter list. If no GRH is present in the incoming message, then the first bytes +will be undefined. This means that in all cases, the actual data of the incoming +message will start at an offset of 40 bytes into the buffer(s) in the scatter +list.

    +

    SEE ALSO

    +ibv_create_qp, ibv_post_send, +ibv_post_srq_recv, +ibv_poll_cq +

     

    +

     

    +


    +IBV_POST_SEND

    +
    +

    NAME

    +ibv_post_send - post a list of work requests (WRs) to a send queue +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr,
    +                  struct ibv_send_wr **bad_wr); 
    +

    DESCRIPTION

    +ibv_post_send() posts the linked list of work requests (WRs) starting +with wr to the send queue of the queue pair qp. It stops +processing WRs from this list at the first failure (that can be detected +immediately while requests are being posted), and returns this failing WR +through bad_wr. + +

    The argument wr is an ibv_send_wr struct, as defined in <infiniband/verbs.h>. +

    +

    +
    struct ibv_send_wr {
    +uint64_t                wr_id;                  /* User defined WR ID */
    +struct ibv_send_wr     *next;                   /* Pointer to next WR in list, NULL if last WR */
    +struct ibv_sge         *sg_list;                /* Pointer to the s/g array */
    +int                     num_sge;                /* Size of the s/g array */
    +enum ibv_wr_opcode      opcode;                 /* Operation type */
    +int                     send_flags;             /* Flags of the WR properties */
    +uint32_t                imm_data;               /* Immediate data (in network byte order) */
    +union {
    +struct {
    +uint64_t        remote_addr;    /* Start address of remote memory buffer */
    +uint32_t        rkey;           /* Key of the remote Memory Region */
    +} rdma;
    +struct {
    +uint64_t        remote_addr;    /* Start address of remote memory buffer */ 
    +uint64_t        compare_add;    /* Compare operand */
    +uint64_t        swap;           /* Swap operand */
    +uint32_t        rkey;           /* Key of the remote Memory Region */
    +} atomic;
    +struct {
    +struct ibv_ah  *ah;             /* Address handle (AH) for the remote node address */
    +uint32_t        remote_qpn;     /* QP number of the destination QP */
    +uint32_t        remote_qkey;    /* Q_Key number of the destination QP */
    +} ud;
    +} wr;
    +uint32_t                xrc_remote_srq_num;     /* SRQ number of the destination XRC */
    +};
    +
    +struct ibv_sge {
    +uint64_t                addr;                   /* Start address of the local memory buffer */
    +uint32_t                length;                 /* Length of the buffer */
    +uint32_t                lkey;                   /* Key of the local Memory Region */
    +};
    +
    +

    Each QP Transport Service Type supports a specific set of opcodes, as shown +in the following table:

    +

    +
    OPCODE                      | IBV_QPT_UD | IBV_QPT_UC | IBV_QPT_RC | IBV_QPT_XRC
    +----------------------------+------------+------------+------------+------------
    +IBV_WR_SEND                 |     X      |     X      |     X      |     X
    +IBV_WR_SEND_WITH_IMM        |     X      |     X      |     X      |     X
    +IBV_WR_RDMA_WRITE           |            |     X      |     X      |     X
    +IBV_WR_RDMA_WRITE_WITH_IMM  |            |     X      |     X      |     X
    +IBV_WR_RDMA_READ            |            |            |     X      |     X
    +IBV_WR_ATOMIC_CMP_AND_SWP   |            |            |     X      |     X
    +IBV_WR_ATOMIC_FETCH_AND_ADD |            |            |     X      |     X
    +
    +

    The attribute send_flags describes the properties of the WR. +It is either 0 or the bitwise OR of one or more of the +following flags:

    +

    +
    +
    IBV_SEND_FENCE Set the fence indicator. Valid only for QPs with + Transport Service Type IBV_QPT_RC
    +
    +
    IBV_SEND_SIGNALED Set the completion notification indicator. + Relevant only if QP was created with sq_sig_all=0
    +
    +
    IBV_SEND_SOLICITED Set the solicited event indicator. Valid only + for Send and RDMA Write with immediate
    +
    +
    IBV_SEND_INLINE Send data in given gather list as inline data +
    +
    in a send WQE. Valid only for Send and RDMA Write. The L_Key will not be + checked. +
    +
    +

    RETURN VALUE

    +ibv_post_send() returns 0 on success, or the value of errno on failure +(which indicates the failure reason). +

    NOTES

    +The user should not alter or destroy AHs associated with WRs until request is +fully executed and a work completion has been retrieved from the corresponding +completion queue (CQ) to avoid unexpected behavior. +

    The buffers used by a WR can only be safely reused after WR the request is +fully executed and a work completion has been retrieved from the corresponding +completion queue (CQ). However, if the IBV_SEND_INLINE flag was set, the buffer +can be reused immediately after the call returns.

    +

    SEE ALSO

    +ibv_create_qp, +ibv_create_xrc_rcv_qp, +ibv_create_ah, +ibv_post_recv, +ibv_post_srq_recv, +ibv_poll_cq

     

    +

     

    +


    +IBV_POST_SRQ_RECV

    +
    +

    NAME

    +ibv_post_srq_recv - post a list of work requests (WRs) to a shared receive queue +(SRQ)

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_post_srq_recv(struct ibv_srq *srq, struct ibv_recv_wr *wr,
    +                      struct ibv_recv_wr **bad_wr);
    +

    DESCRIPTION

    +ibv_post_srq_recv() posts the linked list of work requests (WRs) starting +with wr to the shared receive queue (SRQ) srq. It stops processing +WRs from this list at the first failure (that can be detected immediately while +requests are being posted), and returns this failing WR through bad_wr. + +

    The argument wr is an ibv_recv_wr struct, as defined in <infiniband/verbs.h>. +

    +

    +
    struct ibv_recv_wr {
    +uint64_t                wr_id;     /* User defined WR ID */
    +struct ibv_recv_wr     *next;      /* Pointer to next WR in list, NULL if last WR */
    +struct ibv_sge         *sg_list;   /* Pointer to the s/g array */
    +int                     num_sge;   /* Size of the s/g array */
    +};
    +
    +struct ibv_sge {
    +uint64_t                addr;      /* Start address of the local memory buffer */
    +uint32_t                length;    /* Length of the buffer */
    +uint32_t                lkey;      /* Key of the local Memory Region */
    +};
    +

    RETURN VALUE

    +ibv_post_srq_recv() returns 0 on success, or the value of errno on +failure (which indicates the failure reason).

    NOTES

    +The buffers used by a WR can only be safely reused after WR the request is fully +executed and a work completion has been retrieved from the corresponding +completion queue (CQ). +

    If a WR is being posted to a UD QP, the Global Routing Header (GRH) of the +incoming message will be placed in the first 40 bytes of the buffer(s) in the +scatter list. If no GRH is present in the incoming message, then the first bytes +will be undefined. This means that in all cases, the actual data of the incoming +message will start at an offset of 40 bytes into the buffer(s) in the scatter +list.

    +

    SEE ALSO

    +ibv_create_qp, ibv_post_send, +ibv_post_recv, +ibv_poll_cq +

     

    +

     

    +


    +IBV_QUERY_QP

    +
    +

    NAME

    +ibv_query_qp - get the attributes of a queue pair (QP)

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr,
    +                 int attr_mask,
    +                 struct ibv_qp_init_attr *init_attr);
    +

    DESCRIPTION

    +ibv_query_qp() gets the attributes specified in attr_mask for the +QP qp and returns them through the pointers attr and init_attr. +The argument attr is an ibv_qp_attr struct, as defined in <infiniband/verbs.h>. +

    +
    struct ibv_qp_attr {
    +enum ibv_qp_state       qp_state;            /* Current QP state */
    +enum ibv_qp_state       cur_qp_state;        /* Current QP state - irrelevant for ibv_query_qp */
    +enum ibv_mtu            path_mtu;            /* Path MTU (valid only for RC/UC QPs) */
    +enum ibv_mig_state      path_mig_state;      /* Path migration state (valid if HCA supports APM) */
    +uint32_t                qkey;                /* Q_Key of the QP (valid only for UD QPs) */
    +uint32_t                rq_psn;              /* PSN for receive queue (valid only for RC/UC QPs) */
    +uint32_t                sq_psn;              /* PSN for send queue (valid only for RC/UC QPs) */
    +uint32_t                dest_qp_num;         /* Destination QP number (valid only for RC/UC QPs) */
    +int                     qp_access_flags;     /* Mask of enabled remote access operations (valid only for RC/UC QPs) */
    +struct ibv_qp_cap       cap;                 /* QP capabilities */
    +struct ibv_ah_attr      ah_attr;             /* Primary path address vector (valid only for RC/UC QPs) */
    +struct ibv_ah_attr      alt_ah_attr;         /* Alternate path address vector (valid only for RC/UC QPs) */
    +uint16_t                pkey_index;          /* Primary P_Key index */
    +uint16_t                alt_pkey_index;      /* Alternate P_Key index */
    +uint8_t                 en_sqd_async_notify; /* Enable SQD.drained async notification - irrelevant for ibv_query_qp */
    +uint8_t                 sq_draining;         /* Is the QP draining? (Valid only if qp_state is SQD) */
    +uint8_t                 max_rd_atomic;       /* Number of outstanding RDMA reads & atomic operations on the destination QP (valid only for RC QPs) */
    +uint8_t                 max_dest_rd_atomic;  /* Number of responder resources for handling incoming RDMA reads & atomic operations (valid only for RC QPs) */
    +uint8_t                 min_rnr_timer;       /* Minimum RNR NAK timer (valid only for RC QPs) */
    +uint8_t                 port_num;            /* Primary port number */
    +uint8_t                 timeout;             /* Local ack timeout for primary path (valid only for RC QPs) */
    +uint8_t                 retry_cnt;           /* Retry count (valid only for RC QPs) */
    +uint8_t                 rnr_retry;           /* RNR retry (valid only for RC QPs) */
    +uint8_t                 alt_port_num;        /* Alternate port number */
    +uint8_t                 alt_timeout;         /* Local ack timeout for alternate path (valid only for RC QPs) */
    +};
    +
    +

    For details on struct ibv_qp_cap see the description of ibv_create_qp(). +For details on struct ibv_ah_attr see the description of ibv_create_ah().

    +

    RETURN VALUE

    +ibv_query_qp() returns 0 on success, or the value of errno on failure +(which indicates the failure reason).

    NOTES

    +The argument attr_mask is a hint that specifies the minimum list of +attributes to retrieve. Some RDMA devices may return extra attributes not +requested, for example if the value can be returned cheaply. This has the same +form as in ibv_modify_qp(). + +

    Attribute values are valid if they have been set using ibv_modify_qp(). +The exact list of valid attributes depends on the QP state.

    +

    Multiple calls to ibv_query_qp() may yield some differences in the +values returned for the following attributes: qp_state, path_mig_state, +sq_draining, ah_attr (if APM is enabled).

    +

    SEE ALSO

    +ibv_create_qp, ibv_destroy_qp, +ibv_modify_qp, +ibv_create_ah

     

    +

     

    +


    +IBV_ATTACH_MCAST

    +


    +IBV_DETACH_MCAST

    +
    +

    NAME

    +ibv_attach_mcast, ibv_detach_mcast - attach and detach a queue pair (QPs) +to/from a multicast group

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_attach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid);
    +
    +int ibv_detach_mcast(struct ibv_qp *qp, const union ibv_gid *gid, uint16_t lid);
    +

    DESCRIPTION

    +ibv_attach_mcast() attaches the QP qp to the multicast group +having MGID gid and MLID lid. + +

    ibv_detach_mcast() detaches the QP qp to the multicast group +having MGID gid and MLID lid.

    +

    RETURN VALUE

    +ibv_attach_mcast() and ibv_detach_mcast() returns 0 on success, or +the value of errno on failure (which indicates the failure reason).

    NOTES

    +Only QPs of Transport Service Type IBV_QPT_UD may be attached to +multicast groups. +

    If a QP is attached to the same multicast group multiple times, the QP will +still receive a single copy of a multicast message.

    +

    In order to receive multicast messages, a join request for the multicast +group must be sent to the subnet administrator (SA), so that the fabric's +multicast routing is configured to deliver messages to the local port.

    +

    SEE ALSO

    +ibv_create_qp

     

    +


    +IBV_RATE_TO_MULT

    +


    +IBV_MULT_TO_RATE

    +
    +

    NAME

    +

    ibv_rate_to_mult - convert IB rate enumeration to multiplier of 2.5 Gbit/sec
    +
    +mult_to_ibv_rate - convert multiplier of 2.5 Gbit/sec to an IB rate enumeration

    +

    SYNOPSIS

    +
    #include <infiniband/verbs.h>
    +
    +int ibv_rate_to_mult(enum ibv_rate rate);
    +
    +enum ibv_rate mult_to_ibv_rate(int mult);
    +

    DESCRIPTION

    +ibv_rate_to_mult() converts the IB transmission rate enumeration rate +to a multiple of 2.5 Gbit/sec (the base rate). For example, if rate is +IBV_RATE_5_GBPS, the value 2 will be returned (5 Gbit/sec = 2 * 2.5 Gbit/sec). +

    mult_to_ibv_rate() converts the multiplier value (of 2.5 Gbit/sec) +mult to an IB transmission rate enumeration. For example, if mult is +2, the rate enumeration IBV_RATE_5_GBPS will be returned.

    +

    RETURN VALUE

    +ibv_rate_to_mult() returns the multiplier of the base rate 2.5 Gbit/sec. +

    mult_to_ibv_rate() returns the enumeration representing the IB +transmission rate.

    +

    SEE ALSO

    +ibv_query_port +
    +

    <return-to-top>

     

    -
    -

    WinVerbs


    +

    RDMA CM - Communications Manager

    +
    +
    + +
    +

    NAME

    +
    +

    librdmacm.lib - RDMA + communication manager.

    +
    +

    SYNOPSIS

    +
    +

    #include <rdma/rdma_cma.h>

    +
    +

    DESCRIPTION

    +
    +

    Used to establish + communication endpoints over RDMA transports.

    +
    +

    NOTES

    +
    +

    The  RDMA CM is a communication manager used to setup reliable, connected + and unreliable datagram data transfers. It  + provides  an  RDMA transport  + neutral  interface for establishing connections.  The + API is based on sockets, but adapted for + queue pair (QP) based semantics: communication must be over a specific RDMA device, and data transfers are + message based.

    +

    The RDMA CM only provides + the communication management  (connection + setup / teardown) portion of an RDMA API.  It works in + conjunction with the verbs API  defined + by the libibverbs  library.   The  + libibverbs library provides the + interfaces needed to send and receive data.

    +
    +

    CLIENT OPERATION

    +

           This section + provides a general overview of the basic operation for the + active, or client, side of communication.  A  + general  connection  flow would + be:

    +

           + rdma_create_event_channel

    +

                + create channel to receive events

    +

           rdma_create_id

    +

                + allocate an rdma_cm_id, this is conceptually + similar to a socket

    +

           + rdma_resolve_addr

    +

                + obtain a local RDMA device to reach the remote + address

    +

           + rdma_get_cm_event

    +

                + wait for RDMA_CM_EVENT_ADDR_RESOLVED event

    +

           rdma_ack_cm_event

    +

                + ack event

    +

           rdma_create_qp

    +

                + allocate a QP for the communication

    +

           + rdma_resolve_route

    +

                + determine the route to the remote address

    +

           + rdma_get_cm_event

    +

                + wait for RDMA_CM_EVENT_ROUTE_RESOLVED event

    +

           + rdma_ack_cm_event

    +

                + ack event

    +

           rdma_connect

    +

                + connect to the remote server

    +

           + rdma_get_cm_event

    +

                + wait for RDMA_CM_EVENT_ESTABLISHED event

    +

           + rdma_ack_cm_event

    +

                + ack event

    +

           Perform data + transfers over connection

    +

           rdma_disconnect

    +

                + tear-down connection

    +

           + rdma_get_cm_event

    +

                + wait for RDMA_CM_EVENT_DISCONNECTED event

    +

           + rdma_ack_cm_event

    +

                + ack event

    +

           rdma_destroy_qp

    +

                + destroy the QP

    +

           rdma_destroy_id

    +

                + release the rdma_cm_id

    +

           + rdma_destroy_event_channel

    +

                + release the event channel

    +
    +

    An almost identical process is used to setup + unreliable datagram  (UD) + communication  between  nodes. 
    + No actual connection is formed between QPs however, so disconnection is not + needed.
    + Although this example shows the client + initiating the disconnect, either side of a + connection may initiate the disconnect.

    +
    +
    +

    SERVER OPERATION

    +

           This section + provides a general overview of the basic operation for the + passive, or server, side of communication.  + A general  connection  flow + would be:

    +

           + rdma_create_event_channel

    +

                + create channel to receive events

    +

           rdma_create_id

    +

                + allocate an rdma_cm_id, this is conceptually + similar to a socket

    +

           rdma_bind_addr

    +

                + set the local port number to listen on

    +

           rdma_listen

    +

                + begin listening for connection requests

    +

           + rdma_get_cm_event

    +

                + wait  for   + RDMA_CM_EVENT_CONNECT_REQUEST  event with a new  rdma_cm_id.

    +

           rdma_create_qp

    +

                + allocate a QP for the communication on the new + rdma_cm_id

    +

           rdma_accept

    +

                + accept the connection request

    +

           rdma_ack_cm_event

    +

                + ack event

    +

           + rdma_get_cm_event

    +

                + wait for RDMA_CM_EVENT_ESTABLISHED event

    +

           + rdma_ack_cm_event

    +

                + ack event

    +

           Perform data + transfers over connection

    +

           + rdma_get_cm_event

    +

                + wait for RDMA_CM_EVENT_DISCONNECTED even

    +

           + rdma_ack_cm_event

    +

                + ack event

    +

           rdma_disconnect

    +

                + tear-down connection

    +

           rdma_destroy_qp

    +

                + destroy the QP

    +

           rdma_destroy_id

    +

                + release the connected rdma_cm_id

    +

           rdma_destroy_id

    +

                + release the listening rdma_cm_id

    +

           + rdma_destroy_event_channel

    +

                + release the event channel

    +

    RETURN CODES

    +

           + =  0   success

    +

           = -1   + error - see errno for more details

    +
    +

    Librdmacm functions return 0 to indicate + success, and a -1 return value to indicate + failure.

    +

    If a function operates asynchronously,  + a  return value  of  0  + means  that  the operation was successfully started. 
    + The + operation could still complete in error; + users should check the  status of  the + related event.
    +
    + If the return value is -1, then errno will contain + additional information regarding the reason for the failure.
    + Prior + versions of the library would return -errno and not set errno for + some cases related to ENOMEM, ENODEV, ENODATA, + EINVAL, and EADDRNOTAVAIL codes.
    +
    Applications that want to check these codes and have compatibility  + with prior library versions must manually set errno to the negative + of the return code if it is < -1.

    +
    +

    SEE ALSO

    +
    +

     rdma_create_event_channel, + rdma_get_cm_eventrdma_create_id,
    + rdma_resolve_addr,  + rdma_bind_addr, rdma_create_qp,
    + rdma_resolve_route, + rdma_connect, rdma_listen, + rdma_accept,
    + rdma_reject, + rdma_join_multicastrdma_leave_multicast,
    + rdma_notify,  rdma_ack_cm_eventrdma_disconnect,
    + rdma_destroy_qp,  + rdma_destroy_id, rdma_destroy_event_channel,
    + rdma_get_devicesrdma_free_devicesrdma_get_peer_addr,
    + rdma_get_local_addr,  rdma_get_dst_portrdma_get_src_port,
    + rdma_set_option

    +
    +
    +

    <return-to-top>

    +
    + +

     

    +


    +RDMA_CREATE_ID

    +
    +

    NAME
    +
    +RDMA_CREATE_ID - Allocate a communication identifier.  

    +

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_create_id (struct +rdma_event_channel *channel, struct rdma_cm_id **id, +void *context, enum rdma_port_space ps);

    +

    ARGUMENTS

    +
    +
    channel
    +
    The communication channel that events associated with the allocated + rdma_cm_id will be reported on.
    +
    id
    +
    A reference where the allocated communication identifier will be + returned.
    +
    context
    +
    User specified context associated with the rdma_cm_id.
    +
    ps
    +
    RDMA port space.
    +
    +

    DESCRIPTION

    +Creates an identifier that is used to track communication information. +  +

    NOTES

    +Rdma_cm_id's are conceptually equivalent to a socket for RDMA communication. The +difference is that RDMA communication requires explicitly binding to a specified +RDMA device before communication can occur, and most operations are asynchronous +in nature. Communication events on an rdma_cm_id are reported through the +associated event channel. Users must release the rdma_cm_id by calling +rdma_destroy_id.   +

    PORT SPACE

    +Details of the services provided by the different port spaces are outlined +below. +
    +
    RDMA_PS_TCP
    +
    Provides reliable, connection-oriented QP communication. Unlike TCP, the + RDMA port space provides message, not stream, based communication.
    +
    RDMA_PS_UDP
    +
    Provides unreliable, connectionless QP communication. Supports both + datagram and multicast communication.
    +
    +

    SEE ALSO

    +rdma_cm, +rdma_create_event_channel, +rdma_destroy_id, rdma_get_devices, +rdma_bind_addr, rdma_resolve_addr, +rdma_connect, rdma_listen, +rdma_set_option

     

    +


    +RDMA_DESTROY_ID

    +
    +

    NAME
    +
    +RDMA_DESTROY_ID - Release a communication +identifier.

    +

    SYNOPSIS

    +#include <rdma/rdma_cma.h> int rdma_destroy_id (struct +rdma_cm_id *id);

    ARGUMENTS

    +
    +
    id
    +
    The communication identifier to destroy. +
    +
    +

    DESCRIPTION

    +Destroys the specified rdma_cm_id and cancels any outstanding asynchronous +operation.

    NOTES

    +Users must free any associated QP with the rdma_cm_id before calling this +routine and ack an related events.

    SEE ALSO

    +rdma_create_id, rdma_destroy_qp, +rdma_ack_cm_event +

     

    +


    +RDMA_CREATE_EVENT_CHANNEL

    +
    +

    NAME

    +rdma_create_event_channel - Open a channel used to report communication events.

    +SYNOPSIS

    +#include <rdma/rdma_cma.h>

     struct rdma_event_channel * +rdma_create_event_channel (void);

    +

    ARGUMENTS

    +
    +
    void
    +
    no arguments +
    +
    +

    DESCRIPTION

    +Asynchronous events are reported to users through event channels. +

    NOTES

    +Event channels are used to direct all events on an rdma_cm_id. For many clients, +a single event channel may be sufficient, however, when managing a large number +of connections or cm_id's, users may find it useful to direct events for +different cm_id's to different channels for processing. All created event +channels must be destroyed by calling rdma_destroy_event_channel. Users should +call rdma_get_cm_event to retrieve events on an event channel. Each event +channel is mapped to a file descriptor. The associated file descriptor can be +used and manipulated like any other fd to change its behavior. Users may make +the fd non-blocking, poll or select the fd, etc.   +

    SEE ALSO

    +rdma_cm, +rdma_get_cm_event, rdma_destroy_event_channel +

     

    +


    +RDMA_DESTROY_EVENT_CHANNEL

    +
    +

    NAME

    +rdma_destroy_event_channel - Close an event communication channel. +  +

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     void rdma_destroy_event_channel (struct +rdma_event_channel *channel);  

    +

    ARGUMENTS

    +
    +
    channel
    +
    The communication channel to destroy. +
    +
    +

    DESCRIPTION

    +Release all resources associated with an event channel and closes the associated +file descriptor.   +

    NOTES

    +All rdma_cm_id's associated with the event channel must be destroyed, and all +returned events must be acked before calling this function.   +

    SEE ALSO

    +rdma_create_event_channel, +rdma_get_cm_event, rdma_ack_cm_event +

     

    +


    +RDMA_RESOLVE_ADDR

    +
    +

    NAME

    +rdma_resolve_addr - Resolve destination and optional source addresses. +

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_resolve_addr (struct +rdma_cm_id *id, struct sockaddr *src_addr, +struct sockaddr *dst_addr, int timeout_ms);

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier. +
    +
    src_addr
    +
    Source address information. This parameter may be NULL. +
    +
    dst_addr
    +
    Destination address information. +
    +
    timeout_ms
    +
    Time to wait for resolution to complete. +
    +
    +

    DESCRIPTION

    +Resolve destination and optional source addresses from IP addresses to an RDMA +address. If successful, the specified rdma_cm_id will be bound to a local +device.   +

    NOTES

    +This call is used to map a given destination IP address to a usable RDMA +address. The IP to RDMA address mapping is done using the local routing tables, +or via ARP. If a source address is given, the rdma_cm_id is bound to that +address, the same as if rdma_bind_addr were called. If no source address is +given, and the rdma_cm_id has not yet been bound to a device, then the +rdma_cm_id will be bound to a source address based on the local routing tables. +After this call, the rdma_cm_id will be bound to an RDMA device. This call is +typically made from the active side of a connection before calling +rdma_resolve_route and rdma_connect.   +

    INFINIBAND SPECIFIC

    +This call maps the destination and, if given, source IP addresses to GIDs. In +order to perform the mapping, IPoIB must be running on both the local and remote +nodes.   +

    SEE ALSO

    +rdma_create_id, rdma_resolve_route, +rdma_connect, rdma_create_qp, +rdma_get_cm_event, rdma_bind_addr, +rdma_get_src_port, rdma_get_dst_port, +rdma_get_local_addr, +rdma_get_peer_addr +

     

    +


    +RDMA_GET_CM_EVENT

    +
    +

    NAME

    +rdma_get_cm_event - Retrieves the next pending communication event. +

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_get_cm_event (struct +rdma_event_channel *channel, struct rdma_cm_event **event); +

    +

    ARGUMENTS

    +
    +
    channel
    +
    Event channel to check for events. +
    +
    event
    +
    Allocated information about the next communication event. +
    +
    +

    DESCRIPTION

    +Retrieves a communication event. If no events are pending, by default, the call +will block until an event is received. +

    NOTES

    +The default synchronous behavior of this routine can be changed by modifying the +file descriptor associated with the given channel. All events that are reported +must be acknowledged by calling rdma_ack_cm_event. Destruction of an rdma_cm_id +will block until related events have been acknowledged.

    EVENT DATA

    +Communication event details are returned in the rdma_cm_event structure. This +structure is allocated by the rdma_cm and released by the rdma_ack_cm_event +routine. Details of the rdma_cm_event structure are given below. +
    +
    id
    +
    The rdma_cm identifier associated with the event. If the event type is + RDMA_CM_EVENT_CONNECT_REQUEST, then this references a new id for that + communication. +
    +
    listen_id
    +
    For RDMA_CM_EVENT_CONNECT_REQUEST event types, this references the + corresponding listening request identifier. +
    +
    event
    +
    Specifies the type of communication event which occurred. See EVENT + TYPES below. +
    +
    status
    +
    Returns any asynchronous error information associated with an event. The + status is zero unless the corresponding operation failed. +
    +
    param
    +
    Provides additional details based on the type of event. Users should + select the conn or ud subfields based on the rdma_port_space of the + rdma_cm_id associated with the event. See UD EVENT DATA and CONN EVENT DATA + below. +
    +
    +

    UD EVENT DATA

    +Event parameters related to unreliable datagram (UD) services: RDMA_PS_UDP and +RDMA_PS_IPOIB. The UD event data is valid for RDMA_CM_EVENT_ESTABLISHED and +RDMA_CM_EVENT_MULTICAST_JOIN events, unless stated otherwise. +
    +
    private_data
    +
    References any user-specified data associated with + RDMA_CM_EVENT_CONNECT_REQUEST or RDMA_CM_EVENT_ESTABLISHED events. The data + referenced by this field matches that specified by the remote side when + calling rdma_connect or rdma_accept. This field is NULL if the event does + not include private data. The buffer referenced by this pointer is + deallocated when calling rdma_ack_cm_event. +
    +
    private_data_len
    +
    The size of the private data buffer. Users should note that the size of + the private data buffer may be larger than the amount of private data sent + by the remote side. Any additional space in the buffer will be zeroed out. +
    +
    ah_attr
    +
    Address information needed to send data to the remote endpoint(s). Users + should use this structure when allocating their address handle. +
    +
    qp_num
    +
    QP number of the remote endpoint or multicast group. +
    +
    qkey
    +
    QKey needed to send data to the remote endpoint(s).
    +
    +

    CONN EVENT DATA

    +Event parameters related to connected QP services: RDMA_PS_TCP. The connection +related event data is valid for RDMA_CM_EVENT_CONNECT_REQUEST and +RDMA_CM_EVENT_ESTABLISHED events, unless stated otherwise. +
    +
    private_data
    +
    References any user-specified data associated with the event. The data + referenced by this field matches that specified by the remote side when + calling rdma_connect or rdma_accept. This field is NULL if the event does + not include private data. The buffer referenced by this pointer is + deallocated when calling rdma_ack_cm_event. +
    +
    private_data_len
    +
    The size of the private data buffer. Users should note that the size of + the private data buffer may be larger than the amount of private data sent + by the remote side. Any additional space in the buffer will be zeroed out. +
    +
    responder_resources
    +
    The number of responder resources requested of the recipient. This field + matches the initiator depth specified by the remote node when calling + rdma_connect and rdma_accept. +
    +
    initiator_depth
    +
    The maximum number of outstanding RDMA read/atomic operations that the + recipient may have outstanding. This field matches the responder resources + specified by the remote node when calling rdma_connect and rdma_accept. +
    +
    flow_control
    +
    Indicates if hardware level flow control is provided by the sender. +
    +
    retry_count
    +
    For RDMA_CM_EVENT_CONNECT_REQUEST events only, indicates the number of + times that the recipient should retry send operations. +
    +
    rnr_retry_count
    +
    The number of times that the recipient should retry receiver not ready + (RNR) NACK errors. +
    +
    srq
    +
    Specifies if the sender is using a shared-receive queue. +
    +
    qp_num
    +
    Indicates the remote QP number for the connection. +
    +
    +

    EVENT TYPES

    +The following types of communication events may be reported. +
    +
    RDMA_CM_EVENT_ADDR_RESOLVED
    +
    Address resolution (rdma_resolve_addr) completed successfully. +
    +
    RDMA_CM_EVENT_ADDR_ERROR
    +
    Address resolution (rdma_resolve_addr) failed. +
    +
    RDMA_CM_EVENT_ROUTE_RESOLVED
    +
    Route resolution (rdma_resolve_route) completed successfully. +
    +
    RDMA_CM_EVENT_ROUTE_ERROR
    +
    Route resolution (rdma_resolve_route) failed. +
    +
    RDMA_CM_EVENT_CONNECT_REQUEST
    +
    Generated on the passive side to notify the user of a new connection + request. +
    +
    RDMA_CM_EVENT_CONNECT_RESPONSE
    +
    Generated on the active side to notify the user of a successful response + to a connection request. It is only generated on rdma_cm_id's that do not + have a QP associated with them. +
    +
    RDMA_CM_EVENT_CONNECT_ERROR
    +
    Indicates that an error has occurred trying to establish or a + connection. May be generated on the active or passive side of a connection. +
    +
    RDMA_CM_EVENT_UNREACHABLE
    +
    Generated on the active side to notify the user that the remote server + is not reachable or unable to respond to a connection request. +
    +
    RDMA_CM_EVENT_REJECTED
    +
    Indicates that a connection request or response was rejected by the + remote end point. +
    +
    RDMA_CM_EVENT_ESTABLISHED
    +
    Indicates that a connection has been established with the remote end + point. +
    +
    RDMA_CM_EVENT_DISCONNECTED
    +
    The connection has been disconnected. +
    +
    RDMA_CM_EVENT_DEVICE_REMOVAL
    +
    The local RDMA device associated with the rdma_cm_id has been removed. + Upon receiving this event, the user must destroy the related rdma_cm_id. +
    +
    RDMA_CM_EVENT_MULTICAST_JOIN
    +
    The multicast join operation (rdma_join_multicast) completed + successfully. +
    +
    RDMA_CM_EVENT_MULTICAST_ERROR
    +
    An error either occurred joining a multicast group, or, if the group had + already been joined, on an existing group. The specified multicast group is + no longer accessible and should be rejoined, if desired. +
    +
    RDMA_CM_EVENT_ADDR_CHANGE
    +
    The network device associated with this ID through address resolution + changed its HW address, eg following of bonding failover. This event can + serve as a hint for applications who want the links used for their RDMA + sessions to align with the network stack. +
    +
    RDMA_CM_EVENT_TIMEWAIT_EXIT
    +
    The QP associated with a connection has exited its timewait state and is + now ready to be re-used. After a QP has been disconnected, it is maintained + in a timewait state to allow any in flight packets to exit the network. + After the timewait state has completed, the rdma_cm will report this event. +
    +
    +

    SEE ALSO

    +rdma_ack_cm_event, +rdma_create_event_channel, +rdma_resolve_addr, +rdma_resolve_route, rdma_connect, +rdma_listen, rdma_join_multicast, +rdma_destroy_id, rdma_event_str

     

    +


    +RDMA_ACK_CM_EVENT

    +
    +

    NAME

    +
    + +rdma_ack_cm_event - Free a communication event.

    SYNOPSIS

    +
    + +#include <rdma/rdma_cma.h>

     int rdma_ack_cm_event (struct +rdma_cm_event *event);

    +

    ARGUMENTS

    +
    + +
    +
    event
    +
    Event to be released. +
    +
    +
    + +

    DESCRIPTION

    +
    + +All events which are allocated by rdma_get_cm_event must be released, there +should be a one-to-one correspondence between successful gets and acks. This +call frees the event structure and any memory that it references.

    SEE ALSO

    +
    + +rdma_get_cm_event, rdma_destroy_id + + +

     

    +
    + +


    +RDMA_CREATE_QP

    +
    +

    NAME

    +rdma_create_qp - Allocate a QP.   +

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_create_qp (struct +rdma_cm_id *id, struct ibv_pd *pd, +struct ibv_qp_init_attr *qp_init_attr);

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier. +
    +
    pd
    +
    protection domain for the QP. +
    +
    qp_init_attr
    +
    initial QP attributes. +
    +
    +

    DESCRIPTION

    +Allocate a QP associated with the specified rdma_cm_id and transition it for +sending and receiving. +

    NOTES

    +The rdma_cm_id must be bound to a local RDMA device before calling this +function, and the protection domain must be for that same device. QPs allocated +to an rdma_cm_id are automatically transitioned by the librdmacm through their +states. After being allocated, the QP will be ready to handle posting of +receives. If the QP is unconnected, it will be ready to post sends.

    SEE ALSO

    +rdma_bind_addr, +rdma_resolve_addr, rdma_destroy_qp, +ibv_create_qp, +ibv_modify_qp +

     

    +


    +RDMA_DESTROY_QP

    +
    +

    NAME

    +rdma_destroy_qp - Deallocate a QP.

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     void rdma_destroy_qp (struct +rdma_cm_id *id);

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier. +
    +
    +

    DESCRIPTION

    +Destroy a QP allocated on the rdma_cm_id.

    NOTES

    +Users must destroy any QP associated with an rdma_cm_id before destroying the +ID.

    SEE ALSO

    +rdma_create_qp, +rdma_destroy_id, ibv_destroy_qp +

     

    +


    +RDMA_ACCEPT

    +
    +

    NAME

    +rdma_accept - Called to accept a connection request.   +

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_accept (struct +rdma_cm_id *id, struct rdma_conn_param *conn_param); + 

    +

    ARGUMENTS

    +
    +
    id
    +
    Connection identifier associated with the request. +
    +
    conn_param
    +
    Information needed to establish the connection. See CONNECTION + PROPERTIES below for details. +
    +
    +

    DESCRIPTION

    +Called from the listening side to accept a connection or datagram service lookup +request. +

    NOTES

    +Unlike the socket accept routine, rdma_accept is not called on a listening +rdma_cm_id. Instead, after calling rdma_listen, the user waits for an +RDMA_CM_EVENT_CONNECT_REQUEST event to occur. Connection request events give the +user a newly created rdma_cm_id, similar to a new socket, but the rdma_cm_id is +bound to a specific RDMA device. rdma_accept is called on the new rdma_cm_id.

    +CONNECTION PROPERTIES

    +The following properties are used to configure the communication and specified +by the conn_param parameter when accepting a connection or datagram +communication request. Users should use the rdma_conn_param values reported in +the connection request event to determine appropriate values for these fields +when accepting. Users may reference the rdma_conn_param structure in the +connection event directly, or can reference their own structure. If the +rdma_conn_param structure from an event is referenced, the event must not be +acked until after this call returns. +
    +
    private_data
    +
    References a user-controlled data buffer. The contents of the buffer are + copied and transparently passed to the remote side as part of the + communication request. May be NULL if private_data is not required. +
    +
    private_data_len
    +
    Specifies the size of the user-controlled data buffer. Note that the + actual amount of data transferred to the remote side is transport dependent + and may be larger than that requested. +
    +
    responder_resources
    +
    The maximum number of outstanding RDMA read and atomic operations that + the local side will accept from the remote side. Applies only to RDMA_PS_TCP. + This value must be less than or equal to the local RDMA device attribute + max_qp_rd_atom and the responder_resources value reported in the connect + request event. +
    +
    initiator_depth
    +
    The maximum number of outstanding RDMA read and atomic operations that + the local side will have to the remote side. Applies only to RDMA_PS_TCP. + This value must be less than or equal to the local RDMA device attribute + max_qp_init_rd_atom and the initiator_depth value reported in the connect + request event. +
    +
    flow_control
    +
    Specifies if hardware flow control is available. This value is exchanged + with the remote peer and is not used to configure the QP. Applies only to + RDMA_PS_TCP. +
    +
    retry_count
    +
    This value is ignored. +
    +
    rnr_retry_count
    +
    The maximum number of times that a send operation from the remote peer + should be retried on a connection after receiving a receiver not ready (RNR) + error. RNR errors are generated when a send request arrives before a buffer + has been posted to receive the incoming data. Applies only to RDMA_PS_TCP. +
    +
    srq
    +
    Specifies if the QP associated with the connection is using a shared + receive queue. This field is ignored by the library if a QP has been created + on the rdma_cm_id. Applies only to RDMA_PS_TCP. +
    +
    qp_num
    +
    Specifies the QP number associated with the connection. This field is + ignored by the library if a QP has been created on the rdma_cm_id. +
    +
    +

    INFINIBAND SPECIFIC

    +In addition to the connection properties defined above, InfiniBand QPs are +configured with minimum RNR NAK timer and local ACK timeout values. The minimum +RNR NAK timer value is set to 0, for a delay of 655 ms. The local ACK timeout is +calculated based on the packet lifetime and local HCA ACK delay. The packet +lifetime is determined by the InfiniBand Subnet Administrator and is part of the +route (path record) information obtained by the active side of the connection. +The HCA ACK delay is a property of the locally used HCA. The RNR retry count is +a 3-bit value.

    SEE ALSO

    +rdma_listen, rdma_reject, +rdma_get_cm_event +

     

    +


    +RDMA_CONNECT

    +
    +

    NAME

    +rdma_connect - Initiate an active connection request.

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_connect (struct +rdma_cm_id *id, struct rdma_conn_param *conn_param); +

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier. +
    +
    conn_param
    +
    connection parameters. See CONNECTION PROPERTIES below for details. +
    +
    +

    DESCRIPTION

    +For an rdma_cm_id of type RDMA_PS_TCP, this call initiates a connection request +to a remote destination. For an rdma_cm_id of type RDMA_PS_UDP, it initiates a +lookup of the remote QP providing the datagram service.

    NOTES

    +Users must have resolved a route to the destination address by having called +rdma_resolve_route before calling this routine. +

    CONNECTION PROPERTIES

    +The following properties are used to configure the communication and specified +by the conn_param parameter when connecting or establishing datagram +communication. +
    +
    private_data
    +
    References a user-controlled data buffer. The contents of the buffer are + copied and transparently passed to the remote side as part of the + communication request. May be NULL if private_data is not required. +
    +
    private_data_len
    +
    Specifies the size of the user-controlled data buffer. Note that the + actual amount of data transferred to the remote side is transport dependent + and may be larger than that requested. +
    +
    responder_resources
    +
    The maximum number of outstanding RDMA read and atomic operations that + the local side will accept from the remote side. Applies only to RDMA_PS_TCP. + This value must be less than or equal to the local RDMA device attribute + max_qp_rd_atom and remote RDMA device attribute max_qp_init_rd_atom. The + remote endpoint can adjust this value when accepting the connection. +
    +
    initiator_depth
    +
    The maximum number of outstanding RDMA read and atomic operations that + the local side will have to the remote side. Applies only to RDMA_PS_TCP. + This value must be less than or equal to the local RDMA device attribute + max_qp_init_rd_atom and remote RDMA device attribute max_qp_rd_atom. The + remote endpoint can adjust this value when accepting the connection. +
    +
    flow_control
    +
    Specifies if hardware flow control is available. This value is exchanged + with the remote peer and is not used to configure the QP. Applies only to + RDMA_PS_TCP. +
    +
    retry_count
    +
    The maximum number of times that a data transfer operation should be + retried on the connection when an error occurs. This setting controls the + number of times to retry send, RDMA, and atomic operations when timeouts + occur. Applies only to RDMA_PS_TCP. +
    +
    rnr_retry_count
    +
    The maximum number of times that a send operation from the remote peer + should be retried on a connection after receiving a receiver not ready (RNR) + error. RNR errors are generated when a send request arrives before a buffer + has been posted to receive the incoming data. Applies only to RDMA_PS_TCP. +
    +
    srq
    +
    Specifies if the QP associated with the connection is using a shared + receive queue. This field is ignored by the library if a QP has been created + on the rdma_cm_id. Applies only to RDMA_PS_TCP. +
    +
    qp_num
    +
    Specifies the QP number associated with the connection. This field is + ignored by the library if a QP has been created on the rdma_cm_id. Applies + only to RDMA_PS_TCP. +
    +
    +

    INFINIBAND SPECIFIC

    +In addition to the connection properties defined above, InfiniBand QPs are +configured with minimum RNR NAK timer and local ACK timeout values. The minimum +RNR NAK timer value is set to 0, for a delay of 655 ms. The local ACK timeout is +calculated based on the packet lifetime and local HCA ACK delay. The packet +lifetime is determined by the InfiniBand Subnet Administrator and is part of the +resolved route (path record) information. The HCA ACK delay is a property of the +locally used HCA. Retry count and RNR retry count values are 3-bit values.

    +IWARP SPECIFIC

    +Connections established over iWarp RDMA devices currently require that the +active side of the connection send the first message. +

    SEE ALSO

    +rdma_cm, +rdma_create_id, +rdma_resolve_route, rdma_disconnect, +rdma_listen, +rdma_get_cm_event +

     

    +


    +RDMA_DISCONNECT

    +
    +

    NAME

    +rdma_disconnect - This function disconnects a connection.   +

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_disconnect (struct +rdma_cm_id *id);

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier. +
    +
    +

    DESCRIPTION

    +Disconnects a connection and transitions any associated QP to the error state, +which will flush any posted work requests to the completion queue. This routine +may be called by both the client and server side of a connection. After +successfully disconnecting, an RDMA_CM_EVENT_DISCONNECTED event will be +generated on both sides of the connection.

    SEE ALSO

    +rdma_connect, rdma_listen, +rdma_accept, +rdma_get_cm_event

     

    +


    +RDMA_RESOLVE_ROUTE

    +
    +

    NAME

    +rdma_resolve_route - Resolve the route information needed to establish a +connection.

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_resolve_route (struct +rdma_cm_id *id, int timeout_ms);

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier. +
    +
    timeout_ms
    +
    Time to wait for resolution to complete. +
    +
    +

    DESCRIPTION

    +Resolves an RDMA route to the destination address in order to establish a +connection. The destination address must have already been resolved by calling +rdma_resolve_addr. +

    NOTES

    +This is called on the client side of a connection after calling +rdma_resolve_addr, but before calling rdma_connect.

    INFINIBAND SPECIFIC

    +This call obtains a path record that is used by the connection. +

    SEE ALSO

    +rdma_resolve_addr, +rdma_connect, rdma_get_cm_event +

     

    +


    +RDMA_BIND_ADDR

    +
    +

    NAME

    +rdma_bind_addr - Bind an RDMA identifier to a source address.

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_bind_addr (struct +rdma_cm_id *id, struct sockaddr *addr); +

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier. +
    +
    addr
    +
    Local address information. Wildcard values are permitted. +
    +
    +

    DESCRIPTION

    +Associates a source address with an rdma_cm_id. The address may be wildcarded. +If binding to a specific local address, the rdma_cm_id will also be bound to a +local RDMA device.

    NOTES

    +Typically, this routine is called before calling rdma_listen to bind to a +specific port number, but it may also be called on the active side of a +connection before calling rdma_resolve_addr to bind to a specific address. If +used to bind to port 0, the rdma_cm will select an available port, which can be +retrieved with rdma_get_src_port.

    SEE ALSO

    +rdma_create_id, rdma_listen, +rdma_resolve_addr, +rdma_create_qp, rdma_get_local_addr, +rdma_get_src_port +

     

    +


    +RDMA_LISTEN

    +
    +

    NAME

    +rdma_listen - Listen for incoming connection requests.

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_listen (struct +rdma_cm_id *id, int backlog);

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier. +
    +
    backlog
    +
    backlog of incoming connection requests. +
    +
    +

    DESCRIPTION

    +Initiates a listen for incoming connection requests or datagram service lookup. +The listen will be restricted to the locally bound source address. +

    NOTES

    +Users must have bound the rdma_cm_id to a local address by calling +rdma_bind_addr before calling this routine. If the rdma_cm_id is bound to a +specific IP address, the listen will be restricted to that address and the +associated RDMA device. If the rdma_cm_id is bound to an RDMA port number only, +the listen will occur across all RDMA devices.

    SEE ALSO

    +rdma_cm, +rdma_bind_addr, +rdma_connect, rdma_accept, +rdma_reject, rdma_get_cm_event +

     

    +


    +RDMA_REJECT

    +
    +

    NAME

    +rdma_reject - Called to reject a connection request.

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_reject (struct +rdma_cm_id *id, const void *private_data, +uint8_t private_data_len);

    +

    ARGUMENTS

    +
    +
    id
    +
    Connection identifier associated with the request. +
    +
    private_data
    +
    Optional private data to send with the reject message. +
    +
    private_data_len
    +
    Specifies the size of the user-controlled data buffer. Note that the + actual amount of data transferred to the remote side is transport dependent + and may be larger than that requested. +
    +
    +

    DESCRIPTION

    +Called from the listening side to reject a connection or datagram service lookup +request.

    NOTES

    +After receiving a connection request event, a user may call rdma_reject to +reject the request. If the underlying RDMA transport supports private data in +the reject message, the specified data will be passed to the remote side. +

    SEE ALSO

    +rdma_listen, rdma_accept, +rdma_get_cm_event +

     

    +


    +RDMA_GET_SRC_PORT

    +
    +

    NAME

    +rdma_get_src_port - Returns the local port number of a bound rdma_cm_id.

    +SYNOPSIS

    +#include <rdma/rdma_cma.h>

     uint16_t rdma_get_src_port +(struct rdma_cm_id *id);  

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier. +
    +
    +

    DESCRIPTION

    +Returns the local port number for an rdma_cm_id that has been bound to a local +address.

    SEE ALSO

    +rdma_bind_addr, +rdma_resolve_addr, rdma_get_dst_port, +rdma_get_local_addr, +rdma_get_peer_addr +

     

    +


    +RDMA_GET_DST_PORT

    +
    +

    NAME

    +rdma_get_dst_port - Returns the remote port number of a bound rdma_cm_id.

    +SYNOPSIS

    +#include <rdma/rdma_cma.h>

     uint16_t rdma_get_dst_port +(struct rdma_cm_id *id);

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier.
    +
    +

    DESCRIPTION

    +Returns the remote port number for an rdma_cm_id that has been bound to a remote +address.

    SEE ALSO

    +rdma_connect, rdma_accept, +rdma_get_cm_event, +rdma_get_src_port, rdma_get_local_addr, +rdma_get_peer_addr +

     

    +


    +RDMA_GET_LOCAL_ADDR

    +
    +

    NAME

    +rdma_get_local_addr - Returns the local IP address of a bound rdma_cm_id. +

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     struct sockaddr * rdma_get_local_addr +(struct rdma_cm_id *id);

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier. +
    +
    +

    DESCRIPTION

    +Returns the local IP address for an rdma_cm_id that has been bound to a local +device.

    SEE ALSO

    +rdma_bind_addr, +rdma_resolve_addr, rdma_get_src_port, +rdma_get_dst_port, +rdma_get_peer_addr +

     

    +


    +RDMA_GET_PEER_ADDR

    +
    +

    NAME

    +rdma_get_peer_addr - Returns the remote IP address of a bound rdma_cm_id.

    +SYNOPSIS

    +#include <rdma/rdma_cma.h>

     struct sockaddr * rdma_get_peer_addr +(struct rdma_cm_id *id);

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier.
    +
    +

    DESCRIPTION

    +Returns the remote IP address associated with an rdma_cm_id.

    SEE ALSO

    +rdma_resolve_addr, +rdma_get_src_port, rdma_get_dst_port, +rdma_get_local_addr

     

    +


    +RDMA_EVENT_STR

    +
    +

    NAME

    +rdma_event_str - Returns a string representation of an rdma cm event.

    +SYNOPSIS

    +#include <rdma/rdma_cma.h>

     char * rdma_event_str (enumrdma_cm_event_type +event );

    +

    ARGUMENTS

    +
    +
    event
    +
    Asynchronous event.
    +
    +

    DESCRIPTION

    +Returns a string representation of an asynchronous event.

    SEE ALSO

    +rdma_get_cm_event

     

    +


    +RDMA_JOIN_MULTICAST

    +
    +

    NAME

    +rdma_join_multicast - Joins a multicast group.

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_join_multicast (struct +rdma_cm_id *id, struct sockaddr *addr, +void *context);

    +

    ARGUMENTS

    +
    +
    id
    +
    Communication identifier associated with the request. +
    +
    addr
    +
    Multicast address identifying the group to join. +
    +
    context
    +
    User-defined context associated with the join request.
    +
    +

    DESCRIPTION

    +Joins a multicast group and attaches an associated QP to the group.

    NOTES

    +Before joining a multicast group, the rdma_cm_id must be bound to an RDMA device +by calling rdma_bind_addr or rdma_resolve_addr. Use of rdma_resolve_addr +requires the local routing tables to resolve the multicast address to an RDMA +device, unless a specific source address is provided. The user must call +rdma_leave_multicast to leave the multicast group and release any multicast +resources. After the join operation completes, any associated QP is +automatically attached to the multicast group, and the join context is returned +to the user through the private_data field in the rdma_cm_event.

    SEE ALSO

    +rdma_leave_multicast, +rdma_bind_addr, +rdma_resolve_addr, rdma_create_qp, +rdma_get_cm_event +

     

    +


    +RDMA_LEAVE_MULTICAST

    +
    +

    NAME

    +rdma_leave_multicast - Leaves a multicast group. +

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_leave_multicast (struct +rdma_cm_id *id, struct sockaddr *addr);

    +

    ARGUMENTS

    +
    +
    id
    +
    Communication identifier associated with the request. +
    +
    addr
    +
    Multicast address identifying the group to leave.
    +
    +

    DESCRIPTION

    +Leaves a multicast group and detaches an associated QP from the group.

    NOTES

    +Calling this function before a group has been fully joined results in canceling +the join operation. Users should be aware that messages received from the +multicast group may stilled be queued for completion processing immediately +after leaving a multicast group. Destroying an rdma_cm_id will automatically +leave all multicast groups.

    SEE ALSO

    +rdma_join_multicast, +rdma_destroy_qp +

     

    +


    +RDMA_SET_OPTION

    +
    +

    NAME

    +rdma_set_option - Set communication options for an rdma_cm_id.

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_set_option (struct +rdma_cm_id *id, int level, int +optname, void *optval, size_t optlen); +

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier. +
    +
    level
    +
    Protocol level of the option to set. +
    +
    optname
    +
    Name of the option, relative to the level, to set. +
    +
    optval
    +
    Reference to the option data. The data is dependent on the level and + optname. +
    +
    optlen
    +
    The size of the %optval buffer. +
    +
    +

    DESCRIPTION

    +Sets communication options for an rdma_cm_id. This call is used to override the +default system settings.

    NOTES

    +Option details may be found in the relevent header files.

    SEE ALSO

    +rdma_create_id +

     

    +


    +RDMA_GET_DEVICES

    +
    +

    NAME

    +rdma_get_devices - Get a list of RDMA devices currently available.

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     struct ibv_context ** rdma_get_devices +(int *num_devices);

    +

    ARGUMENTS

    +
    +
    num_devices
    +
    If non-NULL, set to the number of devices returned. +
    +
    +

    DESCRIPTION

    +Return a NULL-terminated array of opened RDMA devices. Callers can use this +routine to allocate resources on specific RDMA devices that will be shared +across multiple rdma_cm_id's.

    NOTES

    +The returned array must be released by calling rdma_free_devices. Devices remain +opened while the librdmacm is loaded +

    SEE ALSO

    +rdma_free_devices +

     

    +


    +RDMA_FREE_DEVICES

    +
    +

    NAME

    +rdma_free_devices - Frees the list of devices returned by rdma_get_devices.

    +SYNOPSIS

    +#include <rdma/rdma_cma.h>

     void rdma_free_devices (struct +ibv_context **list);

    +

    ARGUMENTS

    +
    +
    list
    +
    List of devices returned from rdma_get_devices.
    +
    +

    DESCRIPTION

    +Frees the device array returned by rdma_get_devices.

    SEE ALSO

    +rdma_get_devices +

     

    +


    +RDMA_NOTIFY

    +
    +

    NAME

    +rdma_notify - Notifies the librdmacm of an asynchronous event.

    SYNOPSIS

    +#include <rdma/rdma_cma.h>

     int rdma_notify (struct +rdma_cm_id *id, enum ibv_event_type event);

    +

    ARGUMENTS

    +
    +
    id
    +
    RDMA identifier. +
    +
    event
    +
    Asynchronous event.
    +
    +

    DESCRIPTION

    +Used to notify the librdmacm of asynchronous events that have occurred on a QP +associated with the rdma_cm_id.

    NOTES

    +Asynchronous events that occur on a QP are reported through the user's device +event handler. This routine is used to notify the librdmacm of communication +events. In most cases, use of this routine is not necessary, however if +connection establishment is done out of band (such as done through Infiniband), +it's possible to receive data on a QP that is not yet considered connected. This +routine forces the connection into an established state in this case in order to +handle the rare situation where the connection never forms on its own. Events +that should be reported to the CM are: IB_EVENT_COMM_EST.

    SEE ALSO

    +rdma_connect, rdma_accept, +rdma_listen +

    + +<return-to-top>

    +

     

    +

     

    +

    WinVerbs

    + +
    +
    +

    WinVerbs is a userspace verbs and communication management interface optimized
    for the Windows operating system. Its lower interface is designed to support
    any RDMA based device, including Infiniband and future RDMA devices. Its upper interface is
    capable of providing a low latency verbs interface, plus supports Microsoft's
    NetworkDirect Interface, DAPL and OFED components: libibverbs, libibmad, rdma_cm interfaces and numerous OFED IB