Sean Hefty [Mon, 13 Dec 2010 18:35:40 +0000 (10:35 -0800)]
ibacm: add support for path lookup by LID/GID
Support path record lookup based on DLID or DGID. This allows
ibacm to be used purely for path record caching, with address
resolution being done by a user. This is needed to support
clients that exchange LID/GID information out of band before
connecting, or that make use of ipoib for address resolution.
Sean Hefty [Thu, 16 Dec 2010 01:23:26 +0000 (17:23 -0800)]
ibacm: fix lmc usage
Convert lmc to actual mask for use in slid comparisons. All SA
queries are initiated from the base lid, so set the src_path_bits
to 0. It is currently set incorrectly based on using the lmc as
a mask against the base lid.
Sean Hefty [Wed, 8 Dec 2010 19:34:36 +0000 (11:34 -0800)]
ib_acm: change default log file
Change the default logging file from stdout to
/var/log/ibacm.log, so we create a usable log file if
ib_acm is run without the user first having run ib_acme -O.
Sean Hefty [Tue, 7 Dec 2010 21:44:44 +0000 (13:44 -0800)]
ib_acme: add an 'unspecified' address format
Support an 'unspecified' address format. If no address
format is given, ib_acme will use getaddrinfo to try to
resolve the destination address.
This simplifies the user interface, plus allows the user
to specify a hostname for input, but allow getaddrinfo
to resolve the hostname into an IP address before contacting
the ACM service. The latter can provide a benefit of
being able to specify hostnames on a large cluster, but
allow ib_acme to populate ib_acm caches with IP address
data instead. This benefits MPI applications which may
exchange IP addresses out of band for communications, but
rely on hostnames for job distribution.
Sean Hefty [Tue, 7 Dec 2010 17:39:52 +0000 (09:39 -0800)]
ib_acme: Allow user to specify multiple destinations
In order to test the ACM service on large clusters and
force population of cached data, allow a user to indicate
that ib_acme should resolve information for multiple
destinations.
Multiple destinations can be specified by using bracket
syntax, such as [1-10,12,15-20], at the end of the
destination name.
Sean Hefty [Mon, 6 Dec 2010 21:07:41 +0000 (13:07 -0800)]
ibacm: support no delay options
Allow a user to specify that address and/or route resolution
protocols should be suppressed, and that only locally cached
data should be returned.
This helps support rdma_getaddrinfo options RAI_NUMERICHOST
and RAI_NOROUTE. If data for a request is available, it is
immediately returned. Otherwise, the client request is
failed, but the lookup is still initiated. This avoids
blocking a client for an extended period of time while
resolution completes, but allows future calls to find cached
data.
Sean Hefty [Mon, 6 Dec 2010 19:51:38 +0000 (11:51 -0800)]
ibacm: write port to /var/run/ibacm.port
Write used port data to /var/run/ibacm.port. This will allow
librdmacm and other libraries and applications to find the
ibacm service when it has been moved from its default port.
Sean Hefty [Mon, 6 Dec 2010 19:06:47 +0000 (11:06 -0800)]
ibacm: automatically generate addresses if missing acm_addr.cfg
If the acm_addr.cfg file is missing, automatically attempt to
generate the file. If no addresses can be assigned to any
endpoints because of a missing file, log an error and exit
the service. This avoids running a service that is basically
useless.
This change allows the ib_acm service to execute using defaults
without requireing that a user first run 'ib_acme -A -O' or
otherwise creating the acm configuration files.
Sean Hefty [Mon, 6 Dec 2010 16:42:43 +0000 (08:42 -0800)]
ibacm: modify logging output
Move some level 2 debug output to level 1, some level 0 to 1,
and mark output with 'notice' or 'ERROR' where needed. This
will help debug scaling problems on large clusters without needing to
parse through volumes of data. The original messages were guesses
anyway.
Sean Hefty [Fri, 3 Dec 2010 20:52:49 +0000 (12:52 -0800)]
ib_acme: Hide output for -A -O options unless verbose
Do not display the contents of generated config files by
default. Add a verbose option to ib_acme and only display
the contents if verbose is enabled. This reduces the amount
of noise using ib_acme to configuring address information
across a large cluster.
Sean Hefty [Thu, 2 Dec 2010 22:12:56 +0000 (14:12 -0800)]
ibacm: Add lock to prevent multiple daemons from running
Use a lock file to prevent multiple daemons from running
simultaneously.
Without this lock, a second instance of ib_acm eventually
fails to bind to the server's TCP port and exits, but not
before it overwrites a portion of the log file.
Sean Hefty [Wed, 1 Dec 2010 03:38:15 +0000 (19:38 -0800)]
ibacm: decrease default retries
With 15 retries, 2 seconds per timeout, the total timeout ends
up being too long when running on large clusters, where a node
can drop off at anytime. Decrease the default number of retries
to 2.
Sean Hefty [Tue, 16 Nov 2010 18:39:24 +0000 (10:39 -0800)]
ibacm: Introduce loopback resolution 'protocol'
OFED 1.5.2 introduced a kernel patch that disables looping
back multicast messages to the sending QP. This capability
was enabled by default. The current ACM address resolution
protocol depends on this feature of multicast. To handle
the case where multicast loopback has been disabled, add an
option to ACM to resolve loopback addresses using local data
only. This option is enabled by default.
When loopback resolution is set to 'local', ACM will automatically
insert a loopback destination into each endpoint. Requests
to resolve a loopback address will find this destination
and use it to respond to client requests.
Sean Hefty [Mon, 15 Nov 2010 20:08:46 +0000 (12:08 -0800)]
ibacm: fix issuing SA query after recording address
ACM has two ways that it can complete address resolution. The
first is to receive a response to an address resolution query.
The second is to receive a multicast message carrying an address
resolution request. In the second case, the address request
may be between two other nodes.
When this occurs, ACM will record the address information of
the source of the multicast message. However, it's possible
for ACM to be in the process of trying to resolve that address.
After it records the address, it must see if there's an
outstanding request against that address, so that it can kick
off route resolution.
This fixes an issue where ACM will hang resolving addresses
found during scale-up testing.
Sean Hefty [Wed, 17 Nov 2010 00:05:28 +0000 (16:05 -0800)]
ibacm: enhance debug messages
Prefix all log messages with time stamp information. This
provides useful debugging information regarding the timing of
various operations.
To help match up various operations with each other, improve
and simplify logging of endpoint and destination names. We
save printable names with all destinations. The
function to display address information is removed in favor
of using the normal logging function, a call to format address
data, and per thread logging data.
Finally, we add ability to handle LID/GID addresses in acm_log_addr.
This is required to display SA and multicast destination addresses
in the acm log file.
Sean Hefty [Wed, 25 Aug 2010 17:26:22 +0000 (10:26 -0700)]
ibacm: support distros with older versions of gcc
ibacm implements atomics using gcc intrinsics that were introduced
in gcc 4.1.2. If an older version of gcc is used to compile the
code, an error results. Check that the required atomic calls are
supported, and if not, provide our own implementation.
Sean Hefty [Sat, 24 Jul 2010 00:12:24 +0000 (17:12 -0700)]
ibacm: change location of default configuration file
Move the default location of the configuration files from the
current directory to /etc/ibacm. Change ib_acme to create the
files in this location, and modify ibacm to use the files here
by default.
Sean Hefty [Sat, 8 May 2010 21:53:25 +0000 (14:53 -0700)]
ibacm: wait for SA query to finish before sending reply
When using the SA for route resolution, we need to delay sending
back the client response until SA resolution is complete. If
we cache an ACM address for resolution, but we are not the target
of the resolution, do not respond to queued client requests.
Sean Hefty [Wed, 5 May 2010 18:38:39 +0000 (11:38 -0700)]
ibacm: fix log file usage
The ibacm service inserts a space into the log file name. Remove it.
Flush log file entries to disk, since the file is never closed, to avoid
losing entries. Finally, set the default to write the log file to disk,
rather than the stdout.
Sean Hefty [Mon, 3 May 2010 21:21:10 +0000 (14:21 -0700)]
ib/acm: supporting querying the SA for routing data
To support complex routing topologies, we add the ability to
query the SA for path record data when resolving IB routing data.
Address resolution still relies on the ACM protocol, but route
resolution can select between the ACM protocol and SA queries.
Sean Hefty [Thu, 22 Apr 2010 17:16:04 +0000 (10:16 -0700)]
ib/acm: add state tracking
To support alternate address and route resolution protocols,
add state tracking to destinations. This will allow for the
completion of address resolution separate from route
resolution.
As part of this change, we add locking to destinations,
rather than relying on the endpoint locking.
Sean Hefty [Fri, 9 Apr 2010 18:48:10 +0000 (11:48 -0700)]
ibacm: resolve source address if not given
Allow the user to only provide the destination address into
ACM. If a source address is not given, resolve a usable
source address, and return it to the user.
Sean Hefty [Thu, 29 Oct 2009 21:02:22 +0000 (13:02 -0800)]
ibacm: rework socket interface messages
Rather than indicating the src/dst address type in the acm message
header, indicate the type in the acm_ep_addr structure directly.
This adds more flexibility and will permit additional acm_ep_addr
structures to be carried in a single message request or response.
Allow multiple addresses in a request or response message. This
is needed to support a wider range of fabric topologies, where the
inbound and outbound paths may differ, plus support failover.
Add path records to the resolve message, rather than as a separate
operation. Return path records directly from the lookup.
Sean Hefty [Fri, 2 Oct 2009 19:48:47 +0000 (12:48 -0700)]
libacm: open devices once
Instead of opening IB devices on every call to ib_acm_convert_to_path,
open all devices during initialization and store the necessary data.
This optimizes multiple calls to ib_acm_convert_to_path to improve
connection setup times.