The bug is that the driver performs "CLOSE_PORT" command prior to closing all resources (such as QPs).
In some cases it causes loss of completions.
According to PRM:
18.2 ConnectX Driver Teardown and Re-initialization
The HCA can be shut down (and re-initialized/restarted later on) by software. This operation is performed while the system shuts down gracefully or when PCI bus re-enumeration and memory re-allocation is required. In this case, software should perform the following steps:
•Stop HCA operations (tear-down all QPs and flush WQEs if required).
•Take down the network links by executing the CLOSE_PORT command.
git-svn-id: svn://openib.tc.cornell.edu/gen1@2066
ad392aa1-c5ef-ae45-8dd8-
e69d62a5ef86
ib_unregister_device(&ibdev->ib_dev);
goto dealloc_dev;
}
+
+ ib_unregister_device(&ibdev->ib_dev);
for (p = 1; p <= dev->caps.num_ports; ++p)
mlx4_CLOSE_PORT(dev, p);
- ib_unregister_device(&ibdev->ib_dev);
iounmap(ibdev->uar_map,PAGE_SIZE);
mlx4_uar_free(dev, &ibdev->priv_uar);
mlx4_pd_free(dev, ibdev->priv_pdn);