scm, cma: dapli_thread doesn't always get teminated on library close.
DAPL doesn't actually wait for the async processing thread to exit before
allowing the library to close. It will wait up to 10 seconds, which under
heavy load isn't enough time. Since the thread is created by an application
level thread, it will continue to run as long as the application runs. But
if the application closes the library, then all library data and code is
invalid, which can result in the thread running something that's not
library code and accessing freed memory.
With this change, I was able to run mpi ping-pong, 16 ranks on a single
system (scm provider) without crashes 1300 times.