mpxyd: scale-up with MPI dapl:dapl hits low mem issue with 1 byte traffic patterns
PI and PO buffer management use last byte offset of work requests
and assume non-zero value. However, in the case where a 1 byte
rdma occurs at offset 0 will result in m_idx being set to zero
as a valid offset and the buffer never being marked complete.
Clean-up buffer management, add error reporting on setting,
serialization on PO by setting in post_send op thread, setting
start location at cacheline when buffer ring wraps instead of 0.
Signed-off-by: Arlin Davis <arlin.r.davis@intel.com>