This version uses nonblocking operations for both sending and receiving;
primarily, this is to handle the buffering issues. In this case, the sends
are posted first, allowing receiver-pull rendezvous protocols to often avoid
synchronization delays (but without guarenteeing that)
A separate example shows the use of nonblocking operations to express the
overlap of communication and computation.