We are running with the caches disabled when mctl_mem_matches gets called,
but the cpu's write buffer is still there and can still get in the way,
add a memory barrier to fix this.
This avoids mctl_mem_matches always returning false in some cases, which
was resulting in: