Add read buffer to bitbang, improving performance.
Previously for every bit scanned OpenOCD would write the bit, wait for
that bit to be scanned, and then read the result. This involves at least
2 context switches. Most of the time the next bit scanned does not
depend on the last bit we read, so with a buffer we now write a bunch of
bits to be scanned all at once, and then we wait for them all to be
scanned and have a result.
This reduces the time for one testcase where OpenOCD connects to a
simulator from 12.30s to 5.35s!