The original aligned_buffer usage:
a) Uselessly copied data into the aligned buffer even for IN
transactions. Fix this my making the copy conditional.
b) Always programmed the HW to transfer to/from the start of the aligned
buffer. This worked fine for OUT transactions since the memcpy copied
the OUT data to this location too. However, for large IN transactions,
since the copy from the aligned buffer to the "client" buffer was
deferred until after all chunks were transferred. it resulted in each
chunk's transfer over-writing the data for the first transfer. Fix
this by copying IN data as soon as it's received.
Signed-off-by: Stephen Warren <swarren@wwwdotorg.org>