2 QTDs are allocated for each EP. The current allocation scheme aligns
the first QTD in each pair, but simply adds the struct size to calculate
the second QTD's address. This will result in a non-cache-aligned
addresss IF the system's ARCH_DMA_MINALIGN is not 32 bytes (i.e. the
size of struct ept_queue_item).
Similarly, the original ilist_ent_sz calculation aligned the value to
ARCH_DMA_MINALIGN but didn't take the USB HW's 32-byte alignment
requirement into account. This doesn't cause a practical issue unless
ARCH_DMA_MINALIGN < 32 (which I suspect is quite unlikely), but we may
as well fix the code to be explicit, so it's obviously completely
correct.
The new value of ILIST_ENT_SZ takes all alignment requirements into
account, so we can simplify ci_{flush,invalidate}_qtd() by simply using
that macro rather than calling roundup().
Similarly, the calculation of controller.items[i] can be simplified,
since each QTD is evenly spaced at its individual alignment requirement,
rather than each pair being aligned, and entries within the pair being
spaced apart only by structure size.
Signed-off-by: Stephen Warren <swarren@nvidia.com>