There was a lot of needless handshaking overhead in the current
Cortex-A8 DCC/ITR operations, since the status read by each step
was discarded rather than letting the next step know it.
This shrinks the handshaking by: (a) passing status along from
previous steps, avoiding re-fetching; which enables the big win
(b) relying on a useful invariant: that the DSCR_INSTR_COMP bit
is set after every call to a DPM method.
A "reg sp_usr" call previously took 17 flushes; now it takes just 9.
This visibly speeds common operations like entry to debug state and
stepping, as well as "arm reg" and so on.
Signed-off-by: David Brownell <dbrownell@users.sourceforge.net>