+
+/*
+This version has optimized assembly routines for 32 bit operations:
+- read word
+- write word
+- write array of words
+
+One thing to be aware of is that the MIPS32 cpu will execute the
+instruction after a branch instruction (one delay slot).
+
+For example:
+
+
+ LW $2, ($5 +10)
+ B foo
+ LW $1, ($2 +100)
+
+The LW $1, ($2 +100) instruction is also executed. If this is
+not wanted a NOP can be inserted:
+
+ LW $2, ($5 +10)
+ B foo
+ NOP
+ LW $1, ($2 +100)
+
+or the code can be changed to:
+
+ B foo
+ LW $2, ($5 +10)
+ LW $1, ($2 +100)
+
+The original code contained NOPs. I have removed these and moved
+the branches.
+
+I also moved the PRACC_STACK to 0xFF204000. This allows
+the use of 16 bits offsets to get pointers to the input
+and output area relative to the stack. Note that the stack
+isn't really a stack (the stack pointer is not 'moving')
+but a FIFO simulated in software.
+
+These changes result in a 35% speed increase when programming an
+external flash.
+
+More improvement could be gained if the registers do no need
+to be preserved but in that case the routines should be aware
+OpenOCD is used as a flash programmer or as a debug tool.
+
+Nico Coesel
+*/
+
+