Firstly, this fixes relocation issues. I had to use part of Dcache as RAM for a
while. I also moved around the lowlevel init code. It turned out so most of the
lowlevel init code ended in cpu.c (and eventually was rewritten into C).
This will also allow easier operation with FDT, multi-CPU-model support etc. in
later releases.
NOTE: This breaks most of the PXA boards (actually, the reloc stuff did already,
this only finishes the doom).
Signed-off-by: Marek Vasut <marek.vasut@gmail.com>