1 <!doctype linuxdoc system>
4 <title>Defining a Custom cc65 Target
5 <author>Bruce Reidenbach
9 This section provides step-by-step instructions on how to use the cc65
10 toolset for a custom hardware platform (a target system not currently
11 supported by the cc65 library set).
14 <!-- Table of contents -->
18 <!-- Begin the document -->
22 The cc65 toolset provides a set of pre-defined libraries that allow the
23 user to target the executable image to a variety of hardware platforms.
24 In addition, the user can create a customized environment so that the
25 executable can be targeted to a custom platform. The following
26 instructions provide step-by-step instructions on how to customize the
27 toolset for a target that is not supported by the standard cc65
30 The platform used in this example is a Xilinx Field Programmable Gate
31 Array (FPGA) with an embedded 65C02 core. The processor core supports
32 the additional opcodes/addressing modes of the 65SC02, along with the
33 STP and WAI instructions. These instructions will create a set of files
34 to create a custom target, named SBC, for <bf>S</bf>ingle <bf>B</bf>oard
37 <sect>System Memory Map Definition<p>
39 The targeted system uses block RAM contained on the XILINX FPGA for the
40 system memory (both RAM and ROM). The block RAMs are available in
41 various aspect ratios, and they will be used in this system as 2K*8
42 devices. There will be two RAMs used for data storage, starting at
43 location $0000 and growing upwards. There will be one ROM (realized as
44 initialized RAM) used code storage, starting at location $FFFF and
47 The cc65 toolset requires a memory configuration file to define the
48 memory that is available to the cc65 run-time environment, which is
53 ZP: start = $0, size = $100, type = rw, define = yes;
54 RAM: start = $200, size = $0E00, define = yes;
55 ROM: start = $F800, size = $0800, file = %O;
59 ZP defines the available zero page locations, which in this case starts
60 at $0 and has a length of $100. Keep in mind that certain systems may
61 require access to some zero page locations, so the starting address may
62 need to be adjusted accordingly to prevent cc65 from attempting to reuse
63 those locations. Also, at a minimum, the cc65 run-time environment uses
64 26 zero page locations, so the smallest zero page size that can be
65 specified is $1A. The usable RAM memory area begins after the 6502
66 stack storage in page 1, so it is defined as starting at location $200
67 and filling the remaining 4K of space (4096 - 2 *
68 256 = 3584 = $0E00). The 2K of ROM space begins at
69 address $F800 and goes to $FFFF (size = $0800).
71 Next, the memory segments within the memory devices need to be defined.
72 A standard segment definition is used, with one notable exception. The
73 interrupt and reset vector locations need to be defined at locations
74 $FFFA through $FFFF. A special segment named VECTORS is defined that
75 resides at these locations. Later, the interrupt vector map will be
76 created and placed in the VECTORS segment, and the linker will put these
77 vectors at the proper memory locations. The segment definition is:
81 ZEROPAGE: load = ZP, type = zp, define = yes;
82 DATA: load = ROM, type = rw, define = yes, run = RAM;
83 BSS: load = RAM, type = bss, define = yes;
84 STARTUP: load = ROM, type = ro;
85 ONCE: load = ROM, type = ro, optional = yes;
86 CODE: load = ROM, type = ro;
87 RODATA: load = ROM, type = ro;
88 VECTORS: load = ROM, type = ro, start = $FFFA;
92 The meaning of each of these segments is as follows.
94 <p><tt> ZEROPAGE: </tt>Data in page 0, defined by ZP as starting at $0 with length $100
95 <p><tt> DATA: </tt>Initialized data that can be modified by the program, stored in RAM
96 <p><tt> BSS: </tt>Uninitialized data stored in RAM (used for variable storage)
97 <p><tt> STARTUP: </tt>The program initialization code, stored in ROM
98 <p><tt> ONCE: </tt>The code run once to initialize the system, stored in ROM
99 <p><tt> CODE: </tt>The program code, stored in ROM
100 <p><tt> RODATA: </tt>Initialized data that cannot be modified by the program, stored in ROM
101 <p><tt> VECTORS: </tt>The interrupt vector table, stored in ROM at location $FFFA
103 A note about initialized data: any variables that require an initial
104 value, such as external (global) variables, require that the initial
105 values be stored in the ROM code image. However, variables stored in
106 ROM cannot change; therefore the data must be moved into variables that
107 are located in RAM. Specifying <tt>run = RAM</tt> as part of
108 the DATA segment definition will indicate that those variables will
109 require their initialization value to be copied via a call to the
110 <tt>copydata</tt> routine in the startup assembly code. In addition,
111 there are system level variables that will need to be initialized as
112 well, especially if the heap segment is used via a C-level call to
113 <tt>malloc ()</tt>.
115 The final section of the definition file contains the data constructors
116 and destructors used for system startup. In addition, if the heap is
117 used, the maximum C-level stack size needs to be defined in order for
118 the system to be able to reliably allocate blocks of memory. The stack
119 size selection must be greater than the maximum amount of storage
120 required to run the program, keeping in mind that the C-level subroutine
121 call stack and all local variables are stored in this stack. The
122 <tt>FEATURES</tt> section defines the required constructor/destructor
123 attributes and the <tt>SYMBOLS</tt> section defines the stack size. The
124 constructors will be run via a call to <tt>initlib</tt> in the startup
125 assembly code and the destructors will be run via an assembly language
126 call to <tt>donelib</tt> during program termination.
130 CONDES: segment = STARTUP,
132 label = __CONSTRUCTOR_TABLE__,
133 count = __CONSTRUCTOR_COUNT__;
134 CONDES: segment = STARTUP,
136 label = __DESTRUCTOR_TABLE__,
137 count = __DESTRUCTOR_COUNT__;
141 # Define the stack size for the application
142 __STACKSIZE__: value = $0200, weak = yes;
146 These definitions are placed in a file named "sbc.cfg"
147 and are referred to during the ld65 linker stage.
149 <sect>Startup Code Definition<p>
151 In the cc65 toolset, a startup routine must be defined that is executed
152 when the CPU is reset. This startup code is marked with the STARTUP
153 segment name, which was defined in the system configuration file as
154 being in read only memory. The standard convention used in the
155 predefined libraries is that this code is resident in the crt0 module.
156 For this custom system, all that needs to be done is to perform a little
157 bit of 6502 housekeeping, set up the C-level stack pointer, initialize
158 the memory storage, and call the C-level routine <tt>main ()</tt>.
159 The following code was used for the crt0 module, defined in the file
163 ; ---------------------------------------------------------------------------
165 ; ---------------------------------------------------------------------------
167 ; Startup code for cc65 (Single Board Computer version)
172 .export __STARTUP__ : absolute = 1 ; Mark as startup
173 .import __RAM_START__, __RAM_SIZE__ ; Linker generated
175 .import copydata, zerobss, initlib, donelib
177 .include "zeropage.inc"
179 ; ---------------------------------------------------------------------------
180 ; Place the startup code in a special segment
182 .segment "STARTUP"
184 ; ---------------------------------------------------------------------------
185 ; A little light 6502 housekeeping
187 _init: LDX #$FF ; Initialize stack pointer to $01FF
189 CLD ; Clear decimal mode
191 ; ---------------------------------------------------------------------------
192 ; Set cc65 argument stack pointer
194 LDA #<(__RAM_START__ + __RAM_SIZE__)
196 LDA #>(__RAM_START__ + __RAM_SIZE__)
199 ; ---------------------------------------------------------------------------
200 ; Initialize memory storage
202 JSR zerobss ; Clear BSS segment
203 JSR copydata ; Initialize DATA segment
204 JSR initlib ; Run constructors
206 ; ---------------------------------------------------------------------------
211 ; ---------------------------------------------------------------------------
212 ; Back from main (this is also the _exit entry): force a software break
214 _exit: JSR donelib ; Run destructors
218 The following discussion explains the purpose of several important
219 assembler level directives in this file.
225 This line instructs the assembler that the symbols <tt>_init</tt> and
226 <tt>_exit</tt> are to be accessible from other modules. In this
227 example, <tt>_init</tt> is the location that the CPU should jump to when
228 reset, and <tt>_exit</tt> is the location that will be called when the
235 This line instructs the assembler to import the symbol <tt>_main</tt>
236 from another module. cc65 names all C-level routines as
237 {underscore}{name} in assembler, thus the <tt>main ()</tt> routine
238 in C is named <tt>_main</tt> in the assembler. This is how the startup
239 code will link to the C-level code.
242 .export __STARTUP__ : absolute = 1 ; Mark as startup
245 This line marks this code as startup code (code that is executed when
246 the processor is reset), which will then be automatically linked into
250 .import __RAM_START__, __RAM_SIZE__ ; Linker generated
253 This line imports the RAM starting address and RAM size constants, which
254 are used to initialize the cc65 C-level argument stack pointer.
257 .segment "STARTUP"
260 This line instructs the assembler that the code is to be placed in the
261 STARTUP segment of memory.
264 JSR zerobss ; Clear BSS segment
265 JSR copydata ; Initialize DATA segment
266 JSR initlib ; Run constructors
269 These three lines initialize the external (global) and system
270 variables. The first line sets the BSS segment -- the memory locations
271 used for external variables -- to 0. The second line copies the
272 initialization value stored in ROM to the RAM locations used for
273 initialized external variables. The last line runs the constructors
274 that are used to initialize the system run-time variables.
280 This is the actual call to the C-level <tt>main ()</tt> routine,
281 which is called after the startup code completes.
284 _exit: JSR donelib ; Run destructors
288 This is the code that will be executed when <tt>main ()</tt>
289 terminates. The first thing that must be done is run the destructors
290 via a call to <tt>donelib</tt>. Then the program can terminate. In
291 this example, the program is expected to run forever. Therefore, there
292 needs to be a way of indicating when something has gone wrong and the
293 system needs to be shut down, requiring a restart only by a hard reset.
294 The BRK instruction will be used to indicate a software fault. This is
295 advantageous because cc65 uses the BRK instruction as the fill byte in
296 the final binary code. In addition, the hardware has been designed to
297 force the data lines to $00 for all illegal memory accesses, thereby
298 also forcing a BRK instruction into the CPU.
300 <sect>Custom Run-Time Library Creation<p>
302 The next step in customizing the cc65 toolset is creating a run-time
303 library for the targeted hardware. The recommended way to do this is to
304 modify the platform-independent standard library of the cc65 distribution.
305 It is named "none.lib" in the lib directory of the distribution.
307 When using "none.lib" we need to supply our own <tt>crt0</tt>
308 module with custom startup code. This is simply done by first copying the
309 the library and giving it a new name, compiling the startup code with ca65,
310 and finally using the ar65 archiver to add the module to the new library.
311 The steps are shown below:
314 cp /usr/local/share/cc65/lib/none.lib sbc.lib
316 ar65 a sbc.lib crt0.o
319 <sect>Interrupt Service Routine Definition<p>
321 For this system, the CPU is put into a wait condition prior to allowing
322 interrupt processing. Therefore, the interrupt service routine is very
323 simple: return from all valid interrupts. However, as mentioned
324 before, the BRK instruction is used to indicate a software fault, which
325 will call the same interrupt service routine as the maskable interrupt
326 signal IRQ. The interrupt service routine must be able to tell the
327 difference between the two, and act appropriately.
329 The interrupt service routine shown below includes code to detect when a
330 BRK instruction has occurred and stops the CPU from further processing.
331 The interrupt service routine is in a file named
332 "interrupt.s".
335 ; ---------------------------------------------------------------------------
337 ; ---------------------------------------------------------------------------
341 ; Checks for a BRK instruction and returns from all valid interrupts.
344 .export _irq_int, _nmi_int
346 .segment "CODE"
348 .PC02 ; Force 65C02 assembly mode
350 ; ---------------------------------------------------------------------------
351 ; Non-maskable interrupt (NMI) service routine
353 _nmi_int: RTI ; Return from all NMI interrupts
355 ; ---------------------------------------------------------------------------
356 ; Maskable interrupt (IRQ) service routine
358 _irq_int: PHX ; Save X register contents to stack
359 TSX ; Transfer stack pointer to X
360 PHA ; Save accumulator contents to stack
361 INX ; Increment X so it points to the status
362 INX ; register value saved on the stack
363 LDA $100,X ; Load status register contents
364 AND #$10 ; Isolate B status bit
365 BNE break ; If B = 1, BRK detected
367 ; ---------------------------------------------------------------------------
368 ; IRQ detected, return
370 irq: PLA ; Restore accumulator contents
371 PLX ; Restore X register contents
372 RTI ; Return from all IRQ interrupts
374 ; ---------------------------------------------------------------------------
377 break: JMP _stop ; If BRK is detected, something very bad
378 ; has happened, so stop running
381 The following discussion explains the purpose of several important
382 assembler level directives in this file.
388 This line instructs the assembler to import the symbol <tt>_stop</tt>
389 from another module. This routine will be called if a BRK instruction
390 is encountered, signaling a software fault.
393 .export _irq_int, _nmi_int
396 This line instructs the assembler that the symbols <tt>_irq_int</tt> and
397 <tt>_nmi_int</tt> are to be accessible from other modules. In this
398 example, the address of these symbols will be placed in the interrupt
402 .segment "CODE"
405 This line instructs the assembler that the code is to be placed in the
406 CODE segment of memory. Note that because there are 65C02 mnemonics in
407 the assembly code, the assembler is forced to use the 65C02 instruction
408 set with the <tt>.PC02</tt> directive.
410 The final step is to define the interrupt vector memory locations.
411 Recall that a segment named VECTORS was defined in the memory
412 configuration file, which started at location $FFFA. The addresses of
413 the interrupt service routines from "interrupt.s" along with
414 the address for the initialization code in crt0 are defined in a file
415 named "vectors.s". Note that these vectors will be placed in
416 memory in their proper little-endian format as:
418 <p><tt> $FFFA - $FFFB:</tt> NMI interrupt vector (low byte, high byte)
419 <p><tt> $FFFC - $FFFD:</tt> Reset vector (low byte, high byte)
420 <p><tt> $FFFE - $FFFF:</tt> IRQ/BRK interrupt vector (low byte, high byte)
422 using the <tt>.addr</tt> assembler directive. The contents of the file are:
425 ; ---------------------------------------------------------------------------
427 ; ---------------------------------------------------------------------------
429 ; Defines the interrupt vector table.
432 .import _nmi_int, _irq_int
434 .segment "VECTORS"
436 .addr _nmi_int ; NMI vector
437 .addr _init ; Reset vector
438 .addr _irq_int ; IRQ/BRK vector
441 The cc65 toolset will replace the address symbols defined here with the
442 actual addresses of the routines during the link process.
444 <sect>Adding Custom Instructions<p>
446 The cc65 instruction set only supports the WAI (Wait for Interrupt) and
447 STP (Stop) instructions when used with the 65816 CPU (accessed via the
448 --cpu command line option of the ca65 macro assembler). The 65C02 core
449 used in this example supports these two instructions, and in fact the
450 system benefits from the use of both the WAI and STP instructions.
452 In order to use the WAI instruction in this case, a C routine named
453 "wait" was created that consists of the WAI opcode followed by
454 a subroutine return. It was convenient in this example to put the IRQ
455 interrupt enable in this subroutine as well, since interrupts should
456 only be enabled when the code is in this wait condition.
458 For both the WAI and STP instructions, the assembler is
459 "fooled" into placing those opcodes into memory by inserting a
460 single byte of data that just happens to be the opcode for those
461 instructions. The assembly code routines are placed in a file, named
462 "wait.s", which is shown below:
465 ; ---------------------------------------------------------------------------
467 ; ---------------------------------------------------------------------------
469 ; Wait for interrupt and return
473 ; ---------------------------------------------------------------------------
474 ; Wait for interrupt: Forces the assembler to emit a WAI opcode ($CB)
475 ; ---------------------------------------------------------------------------
477 .segment "CODE"
481 CLI ; Enable interrupts
482 .byte $CB ; Inserts a WAI opcode
483 RTS ; Return to caller
487 ; ---------------------------------------------------------------------------
488 ; Stop: Forces the assembler to emit a STP opcode ($DB)
489 ; ---------------------------------------------------------------------------
493 .byte $DB ; Inserts a STP opcode
498 The label <tt>_wait</tt>, when exported, can be called by using the
499 <tt>wait ()</tt> subroutine call in C. The section is marked as
500 code so that it will be stored in read-only memory, and the procedure is
501 tagged for 16-bit absolute addressing via the "near"
502 modifier. Similarly, the <tt>_stop</tt> routine can be called from
503 within the C-level code via a call to <tt>stop ()</tt>. In
504 addition, the routine can be called from assembly code by calling
505 <tt>_stop</tt> (as was done in the interrupt service routine).
507 <sect>Hardware Drivers<p>
509 Oftentimes, it can be advantageous to create small application helpers
510 in assembly language to decrease codespace and increase execution speed
511 of the overall program. An example of this would be the transfer of
512 characters to a FIFO (<bf>F</bf>irst-<bf>I</bf>n,
513 <bf>F</bf>irst-<bf>O</bf>ut) storage buffer for transmission over a
514 serial port. This simple action could be performed by an assembly
515 language driver which would execute much quicker than coding it in C.
516 The following discussion outlines a method of interfacing a C program
517 with an assembly language subroutine.
519 The first step in creating the assembly language code for the driver is
520 to determine how to pass the C arguments to the assembly language
521 routine. The cc65 toolset allows the user to specify whether the data
522 is passed to a subroutine via the stack or by the processor registers by
523 using the <tt/__fastcall__/ and <tt/__cdecl__/ function qualifiers (note that
524 there are two underscore characters in front of and two behind each
525 qualifier). <tt/__fastcall__/ is the default. When <tt/__cdecl__/ <em/isn't/
526 specified, and the function isn't variadic (i.e., its prototype doesn't have
527 an ellipsis), the rightmost argument in the function call is passed to the
528 subroutine using the 6502 registers instead of the stack. Note that if
529 there is only one argument in the function call, the execution overhead
530 required by the stack interface routines is completely avoided.
532 With <tt/__cdecl__</tt>, the last argument is loaded into the A and X
533 registers and then pushed onto the stack via a call to <tt>pushax</tt>.
534 The first thing the subroutine does is retrieve the argument from the
535 stack via a call to <tt>ldax0sp</tt>, which copies the values into the A
536 and X. When the subroutine is finished, the values on the stack must be
537 popped off and discarded via a jump to <tt>incsp2</tt>, which includes
538 the RTS subroutine return command. This is shown in the following code
544 lda #<(L0001) ; Load A with the high order byte
545 ldx #>(L0001) ; Load X with the low order byte
546 jsr pushax ; Push A and X onto the stack
547 jsr _foo ; Call foo, i.e., foo (arg)
553 _foo: jsr ldax0sp ; Retrieve A and X from the stack
554 sta ptr ; Store A in ptr
555 stx ptr+1 ; Store X in ptr+1
556 ... ; (more subroutine code goes here)
557 jmp incsp2 ; Pop A and X from the stack (includes return)
560 If <tt/__cdecl__/ isn't specified, then the argument is loaded into the A
561 and X registers as before, but the subroutine is then called
562 immediately. The subroutine does not need to retrieve the argument
563 since the value is already available in the A and X registers.
564 Furthermore, the subroutine can be terminated with an RTS statement
565 since there is no stack cleanup which needs to be performed. This is
566 shown in the following code sample.
571 lda #<(L0001) ; Load A with the high order byte
572 ldx #>(L0001) ; Load X with the low order byte
573 jsr _foo ; Call foo, i.e., foo (arg)
579 _foo: sta ptr ; Store A in ptr
580 stx ptr+1 ; Store X in ptr+1
581 ... ; (more subroutine code goes here)
582 rts ; Return from subroutine
585 The hardware driver in this example writes a string of character data to
586 a hardware FIFO located at memory location $1000. Each character is
587 read and is compared to the C string termination value ($00), which will
588 terminate the loop. All other character data is written to the FIFO.
589 For convenience, a carriage return/line feed sequence is automatically
590 appended to the serial stream. The driver defines a local pointer
591 variable which is stored in the zero page memory space in order to allow
592 for retrieval of each character in the string via the indirect indexed
595 The assembly language routine is stored in a file names
596 "rs232_tx.s" and is shown below:
599 ; ---------------------------------------------------------------------------
601 ; ---------------------------------------------------------------------------
603 ; Write a string to the transmit UART FIFO
606 .exportzp _rs232_data: near
608 .define TX_FIFO $1000 ; Transmit FIFO memory location
612 _rs232_data: .res 2, $00 ; Reserve a local zero page pointer
614 .segment "CODE"
616 .proc _rs232_tx: near
618 ; ---------------------------------------------------------------------------
619 ; Store pointer to zero page memory and load first character
621 sta _rs232_data ; Set zero page pointer to string address
622 stx _rs232_data+1 ; (pointer passed in via the A/X registers)
623 ldy #00 ; Initialize Y to 0
624 lda (_rs232_data) ; Load first character
626 ; ---------------------------------------------------------------------------
627 ; Main loop: read data and store to FIFO until \0 is encountered
629 loop: sta TX_FIFO ; Loop: Store character in FIFO
630 iny ; Increment Y index
631 lda (_rs232_data),y ; Get next character
632 bne loop ; If character == 0, exit loop
634 ; ---------------------------------------------------------------------------
635 ; Append CR/LF to output stream and return
646 <sect>Hello World! Example<p>
648 The following short example demonstrates programming in C using the cc65
649 toolset with a custom run-time environment. In this example, a Xilinx
650 FPGA contains a UART which is connected to a 65c02 processor with FIFO
651 (<bf>F</bf>irst-<bf>I</bf>n, <bf>F</bf>irst-<bf>O</bf>ut) storage to
652 buffer the data. The C program will wait for an interrupt generated by
653 the receive UART and then respond by transmitting the string "Hello
654 World! " every time a question mark character is received via a
655 call to the hardware driver <tt>rs232_tx ()</tt>. The driver
656 prototype uses the <tt>__fastcall__</tt> extension to indicate that the
657 driver does not use the stack. The FIFO data interface is at address
658 $1000 and is defined as the symbolic constant <tt>FIFO_DATA</tt>.
659 Writing to <tt>FIFO_DATA</tt> transfers a byte of data into the transmit
660 FIFO for subsequent transmission over the serial interface. Reading
661 from <tt>FIFO_DATA</tt> transfers a byte of previously received data out
662 of the receive FIFO. The FIFO status data is at address $1001 and is
663 defined as the symbolic constant <tt>FIFO_STATUS</tt>. For convenience,
664 the symbolic constants <tt>TX_FIFO_FULL</tt> (which isolates bit 0 of
665 the register) and <tt>RX_FIFO_EMPTY</tt> (which isolates bit 1 of the
666 register) have been defined to read the FIFO status.
668 The following C code is saved in the file "main.c". As this
669 example demonstrates, the run-time environment has been set up such that
670 all of the behind-the-scene work is transparent to the user.
673 #define FIFO_DATA (*(unsigned char *) 0x1000)
674 #define FIFO_STATUS (*(unsigned char *) 0x1001)
676 #define TX_FIFO_FULL (FIFO_STATUS & 0x01)
677 #define RX_FIFO_EMPTY (FIFO_STATUS & 0x02)
680 extern void __fastcall__ rs232_tx (char *str);
683 while (1) { // Run forever
684 wait (); // Wait for an RX FIFO interrupt
686 while (RX_FIFO_EMPTY == 0) { // While the RX FIFO is not empty
687 if (FIFO_DATA == '?') { // Does the RX character = '?'
688 rs232_tx ("Hello World!"); // Transmit "Hello World!"
689 } // Discard any other RX characters
693 return (0); // We should never get here!
697 <sect>Putting It All Together<p>
699 The following commands will create a ROM image named "a.out"
700 that can be used as the initialization data for the Xilinx Block RAM
701 used for code storage:
704 cc65 -t none -O --cpu 65sc02 main.c
705 ca65 --cpu 65sc02 main.s
706 ca65 --cpu 65sc02 rs232_tx.s
707 ca65 --cpu 65sc02 interrupt.s
708 ca65 --cpu 65sc02 vectors.s
709 ca65 --cpu 65sc02 wait.s
710 ld65 -C sbc.cfg -m main.map interrupt.o vectors.o wait.o
711 rs232_tx.o main.o sbc.lib
714 During the C-level code compilation phase (<tt>cc65</tt>), assumptions
715 about the target system are disabled via the <tt>-t none</tt> command
716 line option. During the object module linker phase (<tt>ld65</tt>), the
717 target customization is enabled via inclusion of the <tt>sbc.lib</tt>
718 file and the selection of the configuration file via the <tt>-C
719 sbc.cfg</tt> command line option.
721 The 65C02 core used most closely matches the cc65 toolset processor
722 named 65SC02 (the 65C02 extensions without the bit manipulation
723 instructions), so all the commands specify the use of that processor via
724 the <tt>--cpu 65sc02</tt> option.