2 How to generate the most effective code with cc65.
7 This will not only help to find errors between separate modules, it will
8 also generate better code, since the compiler must not assume that a
9 variable sized parameter list is in place and must not pass the argument
10 count to the called function. This will lead to shorter and faster code.
14 2. Don't declare auto variables in nested function blocks.
16 Variable declarations in nested blocks are usually a good thing. But with
17 cc65, there is a drawback: Since the compiler generates code in one pass,
18 it must create the variables on the stack each time the block is entered
19 and destroy them when the block is left. This causes a speed penalty and
24 3. Remember that the compiler does not optimize.
26 The compiler needs hints from you about the code to generate. When
27 accessing indexed data structures, get a pointer to the element and
28 use this pointer instead of calculating the index again and again.
29 If you want to have your loops unrolled, or loop invariant code moved
30 outside the loop, you have to do that yourself.
36 While long support is necessary for some things, it's really, really slow
37 on the 6502. Remember that any long variable will use 4 bytes of memory,
38 and any operation works on double the data compared to an int.
42 5. Use unsigned types wherever possible.
44 The CPU has no opcodes to handle signed values greater than 8 bit. So
45 sign extension, test of signedness etc. has to be done by hand. The
46 code to handle signed operations is usually a bit slower than the same
47 code for unsigned types.
51 6. Use chars instead of ints if possible.
53 While in arithmetic operations, chars are immidiately promoted to ints,
54 they are passed as chars in parameter lists and are accessed as chars
55 in variables. The code generated is usually not much smaller, but it
56 is faster, since accessing chars is faster. For several operations, the
57 generated code may be better if intermediate results that are known not
58 to be larger than 8 bit are casted to chars.
66 the result of the & operator is an int because of the int promotion
67 rules of the language. So the compare is also done with 16 bits. When
72 if ((unsigned char)(a & 0x0F) == 0)
74 the generated code is much shorter, since the operation is done with
79 7. Make the size of your array elements one of 1, 2, 4, 8.
81 When indexing into an array, the compiler has to calculate the byte
82 offset into the array, which is the index multiplied by the size of
83 one element. When doing the multiplication, the compiler will do a
84 strength reduction, that is, replace the multiplication by a shift
85 if possible. For the values 2, 4 and 8, there are even more specialized
86 subroutines available. So, array access is fastest when using one of
91 8. Expressions are evaluated from left to right.
93 Since cc65 is not building an explicit expression tree when parsing an
94 expression, constant subexpressions may not be detected and optimized
95 properly if you don't help. Look at this example:
101 The expression is parsed from left to right, that means, the compiler sees
102 'i', and puts it contents into the secondary register. Next is OFFS, which
103 is constant. The compiler emits code to add a constant to the secondary
104 register. Same thing again for the constant 3. So the code produced
105 contains a fetch of 'i', two additions of constants, and a store (into
106 'i'). Unfortunately, the compiler does not see, that "OFFS + 3" is a
107 constant for itself, since it does it's evaluation from left to right.
108 There are some ways to help the compiler to recognize expression like
111 a. Write "i = OFFS + 3 + i;". Since the first and second operand are
112 constant, the compiler will evaluate them at compile time reducing the
113 code to a fetch, one addition (secondary + constant) and one store.
115 b. Write "i = i + (OFFS + 3)". When seeing the opening parenthesis, the
116 compiler will start a new expression evaluation for the stuff in the
117 braces, and since all operands in the subexpression are constant, it
118 will detect this and reduce the code to one fetch, one addition and
123 9. Case labels in a switch statments are checked in source order.
125 Labels that appear first in a switch statement are tested first. So,
126 if your switch statement contains labels that are selected most of
127 the time, put them first in your source code. This will speed up the
132 10. Use the preincrement and predecrement operators.
134 The compiler not always smart enough to figure out, if the rvalue of an
135 increment is used or not. So it has to save and restore that value when
136 producing code for the postincrement and postdecrement operators, even if
137 this value is never used. To avoid the additional overhead, use the
138 preincrement and predecrement operators if you don't need the resulting
139 value. That means, use
153 11. Use constants to access absolute memory locations.
155 The compiler produces optimized code, if the value of a pointer is a
156 constant. So, to access direct memory locations, use
158 #define VDC_DATA 0xD601
159 *(char*)VDC_STATUS = 0x01;
161 That will be translated to
166 The constant value detection works also for struct pointers and arrays,
167 if the subscript is a constant. So
169 #define VDC ((unsigned char*)0xD600)
175 If you first load the constant into a variable and use that variable to
176 access an absolute memory location, the generated code will be much
177 slower, since the compiler does not know anything about the contents of
182 12. Use initialized local variables - but use it with care.
184 Initialization of local variables when declaring them gives shorter
185 and faster code. So, use
194 But beware: To maximize your savings, don't mix uninitialized and
195 initialized variables. Create one block of initialized variables and
196 one of uniniitalized ones. The reason for this is, that the compiler
197 will sum up the space needed for uninitialized variables as long as
198 possible, and then allocate the space once for all these variables.
199 If you mix uninitialized and initialized variables, you force the
200 compiler to allocate space for the uninitialized variables each time,
201 it parses an initialized one. So do this:
214 The latter will work, but will create larger and slower code.
218 13. When using the ?: operator, cast values that are not ints.
220 The result type of the ?: operator is a long, if one of the second or
221 third operands is a long. If the second operand has been evaluated and
222 it was of type int, and the compiler detects that the third operand is
223 a long, it has to add an additional int->long conversion for the
224 second operand. However, since the code for the second operand has
225 already been emitted, this gives much worse code.
231 return (a != 0)? 1 : a;
234 When the compiler sees the literal "1", it does not know, that the
235 result type of the ?: operator is a long, so it will emit code to load
236 a integer constant 1. After parsing "a", which is a long, a int->long
237 conversion has to be applied to the second operand. This creates one
238 additional jump, and an additional code for the conversion.
240 A better way would have been to write:
244 return (a != 0)? 1L : a;
247 By forcing the literal "1" to be of type long, the correct code is
248 created in the first place, and no additional conversion code is
253 14. Use the array operator [] even for pointers.
255 When addressing an array via a pointer, don't use the plus and
256 dereference operators, but the array operator. This will generate
257 better code in some common cases.
275 15. Use register variables with care.
277 Register variables may give faster and shorter code, but they do also
278 have an overhead. Register variables are actually zero page
279 locations, so using them saves roughly one cycle per access. Since
280 the old values have to be saved and restored, there is an overhead of
281 about 70 cycles per 2 byte variable. It is easy to see, that - apart
282 from the additional code that is needed to save and restore the
283 values - you need to make heavy use of a variable to justify the
286 An exception are pointers, especially char pointers. The optimizer
287 has code to detect and transform the most common pointer operations
288 if the pointer variable is a register variable. Declaring heavily
289 used character pointers as register may give significant gains in
292 And remember: Register variables must be enabled with -Or.
296 16. Decimal constants greater than 0x7FFF are actually long ints
298 The language rules for constant numeric values specify that decimal
299 constants without a type suffix that are not in integer range must be
300 of type long int or unsigned long int. This means that a simple
301 constant like 40000 is of type long int, and may cause an expression
302 to be evaluated with 32 bits.
312 Here, the compare is evaluated using 32 bit precision. This makes the
313 code larger and a lot slower.
331 instead will give shorter and faster code.