git.sur5r.net Git - cc65/blob - doc/coding.sgml

   1 <!doctype linuxdoc system>
   2
   3 <article>
   4 <title>cc65 coding hints
   5 <author>Ullrich von Bassewitz, <htmlurl url="mailto:uz@cc65.org" name="uz@cc65.org">
   6 <date>03.12.2000
   7
   8 <abstract>
   9 How to generate the most effective code with cc65.
  10 </abstract>
  11
  12 <sect>Use prototypes<p>
  13
  14 This will not only help to find errors between separate modules, it will also
  15 generate better code, since the compiler must not assume that a variable sized
  16 parameter list is in place and must not pass the argument count to the called
  17 function. This will lead to shorter and faster code.
  18
  19
  20
  21 <sect>Don't declare auto variables in nested function blocks<p>
  22
  23 Variable declarations in nested blocks are usually a good thing. But with
  24 cc65, there is a drawback: Since the compiler generates code in one pass, it
  25 must create the variables on the stack each time the block is entered and
  26 destroy them when the block is left. This causes a speed penalty and larger
  27 code.
  28
  29
  30
  31 <sect>Remember that the compiler does not optimize<p>
  32
  33 The compiler needs hints from you about the code to generate. When accessing
  34 indexed data structures, get a pointer to the element and use this pointer
  35 instead of calculating the index again and again. If you want to have your
  36 loops unrolled, or loop invariant code moved outside the loop, you have to do
  37 that yourself.
  38
  39
  40
  41 <sect>Longs are slow!<p>
  42
  43 While long support is necessary for some things, it's really, really slow on
  44 the 6502. Remember that any long variable will use 4 bytes of memory, and any
  45 operation works on double the data compared to an int.
  46
  47
  48
  49 <sect>Use unsigned types wherever possible<p>
  50
  51 The CPU has no opcodes to handle signed values greater than 8 bit. So sign
  52 extension, test of signedness etc. has to be done by hand. The code to handle
  53 signed operations is usually a bit slower than the same code for unsigned
  54 types.
  55
  56
  57
  58 <sect>Use chars instead of ints if possible<p>
  59
  60 While in arithmetic operations, chars are immidiately promoted to ints, they
  61 are passed as chars in parameter lists and are accessed as chars in variables.
  62 The code generated is usually not much smaller, but it is faster, since
  63 accessing chars is faster. For several operations, the generated code may be
  64 better if intermediate results that are known not to be larger than 8 bit are
  65 casted to chars.
  66
  67 When doing
  68
  69 <tscreen><verb>
  70         unsigned char a;
  71         ...
  72         if ((a & 0x0F) == 0)
  73 </verb></tscreen>
  74
  75 the result of the & operator is an int because of the int promotion rules of
  76 the language. So the compare is also done with 16 bits. When using
  77
  78 <tscreen><verb>
  79         unsigned char a;
  80         ...
  81         if ((unsigned char)(a & 0x0F) == 0)
  82 </verb></tscreen>
  83
  84 the generated code is much shorter, since the operation is done with 8 bits
  85 instead of 16.
  86
  87
  88
  89 <sect>Make the size of your array elements one of 1, 2, 4, 8<p>
  90
  91 When indexing into an array, the compiler has to calculate the byte offset
  92 into the array, which is the index multiplied by the size of one element. When
  93 doing the multiplication, the compiler will do a strength reduction, that is,
  94 replace the multiplication by a shift if possible. For the values 2, 4 and 8,
  95 there are even more specialized subroutines available. So, array access is
  96 fastest when using one of these sizes.
  97
  98
  99
 100 <sect>Expressions are evaluated from left to right<p>
 101
 102 Since cc65 is not building an explicit expression tree when parsing an
 103 expression, constant subexpressions may not be detected and optimized properly
 104 if you don't help. Look at this example:
 105
 106 <tscreen><verb>
 107       #define OFFS   4
 108       int  i;
 109       i = i + OFFS + 3;
 110 </verb></tscreen>
 111
 112 The expression is parsed from left to right, that means, the compiler sees
 113 'i', and puts it contents into the secondary register. Next is OFFS, which is
 114 constant. The compiler emits code to add a constant to the secondary register.
 115 Same thing again for the constant 3. So the code produced contains a fetch of
 116 'i', two additions of constants, and a store (into 'i'). Unfortunately, the
 117 compiler does not see, that "OFFS + 3" is a constant for itself, since it does
 118 it's evaluation from left to right. There are some ways to help the compiler
 119 to recognize expression like this:
 120
 121 <enum>
 122
 123 <item>Write "i = OFFS + 3 + i;". Since the first and second operand are
 124 constant, the compiler will evaluate them at compile time reducing the code to
 125 a fetch, one addition (secondary + constant) and one store.
 126
 127 <item>Write "i = i + (OFFS + 3)". When seeing the opening parenthesis, the
 128 compiler will start a new expression evaluation for the stuff in the braces,
 129 and since all operands in the subexpression are constant, it will detect this
 130 and reduce the code to one fetch, one addition and one store.
 131
 132 </enum>
 133
 134
 135 <sect>Case labels in a switch statments are checked in source order<p>
 136
 137 Labels that appear first in a switch statement are tested first. So, if your
 138 switch statement contains labels that are selected most of the time, put them
 139 first in your source code. This will speed up the code.
 140
 141
 142
 143 <sect>Use the preincrement and predecrement operators<p>
 144
 145 The compiler not always smart enough to figure out, if the rvalue of an
 146 increment is used or not. So it has to save and restore that value when
 147 producing code for the postincrement and postdecrement operators, even if this
 148 value is never used. To avoid the additional overhead, use the preincrement
 149 and predecrement operators if you don't need the resulting value. That means,
 150 use
 151
 152 <tscreen><verb>
 153         ...
 154         ++i;
 155         ...
 156 </verb></tscreen>
 157
 158     instead of
 159
 160 <tscreen><verb>
 161         ...
 162         i++;
 163         ...
 164 </verb></tscreen>
 165
 166
 167
 168 <sect>Use constants to access absolute memory locations<p>
 169
 170 The compiler produces optimized code, if the value of a pointer is a constant.
 171 So, to access direct memory locations, use
 172
 173 <tscreen><verb>
 174         #define VDC_DATA   0xD601
 175         *(char*)VDC_STATUS = 0x01;
 176 </verb></tscreen>
 177
 178 That will be translated to
 179
 180 <tscreen><verb>
 181         lda     #$01
 182         sta     $D600
 183 </verb></tscreen>
 184
 185 The constant value detection works also for struct pointers and arrays, if the
 186 subscript is a constant. So
 187
 188 <tscreen><verb>
 189         #define VDC     ((unsigned char*)0xD600)
 190         #define STATUS  0x01
 191         VDC [STATUS] = 0x01;
 192 </verb></tscreen>
 193
 194 will also work.
 195
 196 If you first load the constant into a variable and use that variable to access
 197 an absolute memory location, the generated code will be much slower, since the
 198 compiler does not know anything about the contents of the variable.
 199
 200
 201
 202 <sect>Use initialized local variables - but use it with care<p>
 203
 204 Initialization of local variables when declaring them gives shorter and faster
 205 code. So, use
 206
 207 <tscreen><verb>
 208         int i = 1;
 209 </verb></tscreen>
 210
 211 instead of
 212
 213 <tscreen><verb>
 214         int i;
 215         i = 1;
 216 </verb></tscreen>
 217
 218 But beware: To maximize your savings, don't mix uninitialized and initialized
 219 variables. Create one block of initialized variables and one of uniniitalized
 220 ones. The reason for this is, that the compiler will sum up the space needed
 221 for uninitialized variables as long as possible, and then allocate the space
 222 once for all these variables. If you mix uninitialized and initialized
 223 variables, you force the compiler to allocate space for the uninitialized
 224 variables each time, it parses an initialized one. So do this:
 225
 226 <tscreen><verb>
 227         int i, j;
 228         int a = 3;
 229         int b = 0;
 230 </verb></tscreen>
 231
 232 instead of
 233
 234 <tscreen><verb>
 235         int i;
 236         int a = 3;
 237         int j;
 238         int b = 0;
 239 </verb></tscreen>
 240
 241 The latter will work, but will create larger and slower code.
 242
 243
 244
 245 <sect>When using the <tt/?:/ operator, cast values that are not ints<p>
 246
 247 The result type of the <tt/?:/ operator is a long, if one of the second or
 248 third operands is a long. If the second operand has been evaluated and it was
 249 of type int, and the compiler detects that the third operand is a long, it has
 250 to add an additional <tt/int/ &rarr; <tt/long/ conversion for the second
 251 operand. However, since the code for the second operand has already been
 252 emitted, this gives much worse code.
 253
 254 Look at this:
 255
 256 <tscreen><verb>
 257         long f (long a)
 258         {
 259             return (a != 0)? 1 : a;
 260         }
 261 </verb></tscreen>
 262
 263 When the compiler sees the literal "1", it does not know, that the result type
 264 of the <tt/?:/ operator is a long, so it will emit code to load a integer
 265 constant 1. After parsing "a", which is a long, a <tt/int/ &rarr; <tt/long/
 266 conversion has to be applied to the second operand. This creates one
 267 additional jump, and an additional code for the conversion.
 268
 269 A better way would have been to write:
 270
 271 <tscreen><verb>
 272         long f (long a)
 273         {
 274             return (a != 0)? 1L : a;
 275         }
 276 </verb></tscreen>
 277
 278 By forcing the literal "1" to be of type long, the correct code is created in
 279 the first place, and no additional conversion code is needed.
 280
 281
 282
 283 <sect>Use the array operator &lsqb;&rsqb; even for pointers<p>
 284
 285 When addressing an array via a pointer, don't use the plus and dereference
 286 operators, but the array operator. This will generate better code in some
 287 common cases.
 288
 289 Don't use
 290
 291 <tscreen><verb>
 292         char* a;
 293         char b, c;
 294         char b = *(a + c);
 295 </verb></tscreen>
 296
 297 Use
 298
 299 <tscreen><verb>
 300         char* a;
 301         char b, c;
 302         char b = a[c];
 303 </verb></tscreen>
 304
 305 instead.
 306
 307
 308
 309 <sect>Use register variables with care<p>
 310
 311 Register variables may give faster and shorter code, but they do also have an
 312 overhead. Register variables are actually zero page locations, so using them
 313 saves roughly one cycle per access. Since the old values have to be saved and
 314 restored, there is an overhead of about 70 cycles per 2 byte variable. It is
 315 easy to see, that - apart from the additional code that is needed to save and
 316 restore the values - you need to make heavy use of a variable to justify the
 317 overhead.
 318
 319 An exception are pointers, especially char pointers. The optimizer has code to
 320 detect and transform the most common pointer operations if the pointer
 321 variable is a register variable. Declaring heavily used character pointers as
 322 register may give significant gains in speed and size.
 323
 324 And remember: Register variables must be enabled with <tt/-Or/.
 325
 326
 327
 328 <sect>Decimal constants greater than 0x7FFF are actually long ints<p>
 329
 330 The language rules for constant numeric values specify that decimal constants
 331 without a type suffix that are not in integer range must be of type long int
 332 or unsigned long int. This means that a simple constant like 40000 is of type
 333 long int, and may cause an expression to be evaluated with 32 bits.
 334
 335 An example is:
 336
 337 <tscreen><verb>
 338         unsigned val;
 339         ...
 340         if (val < 65535) {
 341             ...
 342         }
 343 </verb></tscreen>
 344
 345 Here, the compare is evaluated using 32 bit precision. This makes the code
 346 larger and a lot slower.
 347
 348 Using
 349
 350 <tscreen><verb>
 351         unsigned val;
 352         ...
 353         if (val < 0xFFFF) {
 354             ...
 355         }
 356 </verb></tscreen>
 357
 358 or
 359
 360 <tscreen><verb>
 361         unsigned val;
 362         ...
 363         if (val < 65535U) {
 364             ...
 365         }
 366 </verb></tscreen>
 367
 368 instead will give shorter and faster code.
 369
 370
 371 <sect>Access to parameters in variadic functions is expensive<p>
 372
 373 Since cc65 has the "wrong" calling order, the location of the fixed parameters
 374 in a variadic function (a function with a variable parameter list) depends on
 375 the number and size of variable arguments passed. Since this number and size
 376 is unknown at compiler time, the compiler will generate code to calculate the
 377 location on the stack when needed.
 378
 379 Because of this additional code, accessing the fixed parameters in a variadic
 380 function is much more expensive than access to parameters in a "normal"
 381 function. Unfortunately, this additional code is also invisible to the
 382 programmer, so it is easy to forget.
 383
 384 As a rule of thumb, if you access such a parameter more than once, you should
 385 think about copying it into a normal variable and using this variable instead.
 386
 387
 388 </article>
 389