Merge branch 'master' into master

[cc65] / doc / coding.sgml
diff --git a/doc/coding.sgml b/doc/coding.sgml

index 072d7644803bdc6733a2bf7526575caed87108c7..5897837c4ae4601542a369af41f95fb08c439910 100644 (file)
--- a/doc/coding.sgml
+++ b/doc/coding.sgml
@@ -2,13 +2,14 @@
  
  <article>
  <title>cc65 coding hints
-<author>Ullrich von Bassewitz, <htmlurl url="mailto:uz@cc65.org" name="uz@cc65.org">
-<date>03.12.2000
+<author><url url="mailto:uz@cc65.org" name="Ullrich von Bassewitz">
  
  <abstract>
  How to generate the most effective code with cc65.
  </abstract>
  
+
+
  <sect>Use prototypes<p>
  
  This will not only help to find errors between separate modules, it will also
@@ -28,13 +29,14 @@ code.
  
  
  
-<sect>Remember that the compiler does not optimize<p>
+<sect>Remember that the compiler does no high level optimizations<p>
  
-The compiler needs hints from you about the code to generate. When accessing
-indexed data structures, get a pointer to the element and use this pointer
-instead of calculating the index again and again. If you want to have your
-loops unrolled, or loop invariant code moved outside the loop, you have to do
-that yourself.
+The compiler needs hints from you about the code to generate. It will try to
+optimize the generated code, but follow the outline you gave in your C
+program. So for example, when accessing indexed data structures, get a pointer
+to the element and use this pointer instead of calculating the index again and
+again. If you want to have your loops unrolled, or loop invariant code moved
+outside the loop, you have to do that yourself.
  
  
  
@@ -48,10 +50,10 @@ operation works on double the data compared to an int.
  
  <sect>Use unsigned types wherever possible<p>
  
-The CPU has no opcodes to handle signed values greater than 8 bit. So sign
-extension, test of signedness etc. has to be done by hand. The code to handle
-signed operations is usually a bit slower than the same code for unsigned
-types.
+The 6502 CPU has no opcodes to handle signed values greater than 8 bit. So
+sign extension, test of signedness etc. has to be done with extra code. As a
+consequence, the code to handle signed operations is usually a bit larger and
+slower than the same code for unsigned types.
  
  
  
@@ -64,25 +66,8 @@ accessing chars is faster. For several operations, the generated code may be
  better if intermediate results that are known not to be larger than 8 bit are
  casted to chars.
  
-When doing
-
-<tscreen><verb>
-       unsigned char a;
-       ...
-       if ((a & 0x0F) == 0)
-</verb></tscreen>
-
-the result of the & operator is an int because of the int promotion rules of
-the language. So the compare is also done with 16 bits. When using
-
-<tscreen><verb>
-       unsigned char a;
-       ...
-       if ((unsigned char)(a & 0x0F) == 0)
-</verb></tscreen>
-
-the generated code is much shorter, since the operation is done with 8 bits
-instead of 16.
+You should especially use unsigned chars for loop control variables if the
+loop is known not to execute more than 255 times.
  
  
  
@@ -109,13 +94,13 @@ if you don't help. Look at this example:
        i = i + OFFS + 3;
  </verb></tscreen>
  
-The expression is parsed from left to right, that means, the compiler sees
-'i', and puts it contents into the secondary register. Next is OFFS, which is
+The expression is parsed from left to right, that means, the compiler sees 'i',
+and puts it contents into the secondary register. Next is OFFS, which is
  constant. The compiler emits code to add a constant to the secondary register.
-Same thing again for the constant 3. So the code produced contains a fetch of
-'i', two additions of constants, and a store (into 'i'). Unfortunately, the
+Same thing again for the constant 3. So the code produced contains a fetch
+of 'i', two additions of constants, and a store (into 'i'). Unfortunately, the
  compiler does not see, that "OFFS + 3" is a constant for itself, since it does
-it's evaluation from left to right. There are some ways to help the compiler
+its evaluation from left to right. There are some ways to help the compiler
  to recognize expression like this:
  
  <enum>
@@ -132,17 +117,9 @@ and reduce the code to one fetch, one addition and one store.
  </enum>
  
  
-<sect>Case labels in a switch statments are checked in source order<p>
-
-Labels that appear first in a switch statement are tested first. So, if your
-switch statement contains labels that are selected most of the time, put them
-first in your source code. This will speed up the code.
-
-
-
  <sect>Use the preincrement and predecrement operators<p>
  
-The compiler not always smart enough to figure out, if the rvalue of an
+The compiler is not always smart enough to figure out, if the rvalue of an
  increment is used or not. So it has to save and restore that value when
  producing code for the postincrement and postdecrement operators, even if this
  value is never used. To avoid the additional overhead, use the preincrement
@@ -171,7 +148,7 @@ The compiler produces optimized code, if the value of a pointer is a constant.
  So, to access direct memory locations, use
  
  <tscreen><verb>
-       #define VDC_DATA   0xD601
+       #define VDC_STATUS 0xD601
         *(char*)VDC_STATUS = 0x01;
  </verb></tscreen>
  
@@ -179,7 +156,7 @@ That will be translated to
  
  <tscreen><verb>
         lda     #$01
-       sta     $D600
+       sta     $D601
  </verb></tscreen>
  
  The constant value detection works also for struct pointers and arrays, if the
@@ -188,7 +165,7 @@ subscript is a constant. So
  <tscreen><verb>
         #define VDC     ((unsigned char*)0xD600)
         #define STATUS  0x01
-       VDC [STATUS] = 0x01;
+               VDC[STATUS] = 0x01;
  </verb></tscreen>
  
  will also work.
@@ -199,7 +176,7 @@ compiler does not know anything about the contents of the variable.
  
  
  
-<sect>Use initialized local variables - but use it with care<p>
+<sect>Use initialized local variables<p>
  
  Initialization of local variables when declaring them gives shorter and faster
  code. So, use
@@ -242,44 +219,6 @@ The latter will work, but will create larger and slower code.
  
  
  
-<sect>When using the <tt/?:/ operator, cast values that are not ints<p>
-
-The result type of the <tt/?:/ operator is a long, if one of the second or
-third operands is a long. If the second operand has been evaluated and it was
-of type int, and the compiler detects that the third operand is a long, it has
-to add an additional <tt/int/ &rarr; <tt/long/ conversion for the second
-operand. However, since the code for the second operand has already been
-emitted, this gives much worse code.
-
-Look at this:
-
-<tscreen><verb>
-       long f (long a)
-       {
-           return (a != 0)? 1 : a;
-       }
-</verb></tscreen>
-
-When the compiler sees the literal "1", it does not know, that the result type
-of the <tt/?:/ operator is a long, so it will emit code to load a integer
-constant 1. After parsing "a", which is a long, a <tt/int/ &rarr; <tt/long/
-conversion has to be applied to the second operand. This creates one
-additional jump, and an additional code for the conversion.
-
-A better way would have been to write:
-
-<tscreen><verb>
-       long f (long a)
-       {
-           return (a != 0)? 1L : a;
-       }
-</verb></tscreen>
-
-By forcing the literal "1" to be of type long, the correct code is created in
-the first place, and no additional conversion code is needed.
-
-
-
  <sect>Use the array operator &lsqb;&rsqb; even for pointers<p>
  
  When addressing an array via a pointer, don't use the plus and dereference
@@ -310,18 +249,22 @@ instead.
  
  Register variables may give faster and shorter code, but they do also have an
  overhead. Register variables are actually zero page locations, so using them
-saves roughly one cycle per access. Since the old values have to be saved and
-restored, there is an overhead of about 70 cycles per 2 byte variable. It is
-easy to see, that - apart from the additional code that is needed to save and
-restore the values - you need to make heavy use of a variable to justify the
-overhead.
+saves roughly one cycle per access. The calling routine may also use register
+variables, so the old values have to be saved on function entry and restored
+on exit. Saving an d restoring has an overhead of about 70 cycles per 2 byte
+variable. It is easy to see, that - apart from the additional code that is
+needed to save and restore the values - you need to make heavy use of a
+variable to justify the overhead.
+
+As a general rule: Use register variables only for pointers that are
+dereferenced several times in your function, or for heavily used induction
+variables in a loop (with several 100 accesses).
  
-An exception are pointers, especially char pointers. The optimizer has code to
-detect and transform the most common pointer operations if the pointer
-variable is a register variable. Declaring heavily used character pointers as
-register may give significant gains in speed and size.
+When declaring register variables, try to keep them together, because this
+will allow the compiler to save and restore the old values in one chunk, and
+not in several.
  
-And remember: Register variables must be enabled with <tt/-Or/.
+And remember: Register variables must be enabled with <tt/-r/ or <tt/-Or/.
  
  
  
@@ -329,43 +272,35 @@ And remember: Register variables must be enabled with <tt/-Or/.
  
  The language rules for constant numeric values specify that decimal constants
  without a type suffix that are not in integer range must be of type long int
-or unsigned long int. This means that a simple constant like 40000 is of type
-long int, and may cause an expression to be evaluated with 32 bits.
-
-An example is:
+or unsigned long int. So a simple constant like 40000 is of type long int!
+This is often unexpected and may cause an expression to be evaluated with 32
+bits. While in many cases the compiler takes care about it, in some places it
+can't. So be careful when you get a warning like
  
  <tscreen><verb>
-       unsigned val;
-       ...
-       if (val < 65535) {
-           ...
-       }
+        test.c(7): Warning: Constant is long
  </verb></tscreen>
  
-Here, the compare is evaluated using 32 bit precision. This makes the code
-larger and a lot slower.
+Use the <tt/U/, <tt/L/ or <tt/UL/ suffixes to tell the compiler the desired
+type of a numeric constant.
  
-Using
  
-<tscreen><verb>
-       unsigned val;
-       ...
-       if (val < 0xFFFF) {
-           ...
-       }
-</verb></tscreen>
  
-or
+<sect>Access to parameters in variadic functions is expensive<p>
  
-<tscreen><verb>
-       unsigned val;
-       ...
-       if (val < 65535U) {
-           ...
-       }
-</verb></tscreen>
+Since cc65 has the "wrong" calling order, the location of the fixed parameters
+in a variadic function (a function with a variable parameter list) depends on
+the number and size of variable arguments passed. Since this number and size
+is unknown at compile time, the compiler will generate code to calculate the
+location on the stack when needed.
+
+Because of this additional code, accessing the fixed parameters in a variadic
+function is much more expensive than access to parameters in a "normal"
+function. Unfortunately, this additional code is also invisible to the
+programmer, so it is easy to forget.
  
-instead will give shorter and faster code.
+As a rule of thumb, if you access such a parameter more than once, you should
+think about copying it into a normal variable and using this variable instead.
  
  
  </article>