remote TABs in doc/ and test/

[cc65] / doc / coding.sgml
diff --git a/doc/coding.sgml b/doc/coding.sgml

index 89719166726ed46fbf64a5a673114bd9b40e7f67..dc07c091a33647b175a6b312ef0c162dc6df41b9 100644 (file)
--- a/doc/coding.sgml
+++ b/doc/coding.sgml
@@ -2,13 +2,14 @@
  
  <article>
  <title>cc65 coding hints
-<author>Ullrich von Bassewitz, <htmlurl url="mailto:uz@cc65.org" name="uz@cc65.org">
-<date>03.12.2000
+<author><url url="mailto:uz@cc65.org" name="Ullrich von Bassewitz">
  
  <abstract>
  How to generate the most effective code with cc65.
  </abstract>
  
+
+
  <sect>Use prototypes<p>
  
  This will not only help to find errors between separate modules, it will also
@@ -28,13 +29,14 @@ code.
  
  
  
-<sect>Remember that the compiler does not optimize<p>
+<sect>Remember that the compiler does no high level optimizations<p>
  
-The compiler needs hints from you about the code to generate. When accessing
-indexed data structures, get a pointer to the element and use this pointer
-instead of calculating the index again and again. If you want to have your
-loops unrolled, or loop invariant code moved outside the loop, you have to do
-that yourself.
+The compiler needs hints from you about the code to generate. It will try to
+optimize the generated code, but follow the outline you gave in your C
+program. So for example, when accessing indexed data structures, get a pointer
+to the element and use this pointer instead of calculating the index again and
+again. If you want to have your loops unrolled, or loop invariant code moved
+outside the loop, you have to do that yourself.
  
  
  
@@ -48,10 +50,10 @@ operation works on double the data compared to an int.
  
  <sect>Use unsigned types wherever possible<p>
  
-The CPU has no opcodes to handle signed values greater than 8 bit. So sign
-extension, test of signedness etc. has to be done by hand. The code to handle
-signed operations is usually a bit slower than the same code for unsigned
-types.
+The 6502 CPU has no opcodes to handle signed values greater than 8 bit. So
+sign extension, test of signedness etc. has to be done with extra code. As a
+consequence, the code to handle signed operations is usually a bit larger and
+slower than the same code for unsigned types.
  
  
  
@@ -64,25 +66,8 @@ accessing chars is faster. For several operations, the generated code may be
  better if intermediate results that are known not to be larger than 8 bit are
  casted to chars.
  
-When doing
-
-<tscreen><verb>
-       unsigned char a;
-       ...
-       if ((a & 0x0F) == 0)
-</verb></tscreen>
-
-the result of the & operator is an int because of the int promotion rules of
-the language. So the compare is also done with 16 bits. When using
-
-<tscreen><verb>
-       unsigned char a;
-       ...
-       if ((unsigned char)(a & 0x0F) == 0)
-</verb></tscreen>
-
-the generated code is much shorter, since the operation is done with 8 bits
-instead of 16.
+You should especially use unsigned chars for loop control variables if the
+loop is known not to execute more than 255 times.
  
  
  
@@ -109,13 +94,13 @@ if you don't help. Look at this example:
        i = i + OFFS + 3;
  </verb></tscreen>
  
-The expression is parsed from left to right, that means, the compiler sees
-'i', and puts it contents into the secondary register. Next is OFFS, which is
+The expression is parsed from left to right, that means, the compiler sees 'i',
+and puts it contents into the secondary register. Next is OFFS, which is
  constant. The compiler emits code to add a constant to the secondary register.
-Same thing again for the constant 3. So the code produced contains a fetch of
-'i', two additions of constants, and a store (into 'i'). Unfortunately, the
+Same thing again for the constant 3. So the code produced contains a fetch
+of 'i', two additions of constants, and a store (into 'i'). Unfortunately, the
  compiler does not see, that "OFFS + 3" is a constant for itself, since it does
-it's evaluation from left to right. There are some ways to help the compiler
+its evaluation from left to right. There are some ways to help the compiler
  to recognize expression like this:
  
  <enum>
@@ -142,17 +127,17 @@ and predecrement operators if you don't need the resulting value. That means,
  use
  
  <tscreen><verb>
-               ...
-               ++i;
-               ...
+        ...
+        ++i;
+        ...
  </verb></tscreen>
  
      instead of
  
  <tscreen><verb>
-       ...
-       i++;
-       ...
+        ...
+        i++;
+        ...
  </verb></tscreen>
  
  
@@ -163,24 +148,24 @@ The compiler produces optimized code, if the value of a pointer is a constant.
  So, to access direct memory locations, use
  
  <tscreen><verb>
-       #define VDC_DATA   0xD601
-       *(char*)VDC_STATUS = 0x01;
+        #define VDC_STATUS 0xD601
+        *(char*)VDC_STATUS = 0x01;
  </verb></tscreen>
  
  That will be translated to
  
  <tscreen><verb>
-       lda     #$01
-       sta     $D600
+        lda     #$01
+        sta     $D601
  </verb></tscreen>
  
  The constant value detection works also for struct pointers and arrays, if the
  subscript is a constant. So
  
  <tscreen><verb>
-       #define VDC     ((unsigned char*)0xD600)
-       #define STATUS  0x01
-       VDC [STATUS] = 0x01;
+        #define VDC     ((unsigned char*)0xD600)
+        #define STATUS  0x01
+        VDC[STATUS] = 0x01;
  </verb></tscreen>
  
  will also work.
@@ -191,20 +176,20 @@ compiler does not know anything about the contents of the variable.
  
  
  
-<sect>Use initialized local variables - but use it with care<p>
+<sect>Use initialized local variables<p>
  
  Initialization of local variables when declaring them gives shorter and faster
  code. So, use
  
  <tscreen><verb>
-       int i = 1;
+        int i = 1;
  </verb></tscreen>
  
  instead of
  
  <tscreen><verb>
-       int i;
-       i = 1;
+        int i;
+        i = 1;
  </verb></tscreen>
  
  But beware: To maximize your savings, don't mix uninitialized and initialized
@@ -216,62 +201,24 @@ variables, you force the compiler to allocate space for the uninitialized
  variables each time, it parses an initialized one. So do this:
  
  <tscreen><verb>
-       int i, j;
-       int a = 3;
-       int b = 0;
+        int i, j;
+        int a = 3;
+        int b = 0;
  </verb></tscreen>
  
  instead of
  
  <tscreen><verb>
-       int i;
-       int a = 3;
-       int j;
-       int b = 0;
+        int i;
+        int a = 3;
+        int j;
+        int b = 0;
  </verb></tscreen>
  
  The latter will work, but will create larger and slower code.
  
  
  
-<sect>When using the <tt/?:/ operator, cast values that are not ints<p>
-
-The result type of the <tt/?:/ operator is a long, if one of the second or
-third operands is a long. If the second operand has been evaluated and it was
-of type int, and the compiler detects that the third operand is a long, it has
-to add an additional <tt/int/ &rarr; <tt/long/ conversion for the second
-operand. However, since the code for the second operand has already been
-emitted, this gives much worse code.
-
-Look at this:
-
-<tscreen><verb>
-       long f (long a)
-       {
-           return (a != 0)? 1 : a;
-       }
-</verb></tscreen>
-
-When the compiler sees the literal "1", it does not know, that the result type
-of the <tt/?:/ operator is a long, so it will emit code to load a integer
-constant 1. After parsing "a", which is a long, a <tt/int/ &rarr; <tt/long/
-conversion has to be applied to the second operand. This creates one
-additional jump, and an additional code for the conversion.
-
-A better way would have been to write:
-
-<tscreen><verb>
-       long f (long a)
-       {
-           return (a != 0)? 1L : a;
-       }
-</verb></tscreen>
-
-By forcing the literal "1" to be of type long, the correct code is created in
-the first place, and no additional conversion code is needed.
-
-
-
  <sect>Use the array operator &lsqb;&rsqb; even for pointers<p>
  
  When addressing an array via a pointer, don't use the plus and dereference
@@ -281,17 +228,17 @@ common cases.
  Don't use
  
  <tscreen><verb>
-       char* a;
-       char b, c;
-       char b = *(a + c);
+        char* a;
+        char b, c;
+        char b = *(a + c);
  </verb></tscreen>
  
  Use
  
  <tscreen><verb>
-       char* a;
-       char b, c;
-       char b = a[c];
+        char* a;
+        char b, c;
+        char b = a[c];
  </verb></tscreen>
  
  instead.
@@ -302,18 +249,22 @@ instead.
  
  Register variables may give faster and shorter code, but they do also have an
  overhead. Register variables are actually zero page locations, so using them
-saves roughly one cycle per access. Since the old values have to be saved and
-restored, there is an overhead of about 70 cycles per 2 byte variable. It is
-easy to see, that - apart from the additional code that is needed to save and
-restore the values - you need to make heavy use of a variable to justify the
-overhead.
+saves roughly one cycle per access. The calling routine may also use register
+variables, so the old values have to be saved on function entry and restored
+on exit. Saving an d restoring has an overhead of about 70 cycles per 2 byte
+variable. It is easy to see, that - apart from the additional code that is
+needed to save and restore the values - you need to make heavy use of a
+variable to justify the overhead.
  
-An exception are pointers, especially char pointers. The optimizer has code to
-detect and transform the most common pointer operations if the pointer
-variable is a register variable. Declaring heavily used character pointers as
-register may give significant gains in speed and size.
+As a general rule: Use register variables only for pointers that are
+dereferenced several times in your function, or for heavily used induction
+variables in a loop (with several 100 accesses).
  
-And remember: Register variables must be enabled with <tt/-Or/.
+When declaring register variables, try to keep them together, because this
+will allow the compiler to save and restore the old values in one chunk, and
+not in several.
+
+And remember: Register variables must be enabled with <tt/-r/ or <tt/-Or/.
  
  
  
@@ -321,43 +272,18 @@ And remember: Register variables must be enabled with <tt/-Or/.
  
  The language rules for constant numeric values specify that decimal constants
  without a type suffix that are not in integer range must be of type long int
-or unsigned long int. This means that a simple constant like 40000 is of type
-long int, and may cause an expression to be evaluated with 32 bits.
-
-An example is:
+or unsigned long int. So a simple constant like 40000 is of type long int!
+This is often unexpected and may cause an expression to be evaluated with 32
+bits. While in many cases the compiler takes care about it, in some places it
+can't. So be careful when you get a warning like
  
  <tscreen><verb>
-       unsigned val;
-       ...
-       if (val < 65535) {
-           ...
-       }
+        test.c(7): Warning: Constant is long
  </verb></tscreen>
  
-Here, the compare is evaluated using 32 bit precision. This makes the code
-larger and a lot slower.
-
-Using
-
-<tscreen><verb>
-       unsigned val;
-       ...
-       if (val < 0xFFFF) {
-           ...
-       }
-</verb></tscreen>
-
-or
-
-<tscreen><verb>
-       unsigned val;
-       ...
-       if (val < 65535U) {
-           ...
-       }
-</verb></tscreen>
+Use the <tt/U/, <tt/L/ or <tt/UL/ suffixes to tell the compiler the desired
+type of a numeric constant.
  
-instead will give shorter and faster code.
  
  
  <sect>Access to parameters in variadic functions is expensive<p>
@@ -365,7 +291,7 @@ instead will give shorter and faster code.
  Since cc65 has the "wrong" calling order, the location of the fixed parameters
  in a variadic function (a function with a variable parameter list) depends on
  the number and size of variable arguments passed. Since this number and size
-is unknown at compiler time, the compiler will generate code to calculate the
+is unknown at compile time, the compiler will generate code to calculate the
  location on the stack when needed.
  
  Because of this additional code, accessing the fixed parameters in a variadic