Updated AppleCommander URL.

[cc65] / doc / coding.sgml
diff --git a/doc/coding.sgml b/doc/coding.sgml

index 19c7074f08b174fc6576247a64d172b2bd1e06b1..f6dfc3a357c891fb15c12af445d2c19d146542b9 100644 (file)
--- a/doc/coding.sgml
+++ b/doc/coding.sgml
@@ -2,13 +2,15 @@
  
  <article>
  <title>cc65 coding hints
  
  <article>
  <title>cc65 coding hints
-<author>Ullrich von Bassewitz, <htmlurl url="mailto:uz@cc65.org" name="uz@cc65.org">
-<date>03.12.2000
+<author><url url="mailto:uz@cc65.org" name="Ullrich von Bassewitz">
+<date>2000-12-03, 2009-09-01
  
  <abstract>
  How to generate the most effective code with cc65.
  </abstract>
  
  
  <abstract>
  How to generate the most effective code with cc65.
  </abstract>
  
+
+
  <sect>Use prototypes<p>
  
  This will not only help to find errors between separate modules, it will also
  <sect>Use prototypes<p>
  
  This will not only help to find errors between separate modules, it will also
@@ -28,13 +30,14 @@ code.
  
  
  
  
  
  
-<sect>Remember that the compiler does not optimize<p>
+<sect>Remember that the compiler does no high level optimizations<p>
  
  
-The compiler needs hints from you about the code to generate. When accessing
-indexed data structures, get a pointer to the element and use this pointer
-instead of calculating the index again and again. If you want to have your
-loops unrolled, or loop invariant code moved outside the loop, you have to do
-that yourself.
+The compiler needs hints from you about the code to generate. It will try to
+optimize the generated code, but follow the outline you gave in your C
+program. So for example, when accessing indexed data structures, get a pointer
+to the element and use this pointer instead of calculating the index again and
+again. If you want to have your loops unrolled, or loop invariant code moved
+outside the loop, you have to do that yourself.
  
  
  
  
  
  
@@ -48,10 +51,10 @@ operation works on double the data compared to an int.
  
  <sect>Use unsigned types wherever possible<p>
  
  
  <sect>Use unsigned types wherever possible<p>
  
-The CPU has no opcodes to handle signed values greater than 8 bit. So sign
-extension, test of signedness etc. has to be done by hand. The code to handle
-signed operations is usually a bit slower than the same code for unsigned
-types.
+The 6502 CPU has no opcodes to handle signed values greater than 8 bit. So
+sign extension, test of signedness etc. has to be done with extra code. As a
+consequence, the code to handle signed operations is usually a bit larger and
+slower than the same code for unsigned types.
  
  
  
  
  
  
@@ -64,25 +67,8 @@ accessing chars is faster. For several operations, the generated code may be
  better if intermediate results that are known not to be larger than 8 bit are
  casted to chars.
  
  better if intermediate results that are known not to be larger than 8 bit are
  casted to chars.
  
-When doing
-
-<tscreen><verb>
-       unsigned char a;
-       ...
-       if ((a & 0x0F) == 0)
-</verb></tscreen>
-
-the result of the & operator is an int because of the int promotion rules of
-the language. So the compare is also done with 16 bits. When using
-
-<tscreen><verb>
-       unsigned char a;
-       ...
-       if ((unsigned char)(a & 0x0F) == 0)
-</verb></tscreen>
-
-the generated code is much shorter, since the operation is done with 8 bits
-instead of 16.
+You should especially use unsigned chars for loop control variables if the
+loop is known not to execute more than 255 times.
  
  
  
  
  
  
@@ -109,13 +95,13 @@ if you don't help. Look at this example:
        i = i + OFFS + 3;
  </verb></tscreen>
  
        i = i + OFFS + 3;
  </verb></tscreen>
  
-The expression is parsed from left to right, that means, the compiler sees
-'i', and puts it contents into the secondary register. Next is OFFS, which is
+The expression is parsed from left to right, that means, the compiler sees 'i',
+and puts it contents into the secondary register. Next is OFFS, which is
  constant. The compiler emits code to add a constant to the secondary register.
  constant. The compiler emits code to add a constant to the secondary register.
-Same thing again for the constant 3. So the code produced contains a fetch of
-'i', two additions of constants, and a store (into 'i'). Unfortunately, the
+Same thing again for the constant 3. So the code produced contains a fetch
+of 'i', two additions of constants, and a store (into 'i'). Unfortunately, the
  compiler does not see, that "OFFS + 3" is a constant for itself, since it does
  compiler does not see, that "OFFS + 3" is a constant for itself, since it does
-it's evaluation from left to right. There are some ways to help the compiler
+its evaluation from left to right. There are some ways to help the compiler
  to recognize expression like this:
  
  <enum>
  to recognize expression like this:
  
  <enum>
@@ -163,7 +149,7 @@ The compiler produces optimized code, if the value of a pointer is a constant.
  So, to access direct memory locations, use
  
  <tscreen><verb>
  So, to access direct memory locations, use
  
  <tscreen><verb>
-       #define VDC_DATA   0xD601
+       #define VDC_STATUS 0xD601
         *(char*)VDC_STATUS = 0x01;
  </verb></tscreen>
  
         *(char*)VDC_STATUS = 0x01;
  </verb></tscreen>
  
@@ -171,7 +157,7 @@ That will be translated to
  
  <tscreen><verb>
         lda     #$01
  
  <tscreen><verb>
         lda     #$01
-       sta     $D600
+       sta     $D601
  </verb></tscreen>
  
  The constant value detection works also for struct pointers and arrays, if the
  </verb></tscreen>
  
  The constant value detection works also for struct pointers and arrays, if the
@@ -180,7 +166,7 @@ subscript is a constant. So
  <tscreen><verb>
         #define VDC     ((unsigned char*)0xD600)
         #define STATUS  0x01
  <tscreen><verb>
         #define VDC     ((unsigned char*)0xD600)
         #define STATUS  0x01
-       VDC [STATUS] = 0x01;
+               VDC[STATUS] = 0x01;
  </verb></tscreen>
  
  will also work.
  </verb></tscreen>
  
  will also work.
@@ -191,7 +177,7 @@ compiler does not know anything about the contents of the variable.
  
  
  
  
  
  
-<sect>Use initialized local variables - but use it with care<p>
+<sect>Use initialized local variables<p>
  
  Initialization of local variables when declaring them gives shorter and faster
  code. So, use
  
  Initialization of local variables when declaring them gives shorter and faster
  code. So, use
@@ -234,44 +220,6 @@ The latter will work, but will create larger and slower code.
  
  
  
  
  
  
-<sect>When using the ternary operator, cast values that are not ints<p>
-
-The result type of the <tt/?:/ operator is a long, if one of the second or
-third operands is a long. If the second operand has been evaluated and it was
-of type int, and the compiler detects that the third operand is a long, it has
-to add an additional <tt/int/ &rarr; <tt/long/ conversion for the second
-operand. However, since the code for the second operand has already been
-emitted, this gives much worse code.
-
-Look at this:
-
-<tscreen><verb>
-       long f (long a)
-       {
-           return (a != 0)? 1 : a;
-       }
-</verb></tscreen>
-
-When the compiler sees the literal "1", it does not know, that the result type
-of the <tt/?:/ operator is a long, so it will emit code to load a integer
-constant 1. After parsing "a", which is a long, a <tt/int/ &rarr; <tt/long/
-conversion has to be applied to the second operand. This creates one
-additional jump, and an additional code for the conversion.
-
-A better way would have been to write:
-
-<tscreen><verb>
-       long f (long a)
-       {
-           return (a != 0)? 1L : a;
-       }
-</verb></tscreen>
-
-By forcing the literal "1" to be of type long, the correct code is created in
-the first place, and no additional conversion code is needed.
-
-
-
  <sect>Use the array operator &lsqb;&rsqb; even for pointers<p>
  
  When addressing an array via a pointer, don't use the plus and dereference
  <sect>Use the array operator &lsqb;&rsqb; even for pointers<p>
  
  When addressing an array via a pointer, don't use the plus and dereference
@@ -302,11 +250,12 @@ instead.
  
  Register variables may give faster and shorter code, but they do also have an
  overhead. Register variables are actually zero page locations, so using them
  
  Register variables may give faster and shorter code, but they do also have an
  overhead. Register variables are actually zero page locations, so using them
-saves roughly one cycle per access. Since the old values have to be saved and
-restored, there is an overhead of about 70 cycles per 2 byte variable. It is
-easy to see, that - apart from the additional code that is needed to save and
-restore the values - you need to make heavy use of a variable to justify the
-overhead.
+saves roughly one cycle per access. The calling routine may also use register
+variables, so the old values have to be saved on function entry and restored
+on exit. Saving an d restoring has an overhead of about 70 cycles per 2 byte
+variable. It is easy to see, that - apart from the additional code that is
+needed to save and restore the values - you need to make heavy use of a
+variable to justify the overhead.
  
  As a general rule: Use register variables only for pointers that are
  dereferenced several times in your function, or for heavily used induction
  
  As a general rule: Use register variables only for pointers that are
  dereferenced several times in your function, or for heavily used induction
@@ -324,43 +273,18 @@ And remember: Register variables must be enabled with <tt/-r/ or <tt/-Or/.
  
  The language rules for constant numeric values specify that decimal constants
  without a type suffix that are not in integer range must be of type long int
  
  The language rules for constant numeric values specify that decimal constants
  without a type suffix that are not in integer range must be of type long int
-or unsigned long int. This means that a simple constant like 40000 is of type
-long int, and may cause an expression to be evaluated with 32 bits.
-
-An example is:
+or unsigned long int. So a simple constant like 40000 is of type long int!
+This is often unexpected and may cause an expression to be evaluated with 32
+bits. While in many cases the compiler takes care about it, in some places it
+can't. So be careful when you get a warning like
  
  <tscreen><verb>
  
  <tscreen><verb>
-       unsigned val;
-       ...
-       if (val < 65535) {
-           ...
-       }
+        test.c(7): Warning: Constant is long
  </verb></tscreen>
  
  </verb></tscreen>
  
-Here, the compare is evaluated using 32 bit precision. This makes the code
-larger and a lot slower.
-
-Using
-
-<tscreen><verb>
-       unsigned val;
-       ...
-       if (val < 0xFFFF) {
-           ...
-       }
-</verb></tscreen>
-
-or
-
-<tscreen><verb>
-       unsigned val;
-       ...
-       if (val < 65535U) {
-           ...
-       }
-</verb></tscreen>
+Use the <tt/U/, <tt/L/ or <tt/UL/ suffixes to tell the compiler the desired
+type of a numeric constant.
  
  
-instead will give shorter and faster code.
  
  
  <sect>Access to parameters in variadic functions is expensive<p>
  
  
  <sect>Access to parameters in variadic functions is expensive<p>