I am comparing the speed of port writes on the Infineon XC167 to the ARM LPC2294 since I need to do some rapid bit-bashing to my peripherals. The C166 compiler generates one line of code that executes in one cycle, while the ARM compiler generates three lines to set an immediate value into the port. This appears to take about 10 machine cycles. Is the ARM that much worse at I/O or is the compiler just doing a poor job of utilizing the instruction set of the ARM?