I and a co-worker are programming an 8051 uController in C. Last week we were struggling with a poor perfomance time of our program. So I talked to the assembly guy and we figured that our way of software timer handling was too slow. We were using 16 bit variables. Now I tweaked the timer software a bit and it runs about 4x faster, which is fast enough.
Digging a little deeper in the produced assembly code we found a 'persistent nuisance'. As a test I wrote these lines in code:
uint8 j; for(j=10;j--;){ rightTorqueArray[j] = j; }
The array is an unsigned char array but when we observe the assembly
MOV R7,#0AH ?C0001: MOV R6,AR7 DEC R7 MOV A,R6 JZ ?C0002 ; rightTorqueArray[j] = j; } ; SOURCE LINE # 52 MOV A,#LOW (rightTorqueArray) ADD A,R7 MOV DPL,A CLR A ADDC A,#HIGH (rightTorqueArray) MOV DPH,A MOV A,R7 MOVX @DPTR,A SJMP ?C0001 ?C0002:
We noticed that the array is adressed with LOW and HIGH so apparantly it is treated as a 16 bit variable. But my assembly-nese is not so well, so please correct me if I am wrong.
I set the Code Optimalization at level 8: reuse Common entry code and the emphasis at Favor speed.
The assembly was produced as a .SRC file using #pragma SRC on top of the C-file.
I forgot: I want to ask if there is something we can do to let Keil produce more efficient assembly when it comes to arrays?
it is inherent to te '51 architecture that moving down in an xdata array is very inefficient (there is inc dptr, but not dec dptr) turn the loop aroind,
for j = 0 ... j++
Unless your particular chip has some weird, non-standard extended instructions ...
www.onarm.com/.../
We noticed that the array is adressed with LOW and HIGH so apparantly it is treated as a 16 bit variable.
If that "we" included aforementioned "assembly guy", you need a better person to fill that job. No, your array is not "treated as a 16-bit variable". It's treated as an array living in a 16-bit address space, because of either a) how you explicitly declared that array, or b) your choice of memory model, which had the same effect implicitly. Since you didn't show your array's declaration, nobody know which it is.
That declaration is the actual reason this loop is so slow. You basically got exactly what you asked for --- you just didn't know how to ask for what you actually wanted.
That may have been counter-productive. #pragma SRC has a side effect of turning off some optimizations, because those are not really expressible in assembly.
You really should look at list files from normal compilation, and/or post-linking, "absolute" list file instead.
I did mention that the array was an unsigned char type. It is declared as follows
xuint8 leftTorqueArray[10];
xuint8 is defined as unsigned char xdata. I am not aware of other ways of declaring arrays.
The for-mentioned guy was not the assembly guy. The assembly guy has programmed nearly all of our machines and they are rather complex.
These lines
result in
; leftTorqueArray[j] = 5; MOV A,#LOW (leftTorqueArray) ADD A,R7 MOV DPL,A CLR A ADDC A,#HIGH (leftTorqueArray) MOV DPH,A MOV A,#05H MOVX @DPTR,A ; leftTorqueArray[5] = 5; MOV DPTR,#leftTorqueArray+05H MOVX @DPTR,A
I don't understand why there is such a huge difference between
leftTorqueArray[j] = 5;
and
leftTorqueArray[5] = 5;
How can I make
as quick as the other?
I have been told and I have read on several websites about that decrementing variables is quicker than incrementing them. That's why some of my for-loops run backwards instead of forwards.
The difference in assembly code is
; uint8 j; ; for(j=0;j<10;j++){ ; SOURCE LINE # 51 ;---- Variable 'j?040' assigned to Register 'R7' ---- CLR A MOV R7,A ?C0001: ; rightTorqueArray[j] = j; } ; SOURCE LINE # 52 MOV A,#LOW (rightTorqueArray) ADD A,R7 MOV DPL,A CLR A ADDC A,#HIGH (rightTorqueArray) MOV DPH,A MOV A,R7 MOVX @DPTR,A INC R7 MOV A,R7 CJNE A,#0AH,?C0001 ?C0002: ; uint8 j=10; ; SOURCE LINE # 50 ;---- Variable 'j?040' assigned to Register 'R7' ---- ; for(j=0;j--;){ ; SOURCE LINE # 51 CLR A MOV R7,A ?C0001: MOV R6,AR7 DEC R7 MOV A,R6 JZ ?C0002 ; rightTorqueArray[j] = j; } ; SOURCE LINE # 52 MOV A,#LOW (rightTorqueArray) ADD A,R7 MOV DPL,A CLR A ADDC A,#HIGH (rightTorqueArray) MOV DPH,A MOV A,R7 MOVX @DPTR,A SJMP ?C0001 ?C0002:
I honestly can't you tell which is actual faster. The code size is however 3 bytes smaller with incrementing for-loops.
Do you understand the meaning & significance of the xdata keyword in that?
"I don't understand why there is such a huge difference between
"
Thnk about it: one is a fixed address which can be computed at compile-time; the other has to be evaluated and accessed at runtime...
I remember reading that when I was at primary school of the programming. It is being wrong to think always the same. it is depending on the processor achitecture and situations.
if you using the assembler with the 8051 a simple 8 bit decrement counter is being very efficient with the DJNZ instruction but the compiler cannot always use it.
To writing efficient code of the 8051 you must goodly understand the processor. it is old and it is lots of limitations. The KEIL C51 compiler is surprising good but it is not magic and cannot make your bad understanding code very fast. you must help to make it good code.
"To writing efficient code of the 8051 you must goodly understand the processor. it is old and it is lots of limitations. The KEIL C51 compiler is surprising good but it is not magic and cannot make your bad understanding code very fast. you must help to make it good code."
@Kalib
I am asking for precisely this. I am specifcally asking you guys to help me to help the keil C51 to produce better code.
Can I make adressing array elements faster? And if so, how can I?
Fundamental to answering that question is understanding the meaning & significance of the xdata keyword - hence my question above.
Surely, your "assembly guy" can help you with this ... ?
Have you looked at Appendix C, "Writing Optimum Code", in the Compiler manual ?
try incrementing 'j' (I HATE these nondescriptive variables) there is as stated above an INC DPTR and I would hope it is used by the compiler.
Not looking at the assembler posted above.
It is computing the DPTR value for each individual element access - rather than pointing the DPTR at the start of the array, and then incrementing it for each element.
This is where the "inefficiency" lies here.
Maybe incrementing a pointer - rather than using an index - would yield better results in this case ... ?
The absolutely foremost cause of inefficiency is that it's an XDATA array. Once that decision has been made, worrying about execution speed is, to a large extent, just a waste of time.
@Neil The "assembly guy" cannot help, he is 66 years, worked his entire live at this company and most importantly: he does not know anything about C or Keil. The one programming language he ever learned is assembly.
Well a funny thing. If I remove xdata there is no difference in the ammount of assembly lines. We (me and he co-worker who isn't the assembly guy) already suspected it.
We only have 64 bytes memory for non-xdata memory and over 65kB of xdata, which I personally call "unlimited variable memory".
Co-worker already tried using pointers but his attempt produced the same problem. According to the link I sent: Optimal C Constructs for 8051 Microcontrollers by Nigel Jones.
Accessing xdata via for-loop supposed to be faster than using pointers. I will however examin appendix C first. See what that files has to say over the matter
I have another smaller question. For my software timers I removed most loops and I used constants instead. It uses more code but should have a quicker execution timer. Most of the arrays are still members of structs. Does it benefit if I were to remove the structs and use separate arrays instead?
But, surely, he can explain the assembler to you? And explain why this is inherent in the 8051 architecture?
"Well a funny thing. If I remove xdata there is no difference in the ammount of assembly lines"
That's not funny at all - it is to be expected if you are using a Memory Model which defaults to XDATA.
As already noted, XDATA memory access is inherently slow & "inefficient".
"Co-worker already tried using pointers but his attempt produced the same problem"
So show what, exactly, he tried - and what, exactly, was the result.
As the "inefficiency" is inherent in the underlying hardware architecture, it is bound to happen!
"Accessing xdata via for-loop supposed to be faster than using pointers"
Says who?
"Does it benefit if I were to remove the structs and use separate arrays instead?"
Most likely not. More detail needed to answer specifically.
void test(void) { uint8 rightTorqueArray[10], j; for(j = 10;j > 0; --j) { rightTorqueArray[j] = j; } } ///////////////////////////////////////////// 88: void test(void) 89: { 90: uint8 rightTorqueArray[10], j; 91: for(j = 10; j > 0; --j) C:0x199C 7F0A MOV R7,#0x0A 92: { 93: rightTorqueArray[j] = j; C:0x199E 7453 MOV A,#0x53 C:0x19A0 2F ADD A,R7 C:0x19A1 F8 MOV R0,A C:0x19A2 A607 MOV @R0,0x07 93: } C:0x19A4 DFF8 DJNZ R7,C:199E C:0x19A6 22 RET