I just noticed this recently... whenever I have a xdata DWORD - 32 bits - and I try to set it to a constant value (or any value I think), the assembly that gets generated seems all messed up for such a simple routine. See below. Writing a WORD (TotalSleepTime.m16.ab) works fine, but writing a DWORD (NextSleepTime), makes a call to some external routine which is excessively long and then has a series of NOPs.
77: TotalSleepTime.m16.ab = 0; C:0x3347 900181 MOV DPTR,#TotalSleepTime(0x0181) C:0x334A F0 MOVX @DPTR,A C:0x334B A3 INC DPTR C:0x334C F0 MOVX @DPTR,A 78: TotalSleepTime.m16.cd = 0; 79: C:0x334D 900183 MOV DPTR,#0x0183 C:0x3350 F0 MOVX @DPTR,A C:0x3351 A3 INC DPTR C:0x3352 F0 MOVX @DPTR,A 80: NextSleepTime = 1; C:0x3353 900185 MOV DPTR,#NextSleepTime(0x0185) C:0x3356 120FB2 LCALL C?LSTKXDATA(C:0FB2) C:0x3359 00 NOP C:0x335A 00 NOP C:0x335B 00 NOP C:0x335C 0122 AJMP C:3022
I noticed that something like
variable.m32.abcd = 0
would generate screwed up code like this, so I though it may have to do with the union not being accessed correctly, so I tried changing a variable just to a straight up DWORD (NextSleepTime as seen above), but the same thing happens.
Is this behavior normal? I can't follow the undocumented assembly that is called very well, but it definitely seems excessively complicated for a simple command. I just checked on a blank project with
void main(void) { unsigned long xdata test; test = 0; }
Same thing.
So, do you call routinely any and all code you don't understand "messed up", or do you have actual observations to show, to back such a claim?
I.e. did you look at what it actually does, on execution? Did it put a value of 1 into the variable where it was suppposed to go, or didn't it? If it did, what's your problem?
It's normal for C51 to call library routines to move objects in and out of memory, particular large objects in xdata. The 8051 is an 8-bit processor, and access to xdata is available only through MOVX instructions and the DPTR register. So, moving 32-bit words to xdata is somewhat awkward and requires a fair number of instructions. That sequence might well be encoded as a subroutine to save space on repeated inline generation for every variable.
According to the documentation, LSTKXDATA is the name for a routine that loads/stores constant values to xdata. The "K" indicates writing of constant data encoded after the library call, so the NOPs in this case are presumably the value 0 that you're writing to the variable.
"Excessively long" is an opinion. How long is the routine? How does it compare to the source given in XBANKING.A51 or L51_BANK.A51? Can you post the routine? How would you have written a routine to do this job?
More information on the set of library routines for data access: http://www.keil.com/support/docs/1964.htm
"'Excessively long' is an opinion."
In fact, it might not be totally unreasonable to say that 32-bit data is "excessively long" for use with an 8-bit controller...!
;-)
The use of the name "DWORD" to mean 32 bits is also suspicious: The 8051 is an 8-bit processor; ie, its "Word Size" is 8 bits - and so a DWORD (Double-Word) should be 16 bits...
This tends to suggest that the code was not originally designed for an 8051 (or any other 8-bit processor, for that matter); more likely it's PC code being shoe-horned ito an 8051 - so it's not too surprising if the resulting code is a bit ugly...!
:-0
"DWORD" is, IMO, a poor choice for a type name, as it gives no explicit indication of either size or signed-ness!
Something like "U32" would be much better, IMO.
it definitely seems excessively complicated for a simple command
every operation on a '51 that is 'stretching' its capabilities will seem "excessively complicated for a simple command". However, how else would you propose to do 32 bit operations an 8 bitter?
If you are doing a lot of 32 bit work, it may be a problem
If you are doing a few 32 bit operations, it may not be a problem.
Erik
Firstly, the DWORD terminology for a 32 bit variable comes from FX2.h which is supplied by Cypress (the processor manufacturer) for the FX2LP processor. I too thought it was weird that a "WORD" was defined as 16 bit for an 8-bit processor, but I stuck with their naming scheme anyway and is why i use a "U32UNION" structure, as used in my post, that identifies and provides easy access to each individual BYTE, or WORD, in the proper endian order and without needing fancy type-casts to get the compiler to do the most efficient thing.
I understand that the 8-bit core is extremely inefficient at working with 32-bit values - this is the only place that I even use a 32-bit C statement like this, and I'm trying to optimize all my C functions to produce the smallest amount of code (or rewriting it in assembly) b/c I am nearing my code space size. I'm interfacing to a 16-bit wireless transceiver that has 32-bit timestamps, and on the other end I'm dealing with data coming from the USB bus. The point is, I can't completely avoid 16-bit or 32-bit accesses, but mostly they are just for storing and moving data with limited math operations performed.
By "messed up," I did not mean literally b/c obviously something as common as an unsigned long immediate store operation has been tested millions of times. I just know the default implementation was a bit complicated or as I was quoted of saying, "excessively long," for something that should be done in ~10 instructions. The function call includes at least 30 instructions w/ a bunch of sub LCALLs. I'm not on my work PC, so I can't post it now - if you want to see the instructions, create an empty project and assign an unsigned long to 0.
Thank you Drew for posting the link on the meaning of the library functions. The "K" thing explains what the NOPs were doing, and now it makes more sense what the library function was doing popping data off the stack at the start of the function. The way I would expect it to be implemented is already in my initial post in the first two lines of C code converted to assembly. For example...
unsigned long test; test = 0;
CLR A MOV DPTR #test MOVX @DPTR,A INC DPTR MOVX @DPTR,A INC DPTR MOVX @DPTR,A INC DPTR MOVX @DPTR,A
It wouldn't be much more complicated for any immediate value other than 0 as well. I can understand that if I were using 32-bit variables all over my code, using the library routine would probably save a lot of code SIZE while compromising speed, but this is the ONLY place that I ever assign a 32 variable to an immediate value. I noticed that my code size jumped at least 50 bytes when I add this call, so I looked into it to see why.
The simple solution is to just cast it to 2 16 bit values and then do the move, so its not like this is some pressing matter. I was simply looking for a good explanation for why the compiler is choosing to use this routine.
I can understand that if I were using 32-bit variables all over my code, using the library routine would probably save a lot of code SIZE while compromising speed, but this is the ONLY place that I ever assign a 32 variable to an immediate value.
You pretty much answered your own question right there. You presumably asked for size-optimized code, and the tools do what is the most probable to yield the smallest code, in a typical situation. And your counter-example is biased --- 0 is an untypically simple case. Once you generalize that to an arbitrary immediate value to write, code following your pattern would grow from 11 to a whopping 18 bytes, compared to the compiler's 10. Which means the compiler wins as soon as there are about 5 of these operations in the entire program.
The only insight missing is that there's no way for the compiler to guess that this is going to be the single such operation in the whole program, because the compiler doesn't usually see the whole program. Only the linker sees the whole program, but it doesn't get to decide about micro-scale code generation.