I'm wrapping up some product code on a modern 8051.
I actually surprised myself, because I ran out of memory in the data area.
I need to check with the manufacturer because supposedly this chip has 256 bytes of RAM in both the IRAM and XDATA. I have to dig through the device setup files from the manufacturer, because it isn't like an ARM where you specify the address ranges in KEIL directly.
I understand legacy 8051s had an external memory source in some cases, that you could access using the XDATA syntax. I understand the architecture issue to a degree since I've been building a mock 8 bit MCU in Verilog.
On a contemporary 8051, is it really that big of a performance to use XData area for variables?
Am I correct this is really a micro-optimization in terms of system gain? Like fractions of microseconds (µs) difference, or is it worse?
Wish I could edit -- anyway, I get that XDATA is some die level memory area created on the IC, so the overheads seem really low to access it. The bus level access may be sub-optimal on the die, but it's not a like separate IC going over copper traces....
Okay, so the DATA area is split between data / idata. IData is some indirect addressing area that is a bit slower than data. 128 bytes of data, and in my case 128 bytes of idata.
So ideally you'd use the idata next...
Okay, so the DATA area is split between data / idata.
Not really. They're not even separate areas. Idata is primarily not access method, not a memory region.
The 8051 is an ancient class of microcontrollers which, unlike more modern architectures, you really have to understand on the machine language level to a considerable level before you try to program it in C. Partly to avoid such confusion as you're suffering right now, partly to know why and how to use the special features of the C compilers to work around its quirks and limitations.
So your chip has 256 bytes of data memory. All of that is accessible by indirect addressing (IDATA). The stack uses part of that. The lower 128 bytes of it are, in addition, also addressable directly (DATA). The lowest 32 bytes of that are also used as the core's register banks. The next 32 bytes are also addressable bitwise (BDATA).
On top of that, almost any modern 8051 chip will have some amount of (originally external, but now internalized) XRAM memory, sometimes also referred to as "MOVX" memory after the only opcode that handles it. The C compilers uses this for XDATA. One 256-byte page of that is accessible by a paged/"near" addressing scheme (PDATA).
And yes, accessing XRAM is a good deal slower than IDATA, which in turn is slower than registers or DATA. Not because of caches or other such new-fangled trickery, but because it takes considerably more code to prepare any access to such a variable, and reading and processing that code takes time. It can be really illuminating to compile a C program using variables in the various memory spaces and then look at the machine code that the compiler had to create to work with them.
I definitely agree, the DATA / IDATA distinction is a bit odd looking at it from 2018! I had't even thought of IDATA largely because in the KEIL compiler outputs it showed the compiler failing at 128 bytes of memory usage, which is suspiciously 50% of the state 256 bytes of memory.... I didn't think I was close to being out of memory!
Maybe you can enable that in KEIL to show the IDATA when compiler something -- maybe it's the map file somewhere else....
You'd really have to drill down to MCU architecture and instruction set architecture to see why that was done way back when (if you really wanted too, I can't say I'm that interested). I am guessing they just kept bolting on new memory areas and kept the system back compatible with the original instruction set architecture as best they could. I don't really know the 8051 ISA to any real depth.
Here's a good start for the next person: www.circuitstoday.com/8051-addressing-modes
In your experience, have you ever profiled a system using XData to see what the performance hit is on a "modern" 8051?
I suppose I could wire up an experiment, but I can't say it would be even worth the time of few hours vs. asking on a forum.
I'm always trying to get the best performance out of these little cheap MCUs...
I am guessing they just kept bolting on new memory areas and kept the system back compatible with the original instruction set architecture as best they could. I don't really know the 8051 ISA to any real depth.
Nope. The original 8051 had all of these memory regions. There was then a variant (the 8052) that had double the amount of internal RAM (accessible only as stack or IDATA).
Back in the day I used to keep pushing for that 8052 instead of the 8051, and frequently got refused. (I was only a junior programmer in the 1980s and didn't have the clout to authorise it.)
Later on various manufacturers added multiple data pointers and extra interrupt priority levels and stuff, but the basic instruction set remained the same.
The one good thing with having such limited resources was having to know the architecture well in order to code efficiently. I sometimes think that this skill is now sorely lacking.
I don't really know the 8051 ISA to any real depth.
Those two statement are in violent disagreement.
Didn't need to. I knew the data sheet well enough to be able to pretty much tell the speed of the assembly code just by looking at it. But then I was writing that code directly in assembly, so I kind-a had to.
Come on... you know the execution time of each assembly instruction... you've measured down to the nanosecond per each opcode? .... ;)
For perspective, I have a task that needs to execute every 760us. If I idle on my 8051, I notice around 200us of delay to wake-up from the idle state. I'm borderline at 8MHz to get this task done with the idle. If the xdata access, introduced 50us of additional overhead across 50 program variable, I might have to increase the MCU speed, etc. For shipping a product, it's all relevant....
I've been working in Verilog modeling microcontrollers for the past few weeks. And I can even say that even writing assembly isn't necessarily a sign of "depth" of knowledge in a microcontroller architecture! It depends how far down you want to go...
You could write assembly commands and not need to know anything about the actual architecture.... (well, very little). The more I learn, the more I realize what I don't know.
Like really why did they use the pointer register for that IDATA section, but the registers for DATA are directly addressed... That's some deep research to get a good answer. (No need for answer. I can get textbook on 8051 architecture, if I really wanted to pursue it at that level).
Sadly, research time is done for now. I have some code I need to get banged out in C, to ship a product!
Textbook: web.njit.edu/.../CSOA.html
Also, I really appreciated the help on the ARM stuff! I've got my head wrapped around that micro-controller at least to a degree of making working software and blinking LEDS. That's a week or two of time to get comfortable.
WOW -- yes, the ARM has a LOT of registers.... I have been having a philosophical battle in my head about using vendor peripheral libraries vs not. I am going to have to go to therapy.
For perspective, I have a task that needs to execute every 760us. If I idle on my 8051, I notice around 200us of delay to wake-up from the idle state. I'm borderline at 8MHz to get this task done with the idle. If the xdata access, introduced 50us of additional overhead across 50 program variable, I might have to increase the MCU speed, etc. For shipping a product, it's all relevant.... why are you running that slow?
anyhow, you clearly need to have a gander here: www.danlhenry.com/.../80C51_FAM_ARCH_1.pdf www.danlhenry.com/.../80C51_FAM_HARDWARE_1.pdf www.danlhenry.com/.../80C51_FAM_PROG_GUIDE_1.pdf
@Erik for the win... I started scrolling through the first manual, they call out execution time for an assembly instruction.
My no-name Asian 8051, doesn't give you anything like that.
That's perfect. Thank you very much.
So you could conceivably see several hundred µs disappear, using the other memory areas (depending on what registers are used and how, you could estimate from the assembly output ).
I'd even guess, that if I just put every variable I have into XDATA, my interrupt timing might literally crash and burn for my required sampling rate at 760(µs). You'd certainly have to revisit and confirm the timing.
8MHz is blazing fast! I wish we could run the code at 250KHz... I feel like every doubling of the clock speed, I see ~ .3mA of current increase. Every .3mA matters when you have a 300mAh battery. I am sitting here baby sitting every peripheral to minimize current drain.
Literally sitting here on a Saturday agonizing over a 300ms resample rate on my flaky ADC, vs. just sampling the thing at 50ms and paving over some little warts in regards to a blinking dual-color LED.
Sick sorta guy chasing to this level I guess? ;)
I'd even guess, that if I just put every variable I have into XDATA, my interrupt timing might literally crash and burn for my required sampling rate at 760(µs). You'd certainly have to revisit and confirm the timing. basically we all go by the rule: most used variables in DATA, least used variables in XDATA in between in IDATA
8MHz is blazing fast! I wish we could run the code at 250KHz... I feel like every doubling of the clock speed, I see ~ .3mA of current increase. Every .3mA matters when you have a 300mAh battery. I am sitting here baby sitting every peripheral to minimize current drain. I know of no current '51 derivative that can't handle 24MHz, silabs has oneclockers that run at 100 MHz
also many (most) modern derivatives use less that the "steam driven" 12 clocks per cycle.
if you are looking for blazing speed look for a SiLabs f5xx which run about 100 times faster ( 1 clock cycle/100MHz) than the original '51
Yeah my MCU could run up to 32MHz using the on-board oscillator. I'm the nut job who is profiling the system down to the last dotted i, to minimize current consumption.
I totally hear you on the DATA, IDATA, XDATA front. In a pinch, I could re-factor the software and prioritize variables.
(I took pity on myself and did speed up the ADC sampling, after like 6 hours of battling the ADC readings. Frigging tri-stated output on a LiPo charging IC vs. flaky on-board ADC. Put two crappy ICs together, and voila, a giant *** sandwich [pardon the language]..).
It's unclear from the Taiwanese tech support if this 8051 MCU is the updated instruction set / 1 clock operation. They claim so, but it doesn't seem like it when I am setting up timers.
This Asian MCU is a real dog (I don't want to call out the brand, since I actually have been to Taiwan to meet guys who work for the MCU manufacturer, nice people kind of a so-so MCU, but you get what you pay for).... SiLabs would be a dream.
---
I've gotten surface-level deep with Cortex M0 / M3. I'd like to power profile a Cortex M0 vs. the updated SiLabs 8051s and see how this 76us task plays out from a current consumption perspective. That learning curve is a bit nasty on those Cortex MCUs.
Appreciate those manuals!
Yeah my MCU could run up to 32MHz using the on-board oscillator. I'm the nut job who is profiling the system down to the last dotted i, to minimize current consumption. think! you say you idle the \processor, if you run 24 MHz instead of 8MHz you use 3* the current 1/3 the time equals the same
It's unclear from the Taiwanese tech support if this 8051 MCU is the updated instruction set / 1 clock operation. They claim so, but it doesn't seem like it when I am setting up timers. timers are not related to instruction cycle time
SiLabs bought Energy Micro and thus have the power stingiest cortex'es.
The learning curve is not that bad if you use the manufacturers "bring up packages"
Yeah, I agree at 32MHz the system should be at optimal current consumption with idling... the electrical characteristics of the MCU state the same.
I actually started with the core running at 32MHz.
For some reason with this particular task and whatever overhead comes with the MCU coming out of idle, it turned out that running the MCU at 8MHz + Idling resulted in an improvement of current consumption around 0.4mA vs running the same task at 32MHz + Idling.
(So I got to re-wire all the interrupt timing, always fun).
I can't even start to speculate why this is the case, but the measured results are definitely valid.
I posted this line of thought on StackExchange actually for Cortex M. Mixed bag it seems for other users in terms of their testing. It seems most cases "burst processing" is optimal.
I remember reading an APP note from Microchip for Power Tips, and they called out burst processing vs. reducing clock speed. The conclusion was "it depends"