| Details |
Message |
|
Read-Only
Author Tamir Michael
Posted 12-Aug-2009 18:33 GMT
Toolset ARM
|
 Inducing RTX failure
Tamir Michael
Hello all,
I do not wish to repeat myself as I have addressed this issue in a
recent thread, but this is too important to turn a blind eye to as
even lives could be at stake which certainly makes it worth a
separate thread: I believe I managed to conceive a program that
causes a failure of RTX on a LPC2468/2478 (simulator does not induce
the failure). Quite some people have reported problems with RTX on
this forum, so hopefully they can download my stripped test program
here http://dl.getdropbox.com/u/1730945/LPC2468_RTX_Demo_min.zip
to try there own variants. I have of course informed Keil support
about this issue and I am currently waiting for feedback. I would
very much appreciate any feedback you might have.
Tamir
|
|
|
Read-Only
Author Tamir Michael
Posted 13-Aug-2009 10:15 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Tamir Michael
Hello all,
Just a quick status update: Keil support have confirmed that my
program exposes a timing problem within the RTX kernel. RTX expert
Franc Urbanc will address the problem as soon as possible.
|
|
|
Read-Only
Author Franc Urbanc
Posted 25-Aug-2009 10:44 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Franc Urbanc
We found no probles in RTX kernel, the problems you describe are
most likely a device configuratin or a chip problem.
Device LPC2468 had some problems in the past. Most of them are
corrected in the last silicon rev. 'D'
Please read the device errata sheet ES_LPC2468_6.pdf
The items that can cause the application crash are listed
under:
PLL.1, Flash.1 and MAM.1 topics.
Check your device revision level first and then correctly set the
cpu clock and MAM mode.
Franc
|
|
|
Read-Only
Author Tamir Michael
Posted 25-Aug-2009 15:36 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Tamir Michael
I have implemented the errata sheet recommendations (regarding
PLL) for the LPC2468 on the LPC2478 with success. The problem is that
the LPC2478's errata sheet does not specify any of these issues! Does
anybody know which hardware revisions issues of the LPC2468 apply to
the LPC2478 ?
|
|
|
Read-Only
Author Eric Severson
Posted 25-Aug-2009 17:09 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Eric Severson
Hi,
NXP's link to the ES_LPC2468_6.pdf appears not to be working this
morning:
http://www.nxp.com/acrobat/erratsheets/ES_LPC2468_6.pdf
Any chance one of you could send me the file?
My email address is "erics" at the domain of "forwardpay.com".
Thanks,
Eric
|
|
|
Read-Only
Author Eric Severson
Posted 25-Aug-2009 17:21 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Eric Severson
Never mind, I have found a copy of the latest errata.
FYI: I am well within the operating limits of all of these listed
errata -- running on a 12MHz system clock, PLL output of 96, MAM
disabled.
Franc -- doesn't the fact that Tamir got this condition to occur
in the simulator mean that this is not a hardware problem?
|
|
|
Read-Only
Author Tamir Michael
Posted 25-Aug-2009 18:29 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Tamir Michael
Eric,
correction: I did not get it to occur on the simulator...!
|
|
|
Read-Only
Author Eric Severson
Posted 25-Aug-2009 18:32 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Eric Severson
Oh, I apologize. I read your post wrong.
Just to be clear: you have not yet had this occur in your test
application on LPC2478 hardware after making the changes that the
LPC2468 errata suggested, correct?
|
|
|
Read-Only
Author Tamir Michael
Posted 25-Aug-2009 19:11 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Tamir Michael
Eric,
That is correct. I had 2 boards running my test application with
the lowest values of M,N (for PLL determination) that yield 72 MHz
for about an hour without a failure. I forgot to leave such a system
running all night long, though...will do so tomorrow! it is weird
that some errata stuff that was written for the LPC2468 applies to
the LPC2478 as well - while the errata sheet of the LPC2478 is
"clean"...!
|
|
|
Read-Only
Author Tamir Michael
Posted 26-Aug-2009 11:05 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Tamir Michael
I forgot to mention that I have addressed the technical support of
NXP regarding the errata sheet (in)compatibility issue.
|
|
|
Read-Only
Author Tamir Michael
Posted 27-Aug-2009 07:17 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Tamir Michael
Hello,
I have received this reply from NXP:
"The PLL.1 erratum has been resolved since version "A" of the
silicon. It doesn't apply to the LPC2478. It is therefore unlikely
that the PLL frequency is the cause of your system instability.
LPC2478 started with rev C silicon, current LPC2468 are rev B
silicon."
Why is it then that choosing the most stable clock settings that
yields 72[MHz] on a LPC2478 stops RTX from failing? It is NXP's
spread sheet that I used to calculate the clock settings, and nowhere
is it specified that some of the results are illegal or that you must
choose the lowest possible M,N.
|
|
|
Read-Only
Author Tamir Michael
Posted 27-Aug-2009 08:28 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Tamir Michael
Hello,
More information: I have conducted additional tests along with
Franc Urbanc and found out that M=12, N=1 (=72[MHz] with a crystal of
12[MHz]) and a tick rate of 50 microsecond fail RTX when the startup
file is augmented with NOP to align the binary structure compared to
the M=24, N=2 version. We are large shifts in generated code and an
increase in binary size (4 bytes) compared to settings of M=24, N=1.
The errata sheet of LPC2468 does not apply to LPC2478.
|
|
|
Read-Only
Author Tamir Michael
Posted 27-Aug-2009 09:35 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Tamir Michael
I have conducted more tests for Franc. It seems to be a problem
with the LPC2478 revision C MAM (not certain yet), as my test program
does not crash when executed from RAM.
|
|
|
Read-Only
Author ryan williams
Posted 31-Aug-2009 17:33 GMT
Toolset ARM
|
 RE: Inducing RTX failure
ryan williams
i'm changed mine to run from ram, it still crashes, maybe less
often. still always in that same function.
i haven't run your test program but i will as soon as i get a
chance.
|
|
|
Read-Only
Author Tamir Michael
Posted 31-Aug-2009 17:48 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Tamir Michael
Hello Ryan,
Have you tried to shutdown the MAM altogether? Do notice that I
have determined, together with Franc Urbanc, that the actual
structure of internal flash image determines whether there is a crash
or not, as long as the program is runs using the MAM.
|
|
|
Read-Only
Author ryan williams
Posted 1-Sep-2009 15:48 GMT
Toolset ARM
|
 RE: Inducing RTX failure
ryan williams
tamir,
i ran your test program for a while. I have not seen it crash, but
three times so far, the watchdog function in each task shows that
some of those tasks are no longer running. this happened after 5-30
minutes of running. i was running this on the lpc2478 board.
I have seen this behavior on my own project lately while doing
tests. It seems that sometimes a task is lost, sometimes a task is in
the list more than once (next pointer points to itself and gets stuck
in a loop), and most often the dabt error occurs on the null
pointer.
i've run it with MAM off and these things still happen, perhaps
less often. my tests now include a mcb2470 as well as 2 different EA
lpc2468 OEM boards on my own base board design.
|
|
|
Read-Only
Author Tamiryan Michael
Posted 1-Sep-2009 15:50 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Tamiryan Michael
Ryan,
Franc Urbanc confirmed that this is indeed a RTX problem. Tests
are conducted now to check if Franc's fix is valid. Be patient, help
is under way...
|
|
|
Read-Only
Author Eric Severson
Posted 1-Sep-2009 15:54 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Eric Severson
It sounds like you guys are making great progress. Thank you for
continuing to post information as you go. I appreciate it and I am
sure others on the forum do as well.
-Eric
|
|
|
Read-Only
Author Robert Rostohar
Posted 2-Sep-2009 08:00 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Robert Rostohar
The actual reason for all the sporadic occasional RTX failures you
have been seeing is most likely due to the NXP LPC2xxx VIC
undocumented "feature" (described bellow) and that RTX was not aware
of this.
VIC behavior: After an interrupt is disabled (writing to
VICIntEnClr) the interrupt is not immediately blocked but can still
happen for a few cycles (time needed for VIC to process the request).
Special tests were performed which confirm this behavior.
This "feature" was not taken into account by the RTX kernel.
Therefore in some rare situations (very timing specific) it could
happen that a blocked interrupt was still executed which eventually
lead to RTX failure. Such situations are very rare (can happen sooner
when the system time tick interrupt happens more often) and even less
likely when the MAM is disabled because then an instruction fetch
takes longer then the few cycles that VIC requires. This explains
also why the problem was not detected sooner and why it was almost
gone when MAM was disabled.
The updated RTX kernel now takes the described VIC behavior
into account which should eliminate the reported problems (at the
cost of a few additional CPU cycles).
BTW: Similar Interrupt controller behavior like described for the
NXP VIC applies also for the ST's STR7 EIC. In reality the EIC is
even worse in this aspect since the time to process the interrupts is
even longer. Therefore this behavior was already seen and RTX kernel
already handled this. On the other hand it was considered that for
NXP VIC this is not necessary.
In general ARM7/9 cores do not have interrupt controllers so
silicon vendors added their own external implementation and this
leads to such behavior as described above. Much better in this aspect
are the new ARM Cortex-M cores which have an advanced Nested
Interrupt Controller (NVIC) already tightly integrated with the core.
This has many benefits (faster interrupt response, late arriving
interrupts, tail chaining ...) and also eliminates such problems as
seen with VIC and EIC.
|
|
|
Read-Only
Author Tamiryan Michael
Posted 2-Sep-2009 08:04 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Tamiryan Michael
Hello Robert,
Franc provided me with a patch that seems to work fine. I guess we
need to thank you all for putting so much effort in this. When can we
expect a new offical release of RL-ARM containing this fix?
Tamir
|
|
|
Read-Only
Author Per Westermark
Posted 2-Sep-2009 08:35 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Per Westermark
Is it really undocumented? Isn't that behaviour common to all ARM
chips that have an external interrupt controller, and one of the
reasons why code either has to wait a fixed number of cycles or
deactivate interrupts in the core instead of in the interrupt
controller?
|
|
|
Read-Only
Author Robert Rostohar
Posted 2-Sep-2009 08:51 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Robert Rostohar
Tamir,
The new RL-ARM which includes this fix will be released soon (in a
few weeks).
Per,
Yes, this behavior seems to be common to ARM7/9 with external
interrupt controllers. However the number of cycles varies between
interrupt controller implementations and I haven't seen any
documentation about this.
|
|
|
Read-Only
Author Per Westermark
Posted 2-Sep-2009 09:50 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Per Westermark
"However the number of cycles varies between interrupt controller
implementations and I haven't seen any documentation about this."
Neither have I. And it isn't easy to guestimate the required number
either. Some thing that seems to work after extensive testing can
still be one clock off, just waiting for that other interrupt to come
and catch you with the pants down :(
|
|
|
Read-Only
Author bruce yu
Posted 22-Oct-2009 09:45 GMT
Toolset ARM
|
 RE: Inducing RTX failure
bruce yu
Hi Robert, Thank you very much for your explain. My arm7 uses EIC
and I encounter a simular issue, see here :http://www.keil.com/forum/docs/thread15796.asp
To fix the problem, our solution is to excute a short for loop(to
delay) after disable EIC interrupt.Dose the method work?
|
|
|
Read-Only
Author Per Westermark
Posted 22-Oct-2009 09:56 GMT
Toolset ARM
|
 RE: Inducing RTX failure
Per Westermark
A number of chip vendors have recommended the use of a couple of
nop after disabling interrupts. Any combination of instructions that
takes - at least - the required number of cycles should do fine.
The only issue is that the exact number of cycles isn't always
known because the manufacturer haven't published it in any
datasheet.
|
|