Hi,
I recently had problems with events (RTX isr events) resulting in the OS_ERR_FIFO_OVF error in os_error(). This prompted me to understand more about this mechanism. I will explain what I know and what I dont know, any input would be appreciated.
When isr_evt_set() is called, it seems that 2 entries are placed in the array os_fifo[]. The first I am guessing is the task control block for the task that must get the event and the second is the event ID. The size of this queue is set in RTX_Conf_xx.c as variable OS_FIFOSZ.
What I dont know: 1. How to flush this queue? 2. Does os_evt_wait..() remove an event from the queue? How? I cant see that to be true 3. How can one check how full the queue is. 4. Does os_evt_clr() clear one event (which one) or all events with the specified flags.
It's a pity the Keil/ARM don't make this information available
If you want to understand the internal workings, you are going to need to look at the source code. My guess is there is less source code for RTX than for your whole project.
There is not such thing as an event queue in RTX.
OS_ERR_FIFO_OVF happens when too many "post service requests" are queued up before being processed. If this happens you are either spending too much time in Interrupt Service routines, you are calling to queue too many items in to the "post service request" queue (FIFO) or you need to make the queue bigger. Don't make it bigger until you understand how long you are in ISR's and how many post service requests you might post before returning to non-ISR mode.
Answers to questions (as best I can)
0) Only 1 entry is used up any time you put something into the post service request queue. Yes, it is more than 32-bits for each entry, but it only takes up 1 entry slot. It is really not overly important what it is, just that it takes up 1 entry. By default, there are 16 entries.
1) You absolutely cannot flush this queue. That really has no meaning. To "flush it" you process it. If you don't want something in the post service processing queue, do not put it into the queue.
2) No it does not. There is no "event queue" in RTX. The rt_evt_wait function is about 20 lines of code. You should feel free to look at it. If you really want knowledge of what it is doing inside, look at it. It is clear that you are making assumptions that are just not true and it is getting in the way of you understanding what it is really going on.
3) You cannot check for this being full. This is not something that even the OS does. It will check for overflow only (you seem to know that from the description of your problem), but that is a terminal error. The OS cannot know how to recover from this error. The Programmer needs to know what the limits of their code is so that the OS will never come to this situation. This is a VERY TEMPORARY queue that should almost never have anything in it. It is fully processed before you leave ISR mode. Items do not "stay" in this queue. (When the user tasks make any OS calls this queue will ALWAYS be empty)
4) These are not "events" like events that would be useful for an event driven state machine. There is a set of "event flags" that indicate that something has happened. There is no queue of these "events" there is just a 16-bit value with either a 0 or 1 in each bit position. A 1 indicates that the "event flag" is set, a 0 indicates that the "event flag" is not set. Setting an event flag that is already set adds no new information. Clearing an "event flag" just sets the indicated flag bits to 0 (and if it was set, that information is now gone - it is not queued somewhere). There are no "individual events" or "all events" to be removed. There is just a state of the "event flags"
Hi Robert,
Thank you for your very extensive reply, much appreciated!
I will try find the source, it appears not to be included in my install.
If I may ask you to clarify something please. You mention repeatedly that "There is not such thing as an event queue in RTX." but in many places you say thinkgs like "..happens when too many "post service requests" are queued up be". If there is no queue then where are posts being queued? ... and what about the variable os_fifo[] - this looks like a queue to me, call it a buffer if you like, but events get stacked up in there when isr_evt_set() is called.
In fact having the ability to flush an event queue is very useful, this will allow one to recover more gracefully from malfunction (first person that doesnt write buggy code throw a stone) having the ability to test how full a queue is is also useful as it can be used to sense a pending problem and take action. Many industrial RTOS's included these capabilities.
Thank you
I don't see how you think flushing all events would represent recovery - it would leave a number of threads very confused since you have violated the assumptions made when they were designed.
That is why a reboot is a good idea. And an even better idea is to make sure you don't have too high ISR load and that your ISR do not set many events per interrupt.
The important thing here is that you can have many interrupt sources. And it's possible to set up the hardware so the processor just moves from one ISR to the next without ever returning back to "normal" operation - which means a situation where you can't expect to get a superloop or OS threads to get anything done. So a proper design must take into account how much the interrupts can stack up to make sure the processor always have capacity for your real-time requirements. Your issues comes from your program failing this part.
In this case you don't get any overflow of any queue of events for your threads to process. But the ISR lets your interrupts run their things as fast as possible - the primary goal is to have a minimum of latency to do what the ISR needs to do. And then - when the pending interrupts have been handled - the OS processes the results of any state changes your ISR introduced, allowing the OS to reschedule which thread that should next get processing time. So "clearing this queue" means "make the OS unable to know how to schedule". And overflowing this internal state means that you don't have enough time for normal code after the ISR has been taken care of.
Hello Trevor,
A neat feature of Cortex-M device is the low interrupt latency. A neat feature of the CMSIS-RTOS RTX is how fast it handles interrupts - it will not get in the way of the next interrupt.
But the interrupt still may need to talk to the RTOS, e.g. the interrupt calls osMessagePut(), and the next interrupt calls osMailPut().
So rather than run the complete osMessagePut() function, and potentially delay the next interrupt, the Kernel stores the bare minimum of data onto the FIFO Queue, and processes it later, when there are no pending interrupts.
Some programmers take advantage of this by turning a large interrupt handler function into a high priority thread, and signal that thread when the interrupt occurs. This reduces nesting headaches, etc.
If your FIFO queue is overflowing... troubleshoot your interrupt behavior.
**The newer version of the Keil Debugger has an enhanced RTOS event window - you can see the threads and the interrupts all at once.
www.keil.com/.../uv4_db_dbg_event_viewer.htm
When you program errors, it should be stuck in RTX_Conf_CM.c in the function os_error() with the error code OS_ERROR_FIFO_OVF.
You can open the event viewer and see the combination of interrupts that caused the issue. Typically this is some sort of nested interrupt issue.
You could -Turn long Interrupt handlers into high priority threads -Use BASEMASK to mask out low importance interrupts -Change the priorities of the interurpts -Increase the FIFO size in the RTX_Conf_CM.c file
When diagnosing interrupts,
These windows may also be useful
**The trace exceptions window (see the minimum, max time for a exception to enter, exit and run - useful to see if there are nesting issues): www.keil.com/.../ulink2_trace_exception.htm
**Nested Vector Interrupt Controller - turn off and on exceptions: http://www.keil.com/support/man/docs/gsac/gsac_nvic.htm
These windows need to have data tracing setup for the debugger: http://www.keil.com/support/man/docs/ulink2/ulink2_ctx_trace.htm
Hi Kevin,
Thats great advise, thats for the comprehensive reply, I will try some of those points.