Keil Logo Arm Logo

Hard fault at misaligned memcpy memset

Next Thread | Thread List | Previous Thread Start a Thread | Settings

Details Message
Read-Only
Author
Werner Meier
Posted
15-Apr-2013 07:41 GMT
Toolset
ARM
New! Hard fault at misaligned memcpy memset

we made some simple tests with STM32F100 Value Line Eval Board:

//------------------------------------------------------------------------------
// Variables
static unsigned char sDstBuf[1024]; // 1KiB
static unsigned char sSrcBuf[sizeof(sDstBuf)];

printf("Copying words from misaligned src to aligned dst buffer... ");
memset(sDstBuf, 0xcd, sizeof(sDstBuf));

with optimize Level 3, optimize for time this takes
120usec

with optimize Level 0
155usec

almost the same if memcpy is used:
memcpy(sDstBuf, (const void *)0xcd, sizeof(sDstBuf));

It runs into hard fault, if optimize Level >=1 and optimise for time is not set.

I think this is a compiler error..

We ran into this before with MDK 4.60, now we use 4.70A

Werner

Read-Only
Author
Werner Meier
Posted
16-Apr-2013 08:12 GMT
Toolset
ARM
New! RE: Hard fault at misaligned memcpy memset

Sorry, there is more to it, it is not memset / memcpy, I have not understood the code correctly:

the offending code is

    for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned!
    pSrcWord = (unsigned int*) (sSrcBuf + 1);
        // Misaligned!
        pSrcWord
            < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
        pSrcWord++)
    {
        *pDstWord = *pSrcWord;
    }

optimize >= 1 for size:

    for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned!
    pSrcWord = (unsigned int*) (sSrcBuf + 1);
        // Misaligned!
        pSrcWord
            < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
        pSrcWord++)
    {
        *pDstWord = *pSrcWord;
    }

leads to this disassembly part:
0x08002446 CC02      LDM      r4!,{r1} ; >>>> after this: Hardfault occurs
0x08002448 6001      STR      r1,[r0,#0x00]
   372:         pSrcWord
   373:             < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
   374:         pSrcWord++)
   375:     {
   376:         *pDstWord = *pSrcWord;
   377:     }
0x0800244A 42B4      CMP      r4,r6
0x0800244C D3FB      BCC      0x08002446


optimize 0 does this:
   370:     pSrcWord = (unsigned int*) (sSrcBuf + 1);
   371:         // Misaligned!
   372:         pSrcWord
   373:             < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
   374:         pSrcWord++)
   375:     {
0x080027FE 4C39      LDR      r4,[pc,#228]  ; @0x080028E4
0x08002800 1C64      ADDS     r4,r4,#1
0x08002802 E002      B        0x0800280A
   376:         *pDstWord = *pSrcWord;
   377:     }
0x08002804 6820      LDR      r0,[r4,#0x00]
0x08002806 6038      STR      r0,[r7,#0x00]
   374:         pSrcWord++)
   375:     {
   376:         *pDstWord = *pSrcWord;
   377:     }
0x08002808 1D24      ADDS     r4,r4,#4
   372:         pSrcWord
   373:             < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
   374:         pSrcWord++)
   375:     {
   376:         *pDstWord = *pSrcWord;
   377:     }
0x0800280A 4847      LDR      r0,[pc,#284]  ; @0x08002928
0x0800280C 42A0      CMP      r0,r4
0x0800280E D8F9      BHI      0x08002804

Offending command:
LDM r4!,{r1} ; >>>> after this: Hardfault occurs

Read-Only
Author
Tamir Michael
Posted
16-Apr-2013 08:15 GMT
Toolset
ARM
New! RE: Hard fault at misaligned memcpy memset

I think you need to have a look at the user manual of your chip to understand how LDR interacts with unaligned addresses. Many ARM chips differ in that sense.

Read-Only
Author
Tamir Michael
Posted
16-Apr-2013 08:16 GMT
Toolset
ARM
New! RE: Hard fault at misaligned memcpy memset

More correctly, you need to have a look at the assembly manual of your toolchain (using ARM compiler...?).

Read-Only
Author
Werner Meier
Posted
16-Apr-2013 08:21 GMT
Toolset
ARM
New! RE: Hard fault at misaligned memcpy memset

it's the assembly that is being produced and shown by uVision debugger
ARM MDK 4.70A
ARMCC.EXE V5.03.0.24

First post mentiones this.
Optimise for speed or optimise Level 0 runs without problems.

Werner

Read-Only
Author
Tamir Michael
Posted
16-Apr-2013 08:27 GMT
Toolset
ARM
New! RE: Hard fault at misaligned memcpy memset

But what does the manual say about LDR's behavior under such conditions?

Read-Only
Author
Werner Meier
Posted
16-Apr-2013 08:44 GMT
Toolset
ARM
New! RE: Hard fault at misaligned memcpy memset

I do not write Assembly, and I do not know much about it. The Assembly code is produced by the C-Source -> compiled. (mentioned in my second post)

with optimise >=1 it produces the first (offending code)
with optimise 0 the second code is produced (which works fine)

Werner

Read-Only
Author
Tamir Michael
Posted
16-Apr-2013 08:57 GMT
Toolset
ARM
New! RE: Hard fault at misaligned memcpy memset

Look, it does not matter that you don't work with assembly directly. You need to understand what's wrong, and the answer is right under your nose. It is up to you to decide whether to burn the 250 calories finding out...

Read-Only
Author
Werner Meier
Posted
16-Apr-2013 13:10 GMT
Toolset
ARM
New! RE: Hard fault at misaligned memcpy memset

Thank you all for your insights and warnings. I will inform if I learn something from Keil support.

Werner

Read-Only
Author
m-g hua
Posted
16-Apr-2013 09:22 GMT
Toolset
ARM
New! RE: Hard fault at misaligned memcpy memset

I just think the compiler should not produce code that leads to a hard fault.

Sorry mate, but a statement like that sends a shiver up my spine.

Read-Only
Author
John Linq
Posted
16-Apr-2013 09:24 GMT
Toolset
ARM
New! RE: in detail what could be wrong.

Cast a 1-Byte aligned pointer to a 4-Bytes aligned pointer
would confuse the compiler.

For 1-Byte aligned pointer -> LDR
For 4-Bytes aligned pointer with higher optimization -> LDM

Read-Only
Author
Cássio de Lazaro
Posted
3-Feb-2014 11:42 GMT
Toolset
ARM
New! RE: Hard fault at misaligned memcpy memset

You don't need to be so stupid like that.

Read-Only
Author
Andrew Neil
Posted
16-Apr-2013 09:05 GMT
Toolset
None
New! Optimisation very often breaks flawed code.

with optimise >=1 it produces the first (offending code)
with optimise 0 the second code is produced (which works fine)

Optimisation very often breaks flawed code.

the offending code is

    for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned!
    pSrcWord = (unsigned int*) (sSrcBuf + 1);
        // Misaligned!
        pSrcWord
            < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
        pSrcWord++)
    {
        *pDstWord = *pSrcWord;
    }

How are you sure that those casts don't end up giving you unaligned addresses...?

Read-Only
Author
Andrew Neil
Posted
16-Apr-2013 09:21 GMT
Toolset
None
New! missed one!
    for (pDstWord = (unsigned int*) (sDstBuf + 0), // Aligned!
    pSrcWord = (unsigned int*) (sSrcBuf + 1);
        // Misaligned!
        pSrcWord
            < (unsigned int*) (sSrcBuf + sizeof(sSrcBuf) - sizeof(*pSrcWord));
        pSrcWord++)
    {
        *pDstWord = *pSrcWord;
    }
Read-Only
Author
b t
Posted
16-Apr-2013 09:31 GMT
Toolset
None
New! RE: Cross Post

You should set the compiler switch "--no_unaligned_access" in Keil for Cortex M3/M4.(In fact it would be better, if it would be set by default already ...).

ARM7 has the principal possibility to support access at 2-Byte addresses for LDR and STR commands - but it is quite stupid, as it is not faster than two 4-Byte (=32-bit aligned) accesses. So you should switch this off in the compiler. (if you want to use it, you have to switch it on in the CPU - see the "system ... .c" file - best search for the keyword "aligned" in the ARM7 TRM / STM32F4 Programming Manual / Cortex M4 TRM).

Read-Only
Author
Per Westermark
Posted
16-Apr-2013 16:29 GMT
Toolset
None
New! RE: Cross Post

Note that some memory controllers can hide unaligned access - they just force the core to wait extra wait states while the memory controller performs multiple memory accesses and then glues together the partial reads.

I hope no chip gets a memory controller that performs such unaligned hiding for any peripherial device, or really bad things can happen - for peripherials, it isn't always safe to do an extra read. And an unaligned memory accesses can also trig special hardware logic for the neighbor word - potentially saying that an UART status register have been read and is now "cleared".

In almost all situations, code should make sure zero unaligned accesses are performed - the main exception is when storing a big array of "data records" where a significant amount of memory can be saved by packing the data.

Read-Only
Author
scott douglass
Posted
16-Apr-2013 16:46 GMT
Toolset
None
New! --no_unaligned_access

>>
You should set the compiler switch "--no_unaligned_access" in Keil for Cortex M3/M4.(In fact it would be better, if it would be set by default already ...).
<<

No; that's not what --no_unaligned_access means.

When you use --no_unaligned_access it tell armcc that it must not access unaligned data with LDR/STR (and so the processor can be set to disallow unaligned access). This mean that other, less-efficient code sequences will be used to access unaligned data. Accessing data that is guaranteed to be aligned, like (int *), will still use LDR/STR (or even LDM/STM).

Using --no_unaligned_access does *not* allow you to cast aligned values to (int *). Doing that is *undefined behavior* and the compiler can cause anything to happen that it wants, up to and including, but limited to, causing you to waste a lot of effort tracking down the problem in the hope that you'll learn never to lie to the compiler again.

Read-Only
Author
b t
Posted
17-Apr-2013 06:55 GMT
Toolset
None
New! RE: --no_unaligned_access

Hi Scott,
thanks for your warning "Using --no_unaligned_access does *not* allow you to cast aligned values to (int *).". I did not know this before.

But in fact I am not sure whether I understand your warning correctly. To make it more clear, could you perhaps give an example of a short C code snippet, where this would typically happen?

I started using "--no_unaligned_access" when I ran into a problem with the function

memcpy( &ac1, &ac2, 6)


to copy 6 bytes from ac2 to ac1. In this case (in Opt Level 1) the compiler used this halfword-aligned LDR/STR commands as described in ARMv7 TRM A3.2.1 - which require that the bit SCB->CCR.UNALIGN_TRP is set to 0 - I thought that this bit was set to 1 in the ST4Discovery and Blinky examples by default, but now checking it more thoroughly, I recognized that I set it somewhen to 1 myself in my InitTraps function - this was a bit over-ambitious then, I think).

I now re-compiled my complete code without this "--no_unaligned_access" flag, and I recognized that there seems to be really quite a difference - my code size shrinks from 40192 to 40032 Bytes (Opt Level 1) - which means that this flag really seems to touch very many parts of my program (I use the memcpy with 6 Bytes only once or twice in my code - this cannot explain the 160 bytes less in code size). So now I think I will remove the flag "--no_unaligned_access" and also I will remove the setting of SCB->CCR.UNALING_TRP.

... just anyway to complete my understanding of this topic with the "int*"s you touched - if you could give a short typical code example for this, this would be very helpful for me (as from the ARM-Cortex M4 TRM, "3.3.2 Load/store timings", "Unaligned word or halfword loads or stores add penalty cycles..." sounds to me a bit like a warning to the reader, better NOT to use such unaligned LDR/STR) (In fact I was a bit bewildered, why they would allow such unaligned LDR/STR at all - I thought it might be some sort of historical compatibility stuff, as it was introduced in ARMv6 and they somehow just wanted to keep it in ARMv7).

Read-Only
Author
b t
Posted
17-Apr-2013 08:32 GMT
Toolset
None
New! RE: --no_unaligned_access

Thank you for the interesting link to the discussion.

But I must admit, after reading all this, I am still unsure, whether not perhaps better specify the compiler switch "--no_unaligned_access".

Even in your answer, you said that the memcpy functions might be slower if I skip the "--no_unaligned_access".

You did not further elaborate the problem with the "(int*)" usage? Is it difficult to give some basic code example for this?

Read-Only
Author
b t
Posted
17-Apr-2013 10:38 GMT
Toolset
None
New! RE: --no_unaligned_access

Thank you, this is instructive (in my next life I will learn Chinese :)).

Read-Only
Author
scott douglass
Posted
17-Apr-2013 19:02 GMT
Toolset
None
New! RE: --no_unaligned_access

To clarify what I meant about (int *), consider:

char a[10];
char *cp = &a[1];
void *vp = &a[1];
int *ip1, *ip2;
...
    ip1 = (int *)cp;  // may be undefined behavior
    ip2 = cp;    // C++ requires a cast here, C does not; still may be undefined behavior
...

The two statements both invoke undefined behavior if '&a[1]' does not have the alignment required for 'int' (and it probably doesn't). Probably, nothing bad will happen until '*ip1' or '*ip2' is used. Of course, it might also be the case that nothing bad happens right away or ever -- there are no requirements at all on what happens after undefined behavior.

In my opinion, casts (and converting from 'void *') are something to be avoided when at all possible because they can easily hide errors. There are cases where they are necessary, but I prefer to avoid using them.

[In my weak attempt at lawyer-style humor in a post above, I, of course, meant "..., up to and including, but not limited to, ...".]

Read-Only
Author
H Falloon
Posted
17-Apr-2013 19:40 GMT
Toolset
None
New! 50 ways to leave your

You could consider using a __packed pointer.

Understand the underlying hardware along with what the processor does would help you make a sensible decision about what is best.

My advice would be to not just take the typical "you are wrong" and "the best way to do it is to" and the frequent "the only way to do it is to" that frequently appears from beginners right through to experts and professionals. Just remember that there is more than one way to skin a cat.

If you listen carefully, you might just be able to hear the cries of outrage.

Read-Only
Author
Per Westermark
Posted
17-Apr-2013 20:59 GMT
Toolset
None
New! RE: 50 ways to leave your

Instead of using packed pointers, the code should do the best to align the data manually if trying to store big objects in arrays of smaller objects.

Read-Only
Author
Per Westermark
Posted
17-Apr-2013 22:26 GMT
Toolset
None
New! RE: Some people just can't resist it

And packed is just too easy to use. So people don't take 30 seconds to consider alternatives when all they need is zero thought but activating the magic packed keyword.

Read-Only
Author
Tamir Michael
Posted
18-Apr-2013 08:27 GMT
Toolset
None
New! RE: Some people just can't resist it

"the code should"

Well, doing that has reaching consequences ! Actually _thinking_ is cheaper...

Read-Only
Author
Per Westermark
Posted
18-Apr-2013 08:47 GMT
Toolset
None
New! RE: Some people just can't resist it

1) People don't like to think.

2) People like to see everything as defects in the compiler or the processor.

Read-Only
Author
Werner Meier
Posted
19-Apr-2013 07:47 GMT
Toolset
None
New! RE: --no_unaligned_access

Just got answer from Keil:

This isn't actually a compiler fault, as unless the integer pointer is declared
as packed, it must be suitably aligned. It just happens to be fortuitous
that at -O0 it generates safe code.

I was not aware of this. I also can't remember seeing the key word packed in any of the Cortex M3 Software Examples from ST / TI (Luminary) or NXP..

thank you all again also for exhorting to dig deeper into the machine / assembly fundamentals. With 8051 uControllers which I did before this was relatively easy. With the nicely available librarys and examples for Luminary, which was my starting point some years ago, it was tempting to just make programs and develop applications and products which fortunately worked so far without much trouble.

Read-Only
Author
Per Westermark
Posted
19-Apr-2013 08:02 GMT
Toolset
None
New! RE: --no_unaligned_access

The 8051 is an 8-bit processor. With a processor that only does 8-bit memory accesses, a 16-bit read is two 8-bit accesses anyway.

The x86 is one of the few architectures that does handle unaligned accesses silently because the memory controller can hide it. So all the way back the MS-DOS programs could do whatever they wanted. As soon as they got their hands on a Sun workstation, they started to look surprised and ask "What is a bus error" when their "correct" code suddenly didn't function.

You normally don't see examples with packed pointers or structures for the simple reason that the big majority of all code written makes sure that no data is ever unaligned. And since the way to describe packed data differs between compilers, portable code that needs to access large data at odd addresses often contain "manual" code to perform multiple reads and merge the data into the full size - the compilers are normally intelligent enough to spot this kind of code and make use of whatever special instructions the processor may have for handling unaligned code.

With ARM, it will depend on specific chip (and potentially specific memory region) if the hardware will manage unaligned accesses or not. Same with support for exceptions if trying to access non-existing memory. The core interacts with the memory controller(s) and different chips may integrate different memory controller features.

But having the hardware (or the compiler) silently hide the issue with unaligned data will just make developers write sometimes very inefficient programs without knowing about it. Better to have the majority of the code run at max speed, and only have special "big data" packed, since it is so large amounts of data that packing it represents huge space savings - while still normally have few parts of the code accessing it and often quite seldom too.

IBM decided in their OS to trap unaligned data and silently have the exception handler solve the issue and then return - resulted in programs now and then run _extremely_ (like 100+ times) slower than normal. Was a bug in the program loader that sometimes loaded a program to an odd address - so suddenly almost all memory accesses were unaligned, despite the compiler having done the best to make sure it should be aligned. After that OOPS, IBM realized that hiding the issue isn't really practical - better have programs fail and the developer to fix source code specifically for unaligned accesses. Then the developer will know exactly where extra costs are introduced.

Read-Only
Author
Hans-Bernhard Broeker
Posted
16-Apr-2013 18:37 GMT
Toolset
None
New! RE: Cross Post

You should set the compiler switch "--no_unaligned_access" in Keil for Cortex M3/M4.(In fact it would be better, if it would be set by default already ...).

No, that's not what he should do.

What he should do is stop expecting a compiler to magically turn badly flawed source code into working machine code. The problem here is not in the compiler ... it's in that source code, because that code wilfully disregards both the programming language's and the target architecture's properties.

Read-Only
Author
H Warsulhan
Posted
16-Apr-2013 19:54 GMT
Toolset
None
New! No, that's not what he should do.

Who do you think you are directing your late response to?

Read-Only
Author
Per Westermark
Posted
16-Apr-2013 20:02 GMT
Toolset
None
New! RE: No, that's not what he should do.

Thread create time: 15-Apr-2013 07:41 GMT
Last post before your: 16-Apr-2013 18:37 GMT

(note that Keil servers normally never manages to show correct time - they have incorrect code and/or configuration for used time zone. They can't even show the same time stamp in the thread list as for the individual posts.

But anyway: not a single post in this thread can be called "late response" unless you consider the time of day when the poster made the post. And that would be hard to do if you don't know exactly where in the world they actually live.

Read-Only
Author
H Warsulhan
Posted
16-Apr-2013 20:24 GMT
Toolset
None
New! RE: No, that's not what he should do.

And that would be hard to do if you don't know exactly where in the world they actually live.

Err, no. A typical method of determining elapsed time is End-Start. If you remembered the local time of day when the original post appeared and you also know the local time of when a condescending response appeared, you can then easily calculate the delay in that condescending response.

Read-Only
Author
Per Westermark
Posted
16-Apr-2013 20:44 GMT
Toolset
None
New! RE: No, that's not what he should do.

If you did read my post, and pondered a bit, you wouldn't "err, no" me.

A large percent of threads on this forum runs for a number of days.

Look at the time stamps of first/last post here - this is a young thread so _no_ answer here can be considered a late response.

Which was why I noted that the only way you could debate 'late' was if someone posted it late at night. But that requires that you know what time zone the person lives in.

You realize this is a web forum? Not a chat program where you get a 'beep' or something in the mobile phone, and then instantly writes a response?

Read-Only
Author
Hans-Bernhard Broeker
Posted
16-Apr-2013 21:16 GMT
Toolset
None
New! RE: No, that's not what he should do.

Who do you think you are directing your late response to?

And who do you think you are? Let's see: so far all you've exhibited here is

a) a knack for totally missing the topic of discussion in every single one of your three, well, "contributions",
b) a total lack of understanding of the medium you're using, and
c) a strangely narrow selection of targets for your insinuations: me

Here's some free advice for you: the next time you decide to launch a random campaign of throwing stuff at someone, you might want to step out of that glass house of yours, first.

Read-Only
Author
H Warsulhan
Posted
16-Apr-2013 21:29 GMT
Toolset
None
New! RE: No, that's not what he should do.

And who do you think you are?

I'm one who knows how to use a search facility effectively. It's interesting going back to see historical posts.

Read-Only
Author
erik malund
Posted
16-Apr-2013 20:39 GMT
Toolset
None
New! oh, how often

What he should do is stop expecting a compiler to magically turn badly flawed source code into working machine code. The problem here is not in the compiler ... it's in that source code, because that code wilfully disregards both the programming language's and the target architecture's properties.

oh, how often do we see the "what the #$#$ is that, I do not need to know about stupid hardware I am a software person"

If that is your attitude STAY OUT OF EMBEDDED.

Erik

Read-Only
Author
b t
Posted
18-Apr-2013 07:51 GMT
Toolset
None
New! RE: this is instructive

Just are you sure about the quality of this link? My PC had some strange error messages about hanging software when driving down yesterday. And this link does not look very official.

Next Thread | Thread List | Previous Thread Start a Thread | Settings

Keil logo

Arm logo
Important information

This site uses cookies to store information on your computer. By continuing to use our site, you consent to our cookies.

Change Settings

Privacy Policy Update

ARM’s Privacy Policy has been updated. By continuing to use our site, you consent to ARM’s Privacy Policy. Please review our Privacy Policy to learn more about our collection, use and transfers
of your data.