Hi,
I use the CMSIS DSP library to calculate a f32 RFFT with a block size of 1024. For this it takes 262823 cycles. I read in the white-paper that it should take 55538 cycles. Because of the measurement is too slow, i have to speed it up. What could be the reason of the too much cycles?
Thanks
My setup: Cortex-M4 on a TI-MSP432 TI-Code Composer Studio wait states = 0 function: arm_rfft_fast_f32()
white-paper: community.arm.com/.../7563.ARM-white-paper-_2D00_-DSP-capabilities-of-Cortex_2D00_M4-and-Cortex_2D00_M7.pdf
Check optimization settings, and that building has FPU enabled. Review listing file or disassembly of code for FPU instructions and tightness of generated code.
What white paper?
Have you carefully compared all your options and settings to those used in that "white paper"?
Are you sure that your use case is comparable - ie, are you comparing like with like?
Oh - that one: community.arm.com/.../7563.ARM-white-paper-_2D00_-DSP-capabilities-of-Cortex_2D00_M4-and-Cortex_2D00_M7.pdf !
Will this forum ever manage to recognise HTTPS URLs ?!
email the notification ended in my spam folder...
- Yes the building has FPU enabled. - Optimization settings: At first I used the default setting, then I set it to full speed setup. The difference was marginally. - Yes, I think that I compare like with like. I calculate a RFFT F32 with block size 1024 on a Cortex-M4 without wait states.
What else could cause be the difference?
My new reply from the 11th February is between the other replies... maybe it will not be recognized.