You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running signal processing algorithm under cortex A53.
The code is written with Intrinsic C.
I measured performance of Matrix multiply of complex matrix by scalar matrix, scalar multiply of complex float vector by complex float vector.
It seems that when in\out is interleaved (re0,im0,re1,im1...) the performance is lower compared to non-interleaved in\out.
In case of interleaved I'm using: vld2q_f32, vst2q_f32
In case of non-interleaved: vld1q_f32, vst1q_f32
Do you think it make sense to create a c2c that will get non-interleaved input ?
Thank you,
Zvika
The text was updated successfully, but these errors were encountered:
Hello,
I'm running signal processing algorithm under cortex A53.
The code is written with Intrinsic C.
I measured performance of Matrix multiply of complex matrix by scalar matrix, scalar multiply of complex float vector by complex float vector.
It seems that when in\out is interleaved (re0,im0,re1,im1...) the performance is lower compared to non-interleaved in\out.
In case of interleaved I'm using: vld2q_f32, vst2q_f32
In case of non-interleaved: vld1q_f32, vst1q_f32
Do you think it make sense to create a c2c that will get non-interleaved input ?
Thank you,
Zvika
The text was updated successfully, but these errors were encountered: