You are viewing a single thread.
View all comments View context
14 points

Loop unrolling is not really the speedup, autovectorization is. Loop unrolling does often help with autovectorization, but is not enough, especially with floating point numbers. In fact the accumulation operation you’re doing needs to be associative, and floating point numbers addition is not associative (i.e. (x + y) + z is not always equal to (x + (y + z)). Hence autovectorizing the code would change the semantics and the compiler is not allowed to do that.

permalink
report
parent
reply
7 points

so if (somehow) the accumulator was an integer, this loop would autovectorize and the performance differences would be smaller ?

permalink
report
parent
reply
4 points

Very likely yes

permalink
report
parent
reply

Rust

!rust@programming.dev

Create post

Welcome to the Rust community! This is a place to discuss about the Rust programming language.

Wormhole

!performance@programming.dev

Credits
  • The icon is a modified version of the official rust logo (changing the colors to a gradient and black background)

Community stats

  • 596

    Monthly active users

  • 886

    Posts

  • 3.8K

    Comments