2 points
*
Deleted by creator
permalink
report
reply
11 points

Did author knows about difference between static and dynamic dispatch? 🤦🏻‍♂️

permalink
report
reply
6 points
*

I agree with the conclusion, and the exploration is interesting enough that I think it was worth sharing. Still, while the author seemingly knows this already based on their conclusion, it’s still worth stressing: these kinds of microbenchmarks rarely reflect real world performance.

This toy case doesn’t have many (if any) real world performance-sensitive applications. At best, using shapes in games comes to mind, but shapes there are often represented as meshes, and if you really need the area that much, you might find that precalculating the area once is more impactful on the performance than optimizing how fast the area is calculated.

Still, the author seems aware, and it seems to just be the author sharing their fun experiment.

permalink
report
reply
43 points

Casey’s video is interesting, but his example is framed as moving from 35 cycles/object to 24 cycles/object being a 1.5x speedup.

Another way to look at this is, it’s a 12-cycle speedup per object.

If you’re writing a shader or a physics sim this is a massive difference.

If you’re building typical business software, it isn’t; that 10,000-line monster method does crop up, and it’s a maintenance disaster.

I think extracting “clean code principles lead to a 50% cost increase” is a message that needs taking with a degree of context.

permalink
report
reply
5 points

For what its worth , the cache locality of Vec<Box<Dyn trait>> is terrible in general, i feel like if youre iterating over a large array of things and applying a polymorphic function you’re making a mistake.

Cache locality isnt a problem when youre only accessing something once though.

So imo polymorphism has its place for non iterative-compute type work, ie web server handler functions and event driven systems.

permalink
report
parent
reply
15 points

Yup. If that 12-cycle speedup is in a hot loop, then yeah, throw a bunch of comments and tests around it and perhaps keep the “clean” version around for illustrative purposes, and then do the fast thing. Perhaps throw in a feature flag to switch between the “clean” and “fast but a little sketchy” versions, and maybe someone will make a method to memoize pure functions generically so the “clean” version can be used with minimal performance overhead.

Clean code should be the default, optimizations should come later as necessary.

permalink
report
parent
reply
1 point

Keeping the clean version around seems dangerous advice.

You know it won’t get maintained if there are changes / fixes. So by the time someone may needs to rewrite the part, or application many years later (think migration to different language) it will be more confusing than helping.

permalink
report
parent
reply
5 points

Easy solution: write tests to ensure equivalent behavior.

permalink
report
parent
reply
16 points

Correct me if I am wrong but isn’t “loop unrolling/unwinding” something that the c++ and rust compilers do? Why does the loop here not get unwound?

permalink
report
reply
14 points

Loop unrolling is not really the speedup, autovectorization is. Loop unrolling does often help with autovectorization, but is not enough, especially with floating point numbers. In fact the accumulation operation you’re doing needs to be associative, and floating point numbers addition is not associative (i.e. (x + y) + z is not always equal to (x + (y + z)). Hence autovectorizing the code would change the semantics and the compiler is not allowed to do that.

permalink
report
parent
reply
7 points

so if (somehow) the accumulator was an integer, this loop would autovectorize and the performance differences would be smaller ?

permalink
report
parent
reply
4 points

Very likely yes

permalink
report
parent
reply

Rust

!rust@programming.dev

Create post

Welcome to the Rust community! This is a place to discuss about the Rust programming language.

Wormhole

!performance@programming.dev

Credits
  • The icon is a modified version of the official rust logo (changing the colors to a gradient and black background)

Community stats

  • 753

    Monthly active users

  • 866

    Posts

  • 3.5K

    Comments