Yup. If that 12-cycle speedup is in a hot loop, then yeah, throw a bunch of comments and tests around it and perhaps keep the “clean” version around for illustrative purposes, and then do the fast thing. Perhaps throw in a feature flag to switch between the “clean” and “fast but a little sketchy” versions, and maybe someone will make a method to memoize pure functions generically so the “clean” version can be used with minimal performance overhead.
Clean code should be the default, optimizations should come later as necessary.
Keeping the clean version around seems dangerous advice.
You know it won’t get maintained if there are changes / fixes. So by the time someone may needs to rewrite the part, or application many years later (think migration to different language) it will be more confusing than helping.