Reversal knowledge in this case being, if the LLM knows that A is B, does it also know that B is A, and apparently the answer is pretty resoundingly no! I’d be curious to see if some CoT affected the results at all
Meh. Either I’m doing something wrong. Or we should stop linking (only) twitter posts. I can only see the original 42 words and a picture. No mentioned paper or thread that clarifies what this means.
For other people with the same problem, here’s the website of the person: https://owainevans.github.io/
And here’s the mentioned paper: https://owainevans.github.io/reversal_curse.pdf