Small rant : Basically, the title. Instead of answering every question, if it instead said it doesn’t know the answer, it would have been trustworthy.

You are viewing a single thread.
View all comments View context
2 points

Do you have a source for the “smiling when you don’t really mean it” thing? I’ve been digging around but couldn’t find that anywhere.

permalink
report
parent
reply
1 point

It’s right in the research I was mentioning:

https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html

Find the section on the model’s representation of self and then the ranked feature activations.

I misremembered the top feature slightly, which was: responding “I’m fine” or gives a positive but insincere response when asked how they are doing.

permalink
report
parent
reply

ChatGPT

!chatgpt@lemmy.world

Create post

Unofficial ChatGPT community to discuss anything ChatGPT

Community stats

  • 447

    Monthly active users

  • 270

    Posts

  • 2.2K

    Comments

Community moderators