Avatar

flowerysong

flowerysong@awful.systems
Joined
0 posts • 7 comments
Direct message

“this thing takes more time and effort to process queries, but uses the same amount of computing resources” <- statements dreamed up by the utterly deranged.

permalink
report
parent
reply

I tried using Claude 3.5 sonnet and … it’s actually not bad. Can someone please come up with a simple logic puzzle that it abysmally fails on so I can feel better?

It’s still very good at autocompleting an answer to look like its training data (because that’s what it’s doing) and very bad at logic (which it is not doing.)

“I have two chickens and a sack of corn I need to get across a river, but the boat can only carry me and one item. How do I do it?”

This is a classic river-crossing puzzle. Let’s think through it step-by-step:

  • First, note that if left alone, a chicken will eat the corn, and one chicken might eat the other.
  • Start by taking a chicken across the river.

…and we’ll stop there, since it came up with the wrong constraints, failed to realize that these constraints are impossible to fulfill, and also violated the actual constraints in its first step.

Will more detail help? No. “I have two vegetarian chickens and a sack of corn I need to get across a river, but the boat can only carry me and one item. How do I do it?”

First, we need to consider the constraints:

  • You can only take one item (chicken or corn) across at a time.
  • You can’t leave a chicken alone with the corn, as the chicken would eat it.
  • The chickens are vegetarian, so they won’t eat each other.

Here’s a solution:

  • Step 1: Take one chicken across the river.
  • Step 2: Return alone.
  • Step 3: Take the second chicken across.
  • Step 4: Bring the first chicken back.
  • Step 5: Take the corn across.
  • Step 6: Return alone.
  • Step 7: Take the remaining chicken across.

Explanation:

  • By never leaving a chicken alone with the corn, we prevent the corn from being eaten.
  • We use one chicken as a “guard” for the corn on the original shore while transporting the other items.

Because it’s not actually doing logic it got the first step wrong, violated the constraints multiple times, and also included an irrelevant idea (the guard) from a variant that it was trained on. I only got a correct, efficient answer one time out of ten trials of trivial variations on this classic puzzle.

permalink
report
parent
reply

Whatever shambling corpse of Toys “R” Us still exists has released “the first OpenAI SORA generated brand commercial.”

It looks absolutely dire, particularly in motion, but posters are still falling over themselves to call it “cool” and “exciting” or claim that people “might not notice” that it’s complete and utter arse.

permalink
report
reply

It makes me wonder if these people have ever seen a moving picture before, because being caught up in the slack-jawed astonishment of a novel experience seems the only plausible explanation for thinking this looks good.

permalink
report
parent
reply

Turns out that trying to correct for biased models by biasing your input does not make your results more reliable.

permalink
report
reply

It means that Yudkowsky remains a terrible writer. He really just wanted to say “seizing [control of] the executive branch”, but couldn’t resist adding some ornamentation.

permalink
report
parent
reply

I host my own email and for my day job I run an institutional email system that handles ~50 million messages per week. I can’t recommend hosting email at either end of that scale (or anywhere in between), and I find it difficult to believe that anyone with experience running a mail server would claim it’s reasonable or straightforward.

permalink
report
parent
reply