If GPTs only predict the next word how do they decide between "a" and "an". Wouldn't this have massive effect on their abilities?

posted 1 year ago

spread@programming.dev

askscience@lemmy.world

8 commentshide report

Sort:

Hot Top Controversial New Old

You are viewing a single thread.

View all comments View context

[ - ]

GarrettBird@lemmy.world

2 points

1 year ago

Well, my example of the word ‘elephant’ has the same property as ‘herb’ where the use of ‘a’ or ‘an’ can depend on who you ask. I chose my example trying to anticipate this exact question, and I believe I gave you an answer.

Let me put it this way: it depends… It depends on the data the LLM (Chat GPT for example) has been given to train its output. If we have an LLM dataset which uses only text by people in the United Kingdom, then the data will favor “a herb” as the ‘h’ is pronounced, where data from the United States will favor the other way as the ‘h’ is usually silent when spoken out loud.

As a fairly general rule, people use the article “an” before a vowel sound (like a silent “h”) and “a” before a consonant sound (like a pronounced, or aspirated, “h”). Usually the data gathered is from multiple English speaking countries, so both “an herb” and “a herb” will exist in the training data, and from there the LLM will favor picking the one that is shown more often (as the data will biased.)

Just for fun, I asked the LLM running on my local machine. Prompt: "Fill in the blank: “It is _ herb” Response: “It is an herb.”

permalink

report

parent

Ask Science

!askscience@lemmy.world

Create post

Ask a science question, get a science answer.

Community Rules

Rule 1: Be respectful and inclusive.

Treat others with respect, and maintain a positive atmosphere.

Rule 2: No harassment, hate speech, bigotry, or trolling.

Avoid any form of harassment, hate speech, bigotry, or offensive behavior.

Rule 3: Engage in constructive discussions.

Contribute to meaningful and constructive discussions that enhance scientific understanding.

Rule 4: No AI-generated answers.

Strictly prohibit the use of AI-generated answers. Providing answers generated by AI systems is not allowed and may result in a ban.

Rule 5: Follow guidelines and moderators' instructions.

Adhere to community guidelines and comply with instructions given by moderators.

Rule 6: Use appropriate language and tone.

Communicate using suitable language and maintain a professional and respectful tone.

Rule 7: Report violations.

Report any violations of the community rules to the moderators for appropriate action.

Rule 8: Foster a continuous learning environment.

Encourage a continuous learning environment where members can share knowledge and engage in scientific discussions.

Rule 9: Source required for answers.

Provide credible sources for answers. Failure to include a source may result in the removal of the answer to ensure information reliability.

By adhering to these rules, we create a welcoming and informative environment where science-related questions receive accurate and credible answers. Thank you for your cooperation in making the Ask Science community a valuable resource for scientific knowledge.

We retain the discretion to modify the rules as we deem necessary.

Community stats

437
Monthly active users
217
Posts
3K
Comments

Ask a science question, get a science answer.

Community Rules

Community stats

Community moderators