How do I calculate if a test like this is statistically significant?

posted 1 year ago

I let people rate how much they like different things on a scale of 1-10. How do I actually tell if people like one thing more than another thing if the sample sizes are different? This is not about any real scientific study, more like a personal test :)

For example, if one thing got voted on 10 times and has an average value of 6.5, and another thing got voted on 6 times and has a 6.1, is the 6.5 thing actually more liked? Or is this small sample size still so random that it could with a high chance go both ways?

I’ve never done anything like this, if someone could explain it or direct me to the correct key words/links, that would be hugely appreciated :)

I’ve read up a bit on p-value determination, but I’m not sure what my “null hypothesis” is here actually, numerically. If I’d put it in words I guess my hypothesis would be “this thing is more liked than the other thing”, but honestly, it seems like my specific case would be much simpler than all the stuff I’m reading here :D

Sort:

Hot Top Controversial New Old

[ - ]

altairabove@lemmy.world

7 points

1 year ago

You could use a few different null hypotheses here. One with minimal assumptions would be that the medians are equal. This can be tested using the Mann-Whitney U test. https://en.m.wikipedia.org/wiki/Mann–Whitney_U_test

permalink

report

[ - ]

Azzu@lemm.eeOP

2 points

1 year ago

This seems like exactly the case here :) I will read up and try this

permalink

report

parent

[ - ]

Jlafs@lemmy.world

5 points

1 year ago

Your null hypothesis is the thing you’re trying to disprove. For example, if I wanted to run a study to asses the effect of adding a certain growth hormone to a cell culture, my null hypothesis would be “there is no effect”. In your case, it would be “there is no difference in how much different things are liked”. From there, you’d run your study, and do your statistical analysis, for which there are different methods based on the type of data, number of groups your comparing, sample size, etc., and I’m not a statistician so I can’t say which methods are best for what you’re planning.

When it comes to p-value, to really simplify it, you can think of your p-value as the likelihood your null hypothesis is true. That’s not exactly what it means, but it’s an easy way to remember it.

permalink

report

[ - ]

TauZero@mander.xyz

5 points

1 year ago

Your situation reminded me of the way IMDB sorts movies by rating, even though different movies may receive vastly different total number of votes. They use something called a credibility formula which is apparently a Bayesian statistics way of doing it, unlike the frequentist statistics with p-values and null hypotheses that you are looking for atm.

permalink

report

[ - ]

JWBananas@startrek.website

2 points

1 year ago

People are inherently bad at rating things. Why not run a “This or that?” style study instead?

Given a list of items to rate, pair them up randomly. Ask a person which item they like better out of each pair. Run through Final Four type eliminations until you get down to their number one preference.

Run through this process for each person, beginning with different random pairings every time.

Record data on all the choices - not just the final ones. You should be able to get good data like that.

For example, there will probably be a thing that is so disliked that it gets eliminated in the first round more frequently than anything else. The inverse will likely be true of a highly-preferred item. And I am sure you can identify other insights as well.

permalink

report

[ - ]

Azzu@lemm.eeOP

2 points

1 year ago

Sounds like a good idea, however my participants neither have the attention span nor do I have the resources to do anything else :) after all, like I said, it’s just a small personal thing :)

permalink

report

parent

Ask Science

!askscience@lemmy.world

Create post

Ask a science question, get a science answer.

Community Rules

Rule 1: Be respectful and inclusive.

Treat others with respect, and maintain a positive atmosphere.

Rule 2: No harassment, hate speech, bigotry, or trolling.

Avoid any form of harassment, hate speech, bigotry, or offensive behavior.

Rule 3: Engage in constructive discussions.

Contribute to meaningful and constructive discussions that enhance scientific understanding.

Rule 4: No AI-generated answers.

Strictly prohibit the use of AI-generated answers. Providing answers generated by AI systems is not allowed and may result in a ban.

Rule 5: Follow guidelines and moderators' instructions.

Adhere to community guidelines and comply with instructions given by moderators.

Rule 6: Use appropriate language and tone.

Communicate using suitable language and maintain a professional and respectful tone.

Rule 7: Report violations.

Report any violations of the community rules to the moderators for appropriate action.

Rule 8: Foster a continuous learning environment.

Encourage a continuous learning environment where members can share knowledge and engage in scientific discussions.

Rule 9: Source required for answers.

Provide credible sources for answers. Failure to include a source may result in the removal of the answer to ensure information reliability.

By adhering to these rules, we create a welcoming and informative environment where science-related questions receive accurate and credible answers. Thank you for your cooperation in making the Ask Science community a valuable resource for scientific knowledge.

We retain the discretion to modify the rules as we deem necessary.

Community stats

539
Monthly active users
217
Posts
3K
Comments

Ask a science question, get a science answer.

Community Rules

Community stats

Community moderators