lemm.ee

Local All Communities Log in Sign up

Local All Communities

nsa

nsa@kbin.social

Joined1 year ago

16 posts • 11 comments

Filter:

Overview Posts Comments

Sort:

1

pl.aiwright - GPT-4 dialogue for Disco Elysium: The Final Cut(pl.aiwright.dev)

posted 1 year ago

by

nsa@kbin.social

in

PCGaming@kbin.social

2

What's In My Big Data?(arxiv.org)

posted 1 year ago

by

nsa@kbin.social

in

machinelearning@kbin.social

2

The Data Provenance Initiative: A Large Scale Audit of Dataset Licensing & Attribution in AI(arxiv.org)

posted 1 year ago

by

nsa@kbin.social

in

machinelearning@kbin.social

1

GPT-4 Doesn't Know It's Wrong: An Analysis of Iterative Prompting for Reasoning Problems(arxiv.org)

posted 1 year ago

by

nsa@kbin.social

in

machinelearning@kbin.social

2

A Long Way to Go: Investigating Length Correlations in RLHF(arxiv.org)

posted 1 year ago

by

nsa@kbin.social

in

machinelearning@kbin.social

3

Think before you speak: Training Language Models With Pause Tokens(arxiv.org)

posted 1 year ago

by

nsa@kbin.social

in

machinelearning@kbin.social

2

Language Modeling Is Compression(arxiv.org)

posted 1 year ago

by

nsa@kbin.social

in

machinelearning@kbin.social

5

Retentive Network: A Successor to Transformer for Large Language Models(arxiv.org)

posted 1 year ago

by

nsa@kbin.social

in

machinelearning@kbin.social

[ +- ]

nsa@kbin.social

1 point

1 year ago

in machinelearning@kbin.social•Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Averaging model weights seems to help across textual domains as well, see Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models and Scaling Expert Language Models with Unsupervised Domain Discovery. I wonder if the two types of averaging (across hyperparameters and across domains) can be combined to produce even better models.

report

reply

4

CoDi: Generate Anything from Anything All At Once through Composable Diffusion(codi-gen.github.io)

posted 1 year ago

by

nsa@kbin.social

in

machinelearning@kbin.social

[ +- ]

nsa@kbin.social

1 point

1 year ago

in machinelearning@kbin.social•Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

That’s appreciated!

report

reply

[ +- ]

nsa@kbin.socialOP

1 point

1 year ago

in machinelearning@kbin.social•Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Research into efficient optimization techniques seems pretty important given the scale of LLMs these days. Nice to see a second-order approach that achieves reasonable wall-clock improvements.

report

reply

2

Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training(arxiv.org)

posted 1 year ago

by

nsa@kbin.social

in

machinelearning@kbin.social

[ +- ]

nsa@kbin.social

2 points

1 year ago

in machinelearning@kbin.social•Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

If there isn’t any discussion on reddit (no discussion in this case), I don’t see a reason to link to reddit; you can just link to the project page. That said, if you think there is important discussion happening that is helpful for understanding the paper, then use a teddit link instead, like:

https://teddit.net/r/MachineLearning/comments/14pq5mq/r_hardwiring_vit_patch_selectivity_into_cnns/

report

reply

[ +- ]

nsa@kbin.social

1 point

1 year ago

in machinelearning@kbin.social•Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

Please don’t post links to reddit.

report

reply

[ +- ]

nsa@kbin.socialOP

2 points

1 year ago

in machinelearning@kbin.social•Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models

It seems like for creative text generation tasks, metrics have been shown to be deficient; this even holds for the new model-based metrics. That leaves human evaluation (both intrinsic and extrinsic) as the gold standard for those types of tasks. I wonder if the results from this paper (and other future papers that look automatic CV metrics) will lead reviewers to demand more human evaluation in CV tasks like they do for certain NLP tasks.

report

reply

[ +- ]

nsa@kbin.socialOP

1 point

1 year ago

in machinelearning@kbin.social•Extending Context Window of Large Language Models via Positional Interpolation

hmmm… not sure which model you’re referring to. do you have a paper link?

report

reply

[ +- ]

nsa@kbin.socialOP

1 point

1 year ago

in machinelearning@kbin.social•Extending Context Window of Large Language Models via Positional Interpolation

do you have a link?

report

reply

[ +- ]

nsa@kbin.social

1 point

1 year ago

in machinelearning@kbin.social•[@machinelearning](https://kbin.social/m/machinelearning) am I in the right place? Lol

@Koffindodjer indeed you are!

report

reply

[ +- ]

nsa@kbin.social

1 point

1 year ago

in machinelearning@kbin.social•VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Also reminds me of this ICLR paper: Linearly Mapping from Image to Text Space.

report

reply

modlog legal instances join-lemmy.org

lemmy-ui-next v0.11.0 (github)lemmy v0.19.5 (github)