Technology

37360 readers

219 users here now

Rumors, happenings, and innovations in the technology sphere. If it's technological news or discussion of technology, it probably belongs here.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago

MODERATORS

[email protected]

AI models feeding on AI data will lead to 'model collapse', researchers say (web.archive.org)

submitted 1 year ago by [email protected] to c/[email protected]

3 comments fedilink hide all child comments

Using model-generated content in training causes irreversible defects, a team of researchers says. "The tails of the original content distribution disappears," writes co-author Ross Anderson from the University of Cambridge in a blog post. "Within a few generations, text becomes garbage, as Gaussian distributions converge and may even become delta functions."

Here's is the study: http://web.archive.org/web/20230614184632/https://arxiv.org/abs/2305.17493

top 3 comments

sorted by: hot top controversial new old

[–] [email protected] 0 points 1 year ago (1 children)

This isn't an actual problem. Can you train on post-ChatGPT internet text? No, but you can train on the pre-ChatGPT common crawls, the millions of conversations people have with the models and on audio, video and images. As we improve training techniques and model architectures, we will need even less of this data to train even more performant models.

[–] [email protected] 0 points 1 year ago (1 children)

But then you're training on more and more outdated data

[–] [email protected] 1 points 1 year ago

Afaik, there are already solution to that.

You first train the data on the outdated but correct data, to establish the correct "thought" patterns.

And then you can train the ai on the fresh but flawed data, without tripping about the mistakes.