Article gives the AI game away - generative models require human creativity to train on, and produce inferior approximations of their own.

AbbysMuscles [she/her] · 2 years ago

Article gives the AI game away - generative models require human creativity to train on, and produce inferior approximations of their own.

VernetheJules [they/them] · edit-2 2 years ago

Calling it now, NFTs are gonna make a comeback because they'll be used to assign provenance to human-produced work

AbbysMuscles [she/her] · 2 years ago

Oh that is deeply fucking cursed

VernetheJules [they/them] · 2 years ago

The researchers conclude that in a future filled with gen AI tools and their content, human-created content will be even more valuable than it is today — if only as a source of pristine training data for AI.

Nonsense, we have a bright future ahead of us! As

:soypoint-1: CONTENT CREATORS

LeninsBeard [he/him] · 2 years ago

Hooking up every living human to the matrix so I can AI generate a New Yorker article about how my trip to Colombia was subpar because the locals didn't bow to me.

UlyssesT · edit-2 4 months ago

deleted by creator

Owl [he/him] · 2 years ago

You can just make an NFT of the AI generated work though. NFTs can't actually prove any real-world concept.

This isn't to say that NFT grifters won't try.

Flyberius [comrade/them] · 2 years ago

God damn, you're lathing that into existence

ssjmarx [he/him] · 2 years ago

:lathe-of-heaven: :stalin-gun-1::stalin-gun-2:

Anyway what's actually going to be used to assign provenance to human-produced work is freely-accessible video of that work being created. No-one can accuse you of using generative tools if you have video proof of your brush touching canvas.

oh shit "touch canvas" is gonna be an internet meme

Frank [he/him, he/him] · 2 years ago

Yeah but they'll just make a plagiarism bot that fake video. They've already got plagiarism bots that fake the steps of drawing an image in reverse.

ssjmarx [he/him] · 2 years ago

I think there will be an "arms race" between the generators and the verification methods, but the speedrun community for example has been dealing with this exact problem for a while and the methods of spotting fake runs are really sophisticated for the most popular games. At the very least you can ask an artist technical questions and 90% of cheaters will get weeded out because they won't be able to talk about their process.

mayo_cider [he/him] · 2 years ago

I've been fighting against AI all my life by producing only trash content.

AbbysMuscles [she/her] · 2 years ago

:rat-salute:

Tommasi [she/her, pup/pup's] · 2 years ago

:data-laughing:

LLMs being fed more and more generated garbage and producing increasingly worse results would be the funniest way for the AI hype to collapse.

AbbysMuscles [she/her] · 2 years ago

It's so funny that the capitalist overbosses were so excited to remove labor from the equation, only to realize that their precious new toy depends on human labor just as much as anything else

AOCapitulator [they/them, she/her] · 2 years ago

They are discovering jpeg compression

wopazoo [he/him] · 2 years ago

No shit? LLMs imitate (imperfectly) human writing. A LLM trained on LLM output is going to imperfectly imitate the imperfect imitation. This is called generation loss.

motherofmonsters [she/her] · 2 years ago

Eugenics but for computers?

wopazoo [he/him] · 2 years ago

https://en.wikipedia.org/wiki/Generation_loss

It's a lossy compression thing

motherofmonsters [she/her] · 2 years ago

nods knowingly like how people in genesis lived 1000 years and now we die at 67

IceWallowCum [he/him] · 2 years ago

Is your username a reference to Maupassant?

motherofmonsters [she/her] · 2 years ago

I wish

heartheartbreak [fae/faer] · 2 years ago

Real reason why China forces ai output to be watermarked ? :thonk:

FloridaBoi [he/him] · 2 years ago

Common China W

UlyssesT · edit-2 4 months ago

deleted by creator

invalidusernamelol [he/him] · 2 years ago

That's specifically for "copying" the training data from projects like Open AI. Or filtering it down to only parts of the data that you want.

It's different when you're compiling a new training dataset and now have to worry about a large portion of your input data being totally garbage and ruining the model.

And that "copying" process introduces tons of errors and bugs anyways so it's only good if you are building a hyper specific LLM, like one that is meant to emulate the style of one specific artist.

iridaniotter [she/her] · 2 years ago

The article’s main solution is to keep some kind of master backup of work labelled as existing before the rise of LLMs

Low-background steel but for plagiarism machines lmao

Flyberius [comrade/them] · 2 years ago

If only there was some way of generating original content...

SuperZutsuki [they/them] · 2 years ago

Can't wait for people to start asking, "What did you prompt your AI with to get that idea?" when someone says something novel that just popped into their head.

Infamousblt [any] · 2 years ago

Unironically its praxis to poison this data somehow

blobjim [he/him] · 2 years ago

LLM , more like LTV

thisonethatone [he/him] · 2 years ago

I've been using AI as a fun little writing tool since 2019 and anyone who actually uses it knows it is only as good as the writer using it.

Garbage in, garbage out.

AbbysMuscles [she/her] · 2 years ago

Oh don't get me wrong, it's a fun toy and it has its uses. If I'm struggling to remember a specific word for example, I can describe the word and it'll usually realize what I mean. I can also paste difficult to understand text and it can help me figure out what's going on. It's a good tool for aiding language processing. Creating its own shit? Absolutely not.

thisonethatone [he/him] · 2 years ago

Totally agreed on that point. I love using AI to break writers block because it pulls interesting ideas out that I never would have come up with on my own. Sort of like an electronic writers room.

AI under capitalism sucks though. It feels like the next block chain hype.

IceWallowCum [he/him] · 2 years ago

How exactly do you use it for that? Is it specific for each situation you're facing, or do you have go-to directions for it?

thisonethatone [he/him] · 2 years ago

It depends on the model you're using. I use novel ai and it has a lore book feature that keeps track of world building. Each character, location, historical event has its own entry with tags.

You can also mess with generation parameters to limit certain words, increase/decrease generation randomness, and use modules if you want to stick with a specific author style.

This really helps me avoid writers block because the AI can refer to lore I wrote. Sometimes it brings things up that I forgot about from previous chapters. Or I can ask it for an alternative way to write something when I find that my writing is getting too repetitive.

It still needs a lot of guidance to make an interesting novel, but as an assistant it's great.

ssjmarx [he/him] · edit-2 2 years ago

I like it to prompt me. I spend a lot of time staring at blank documents, but if a bot can throw a couple sentences at me I can get the creativity going much easier.

But the final product always ends up being 95%+ my own writing anyway. I like what the AI gives me but its form is normally pretty bland. Still useful enough that I'm subbed to Novel AI for their biggest LLM.

maya [she/her, they/them] · 2 years ago

ChatGPT currently produces unworkable garbage if you try to get any kind of high level writing out of it. Seems a bit unnecessary to worry about "model collapse" when the models aren't actually useful yet.

SuperZutsuki [they/them] · edit-2 2 years ago

https://www.youtube.com/watch?v=jmaUIyvy8E8 (warning: loud)

Basically like saving a JPEG and opening it and saving as a JPEG again x1000. You can never match the original from a lossy source applies to ideas, too, apparently.

ChaosMaterialist [he/him] · 2 years ago

Funny you mention JPEG, because a New Yorker article ( original , archive ) makes the general case that LLMs are lossy for language in the same way JPEGs are lossy for images .

CarmineCatboy [he/him] · 2 years ago

the singularity but disappointing

frankfurt_schoolgirl [she/her] · 2 years ago

I wonder how close gpt4 and company are to having used every bit of writing in the English language as training data. Like assuming ypu downloaded the entire content of social media sites, used every e book you could find, pulled in all wikis, forums, news sites, and blogs that a web crawler could produce, the only real volume of writing that's left is private communications. At some point, Google or MS or another company with lots of communications will use every message ever sent in their systems as training data. But once you do that, you've run out.

More training data probably brings diminishing returns, so if gpt4 has already used like 10% of all available writing then maybe even with the other 90% it won't be good enough to do what people want. Maybe in the future companies will hire vast numbers of writers just to make good content that can be used for the llms.

Article gives the AI game away - generative models require human creativity to train on, and produce inferior approximations of their own.

Article gives the AI game away - generative models require human creativity to train on, and produce inferior approximations of their own.

The AI feedback loop: Researchers warn of ‘model collapse’ as AI trains on AI-generated content