people training foundation models will usually drop items that score high on that
Not if they're freaks. And it isn't as though there's a shortage of that in the tech world.
People who make foundation models do NOT want to have it get out that they may have had that in their training sets and tend to aggressively scrub NSFW content of all kinds.
I doubt the bulk of them give it serious thought. The marketing and analytics guys, sure. But when that department is overrun with 4chan-pilled Musklovites, even that's a gamble. Maybe someone in Congress will blow a gasket and drop the hammer on this sooner or later, like Mark Foley did back in the '00s under Bush. But... idk, it feels like all the tech giants are untouchable now anyway.
they are training it in themselves with a smaller dataset which sounds like what might have happened here
Also definitely possible, which would get us much closer to a "this is obviously criminal misconduct because of how you obtained the dataset" rather than "this is obviously morally repugnant while living in a legal gray area".
I mean, I've seen what OpenAI did for their most recent models, they spent about half of the development time cleaning the dataset, and yes, a large part of their reason for that is that they want to present themselves as "the safe AI company" so they can hopefully get their competitors regulated to death. Stable Diffusion openly discloses what they use for their dataset and what threshold they use for NSFW filtering, and for SD2.0 they ended up using an extremely aggressive filter (filtering everything with a score above 0.1 where most would use something like 0.98 -- for example, here is an example of an absolutely scandalous photo that would be filtered under that threshold, taken from a dataset I assembled) that it actually made the model significantly worse at rendering humans in general and they had to train it for longer to fix it. Like... all evidence available points to them being, if anything, excessively paranoid.
Not if they're freaks. And it isn't as though there's a shortage of that in the tech world.
I doubt the bulk of them give it serious thought. The marketing and analytics guys, sure. But when that department is overrun with 4chan-pilled Musklovites, even that's a gamble. Maybe someone in Congress will blow a gasket and drop the hammer on this sooner or later, like Mark Foley did back in the '00s under Bush. But... idk, it feels like all the tech giants are untouchable now anyway.
Also definitely possible, which would get us much closer to a "this is obviously criminal misconduct because of how you obtained the dataset" rather than "this is obviously morally repugnant while living in a legal gray area".
I mean, I've seen what OpenAI did for their most recent models, they spent about half of the development time cleaning the dataset, and yes, a large part of their reason for that is that they want to present themselves as "the safe AI company" so they can hopefully get their competitors regulated to death. Stable Diffusion openly discloses what they use for their dataset and what threshold they use for NSFW filtering, and for SD2.0 they ended up using an extremely aggressive filter (filtering everything with a score above 0.1 where most would use something like 0.98 -- for example, here is an example of an absolutely scandalous photo that would be filtered under that threshold, taken from a dataset I assembled) that it actually made the model significantly worse at rendering humans in general and they had to train it for longer to fix it. Like... all evidence available points to them being, if anything, excessively paranoid.