The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

FatCat@lemmy.world · 4 months ago

The Irony of 'You Wouldn't Download a Car' Making a Comeback in AI Debates

VoterFrog@lemmy.world · edit-2 4 months ago

a much stronger one would be to simply note all of the works with a Creative Commons “No Derivatives” license in the training data, since it is hard to argue that the model checkpoint isn’t derived from the training data.

Not really. First of all, creative commons strictly loosens the copyright restrictions on a work. The strongest license is actually no explicit license i.e. “All Rights Reserved.” No derivatives is already included under full, default, copyright.

Second, derivative has a pretty strict legal definition. It’s not enough to say that the derived work was created using a protected work, or even that the derived work couldn’t exist without the protected work. Some examples: create a word cloud of your favorite book, analyze the tone of news article to help you trade stocks, produce an image containing the most prominent color in every frame of a movie, or create a search index of the words found on all websites on the internet. All of that is absolutely allowed under even the strictest of copyright protections.

Statistical analysis of copyrighted materials, as in training AI, easily clears that same bar.