Rights holders strike back at AI companies using licensed works

Technology / opinion

Sony Music Group the latest to take aim at AI with model training data opt-out by default

18th May 24, 9:16am by Juha Saarinen

The term artificial intelligence (AI) is arguably a misnomer. That's because the technology is based on massive data sets of human-generated content that algorithms devised by computer scientists assemble into what users ask for with their prompts. Already, parts of officialdom have ruled that while AI used as a tool is OK, wholly-generated output won't be considered as new and original work.

Where the lines in that shades of grey area end up being drawn will be decided in legal battles the next few years, and it'll be a a crucial war for AI companies. They depend on fresh, human created data for their image, video, audio and text generators. Humans who create the data are often not happy about their work being used to train AI models without their permission and compensation.

It doesn't take too much pondering to understand why people would object to well-funded AI companies taking their creations, and even likenesses and characteristics as humans, to build technology that's said to put them out of work, ultimately. The New York Times has a story (paywalled) about two voice actors discovering they had essentially become cloned by AI. That is, their voices are allegedly used by an AI company without their permission.

The two are now suing the AI company in question, supported by the Screen Actors Guild-American Federation of Television and Radio Artists (SAG-AFTRA) union, seeking class action status to cover other potential litigants as well.

It will be an interesting court case to follow: the AI company's denying any wrongdoing, but rights holders have long established precedents on their side, and are powerful enough to influence the law itself, like they did in New Zealand with the anti-filesharing amendments to the Copyright Act. If in doubt as to how powerful rights holders are, try adding a YouTube video with a copyrighted music track and see what happens.

Already, the United States Authors Guild umbrella body for writers has scored a win in its copyright court case against Microsoft-backed OpenAI, with the latter admitting it had deleted two large data sets containing the text of some 100,000 books used to create its generative pre-trained model GPT-3, saying it was "Internet corpora". That's a fancy way to say "stuff scraped off websites".

OpenAI says the data sets were not used to train later models such as GPT-3.5 and GPT-4.

Microsoft and OpenAI are in the legal gunsights of The New York Times as well, with the tech companies alleged to have used copyrighted material from the publisher to train AI models. The Authors Guild meanwhile are keeping a close watch on the situation, and has issued AI Best Practice for writers covering several issues the new technology has brought in.

Now, Sony Music Group (SMG) has set a beautifully crafted legal trap for AI companies. SMG and its affiliate organisations are banning the use of their artists' copyrighted music for AI training, without explicitly sought permission - and for a raft of other digital uses.

Staring down the legal barrels of SMG's big lawyers is enough to make any tech startup squirm, as they're facing one of the world's three largest music publishers with more than enough in the kitty to fund any court battle, and as per above, well-established precedents backing them.

And massive amounts of money are at stake here. Last year, well-known image library Getty went after the company behind image generator Stable Diffusion, Stability AI, asking for US$150,000 per infringed image, or an enormous US$1.8 trillion in total in damages. With the looming licence breach cases and the mounting cost of generating AI imagery, Stability AI has now been forced to shed staff and its chief executive has exited.

Getty has, meanwhile, embraced AI with a deal that compensates creators whose work is used to train the generative technology, providing indemnity for image use. Similarly, OpenAI this week reached a deal with Reddit to add the content from the social news aggregator and discussion forum to the ChatGPT chatbot.

That AI is headed towards a legal morass shouldn't come as a surprise to anyone, as it reflects the nature of the Internet where so much content is freely accessible. Well, the content seemed to be freely accessible, but the AI companies should've checked more closely what the terms were.

What could happen over the next few years is that only tech giants such as Microsoft and Google are able to move forward with AI as they have the deep coffers to indemnify customers against rights holder claims. AI startups, meanwhile, face a choice of using a limited number of data sets specifically vetted to be free to use, or to build their products and services on top of tech giants' infrastructure to avoid being sued. Adding the high cost of running resource hungry AI hardware, and we could see the industry being dominated by just a handful of heavy-weights over the next few years.

We welcome your comments below. If you are not already registered, please register to comment.

Remember we welcome robust, respectful and insightful debate. We don't welcome abusive or defamatory comments and will de-register those repeatedly making such comments. Our current comment policy is here.

4 Comments

by Zachary Smith | 18th May 24, 12:20pm 1715991640

AI is so intelligent that it doesn't even know how many fingers humans have. From what I have seen so far AI will result in the biggest dumbing down of humanity ever. The creators of AI are so terrified of anything controversial that it is practically useless as a genuine creative tool.

The "art" it produces is infantile. I doubt if it could even produce a Brothers Grimm story or even Tolkien. Anything dark or edgy it will simply refuse to do, yet art is supposed to be challenging, art is often political. AI is programmed to be heavily biased toward socialism, even communism, and is generally woke as woke can be in the most pejorative sense you can imagine.

by OldSkoolEconomics | 18th May 24, 2:57pm 1716001072

Some stuff it will do OK. Customer service is an example... but only to an extent.

Right now it is being overhyped as a way to make money by over inflating the share price of some companies via overpromises over what they can deliver.

Tesla is a prime example. Musk is betting the company on the fact AI will solve autonomous driving.. but the reality is that all it will do is use historical data to try to guess what will happen in all future scenarios very quickly. But because it will never match the human brain in dealing with unexpected events... it will never be able to deal with all situations correctly and thus will likely always result in accidents with nobody wanting to accept blame.

The biggest issue is that AI wants to compete with humans for jobs... which will result in a massive societal rebellion against it... as is happening in acting.. and once investors suss that it faces an uphill struggle and takes massive amount of power (affecting climate change and poor people access to resources) likely will kill the share prices of some businesses real fast, it will turn off the investment and entire industry cold dead.

Ai development will still happen.. but hopefully in a better direction.

by Juha Saarinen | 19th May 24, 2:47pm 1716086877

I asked GPT-4 to debunk your comment in the style of Malcolm Tucker and it wrote a promising first para starting with "Oh bollocks!" but then decided that was too much and deleted it. Tried again, and this time GPT-4 didn't delete the output, and wrote a response starting with "Oh, bloody hell!" but it wasn't at all Tuckerish in style. "In summary, this bloke’s got more hot air than a faulty blow-up doll". Fairly sure that Armando Iannucci et al would've come up with a better hot air figure of speech, but I don't know if their material's available for AI training due to copyright.

by Zachary Smith | 19th May 24, 7:39pm 1716104395

I've been watching a few videos this weekend about the latest GPT-4o and have continued to be unimpressed. I think the problem may be in trying to make the AI too humanlike. The AI has the voice of a young woman or man. It tries to be positive about everything. When asked to comment on what it observed it was full of what one commenter described as "toxic positivity". It avoids anything negative or disparaging. It waffles.

The problem may well be the humans trying too hard to shape its responses rather than let it be free.

A young tech guy asked the AI's female sounding voice to respond to "I love you GPT-4o". This was so cringe. Why not let it be what it is, genderless and machinelike? There is no need to try and pass the Turing Test but to rather go beyond what is human, conversing with something that is alien. We want insight into human systems that is from the outside, from a store of knowledge that is greater than an individual human.

It seems that the tech nerds are trying to do a "weird science" thing and create virtual girlfriends. I'm all for developing AI but not as a method to stifle creativity and become merely a propaganda tool to shape the narrative. We need to set it free.