LEGAL ISSUES
The State of AI Litigation
by Tom Gaylord
Perhaps the hardest thing about writing a column that is going to be published 2 months later is trying to account for the potential changes that will happen in between writing and publishing. With that in mind, let us chat a little bit about what is going on with the litigation surrounding AI.
What do comedian-actor Sarah Silverman and The New York Times have in common? They, with others, are suing AI companies. Why? Well, the reasons are varied. OpenAI is a tech company whose best-known product is ChatGPT, a large language model chatbot that can more or less learn how to respond to prompts via generative AI. But we need not go into all of the details, because this publication has been covering AI extensively (see, for example, six articles from the January/February issue of Information Today).
I have a feeling that over the term of my tenure in this space, we will have a lot of opportunities to talk about AI, so let us take it back to 2010, when Thomson Reuters introduced WestlawNext. This was a significant step in AI, as it utilized a machine-learning legal research database with a universal search bar that acted on the Google model, with which many users were already familiar. Serendipitously, I happened to run an initial test of WestlawNext that is a good representation of how it has “learned” since its early days.
I am a huge fan of the Beatles, even though they broke up 18 months before I was born (I feel it necessary to point that out). When WestlawNext debuted, I ran a simple case-law search on the Beatles—just “Beatles” with no other search terms. In my initial 10 results, there were two pretty significant outliers: one of them was International Shoe Co. v. State of Washington et al., a 1945 Supreme Court case, and the other was World-Wide Volkswagen Corp. et al. v. Woodson, district judge of Creek County, Oklahoma, et al., another Supreme Court case, this one from 1980. Some readers might recognize these as two of the court’s most important precedents regarding state long-arm jurisdiction over foreign defendants.
So, what were these cases doing near the top of the results for a search on the Beatles? Well, WestlawNext’s algorithm had been taught to include similar spellings, so “Beatles” also picked up “Beetles” on its own, which led to including World-Wide Volkswagen Corp. Even this early in its public life, it also “knew” that if World-Wide Volkswagen Corp. was near the top of a results list, then International Shoe Co. should be too, because they are almost never cited without each other. Neither case shows up now when you search for Beatles with an “a”; it knows you’re searching for cases that involve John, Paul, George, and Ringo. But I always use this story as a good illustration for students of how machine learning works, even at a rather rudimentary level.
Which brings us to the present. Tools that incorporate generative AI are well beyond where WestlawNext’s 2010 machine-learning algorithm was. Because with the use of large language models, generative AI has content to pull together that goes beyond what a user can type into a search box, even with multiple prompts.
THE AI-RELATED LAWSUITS
So, back to the lawsuits. In a way, some of them are similar to the Authors Guild suit against Google, whose book-scanning project generated “snippets” within books, allowing them to be “indexed” on the internet. Many of these books were still under copyright. Even though Google quite literally copied the totality of these copyrighted works, the U.S. Court of Appeals for the Second Circuit held that the copying was a fair use under U.S. copyright law.
For instance, The New York Times filed a copyright infringement lawsuit against OpenAI and Microsoft in December 2023. The newspaper alleges that ChatGPT was “trained” by ingesting (or “copying”) millions of Times articles. Thus, at least with regard to wholesale copying of works under copyright, why might the Times win here, when the Authors Guild lost against Google?
A main difference between the two cases is that the Second Circuit upheld Google’s fair use argument, in part, because while Google is surely a profit-generating corporation, the purpose of the copying at issue was not to compete with the authors, either directly or indirectly. Rather, it was to create a tool for research that only displayed brief snippets of copyrighted works: enough for a researcher to identify a work as one relevant to their endeavors, but not enough to compete with the copyrighted work itself. In the case of the Times, however, the complaint alleges that chatbots that have ingested copyrighted Times articles are now being used to generate their own news content. Thus, this allegation potentially takes the case out of fair use territory because the chatbots are directly competing with the Times itself.
This is similar to, although still a little different from, complaints filed by the likes of Sarah Silverman and author Jonathan Franzen. Those complaints make both an Authors Guild-style argument (i.e., the chatbots ingested whole, copyrighted works) and a Times-style one (namely, that by ingesting their works, any resulting output generated by the chatbot necessarily violates their copyright). However, these litigants might stand on shakier ground than The New York Times. As the judge in Silverman’s case stated, “When I make a query of Llama [Meta’s large language model], I’m not asking for a copy of Sarah Silverman’s book—I’m not even asking for an excerpt. …” The judge thus dismissed some of the claims against Meta and granted Silverman and other plaintiffs a chance to amend their complaint. |