DATABASE REVIEW
Thoughts on ChatGPT
by Mick O'Leary
SYNOPSIS
ChatGPT, an advanced AI product from OpenAI, breaks new ground with its remarkable analytical abilities. By generating novel, information-rich content, it diminishes the role of the original content sources—books, articles, webpages, etc.—thus redefining today’s models for information retrieval and usage.
|
The history of online searching has been marked by increasing power on the part of the systems and decreasing control on the part of the searchers.
COMMAND-DRIVEN
The first online search systems that emerged in the late 1960s and early 1970s were an astonishing, printing-press level advance over print indexes and catalogs. The advantages of doing an online literature search in Medline, compared to a manual search in Index Medicus, were revolutionary. Then, in the 1980s, the appearance of full-text databases completed the miraculous loop of instant information retrieval.
These systems are command-driven: Searchers employ a complex query system that uses Boolean and proximity operators, descriptors, truncation, and other arcana. Searchers must also understand the makeup of the databases themselves: content, record metadata, controlled vocabularies, etc. Command systems are completely transparent: The searcher knows how the system executes the commands and why the search results appear.
NATURAL LANGUAGE
The appearance of natural language search systems in the 1990s was another revolution. These systems use AI to match appropriate database records to a few words submitted by the searcher. Their remarkable simplicity and effectiveness—along with the emergence of the web—put the wonders of online searching into the hands of almost anyone.
The actual process of a natural language search itself, however, is a black box. It’s impossible for the searcher to know exactly what steps the system is taking. The searcher has to trust the system and is happy if the results are satisfactory.
MACHINE LEARNING
Worldwide interest in AI search systems exploded in Fall 2022 with the release of OpenAI’s ChatGPT. Machine learning systems had been developing for years, but the extraordinary capabilities of ChatGPT were unprecedented. It promised to be the long-sought Holy Grail of information-seeking: an Answer Machine that returns not just a few relevant documents, but instead, a customized response that is specific, relevant, coherent, and accurate.
ChatGPT and its peer products complete the transition of search control from the searcher to the system. Even their makers don’t claim to understand how they work. We don’t fully trust them, but they’re here to stay.
COMPOSITIONAL WIZARDRY
To one who has studied the progress of online searching for decades, ChatGPT is mind-boggling. Its ability to gather, organize, and present complex information is wizardry. The “G” in GPT stands for “generative,” because it generates novel content, which is, arguably, an entirely new medium.
I conducted hundreds of searches on ChatGPT Plus, using GPT-3.5 and GPT-4, to examine their recall, accuracy, and presentation. Most of my searches were reference-type queries, rather than creative prompts (AI lingo for search queries) or shopping/travel topics. My most intriguing test was to investigate “election deniers.” In response to this simple prompt, GPT-4 answered admirably. It concisely described the issue, presented its key aspects, and ended with a short summary. The response was not only accurate, but was also exemplary expository writing, with perfect grammar and spelling, clear sentence structure, and good organization. The writing style was formal and impersonal—very similar to technical journal article writing.
But the plain “election deniers” prompt was just the warm-up. I continued using it, but also asked that the reply be written in different rhetorical styles, including sarcasm, irony, cynicism, satire, and humor. I figured that these prompts would be profoundly more difficult—gathering up a few facts is one thing, but writing that applied each of these subtly differing styles would require analysis on a whole new level.
This is where GPT-4 went from amazing to eerie. It nailed each one, with clever, literate essays, each of which deftly applied the requested style. GPT-4’s deeply informed comprehension of language was stunning. As I often asked myself throughout my tests, “How does it do this?”
EMERGENT ABILITIES AND AGI
In the expert AI research community, there is an energetic debate about whether these systems are simply superb pattern detectors—autocomplete on steroids—or whether they can, on their own, develop unintended cognitive capabilities. There is presently no answer to this because the differences between superb pattern matching and some sort of added-value comprehension are murky and ill-defined.
The notion of emergent abilities is related to another Holy Grail: artificial general intelligence, or AGI. OpenAI’s mission statement defines AGI as “AI systems that are generally smarter than humans.” Niche AI applications, such as robot chess players, are already demonstrably “smarter” than humans, in the narrow sense that they can beat us in a game. However, AGI doubters argue that AI systems can never attain the level of the complete human cognitive capacity.
ALIGNMENT RISK
In mid-2023, large numbers of prominent AI experts called for the regulation or curtailment of AI development, on the chance that, otherwise, existentially catastrophic outcomes might occur. That AI systems won’t do what we want or intend them to do is defined as “alignment risk.” Alignment risk has a long cultural history, at least from 1818’s Frankenstein and up to 2001: A Space Odyssey and the Terminator movies.
Alignment risk is significantly a result of anthropomorphism. Humans, including the developers of AI systems, expect, often unwittingly, that these alien entities will act just like we do. (I keep doing this with my cats, but it never works.) Humans generally conduct themselves with shared expectations of how things should and do work, but AI neither knows nor cares. The classic thought experiment of “inhuman” AI is the “Paper Clip Maximizer,” a seemingly innocuous AI system that independently takes unforeseen and unwanted directions.
A SWIFT ASCENT
ChatGPT was introduced on Nov. 30, 2022. It used GPT-3, worked from a training corpus that ended in September 2021, and did not search the live web. Nevertheless, it went viral as the fastest-growing app in history, with 100 million users in its first 2 months (this has since been surpassed by Threads, Meta’s X, formerly Twitter, competitor). ChatGPT’s unprecedented ascent was quickly followed by watershed announcements from other AI developers:
- Jan. 31, 2023—ChatGPT signs up 100 million users in 2 months.
- Feb. 1, 2023—OpenAI launches ChatGPT Plus, with improved access, at $20 per month.
- Feb. 24, 2023—Meta releases LLaMA, an open source large language model.
- March 14, 2023—OpenAI launches GPT-4, which is much more powerful than GPT-3.
- March 23, 2023—OpenAI announces plugins for ChatGPT Plus that enable real-time web search.
- April 25, 2023—Hugging Face releases HuggingChat, an open source, ChatGPT-like chatbot.
- May 4, 2023—Microsoft’s chatbot, Bing, is available to all Microsoft account holders.
- May 10, 2023—Google’s chatbot, Bard, is available in general release.
- May 25, 2023—Google opens access to Search Generative Experience (SGE), an AI supplement to Google search.
Those 6 months are one of the greatest periods of technological innovation in human history. AI power that didn’t exist even a year ago is now in the hands of everyone who has access to the web.
THIRD-GEN SEARCHING IS HERE
All of this completes the transition to third-generation, generative online searching: powerful, effective, and accessible machine learning search. It further weakens the searcher’s control and understanding of the search process. In previous generations, the searcher would retrieve “content sources”—books, articles, web documents, etc. The task of extracting information and meaning from these content sources was then in the hands of the end user. ChatGPT bludgeons the notion of an identified and verifiable content source. Books and their ilk are now pass-through stages toward the Answer Machine. The black box is bigger and darker.
Anyone in an information-driven profession must follow these developments closely and become adept with those that will directly affect their work, because ChatGPT-grade AI search will continue its torrid spread. There are daunting concerns: fears about malevolent apps, concerns over access and equity, civil rights erosion from surveillance and monitoring by governments and businesses, and terrifying military outcomes. However, all of these concerns will be solved—or ignored. AI promises god-like wealth and power to those who own, control, and use it. Stay tuned—or just ask ChatGPT. |