The rise of AI—its recent rapid development owes a great deal to Big Data—is fueling our growing trust in data and data-based machine learning algorithms. Traditionally, a certain hypothesis—a research question—is posed first. Data play a role either in validating or invalidating the hypothesis. Validated hypotheses form knowledge. Nowadays, it is becoming much more fashionable to simply ask what data show—as if data themselves could replace our thinking. This trend is accelerating.
“Big Data” literally means data that are massively large in volume. It also refers to data that come in exceedingly quickly, in many different formats, and from a variety of sources—such as sensors, online searches, and social media. These three characteristics—volume, velocity, and variety—are referred to as the “three V’s of Big Data.” Not surprisingly, such data are often generated and collected by machines. But this machine-based data collection is not the only component of Big Data. The Big Data supporting technologies allow storing, organizing, searching, retrieving, and analyzing the unprecedented amount of digital data, thereby making it possible to draw correlations and generate insights that were previously inaccessible. Dale Neef, the author of Digital Exhaust: What Everyone Should Know About Big Data, Digitization, and Digitally Driven Innovation (Pearson Education, 2014), summarizes the Big Data phenomenon as “huge amounts of digital data being produced and captured from a variety of sources and new tools to analyze large data sets to extract patterns and correlations that we otherwise could not.”
Recent developments in AI, particularly its machine learning techniques, are one of the most well-known technologies that support Big Data. In turn, Big Data enables AI research to reach a new level. For example, a project at Google Brain that built an AI program that learned, by itself, to discover high- level concepts, such as “cats” and “people’s faces,” required more than 10 million images from YouTube videos and a highly distributed neural network with 1-billion-plus parameters trained on 16,000 CPU cores (“How Many Computers to Identify a Cat? 16,000,” John Markoff, The New York Times, June 25, 2012; nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html). We see Big Data and AI regularly featured in the mass media. The advances of AI, such as AlphaGo’s victory in a Go match with 18-time world champion Lee Sedol and the impressive capability of an autonomous car, are leading many to predict that machines will soon take over more perception- and cognition-related tasks, ones currently performed by humans.
RELYING ON DATA
With such predictions widely circulated and accepted, it is not surprising to see the increasing trust of and reliance on data within our society. David Brooks observed, in his Feb. 4, 2013, New York Times op-ed column, that such a tendency is reaching the point where it is becoming the rising philosophy of the day. He calls it “data-ism.” According to Brooks, data- ism is a belief that “everything that can be measured should be measured; that data is a transparent and reliable lens that allows us to filter out emotionalism and ideology; that data will help us do remarkable things—like foretell the future” (“The Philosophy of Data”; nytimes.com/2013/02/05/opinion/brooks-the-philosophy-of-data.html).
Note that data-ism goes well beyond utilizing data to either validate or invalidate a hypothesis. To dataists, data and algorithms are a superior means to process data and find meaning in them. Information and knowledge are directly extracted from data by algorithms, which are capable of handling today’s immense data flow. Human mediation in this process is no longer necessary. Furthermore, machine-generated information and knowledge are considered to be more transparent and reliable than human thoughts.
Israeli historian Yuval Noah Harari argues that a close examination of this dataist dogma is today’s most urgent political and economic project. He thinks that the notion of organisms being simply data-processing systems (namely, algorithms) is becoming pervasive in our society. In his book, Homo Deus: A Brief History of Tomorrow (Harper, 2017), Harari notes that expanding and facilitating the great data flow has become a new mandate in our times. It is now taking priority over the right of humans to own data and restrict its movement. He writes: “Dataists believe that all good things … depend on the freedom of information” (p. 389). In this dataist worldview, human experiences are only valuable to the degree that they produce data that can con tribute to data flow. Epistemologically, socially, and politically, humans are no longer the source of meaning, knowledge, and authority.
GOING WITH THE DATA FLOW
Are we already living under this dataist worldview? It is hard to deny that our society is moving in the direction of constantly increasing, expanding, and facilitating data flow. More and more of our devices are being connected to the internet and designed to constantly generate and send data back and forth. We participate in a variety of social media platforms on a daily basis, generating and sharing a great deal of data and information. This is all uploaded into the cloud, where it is stored, analyzed, and monetized by those platform providers. Some of the applications that we regularly use, such as a GPS map app on our smartphone, require users to be online and share their location data in order to function at all.
It has become a norm not just for internet search companies such as Google or Bing, but for any type of online businesses, to claim the right to track, collect, analyze, monetize, and sell all kinds of user behavior data on their websites. Doing so gives them a way to monetize such customer data, regardless of what product they sell. Data generation, collection, analysis, and monetization are now taking place almost everywhere we go online. Opting out or retreating from these activities is nearly impossible if we want to be a fully participating member of society.
Any effort to limit and restrict such data generation, collection, analysis, and monetization is met with disdain and dismissal. Dataists use two powerful rhetorical phrases to rebuff these efforts: techno-chauvinism and technological “inevitabilism.” Techno-chauvinism maintains that technology supplies the best solutions for all problems, whether they be social or economic in nature, and always improves people’s lives. Just like techno-chauvinism, technological inevitabilism also blindly accepts the authority of technology. Technological inevitabilism equates any questioning or limiting of technological developments with hindering the prosperity of a society itself. Any proposal to adjust or modify technological innovation to accommodate social values is treated as a backward objection to change. Technological inevitabilists firmly believe that technology cannot and should not be stopped.
Facing people’s objection to today’s uninhibited practices of data collection, analysis, and monetization, techno-chauvinists and technological inevitabilists like to point out that increasing data collection, analysis, and monetization have made it possible for innovative new businesses, such as Uber and Airbnb, to start and thrive and for the economy to grow as a result. They also argue that since today’s economic growth is being fueled by data, it is even more important to continue to increase such data and ensure its free flow. They claim the concern that Big Data is likely to lead to the microscopic control over people and the loss of human agency and authority has little merit. How real is the peril of data-ism?
MANNA FROM HEAVEN OR HELL
A sci-fi story by Marshall Brain, Manna: Two Visions of Humanity’s Future (BYG Publishing, Inc., 2012), describes a relatively realistic scenario of how such control may come to be. Manna is a computer software that was invented to manage a fast-food restaurant. It is initially developed as a store management software that, via a headset, provides restaurant employees with instructions for what to do during their work hours, such as taking out garbage and cleaning up the restrooms. In the story, Manna soon evolves and becomes responsible for scheduling employee hours. Then, in its next versions, it gains the ability to call replacements when employees don’t show up and reinforcements when needed. It also is given the authority to hire and fire employees based upon whether they show up on time, arrive late, and so on. The software also measures many aspects of employees’ job performance and stores the data. As it gets widely adopted beyond the fast-food industry to manage minimum-wage workers, the detailed data about each person’s performance as an employee become available for employers. People with poor records get blacklisted, and soon they find themselves unable to find a job at all. This, in turn, makes it impossible for people with jobs to refuse to work extra hours when Manna asks: Refusal would put them at risk of losing the job.
It would be easy to say this is evil. But how many of us can say that this will never happen? The goal of the Internet of Things (IoT) is to measure nearly everything in our environment and capture the data, so that computer algorithms can manage the environment to be in the optimal state, whether it be the temperature of a building or the timing of traffic lights. The IoT is likely to include many sensors that will also monitor us, so our behavior would be measured and captured as data along the way. Building a store management software, having it deployed across many businesses, and making multiple instances of the software interoperable are all doable already. The real issue is whether it will be socially and politically acceptable for employers to authorize algorithms to monitor workers, instruct them about what to do, evaluate their performance, and make hiring and firing decisions based upon the captured attendance and performance data.
The obvious problem with the world Manna describes is that it does not accommodate people or their social values. Instead, people have to make accommodations for a world in which algorithms demand and enforce a particular behavioral pattern from people based upon data, and social values are dropped to comply with this new demand of technology. To dataists, this would be the ideal state of the world in which IoT is fully implemented. In this type of world where data and algorithms rule over people, there is no second chance, no right to be forgotten, and no sacred values.
Manna may appear to be the story of an unlikely future event. But the worldview behind it closely matches today’s data-ism. Human behavior is no exception in the data-ism mandate to measure as many things as possible and capture and store them as more data to circulate, analyze, and monetize. Once that becomes acceptable, bringing algorithms to data in order for them to automatically make decisions about people is only a short step away. It will be faster, more efficient, and even more objective, dataists would claim. In their vision, it is one more friction removed in the free flow of information. Will our society be able to resist?
ALGORITHMS ARE NOT THE CULPRIT
It is important to understand that data and algorithms are not the culprit here. Data and algorithms do not desire to tell people what to do or covet people’s jobs. It is people who decide that ever more data must be collected to provide a real-time picture of everything in the world to the smallest detail. It is people who decide to replace workers with data and algorithms. And it is people who decide to deploy, implement, and enforce algorithms as a tool to monitor and shape people’s behavior for more profit.
What enables data-ism is not a particular technology, but the socio-economic interests of a specific group of people who will benefit from implementing data-ism society-wide at the cost of others outside that group. For the purpose of data generation and monetization, today’s businesses such as Facebook or Google exploit people’s most natural de sire, such as forming meaningful relationships with others through sharing thoughts, opinions, and recommendations as emails, photos, and videos. In Manna, people who do not perform as the software asks are penalized. But in today’s world, we see data-ism grow and spread through enticement. This is why, in today’s society, there is such lack of social and political will to question and object to data-ism’s reversed priority of data and algorithms over people.
The proper role for data and algorithms is to augment and empower human autonomy and agency, not to undermine and eliminate them. Only with strong social and political will shall we be able to establish and enforce such a role for data and algorithms. We do not have to stop collecting data nor developing more sophisticated and advanced algorithms. But we must find a way to set proper and effective boundaries for them in order to ensure that data and algorithms always lend themselves as tools, not rule as an authority over human behavior. This is easier said than done, but the stakes are too high not to try.
Bohyun Kim is the Associate University Librarian for Library Information Technology at the University of Michigan Library.