Opinions differ about how big is big when it comes to data. The phrase “Big Data” is rarely quantified. It is generally accepted that if the data requires a computer to analyze it, as it’s beyond human capability to do so by hand, then it’s Big Data. A relative term, what is Big Data in some contexts might not be Big Data in others. How does Big Data relate to the information professional? Do we agree that more data leads to better insights, that combining information from multiple sources creates better decision making? If so, then do we play a leading role in a Big Data environment?
Big Data promises to revolutionize our world, although concerns about privacy, anonymity, and interpretation trouble information professionals. On the one hand, we deal with Big Data all the time. Traditional bibliographic databases are big—but they don’t usually figure into the discussion. Anyone who has searched the web using Google, Bing, or other web search engines has experienced receiving millions of results. Surely, that’s a form of Big Data. However, scholars are more likely to consider large datasets generated from government sources, scientific research projects, or archival materials as Big Data.
Information professionals see millions of results as an opportunity to employ narrowing techniques in the search strategy. We use fields, limits, and thesauri terms to tout our search expertise. Yet sometimes our finding tools let us down. Vocabulary may not mean what we think it does. A glitch in field structure may induce errors. Overly architected databases can be so intimidating that the abundance of fields and limiting options actually obstructs good searching. And the unstructured data that increasingly dominates the Big Data world does not lend itself to our notions of narrowing results.
Big Data is likely to get bigger. The Independent reported that scientists in Switzerland have discovered a methodology to store an immense amount of information, some 300,000 terabytes, in a single DNA molecule (independent.co.uk/news/science/single-dna-molecule-could-store-information-for-a-million-years-following-scientific-breakthrough-10459560.html). The cost, however, is prohibitive. Molecule-stored data will require entirely new research tools and techniques. Its promise for libraries, archives, and museums is enormous, assuming the cost comes down.
Inserting information professional ethics, values, and procedures into Big Data necessitates getting outside our comfort zone. Machine analysis should be supplemented by human oversight. It’s humans who should ensure that causation and correlation are not confused. It’s humans who can recognize anomalies and their importance. It’s humans who can judge relevance. It’s humans who should oversee data sources to ensure that bias is not introduced by ignoring a relevant dataset.
Information professionals should move from emphasizing the somewhat passive location of relevant information to the more active creation of new information. Big Data opens up new spheres of influence for information professionals. It can create new roles and responsibilities if we are primed to take advantage of them. Big Data offers both promise and perils. Information professionals are poised to exploit the promise and eliminate the perils. All we need is courage.