FEATURE
Metadata Literacy: What Every Librarian Needs to Know About the Changing Landscape
by Tom Adamich
“Metadata
liberates us, liberates knowledge.”
—David Weinberger
American technologist and author,
The Cluetrain Manifesto |
As libraries continue to explore 21st-century approaches to information delivery and library materials access, metadata literacy—in the context of data quality, management, and utilization—ranks high on the list of important continuing-education topics for librarians. Frequently raised questions include “What is data integration?” and “How can the ETL (extract, translate, load) protocol benefit my library and its users?” In order to help answer some of these timely metadata literacy questions, I have compiled a list of questions and answers that cover several key metadata concepts that are either part of today’s library metadata protocols or emerging as important metadata concepts. What Are Web Data Stewardship and Data Integration?
While the traditional library catalog of the past did not provide much information on the location of materials in other libraries or in formats other than print, the latest versions of integrated library systems and discovery layers do provide much of this functionality. OCLC’s WorldCat and WorldShare as well as Ex Libris’ Alma and Primo are good examples. However, the challenge is that many library resources are still described using legacy conventions (such as MARC standard metadata) and are still housed in legacy systems using outdated file formats. Library materials metadata is embedded in natural language or primitive graphical presentations, according to Ismail Khalil Ibrahim and Wieland Schwinger in their article, “Data Integration in Digital Libraries: Approaches and Challenges.”
To solve the issue of disparate metadata in disparate systems, companies such as Talend and Smartlogic—in addition to the integrated library platforms mentioned previously—have developed data integration solutions that use data modeling structures and semantic architectural protocols such as ETL to identify appropriate resources, including their location and access information. For example, if a university researcher requires a PDF of a research article and the system knows that the library has access to a partner institution’s scholarly repository, the data integration solution will parse the various metadata components of the query (in this case, article title, journal title, date, location, access rights, etc.) and send that information to the researcher for review.
While the results provided by data integration systems have some limitations, the opportunity to obtain quick access to materials previously available only through traditional library interlibrary loan (or similar access) is greatly enhanced.
How Can the ETL Protocol Benefit My Library and Its Users?
As a segue from the data integration concepts discussed earlier, one of the best ways to understand how to use metadata to promote more effective and modular system integration is to look at the role the ETL protocol plays in that process.
Extract is the process of reading data from a database. Transform is the process of converting extracted data from its previous form into the form that it needs to be in so it can be placed in another database (think MARC-to-Dublin Core Metadata Crosswalk as an example). Finally, load is the process of writing data into the target database.
While the ETL concept may seem relatively basic, the ability for the process to function in a web environment requires some understanding of data warehousing and semantic architecture. Since it involves moving data from one location and making it available for use in other locations, the idea of enabling its transformation into a desired format is key to successful utilization. For web access purposes, transforming the metadata into linked open data (LOD)—for example, converting catalog data that has been LOD-optimized into BIBFRAME format—enables library materials information to be more accessible and promotes better understanding of a resource’s contents and its uses.
Why Is SQL Scripting Important in Managing Library Metadata?
Following the line of thinking associated with the ETL concept and warehousing metadata in a data warehouse for a variety of uses, the importance of SQL scripting—and its various forms—is brought to the forefront of must-know metadata literacy concepts. The ability to locate, manipulate, and move metadata to perform various functions is key to both operation and analysis for most current library repository and discovery systems.
For example, both the Evergreen and Koha open source ILSs rely on PostgreSQL scripting for major functions, including managing bibliographic and item metadata. Additionally, knowing how to develop and execute SQL queries at various OS access points (i.e., Linux command line, BASH script, Shell script) gives librarians a better understanding of the questions they are trying to answer with the data as well as a concept of where the source data is located and how it flows.
What Are Some 21st-Century Metadata Standards I Should Learn More About?
While most librarians are familiar with MARC and Dublin Core metadata standards for describing library materials—at least in traditional library terms—many may not be familiar with more specialized metadata standards that are becoming more widely used to satisfy the current needs of the populations using them.
For example, the biology and environmental sciences communities have discovered the benefits of using Darwin Core for describing research-related materials and outputs. Darwin Core helps to facilitate the sharing of information about biological diversity by providing identifiers, labels, and definitions. The Darwin Core schema is primarily based on taxa—groups of organisms as commonly recognized by biologists—with associated metadata for a taxon’s occurrence in nature as documented by observations, including the location and conditions under which a particular specimen was collected.
Libraries charged with maintaining traditional and digital collections of art, architecture, and cultural works may wish to encourage library professionals to expand their understanding of the Categories for the Description of Works of Art (CDWA) standard. Developed by the J. Paul Getty Trust and the College Art Association, CDWA offers best practices for cataloging that include data mapping from existing metadata standards as well as connections to accepted thesauri in the art area, such as the Art and Architecture Thesaurus (AAT) and the Cultural Objects Name Authority (CONA).
For digital images, another art-related metadata standard is the International Image Interoperability Framework (IIIF). Touted as a semantic-based metadata standard for digitized physical objects, the IIIF uses LOD concepts to support a rich online viewing environment. Using elements of the Shared Canvas data model, this JSON-based design is purported to be easily implemented even if adopters don’t understand semantic resource-description framework (RDF) protocol, although it is heavily based on it.
In the area of cultural heritage projects, the CIDOC-CRM model enables various components associated with cultural heritage presentations and resources to be linked together and presented in concert. Similar to the IIIF, the CIDOC-CRM has made efforts to incorporate semantic-based LOD into its associated metadata recommendations. While not a true metadata standard, the CIDOC-CRM encourages use of related cultural object standards in its utilization, including the aforementioned CDWA.
The EBUCore metadata standard not only describes media assets, such as audio and video recordings, it also provides a platform for automating media production and delivery processes via various add-ons, including EBUCore RDF and the Class Conceptual Data Model (CCDM). XML classification schema connections include the ability to transform EBUCore protocols into RDF SKOS. It also matches the Framework for Interoperable Media Service (FIMS), which can be made to connect media asset metadata to semantic-based media production and delivery systems.
What Are Some XML Metadata Formats I Should Consider Learning More About?
While most librarians and library professionals have heard of XML, many have not been exposed to the various XML standards tag suites (STS) maintained by NISO. These multiple formats—including DTD (Document Type Definition), XSD (XML Schema Definition), and RNG (RelaxNG)—offer users more flexibility when working with applications or systems that may use particular XML format elements and attributes in their design.
For instance, groups using a DTD framework can apply certain XML elements/attributes universally across several web markup platforms, including XML, SGML (Standard Generalized Markup Language), and HTML. This template-development process encourages interoperability for web markup metadata, similar to the ETL and metadata standard crosswalk discussions presented earlier in this article. The end result is more effective metadata management across web platforms and web-based systems.
How Can Library Materials Metadata Facilitate AI?
Because libraries have been creating and maintaining various types of materials metadata for decades, learning more about and understanding the processes and components associated with AI might provide a glimpse of how libraries and library collections could be included in AI designs.
According to Jelani Harper in “2019 Trends in Metadata Management: Intelligent Applications for Operational Success,” AI’s mapping, shallow learning, predictive analysis, anomaly detection, and edge deployments all require quality metadata to function properly. Think about the operation of the typical library ILS and the cataloging record. If a fixed field in a MARC record is coded improperly, the resulting display for a particular corresponding field—regardless of the accuracy of the metadata contained in that field—will be inaccurate.
Thus, as libraries determine how AI will streamline the way in which their patrons obtain and utilize user profile-driven results and the composite transaction metadata that is generated from those results, there is a renewed sense of urgency to better understand the components of AI and how AI designers work to develop, test, and execute scenarios that mimic human thought and action. Once again, library metadata becomes the agent that allows knowledge to develop and flow, whether it is in written form, audio, visual, or—in the case of AI—conceptual.
|