OA/Open Data Designs and Digital Repository Strategies

Information Today, Inc. Corporate Site

KMWorld

CRM Media

Streaming Media

Faulkner

Speech Technology

DBTA/Unisphere

PRIVACY/COOKIES POLICY

Other ITI Websites

American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Enterprise AI World Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Unisphere Research

Vendors: For commercial reprints in print or digital form, contact LaShawn Fugate (lashawn@infotoday.com)
Magazines > Computers in Libraries > May 2019
Back Index Forward

SUBSCRIBE NOW!

Vol. 39 No. 4 — May 2019

FEATURE
OA/Open Data Designs and Digital Repository Strategies
by Tom Adamich

The academic library community ... will need to keep listening, talking, and innovating in order to create the interconnected scholarly communication landscape of the future.

This article explores scholarly communication issues for 21st-century academic and research libraries. I will identify several issues that academic libraries face when developing their scholarly communication programs to include strategies based on linked open data and semantic web architecture.

By Way of Background

In the 21st century’s second decade, scholarly research curation and repository program development have grown exponentially. Academic libraries, especially those supporting their institution’s research agendas, have recognized the importance of managing scholarly communication resources associated with the multiple steps in the scholarly research process: inquiry; data acquisition; data analysis; report development, publication, and dissemination; and postpublication activities.

Equally important is the recognition that scholarly communication resources must be accessible and discoverable in a 21st-century OA/open data (OA/OD) environment. In this context, there is an understanding that descriptive metadata indicating the resource, researcher, and research process information must be optimized using OA/OD designs and strategies (often referred to as linked open data and semantic web architecture).

Development of this framework has required—and will continue to demand—frequent dialogue among academic library professionals and researchers. This is prior to, during, and following the research process in order to successfully evaluate and execute effective strategies and designs.

Sai Deng’s Primer

One of the best current resources to address these concepts is a presentation by Sai Deng (University of Central Florida Libraries) at the 2018 ALA Midwinter Meeting. Deng enumerates a plethora of issues at play:

Digital repositories, the web, and knowledge organization systems (KOS)—also known as controlled thesauri/subject headings and name identifiers
Environmental scanning of authority control in digital repositories
Name disambiguation and identification
Engaging user interfaces (UIs) using author information
Subjects and keywords; debate on the use of controlled vocabularies
Web-based metadata and text analysis
SEO and the evolution of digital repositories—OA and proprietary
Authority control; identity management and discovery

In addition to highlighting the best practices in scholarly communication designs and strategies that are metadata-based and OA/OD-friendly, Deng provides copious notes and references to research reported in more than 40 resources. Citations include scholarly articles, presentations, and lists from metadata providers (such as OCLC), OA-based UIs (including Omeka and Islandora), and proprietary-based UIs (such as CONTENTdm).

Deng poses and answers many questions relating to how academic library professionals can better understand 21st-century OA/OD-based web architecture. This includes the following:

Semantic web designs—RDFa /triples/JSON designs
Controlled vocabulary and name identifier concepts (best practices, semantic-enabled vocabulary sets)
UI options (legacy and emerging)

The concepts are presented in an easy-to-follow narrative; there’s ample opportunity for discussion, reflection, and further study. Particular emphasis is placed on the steps of the research process and the role of the researcher, especially the effective identification of the researcher and her or his body of research.

The ORCID Approach

One of the best systems for identifying and tracking a researcher’s scholarly output is ORCID. ORCID is an alphanumeric code that uniquely identifies scientific and academic authors—including their names and affiliations—and thus can be used to track their body of work. ORCID codes are unique, persistent, and maintained in an OA/OD web environment by a nonprofit organization dedicated to open information-sharing among research scholars worldwide (ORCID 2018).

ORCID codes allow researchers to document research process steps and outputs for specific research projects, including grant details and manuscript status, from draft stage to final publication. There’s also the ability to provide links to a researcher’s body of work and profile information (affiliation provenance, connections to joint research projects, etc.).

DataCite: Managing Research Data and Projects

Another interesting community-driven OA/OD ID project is DataCite. Similar to ORCID, DataCite is a nonprofit organization that provides and manages persistent identifiers. In this case, it’s DOIs, which were originally developed by CrossRef for journal article identification, but applied by DataCite for research data. According to DataCite’s mission statement, “We support the creation and allocation of DOIs and accompanying metadata. We provide services that support the enhanced search and discovery of research content. And we promote data citation and advocacy through our community-building efforts and responsive communication and outreach materials.”

In the DataCite 2018 Wrap-Up and 2019 Preview blog post, Robin Dasler (product manager at DataCite) reports that there are several key technology enhancements and DataCite community improvements slated to be introduced and shared this year. One of the most interesting is the launch of the DOI Fabrica API (a combination of the previous REST API and Fabrica API architecture). The advantage is that the Fabrica API aggregates more DataCite DOI management functions under one umbrella, eliminating the need to convert metadata structures to XML, as the Fabrica API is fully JSON-compliant (DataCite 2018).

Another DataCite technology enhancement (that will align with the Fabrica API upgrade) is the migration from Solr to Elasticsearch for DOI search, which is said will result in decreased time between DataCite DOI creation and DOI indexing. Additionally, in 2019, the Elasticsearch platform will have access to the Fabrica API architecture, opening access to more DataCite DOI search filters being exposed in order to retrieve better, more accurate search results.

Like ORCID, the DataCite user community is actively engaged. In 2019, there will be efforts to expand the ability of DataCite contributors to create and manage persistent identifier (PID) graphs, which are designed to connect research entities. DataCite user community members are being asked to contribute to the further development of PID graphs and recommend changes and enhancements.

Shared Strategies

As Wrigley mentioned in her discussion of ORCID community opportunities—and we’ve seen it in the robust activities of the DataCite community—the ability to share ideas and strategies that would promote consensus in the academic library scholarly communication and related research communities is key to the continued growth and success of OA/OD initiatives, particularly those that are metadata related. Additionally, from a technology perspective, the need to understand the present and future capabilities of OA/OD digital repository solutions will be important as libraries determine what the systems can accomplish—short term and long term—and what data architecture specifications are needed to enable both terminology and identifiers to allow the systems to function properly.

There is also a question of the dialogue and workflows that are required between academic library professionals and researchers as they continue to explore and question what options would best allow OA/OD research environments to grow and flourish. Will web practices dictate how to manage OA/OD research environments in the future? Is semi-automatic data manipulation (as accomplished by tools such as MarcEdit and Notepad ++) still needed to ensure data integrity and encourage periodic review of data integrity and accuracy? Where will academic research-based OA/OD designs and strategies be in 5 years? Or 10?

The academic library community has made great strides in promoting and enabling OA/OD repositories to develop. The same community will need to keep listening, talking, and innovating in order to create the interconnected scholarly communication landscape of the future.

The ORCID Dialogue

I had the pleasure of participating in a dialogue with Alainna Wrigley. At the time of the interview, she was serving as a communications specialist based in ORCID’s Hong Kong offices.

TOM: Is ORCID compatible with other systems for persistent identification?

ALAINNA: A key feature of ORCID is our interoperability. Besides being a registry of identifiers for individual contributors to research, scholarship, and innovation, we are also a hub for exchanging information about individuals, their contributions, and their affiliations—all based on identifiers. So long as a system can access the ORCID API, use the ORCID message schema, and process the identifiers connected to an ORCID record (such as DOIs, GRID IDs …), that system can read and write data about individuals with the ORCID Registry. This enables systems that don’t have connections, or that use different types of metadata standards, to exchange information via the ORCID record.

The current list of work, grant, resource, and peer-review identifiers in the ORCID registry can be found here: pub.orcid.org/v2.0/identifiers.

Any ORCID member writing identifiers to the ORCID registry can request that new identifiers be added by contacting the ORCID team.

We are participating in the Metadata 2020 initiative that is advocating for richer, connected, and reusable open metadata for all research outputs.

TOM: How is the ORCID similar/different from other name identifiers (ex., ISNI)?

ALAINNA: ORCID and ISNI are separate organizations that address different aspects of unambiguously identifying people and parties. The background, context, and goals of each organization are distinct.

ORCID is an international, interdisciplinary, open, non-proprietary, and not-for-profit organization. Our mission is to help create a world in which all who participate in research, scholarship, and innovation are uniquely identified and connected to their contributions and affiliations, across disciplines, borders, and time.

We work with the research community to embed these identifiers in critical workflows (such as manuscript submission, grant applications, and research information systems) to ensure that researchers are connected with their research. Individuals may register an ORCID iD, create and populate an ORCID record, and search the registry free of charge. Individuals also have control over what data are displayed in their record and may adjust visibility settings and select trusted parties with whom to share information.

Fundamentally, ORCID supports system interoperability by linking information between sources using the unique ORCID identifier. ORCID itself stores only limited metadata, including digital object identifiers and URLs, so that people using the registry can easily navigate to research works. ORCID also supports linkages with other ID schemes, allowing researchers to bring together into their record works of multiple types (e.g., publications, grants, datasets) from multiple sources. One example is Scopus, a repository of research articles. From ORCID, an individual can link their record to their Scopus Author ID and vet and import publication metadata from their Scopus record. This benefits the researcher by improving the accuracy of the Scopus database and by speeding the population of their ORCID record.

ORCID is committed to being interoperable with other identifier schemes, including ISNI. To this end, we coordinate our efforts with ISNI where they overlap in the research and scholarship communities. ORCID identifiers use a format compliant with the ISNI ISO standard. ISNI has reserved a block of identifiers for use by ORCID, so there will be no overlaps in assignments. This range of identifiers is defined between 0000-0001-5000-0007 and 0000-0003-5000-0001. (For more on ORCID iDs, see support.orcid.org/hc/articles/360006897674.) We continue to work together to consider additional opportunities for collaboration.

TOM: What are some current/future goals for ORCID optimization?

ALAINNA: We have a lot coming! ORCID is an open organization. We keep the community abreast of what we have under development, and many of our ideas are sourced from the community:

• Annual road maps— Each year, we publish our road map of our goals for the year. Our 2018 road map can be found at orcid.org/about/what-is-orcid/mission/2018-project-roadmap. Our theme for 2019 is the Year of the Researcher, and many of our projects will focus on making the ORCID Registry more useful to the individuals who use ORCID iDs. The upcoming road map will be announced on our blog in the first quarter of 2019: orcid.org/blog.

• Current development— We use Trello to track our current projects: trello.com/orcid2. Any member of the public can follow the Trello boards, which track what we are working on and if a new feature has been released. All of the development cards are linked to our GitHub repository, where the ORCID registry code is housed. Recent major releases include research resources—recognizing use of specialist resources used for research purposes. Specialist resources can include anything from research facilities housing specialized equipment (laboratories, observatories, ships, etc.) to digital repositories; and from museums and galleries to field stations that house physical collections—and researcher profile information—affiliations, qualifications, invited positions, distinction, membership, and service.

• Community ideas— There are many ways for the research/library community to get involved. We regularly have working groups and task forces on various topics, such as the Publications and User Facilities Working Group, which advised on the research resources data model. A full list can be found here: orcid.org/about/community. We also have the ORCID iDeas Forum where individuals and organizations can suggest new features that would benefit them and their communities. The iDeas Forum is accessible at support.orcid.org/hc/en-us/community/topics.

References

Deng, S. 2018. “Expanding the Metadata Librarian Horizon: Reflections on the Metadata Practices in the Web and Digital Repositories” [presented at the 2018 ALA Midwinter Meeting]. Retrieved Dec. 26, 2018 from connect.ala.org/HigherLogic/System/DownloadDocumentFile.ashx?DocumentFileKey=9383f187-eca1-4e2e-9b92-9a7aae1bf8df&forceDialog=0.

ORCID. About. Retrieved Dec. 26, 2018 from orcid.org.

Wrigley, A. Personal email correspondence, Dec. 10, 2018. Retrieved Dec. 26, 2018 from vls@tusco.net.

DataCite. 2018. Mission. Retrieved Dec. 31, 2018 from datacite.org/mission.html.

DataCite. 2018. Preview. Retrieved Dec. 31, 2018 from blog.datacite.org/2019-preview.

Tom Adamich (vls@tusco.net) is the president of the Visiting Librarian Service, a firm he has operated full-/part-time since 1993. Library materials metadata management has been a career focus. Adamich has also been a vehicle and architectural historian since the early 1990s.