The
very nature of scholarly research has fundamentally changed with the increased
availability of reference or citation linking. With the ubiquity of the
Web, it seems natural to users and information professionals alike that
they be able to link painlessly among full-text articles, abstracting and
indexing (A&I) bibliographic information, and article reference lists.
In a perfect world, such linking would occur seamlessly. However, in reality,
technical obstacles, as well as organizational concerns, blur the vision
of perfect, seamless reference linking. Information professionals need
to be aware of the technical and organizational interests and obstacles
associated with reference linking in order to better serve their users.
Caplan defines
reference or citation linking as "the ability to go directly from a citation
to the work cited, or to additional information about the cited work,"1
whether the source and accompanying destination are journal articles, Web
sites, conference proceedings, entries in A&I databases, or even a
link sent via e-mail from one colleague to another. Generally, in the scholarly
community, reference linking is first and foremost thought of as a link
among and between journal articles and bibliographic entries. In the electronic
scholarly publishing community, linking initiatives have first attacked
the obstacles and problems associated with linking among and between journal
articles and bibliographic entries.
In the past, linking
generally was built around the collections of a specific publisher (primary
or secondary) or aggregator, whether internal or external. In other words,
links either traveled among articles and records housed together or among
articles and records housed on any number of servers, but the focus was
the legitimate collection of a database provider. For example, publishers
might provide internal links among their own electronic journals, or aggregators
would provide internal links among full text provided by their services.
In terms of reference linking, however, these sorts of closed systems are
only partially successful. As Caplan reminds us, "It is very unlikely that
any article published by Elsevier will cite only other articles in other
Elsevier journals."2
It is also unlikely that any one full-text repository will in itself be
large enough for any one body of literature. Necessarily, then, a hybrid
model of both internal and external linking emerges. While more useful,
inclusive, and most importantly, open, hybrid models of internal and external
linking introduce a new array of technical and organizational issues.3
Electronic Journal Publishing
Defined
Before examining
the technical and organizational issues involved in hybrid reference linking,
however, we should look at the two general categories of scholarly electronic
journal publishing: direct delivery from publishers and full-text aggregation
through third parties.
With direct delivery,
primary publishers offer electronic journal subscriptions via Web sites
or Web services. For example, STM primary publishers such as Elsevier Science,
Springer-Verlag, Academic Press, American Institute of Physics, and the
American Chemical Society all offer access to their journals electronically.
These primary publishers have services such as ScienceDirect (Elsevier
Science), LINK (Springer-Verlag), IDEAL (Academic Press), Online Journal
Publishing Service (American Institute of Physics), and ACS Publications
(American Chemical Society). With some of these primary publishers, such
as Springer-Verlag, a portion of the articles in various journals appear
online before appearing in print.
The second general
category of electronic journal publishing is full-text aggregation through
third parties. In this schema, publishers hand materials over to aggregators,
which then house the full text. Major electronic journal aggregators include
EBSCO, OCLC FirstSearch via Electronic Collections Online (ECO), ProQuest,
Gale Group via InfoTrac, and H.W. Wilson. However, unlike direct delivery,
aggregators may not include all of any given journal issue and publishers
may impose embargoes.
While this sort
of categorization is useful, it is sometimes misleading. Increasingly,
primary publishers have become aggregators. Elsevier and Springer-Verlag,
for example, offer access to publications from sources neither company
owns. To further complicate matters, aggregators may choose to house their
full text on internal servers or link to full text housed elsewhere.
Moreover, primary
publishers and aggregators now both offer direct delivery or e-journal
gateways for libraries and information centers. Services such as EBSCO
Online, OCLC's WebExpress, and SwetsnetNavigator attempt to serve
as electronic journal portals for information centers or libraries. Other
companies, such as the Gale Group and ingenta, have initiated partnerships
that offer similar e-journal gateway services. These types of services,
however, may not always accurately reflect a specific institution's holdings.
For example, linking agreements and pricing structures determine which
e-journals from different publishers are available and viewable through
these subscription gateways.
With the ever-increasing
amount of full text available in different formats and from different providers
(e.g., publishers or aggregators, who can be one and the same), information
professionals need a way to evaluate and navigate the technical and organizational
obstacles associated with reference linking for scholarly electronic materials.
Linking Models: Internal,
External, Or Hybrid?
First, one must
understand the type of link provided. Links can be internal, contained
within one service, or external, connecting documents or records provided
by two or more services4.
Internal linking occurs in aggregator services such as EBSCOhost's
Academic Search Elite or Academic Search Premier, OCLC FirstSearch, Gale
Group's InfoTrac, or ProQuest's Research Library. Primary publishers also
employ internal linking in their own direct subscription services. For
instance, reference links are available among journals published by Elsevier's
ScienceDirect.
A prime example
of the external linking model is linkages among secondary and primary publishers
or links among abstracting and indexing services, primary publishers, and
aggregators. Articles are housed on a different server than the bibliographic
records. An A&I service functions as a navigational tool that then
points users to the full text, via services such as SilverPlatter or Cambridge
Scientific Abstracts (CSA). For example, CSA offers links from search results
in its Internet Database Service (IDS) bibliographic databases to full-text
documents offered by Project MUSE from Johns Hopkins University Press,
PsycARTICLES from the American Psychological Association, and Ingenta.
The IDS service includes more than 50 databases.
Lines of demarcation
between internal and external linking continue to blur, however. Use of
both internal and external linking, or hybrid reference linking, has become
more and more common, as evidenced by the emergence of the CrossRef initiative,
a joint effort of major primary publishers under the auspices of the Publishers
International Linking Association (PILA). As Caplan noted, it is highly
unlikely that any Elsevier-published journal article only references other
Elsevier-published articles. To provide the service that readers and librarians
want, primary publishers must link within their own services and out to
other services. Aggregators and A&I services, too, have begun linking
in and out of their own services, which often means crafting new linking
agreements with primary publishers. While hybrid linking is becoming the
norm, it does make for a most complicated practice.
Hybrid Linking in Action
In studying the
hybrid linking model, it must be determined just where the link takes the
user. Is the link internal to a service or external? Does a link take the
user to the article or journal level or simply to the front page of a publisher's
site? CrossRef [http://www.crossref.org],
the international publisher initiative launched in 2000, allows for a dissection
of the hybrid linking phenomenon among the primary publishers.
CrossRef describes
itself as a "digital switchboard," linking the content of primary publishers
(more than 91 at the end of 2001) and what CrossRef terms "affiliates"
and "library affiliates." These links are effected through Digital Object
Identifiers (DOIs), a unique identifier tagged to article metadata. Unlike
URLs, which can be inconsistent and point only to a manifestation of an
article or other piece of electronic content, DOIs are persistent and identify
the object itself. DOIs link to URLs through a resolver system, such as
the one run by the International DOI Foundation. CrossRef, then, functions
as what Caplan calls a "reference database," into which CrossRef publisher
members deposit DOIs and associated citation metadata. CrossRef houses
no full text; it is only a cog in a wheel that allows for the association
of persistent identifiers (DOIs) with locations (URLs) and article citation
metadata.
One should note
that while DOIs are the standardized identifier used by CrossRef, other
standardized identifiers exist as well. The choice of identifier may depend
on the organization or company using them and the level of access being
supported. For example, an aggregator such as the Gale Group offers links
to local holdings through ISSNs for its InfoTrac Web periodical products;
the Gale Group is currently working on this functionality in its Resource
Centers product as well.
CrossRef, however,
has chosen DOIs. When a user clicks on a link in a reference list of a
journal published by a participating publisher, he or she goes to the publisher's
Web site, where access is determined by subscription. The reference list
of an article in the Elsevier service ScienceDirect will have links to
other Elsevier journals, as well as links to other publishers' journals,
such as Blackwell Science, Springer-Verlag, and Wiley InterScience. Again,
however, access is determined by subscription. While a user may have affiliated
access via another route (an aggregator, library print holdings, etc.),
currently, the CrossRef link only takes the user to the publisher-supplied
full text.
Ovid [http://admin.ovid.com/openlinks]
provides another example of the hybrid linking model, this time for aggregators.
Ovid's products employ both internal and external linking through full-text
aggregation, bibliographic databases, and its OpenLinks software. Links
in these services may travel among documents housed at Ovid or housed elsewhere,
on non-Ovid Web-based systems. With the OpenLinks software, connections
can be set up between records in Ovid databases and remote e-journal full
text to which an institution subscribes. OpenLinks also fully support CrossRef.
According to Ovid, by supporting CrossRef, it has access to the CrossRef
database. Access to the CrossRef database allows Ovid to pair its bibliographic
article metadata to the information in the CrossRef database, thus creating
a link from the article's unique DOI to the publisher-assigned URLs. Access
for the user is wholly based on subscription, and according to Ovid, the
institution can define OpenLinks so that links only appear for subscribed
journals. Ovid also has agreements with other aggregators, such as Project
MUSE, and primary publishers, such as Springer's Online LINK, to access
by Ovid's OpenLinks.
Ovid is not the
only company that has seen the advantage of supporting CrossRef. Secondary
publishers, A&I services, and others can become affiliates in CrossRef
and have access to the CrossRef reference database with its DOIs and journal
article metadata. By becoming CrossRef affiliates, secondary publishers,
A&I services, and others can bypass the tedious process of signing
bilateral linking agreements with separate publishers. CrossRef affiliates
include CSA, EBSCO Publishing, and SwetsBlackwell.
However, for now,
using CrossRef alone means a system of one-to-one relationships between
DOIs and URLs, pointing users to the manifestation of a journal article
available at the journal publisher's Web site. CrossRef alone does not
take into account user affiliations and whether users might have access
to more than one route to a journal article, e.g., through an aggregator.
The "Appropriate Copy" Problem
Rears Its Ugly Head
Once we understand
the type of link being provided (internal or external), we must still look
at several technical and organizational obstacles to understand the intrinsic
complexity of reference linking. The major technical obstacle for information
professionals is providing access to the "appropriate copy" for their constituencies.
While the promises and advertisements of information providers (publishers,
aggregators, A&I services) tout seamless interconnectivity, information
professionals know better.
Behind the user's
simple expectation of clicking on a link in a reference list and being
instantly transported to its corresponding full text lie some very complex
processes. The complexity lies in the multiple availability of any one
article. For example, the full text of one article could be available through
several means: the publisher's Web site; aggregator services such as Ovid,
ProQuest, Gale Group, or EBSCO; subscription agent gateways such as EBSCO
Online or SwetsnetNavigator; locally or consortially hosted copies
of publishers' journal databases; and document delivery services such as
Infotrieve or ingenta. And these are only some of the options for electronic
versions. Of course, an information center might have print subscriptions
or wish to direct users to ILL services. Moreover, an institution might
have access to more than one of these article manifestations and wish to
guide each user to the best option.
Open-system reference
linking initiatives such as CrossRef and software developments such as
Ovid's OpenLinks, as well as persistent and unique identifier technology
such as the DOI, do much to move us toward a seamless interconnectivity
between bibliographic data and full text housed within different organizations.
Furthermore, efforts on the part of publishers (primary and secondary)
and aggregators to link to local holdings or local ILL services also promote
localization and personalization of resources.
We need one more
step, however, to provide truly painless reference linking. We need linking
architecture that never leaves a user at a dead end, wondering why he or
she is denied access, a linking architecture that points users to their
institutional-affiliated holdings, subscriptions, and chosen services.
Such a linking architecture or system must provide context-sensitive reference
linking.
Linking Initiatives and Technologies
to the Rescue
Context-sensitive
reference linking takes the user's information environment, their context
or situation, into account when linking between references in journal articles
or other online content to full-text collections. In other words, the linking
systems include a localization feature that addresses the user's specific
affiliation or possible subscriptions. Two 2001 articles by Oren Beit-Arie
et al. and Priscilla Caplan, respectively, explain the need for linking
systems to be open, generalized, and robust in order to be context-sensitive
and to solve the "appropriate copy" problem5.
For an effective "appropriate copy" solution, a linking system also needs
standardization and localization.
Chemical Abstract
Services' (CAS) ChemPort Connection and ChemPort Reference Linking services
offer two examples of the type of services that attempt to address the
"appropriate copy" issue, both in secondary database linking and in reference
linking. The difference between secondary database linking and reference
linking is a matter of nomenclature; often, secondary database linking
is simply included in the notion of reference linking. These two services
from CAS specifically address both.
The ChemPort Connection,
CAS's initial linking effort launched in December 1997, allows searchers
of CAS secondary databases (e.g., STN, SciFinder) to link from CAS records
through the ChemPort Connection to the full text available from the primary
publisher, patent office, or CAS's Document Detective Service. According
to Harry Boyle, Manager, Web Alliances at CAS, some variation of reference
linking has been available in CAS since the introduction of the ChemPort
Connection, with links from CAS databases to the full text
at publishers' Web sites; CAS has agreements with about 135 publishers,
as well as with patent offices and EBSCO, as a subscription agent. Through
these agreements and behind-the-scenes linking technology, ChemPort Connection
allows those with subscriptions or affiliated access to link directly to
the cited article or document. The ChemPort Connection also offers a link
to local library holdings — and CAS works with a broad spectrum of libraries
— that can extend from advanced, localized integration with library systems
to a link that simply takes users to the library's home page. The advanced
localization is a powerful feature of the ChemPort Connection, allowing
local systems administrators to directly set up linking to their library
through the CAS Site Administration Tool.
In December 2000,
CAS announced linking from cited references in full-text articles
to
CAS records, thus allowing for links both to and from CAS records; this
is the ChemPort Reference Linking service. Currently, Boyle noted, CAS
is working with a relatively small number of publishers for the ChemPort
Reference Linking service, including ACS Publications, Academic Press,
American Institute of Physics, the Institute of Physics Publishing, the
International Union of Crystallography, Springer-Verlag, and publishers
with full text loaded at Catch Word. Announced December 4, 2001, CAS unveiled
ChemPort's new "Enhanced Reference Linking Service." With these recent
enhancements to ChemPort, researchers now have the option to view, for
a charge, chemical substances discussed in the cited article or a list
of documents citing the current document.
The OpenURL framework
and SFX, context-sensitive reference linking software, combine to offer
another option for successful linking of heterogeneous materials from different
providers. Most important, the OpenURL framework is non-proprietary, an
open source protocol and a proposed standard under review by the National
Information Standards Organization [http://www.niso.org/committees/comittee_ax].
Herbert Von de Sompel and others developed the concept of OpenURLs, and
the complete, original theoretical papers on the OpenURL were published
in April 19996.
According to Harry
Boyle, while not called "OpenURL," CAS's linking initiatives designed to
deal with the "appropriate copy" issue were a precursor to the OpenURL.
Specifically, CAS collaborated with Ohio State University and OhioLink
to localize the ChemPort Connection for the OhioLink consortium. Overall,
the "appropriate copy" issue has been a recognized problem for years, and
the CAS/OSU/OhioLink collaboration, those involved with the development
of the OpenURL framework, as well as other groups, have been working to
find a practical solution. (More information about the development of OpenURLs
and the subsequent development of SFX is available at http://www.sfxit.com/.)
According to NISO [http://www.niso.org],
the OpenURL standard "should incorporate these two syntax options:
-
syntax for packaging
metadata and identifiers describing information objects
-
syntax for pointing
to a user-specific resolver that can accept this packaged data, combine
it with user information, and resolve the data into actual links."
The standard should
not focus on any one identifier, such as DOI, but rather take into account
other identifier standards such as SICI, ISSN, and others.
SFX, on the other
hand, is a dynamic linking software now marketed by Ex Libris [http://www.exlibris-usa.com].
In other words, OpenURL is part of the underlying framework that allows
an SFX server to work. While marketed by Ex Libris, SFX is vendor-independent
and facilitates an open-linking environment. It is remarkable in its ability
to localize the dynamic creation of links among A&I databases, library
catalogs, citations databases, citations in research papers, e-print archives,
and Web resources. SFX is, in essence, a third party in the effort to connect
the user with his or her "appropriate copy." An institution purchases it
and it "remains under their control and management."7
This allowance for local administrative control is the most powerful feature
of SFX. SFX, however, is only one of several possible local resolution
systems. According to Caplan, systems from Endeavor Information Systems
and OCLC's Open Name Service are in the works8.
SFX generally works
on the concept of sources and targets. Sources could be records in one
database, and targets could be records in another. For example, a user
might access records in one database. A database record retrieved by the
search would carry an SFX link (named SFX or something else). When the
clicking on the link, the user would see a list of options specific to
their affiliation: library catalog, full-text databases, e-journals, and
more. Both the sources and targets must be OpenURL-compliant, as the requests
passed between the two depend upon it.
Customer Demand for OpenURL
Compliance
More and more
publishers, aggregators, and vendors have either become OpenURL-compliant
or have begun implementing such compliance. Again, as the SFX software
(and hence the OpenURL framework) employs both sources and targets, a variety
of organizations and companies provide products that are either sources,
targets, or both. OpenURL-enabled resources include the pre-print archive
at the Los Alamos National Laboratory [http://www.arXiv.org];
ProQuest from Bell & Howell Information and Learning; Cambridge Scientific
Abstracts; EBSCOhost from EBSCO Publishing; InfoTrac from the Gale
Group; WilsonWeb from H.W. Wilson; Web of Science from ISI; FirstSearch
from OCLC; Ovid Bibliographic Databases and SilverPlatter ERL/WebSPIRS
from Ovid; and SwetsnetNavigator from SwetsBlackwell. SFX targets
are many more in number. There are bibliographic and A&I databases,
document delivery services, journal publishers and individual journals,
full-text aggregators, library catalogs, and general-interest Web sites.
(For a full list of both sources and targets, see http://www.sfxit.com/.)
According to Gary
Pollack, program director for Product Platforms at the Gale Group, Gale
committed to becoming an OpenURL-enabled resource because customers started
asking for the SFX product by name. Initially, there may have been some
confusion among customers between OpenURL compliance and SFX, but regardless,
the demand for the OpenURL/SFX solution was clear. Pollack noted that OpenURL
is the emerging standard, and SFX is the best implementation of the emerging
OpenURL standard. Gale is both a source and a target, but the real work
for the information provider is becoming a target. Being an OpenURL-enabled
resource involves sending outbound HTTP requests; therefore, information
providers must re-gear their product to become an OpenURL-enabled resource,
requiring a considerable amount of work. CAS also fully supports the OpenURL
protocol, making the SFX software compatible with the ChemPort services.
Out of Many, One Linking System
All these many
components of linking were recently put to the test. A complex linking
system using the OpenURL framework, the DOI resolution system, and a local
resolution system (including SFX) was tested in the spring and summer of
2001. Participants and observers of the prototype project included the
following groups and organizations:
-
library participants
(Research Library of the Los Alamos National Laboratory, University of
Illinois Grainger Engineering Library, and the Ohio State University Libraries)
-
International DOI
Foundation
-
Corporation for National
Research Initiatives (technology provider for the DOI)
-
CrossRef
-
Ex Libris
-
OhioLINK
-
Digital Library Federation
-
NIS
-
Elsevier Science
-
American Institute
of Physics
A September 2001
D-Lib
article, "Linking to the Appropriate Copy," fully explains the prototype
project9.
Basically, however, all the following components were successfully used
in the same linking system: DOIs as persistent identifiers; the OpenURL
framework as a standardized transportation of metadata and/or identifiers;
CrossRef as a reference database of DOIs and citation metadata; and SFX
(and other systems) as options for a local resolution system. The most
astounding aspect of the prototype project was getting such divergent groups
to work together to create an effective linking system. It does engender
hope for a truly heterogeneous research environment.
Growing Pains
All the organizational
and technical concerns and obstacles associated with reference linking
return us to the oldest goal of librarians: getting the right resource
to the right user at the right time. The "appropriate copy" problem is
not new, only updated to accommodate a digital world. Publishers, aggregators,
and other information professionals have joined in the effort to create
a truly seamless reference-linking environment. The recent collaboration
to integrate OpenURL and CrossRef gives us hope that similar collaborations
of such differing groups may be on the horizon.
Other sorts of
library- and librarian-initiated efforts also give us hope. In an effort
to deal with the "appropriate copy" issue and to add the crucial localization
aspect to e-journal collections, products such as jake and SerialsSolutions
were developed. Additionally, "advanced" thinkers such as those at CAS
as well as those involved with the OpenURL development continue to tackle
the technical issues associated with reference linking for libraries.
Remember that the
Web-based electronic availability of full texts is relatively new. We only
need look at the numbers. The number of publications listed in Fulltext
Sources Online has grown from approximately 4,400 in 1993, to 13,094
in July 2000, and 15,388 in January 200110.
These numbers continue to climb as electronic publishing gains more and
more momentum. Essentially, reference linking is in its adolescence and
experiencing some growing pains. But reference linking also has the energy
of an adolescent, allowing informational professionals to offer unprecedented
access for our constituencies.
FOOTNOTES
1. Priscilla Caplan,
"Reference Linking for Journal Articles: Promise, Progress, and Perils,"
Portal:
Libraries and the Academy, vol. 1, no. 3, pp. 352-356.
2. Caplan, "Reference
Linking."
3. Carol Tenopir,
"Links and Bibliographic Databases," Library Journal, vol. 126,
no. 4, March 1, 2001, pp. 34-36.
4. Jill E. Grogg
and Carol Tenopir, "Linking to Full Text in Scholarly Journals: Here a
Link, There a Link, Everywhere a Link," Searcher, vol. 8, no. 10,
November/December 2000, pp. 36-45.
5. Oren Beit-Arie
et al., "Linking to the Appropriate Copy: Report of a DOI-Based Prototype,"
D-Lib
Magazine, vol. 7, no. 9, September 2001; Priscilla Caplan, "A Lesson
in Linking," Library Journal NetConnect, Supplement to Library
Journal and School Library Journal, Fall 2001, pp. 16-18.
6. These papers
are available at http://www.dlib.org/dlib/april99/van_de_sompel/04van_de_sompel-pt.1.html#ref1
and http://www.dlib.org/dlib/april99/van_de_sompel/04van_de_sompel-pt2.html.
7. Jenny Walker,
"Key Issues: SFX — The Context Sensitive Linking System for Libraries,"
Serials,
vol. 14, no. 1, March 2001, pp. 71-72.
8. Caplan, "A Lesson
in Linking."
9. Oren Beit-Arie
et al., "Linking to the Appropriate Copy: Report of a DOI-Based Prototype,"
D-Lib
Magazine, vol. 7, no. 9, September 2001.
10. Carol Tenopir,
"Should We Cancel Print?," Library Journal, September 1, 1999, pp.
138-142.
|