FEATURE
VIVO: Building a Stronger Network
by Barbara Brynko
As the VIVO team nears the end of Year 1 of its 2-year project, the web of collaboration continues to expand and to connect a broad range of organizations and institutions.
The national network of scientists was just one of the projects in the grant that the National Institutes of Health (NIH) awarded to create a sustainable research platform of profiles and resources. For the people part of the grant, the VIVO team is busy building a platform to support a network of profiles (faculty researchers and scientists) in each of the seven pilot institutions and facilities (University of Florida, Indiana University, Cornell University, the Weill Cornell Medical College, Washington University–St. Louis, the Scripps Research Institute, and the Ponce School of Medicine in Puerto Rico). For the resource part of the grant, the eagle-i Consortium is busy building a research resource discovery network covering the nuts and bolts of research, including specialized services, resources, and tools.
VIVO and eagle-i represent the yin-yang of people and resources in the scientific community. “Our ontologies [VIVO] are focusing on people,” says Jon Corson-Rikert, head of information technology services at Cornell University’s Mann Library and VIVO creator, “as the people relate to projects, publications, where they work, how they are connected to each other, as well as their own personal and professional history.” The VIVO and eagle-i Consortium teams have been working together “to find points of connection and to try to align those as closely as possible to each other,” he says.
The teams are exploring relevant data that bridge these two related areas in order to help each other and to share best practices. The eagle-i Consortium consists of nine institutions (Morehouse School of Medicine, University of Alaska–Fairbanks, Montana State University, University of Hawaii–Manoa, Jackson State University, Dartmouth College, Harvard University, Oregon Health & Science University, and University of Puerto Rico Medical Sciences Campus) that are divided into three teams: the Build team (software development archi tecture and design), Resource Navigator team (gathering and compiling data), and the Ontology team.
“A lot of what researchers are looking for in both domains—research and people—is finding the significant overlap in a person’s activities and expertise,” says Melissa Haendel, lead ontologist and assistant professor at the Oregon Health & Science University Library. “We’re exploring how these two projects relate to each other.” She says the eagle-i team specifically analyzes a person as a resource: what activities they’ve been doing, what kinds of protocols they have in their labs, and what kind of resources they have generated. All of this data about resources and services, she says, can also contribute to finding a researcher with a specific kind of expertise who may become a potential collaborator.
Digging for Data
VIVO and eagle-i’s individual approach to acquiring data for their respective networks is slightly different too. eagle-i currently has Ph.D. scientists (Resource Navigators) who are collecting the information directly about lab resources, says Haendel, adding that this data is being used to determine specific ontology requirements and interface functionality. But this direct contact is likely to change as universities and facilities eventually provide more of their data directly, much the same way as VIVO can tap into an institution’s existing databases to gather information about people stemming from their involvement with resource systems or research grants.
Carlo Torniai, assistant professor and ontologist at Oregon Health & Sciences University, says the work at eagle-i is a community effort, so the first point was to engage the biomedical community in finding out what data was already available and to use ontology best practices. By using Mireot (minimum information to reference external ontology terms) principles, one ontology can selectively import classes and term definitions from another ontology to align data at points of intersection, while avoiding duplication, inconsistencies, and unintended inferences and allowing domain-specific specialization.
The teams are forging ahead, building their networks and collaborating on proto cols. For each team, this process has been a blend of searching out existing work, balancing alternative conceptual frameworks, and crafting a coherent structure for the two different domains of people and resources. “At first, we tried to map ontolo gies to create our data,” says Ying Ding, assistant professor of Information Science at the Core Faculty of Cognitive Science at Indiana University, noting that the VIVO ontology team actually learned how to add roles to identity. “But we have learned a lot by adopting best practices from eagle-i and other scientific ontologies. The VIVO ontology has evolved to support more nuanced roles, processes, and temporal bounds on relationships to support institutional needs and sustainable approaches to change over time.”
But not all information is readily acces sible or freely given from external sources. “It’s actually been a fairly good struggle to get scientists to share resource metadata with us,” says Haendel. “From a scientist’s perspective, you want nothing more than to share sources. … But it’s very hard to take the time to dive through all of your data and then make it available. It’s not only time-consuming, but you’ve also given up authority over those materials. … It feels like hard work being given away.” Others, she says, have the opposite reaction and freely provide all kinds of resource information.
VIVO experiences similar pain points about collaboration and privacy, says Corson-Rikert, noting that some resear chers are reluctant to consider the VIVO platform to be a place to store even public information about their research. While some information needs to be transparent for disambiguation, some people are sensitive about having their email addresses exposed or having the location of their labs listed for security purposes. Some institutions, such as Indiana University, have an opt-in policy, says Corson-Rikert, while others have an opt-out system. So institutions vary on what their privacy policies are.
Building the networks also meant creating viable and sustainable ontologies. “Ontologies, by definition, are a never-ending development,” says Torniai. “From our perspective, we have continuous feedback coming from the data we are collecting.”
Another big issue is granularity, says Corson-Rikert. How much granularity is necessary or desired for a project, grant, or observation? A research center can have many levels of committees and project teams, he says, so decisions need to be made about what the ontologies must accommodate and how the inconsistencies are modeled. He says he is looking for that “happy medium,” where users can get all the data they need without causing an undue burden of entering it into the system.
“You can have a very expressive ontol ogy but not necessarily annotate all data to that level of granularity,” says Haendel. “The ontology can be used to extract data so that you can query at a granular level.” One of eagle-i’s goals is to experiment with that kind of querying capability. “We’re examining the annotation at a very granular level versus using entity extraction versus using text-matching algorithms,” she says.
Finding Common Ground
“As we implement our respective systems for people and research resources, the degree of collaboration that can happen independently of specific technologies is notable,” says Corson-Rikert. Both eagle-i and VIVO are essentially applications built on an underlying ontology where interaction in modeling styles and even terminology can occur to facilitate in teroperability. “Now, based on the idea of an application being driven by the ontology, where the application is really just a delivery mechanism for data in a standard underlying format, the application can be updated or replaced without affecting the long-term investment in data,” he says, adding that this is an effective way to move large-scale projects forward, so they can then respond to new technology, whether they are distributed, federated, or whatever else emerges in the world of computer science. This way, data-intensive projects can be advanced and can be upgraded much more independently, he says.
And the level of data is moving forward. The focus on common structure and trans parent data definition is inherent to the data on the semantic web. “The users can see everything about our data and know exactly what we are modeling and how they can use, adapt, or extend it,” says Corson-Rikert. “We guarantee that our data can talk to each other or to other applications because our applications are built with ontologies designed using international standards to enable data to talk to data.”
Year 2 of the NIH project is likely to bring more developments in technology and collaboration, the details of which are now being explored. Haendel points out that while the VIVO and eagle-i software will offer specific windows onto information about research and researchers, “We don’t know what the destiny of either VIVO or eagle-i will be. But we do know that we will have data and we will have ontologies. And if we build the moving parts to be active, so to speak, to be inter operable, these resources can be utilized by other groups and other people in ways we may not have anticipated.” |