FEATURE
Looking for Good Art
Part 2: Image Retrieval
by David Mattison
Access Archivist, British Columbia Archives, Royal BC
Museum Corporation
[Part
1] • [Part
3]
The infrastructure behind art image databases forms the
core of Part 2 of this series of articles. The metadata
problem, both the lack and inaccuracy of information describing
image content, poses a significant challenge for both
searchers and image database developers. Without detailed
image content metadata, search options may be limited
to the kinds of "advanced" search options you
see on Google Images or AltaVista: text in the file name,
file size, file format (limited in Google to JPEG, GIF,
or PNG), and whether the image is color, grayscale, or
black and white. Google's Image Search FAQ indicates that
the search engine places more weight on the image's context
within a Web page, especially textual information in the
image hyperlink (the IMG SRC tag).
Image Search Engines
Image search engines fall into two categories. General
search engines such as Google and AltaVista index image
file format and property data, that is, metadata about
the image file itself, but not the image content directly.
The content connections come from the language surrounding
where the search engine found the image — in other
words, content-in-context indexing. Specialized search
engines attempt to analyze image content by applying
various techniques such as edge detection (object outline);
color, shape, and texture comparisons; and other basic
values and properties. Most content-based image search
engines work on a search principle called Query by Example.
The user selects from an existing set of images and
the search engine attempts to produce matches from its
image collection. In their review of content-based image
search engines, Gevers and Smeulders point out that
"As data sets grow big and the processing power
matches that growth, the opportunity arises to learn
from experience. Rather than designing, implementing,
and testing an algorithm to detect the visual characteristics
for each different semantic term, the aim is to learn
from the appearance of objects directly." Many
of the specialized search engines depend on the same
techniques used in identifying objects in other fields,
e.g., satellite detection, fingerprint matching, etc.
As far as I could tell, most of the sites I listed
in Part
1's Table 1 (AlltheWeb Pictures, AltaVista, Ditto
Images, Google Image Search, Freenet.de Bildersuche,
Ithaki Multimedia Meta Search Engine, Ixquick Metasearch,
Lycos Multimedia Search, Picsearch, Yahoo! Image Search)
do not access any academic or art museum databases.
AltaVista, however, does let you directly query Corbis,
the commercial image agency started by Bill Gates in
1989 that includes a large body of historic art.
Nicolas G. Tomaiuolo provided a thorough, general overview
of image search engines in "When Image Is Everything"
(Searcher, January 2002; https://www.infotoday.com/searcher/jan02/tomaiuolo.htm).
He updated his article for a chapter on the same topic
in his book The Web Library (Information Today,
Inc., 2004); the Web site for the book includes additional
sites for the two chapters that cover image searching
and art images [http://www.ccsu.edu/library/tomaiuolon/theweblibrary.htm].
Content-Based Image Retrieval: The Holy Grail of Information
Retrieval
Content-Based Image Retrieval (CBIR) is a large field,
taking in other domains such as computer vision and
pattern recognition. For a sense of how the content-based
image retrieval field evolved, you might want to look
at these four surveys of early and current CBIR systems
and technological issues:
Content-Based Image Retrieval: An
Overview by Theo Gevers and Arnold W. M. Smeulders
(Faculty of Science, University of Amsterdam, June 2003)
[http://carol.science.uva.nl/~gevers/pub/overview.pdf].
Content-Based Multimedia Information
Handling: Should We Stick to Metadata? by Paul Lewis,
David Dupplaw, and Kirk Martinez (Cultivate Interfactive,
February 2002) [http://www.cultivate-int.org/issue6/retrieval/].
Content-Based Image Retrieval Systems:
A Survey by Remco C. Veltkamp and Mirela Tanase
(Utrecht University, March 8, 2001) [http://www.aa-lab.cs.uu.nl/cbirsurvey/cbir-survey/cbir-survey.html],
lists all known CBIR systems as of late 2000.
Content-Based Image Retrieval: A Report
to the JISC Technology Applications Programme by
John P. Eakins and Margaret E. Graham (Institute for
Image Data Research, Northumbria University, Newcastle,
January 1999) [http://www.unn.ac.uk/iidr/report.html].
For more background information on the field, try
academic and private research institutes involved in
CBIR via their publications pages, site-search engines,
e-print servers, institutional repositories, or through
new kinds of academic search services such as OAIster
(now indexed by Yahoo! Search). SFUjake [http://mercury.lib.sfu.ca/~tholbroo/sfujake-mason/search.html]
can identify specific journals devoted to image recognition
and retrieval, where they're indexed, and their availability
in full text. The Association for Computing Machinery
(ACM) Portal [http://portal.acm.org]
also offers a fruitful resource for current and past
research in CBIR.
Some of the European, British, and U.S. CBIR research
centers include:
The Institut National de Recherche en
Informatique et en Automatique (INRIA, France) IMEDIA
Project [http://www-rocq.inria.fr/imedia/index_UK.html]
and its IKONA software. One of the demonstration IKONA
databases contains art images [http://www-rocq.inria.fr/cgi-bin/imedia/ikona].
Germany's Center for Computing Technologies
(TZI) [http://www.tzi.de]
offers a demonstration of its PictureFinder application
[http://www.tzi.de/bv/pfdemo]
for large image databases. It was designed for the "automatic
indexing and annotation of images in a specific domain"
[http://www-agki.tzi.de/bv/projects/index.html?project=picturefinder&site=short&lang=en].
Greece's Informatics and Telematics
Institute [http://www.iti.gr]
has developed a number of CBIR applications, including
the SCHEMA Network of Excellence in Content-Based
Semantic Scene Analysis and Information Retrieval
[http://www.schema-ist.org/SCHEMA/]
and ISTORAMA: Content Based Image Retrieval over
the Internet [http://uranus.ee.auth.gr/Istorama/].
Although the two SCHEMA demonstration systems [http://media.iti.gr/site/Schema/schema.php
and http://media.iti.gr/SchemaRS/systems/xm/index.html]
utilize photographs of natural objects, the SchemaRS
system, based on MPEG-7, an audio-visual content description
standard [http://www.chiariglione.org/mpeg/standards/mpeg-7/mpeg-7.htm],
contains photographs of art objects contributed by Fratelli
Alinari.
From Italy's Istituto Trentino di Cultura
comes COMPASS (Computer-Aided Search System)
[http://compass.itc.it],
"a distributed application for content-based image retrieval
using remote databases." Among the downloadable demonstration
databases is one of fine art images (Windows, Linux,
and Sun Solaris versions).
Among the achievements of the Intelligent
Sensory Information Systems Research Group, University
of Amsterdam [http://www.science.uva.nl/research/isis/isisNS.html]
in the realm of Web searches for images are PicToSeek
and PicToVision. You can try these out through the Java-based
demonstration ZOMAX site [http://zomax.wins.uva.nl:5345/zomax/
or http://www.science.uva.nl/research/isis/zomax/].
The Intelligence, Agents, Multimedia
Group (IAM) [http://wwwcosm.ecs.soton.ac.uk/],
Electronics and Computer Science Department, University
of Southampton, U.K., continues to generate a wealth
of e-prints on CBIR, and links to various projects.
The Institute for Image Data Research
[http://www.unn.ac.uk/iidr],
Northumbria University, Newcastle, U.K., worked on a
visual search CBIR tool from 2000 to 2002 for the AHDS
Visual Arts image collections.
Along with several other content-based
multimedia retrieval projects, Columbia University's
Digital Video/Multimedia Laboratory [http://www.ee.columbia.edu/dvmm]
created the demonstration WebSEEK: A Content-Based
Image and Video Search and Catalog Tool for the Web
[http://www.ctr.columbia.edu/WebSEEk/
and http://www.ee.columbia.edu/dvmm/researchProjects/MultimediaIndexing/WebSEEK/WebSEEK.htm].
The lab also collaborated with the university's teachers'
college to create a visual arts teaching tool called
EdSearch that incorporates user sketches as part of
the database query.
The University of California at Berkeley's
Digital Library Project, working with Berkeley's Computer
Vision Group [http://elib.cs.berkeley.edu/kobus/famsf/model_2/text_and_blobs/bbox.html],
developed a nameless demonstration, Java-based image-content
browser using artworks from the Fine Arts Museum of
San Francisco.
The RIEMANN Project (Research
on Intelligent Media Annotation) [http://wang.ist.psu.edu/IMAGE/],
also described as "automatic linguistic indexing of
pictures," contains a sample database of art images
and features past and current CBIR work by professors
James Z. Wang and Jia Li at Pennsylvania State University.
Dr. Wang "developed an art image retrieval system for
the Stanford University Libraries ... [and] later worked
for the IBM QBIC project." Professor Wang's most recent
investigation into applying machine learning techniques
for image retrieval started in August 2002: Advancing
Digital Imagery Technologies for Asian Art and Cultural
Heritages [http://art.ist.psu.edu],
which also brings together some of his collaborators'
work, includes a version of the SIMPLIcity (Semantics-sensitive
Integrated Matching for Picture Libraries) [http://wang.ist.psu.edu/~jwang/amico],
a demonstration database that uses thumbnails from the
AMICO (Art Museum Image Consortium) collection.
Experimenting with Free Image Retrieval Software
If you'd like to try content-based image retrieval
and have the right computer components, imgSeek
[http://
imgseek.sourceforge.net], an open source
application, contains some of the functionality used
in IBM's QBIC technology. The related imgSeekNet
project [http://imgseek.sourceforge.net/net/],
available in prototype form on client-server architecture,
hopes to create "a distributed content-based image search
engine or peer to peer network." The GIFT software
(GNU Image Finding Tool) [http://www.gnu.org/software/gift/
or http://savannah.gnu.org/projects/gift]
from the University of Geneva is another open source
CBIR system based on the "query by example" model. These
products appear to work best with representational photographic
images, rather than fine art images. While not a content-based
image retrieval tool and designed specifically for digital
photographs, Eamonn Coleman's Windows-based free PixVue
image management software [http://www.pixvue.com]
deserves mention because of its support for JPEG and
TIFF metadata, along with PixVue's integration into
Windows Explorer.
More Image Database Search Tools
As of July 2, 2004, the University of Michigan's OAIster
[http://oaister.umdl.umich.edu/o/oaister/]
service contained more than 3.3 million records from
307 institutions gathered through the Open Archive Initiative
Protocol for Metadata Harvesting (OAI-PMH) [http://www.openarchives.org].
Besides the ability to limit a search to a specific
media type such as an image, the home page promises
that "instead of just the catalog records of a slide
collection of van Gogh's works, users will be able to
view images of the actual works." With only keyword
queries available, except for the media type, you should
keep your subject words general and few. A test search
on July 23, 2004, for the resource type "image" and
the subject "art" produced 4,791 records, not all of
them relevant.
Institutional and learning object repositories, most
of them at academic facilities and some searchable through
OAIster, provide yet another resource category worth
investigating. Cultural institutions such as art museums
with staff engaged in active research may adopt the
institutional repository model for archiving research.
[For an excellent overview, see Miriam A. Drake's "Institutional
Repositories: Hidden Treasures," Searcher,
May 2004 at https://www.infotoday.com/searcher/may04/drake.shtml.]
For example, you can explore art-related materials at
MERLOT (Multimedia Educational Resource for Learning
and Online Teaching) [http://www.merlot.org],
GEM (Gateway to Educational Materials) [http://www.thegateway.org/],
and collections such as Blue Web'n and Filamentality
that form part of the SBC's Knowledge Network Explorer
[http://www.kn.pacbell.com/].
Imaging and Image Retrieval Conferences and Software
Vendors
To stay on the cutting edge of visual arts digitization
research, go to EVA Conferences International
(Electronic Imaging & the Visual Arts) [http://www.eva-conferences.com].
This umbrella site for "a cross-sectoral, multi-disciplinary,
local & global set of events for people interested
in new technologies in the cultural sector" dates back
to 1990, when the European Commission established its
VASARI project (Visual Arts System for Archiving &
Retrieval of Images), aimed at developing a system of
"very high quality digital imaging direct from paintings
for conservation purposes." The EVA site, developed
by the VASARI organization (not the project), offers
access to conference papers back to 1998.
Two other international conferences in this field
are the International Conference on Image and Video
Retrieval [http://www.civr.org],
and the International Conference on Pattern Recognition
[http://www.ee.surrey.ac.uk/icpr2004/].
I also recommend DigiCULT [http://www.digicult.info]
for the European perspective for art image digitization
initiatives within the art museum and gallery community.
In North America, D-LIB Magazine [http://www.dlib.org]
performs a similar service in identifying new art image
databases and conferences.
Here are some commercial CBIR systems:
LTU Technologies [http://www.ltutech.com],
an Anglo-French-American company, partnered with Corbis
on a demonstration database for its Image-Seeker
product [http://corbis.ltutech.com/].
Idée Inc. [http://www.ideeinc.com],
a Canadian company, markets Espion, a visual search
and image management product whose image retrieval capabilities,
according to a product sheet, encompass "collections
of visual images that may contain photographs, video,
graphics, sketches, illustrations, drawings, etc."
Digital library collections management systems are
well worth exploring for art image collections. These
digital content management systems go by a variety of
names: digital library or archival management systems,
institutional repository software, digital asset management
systems (DAMS), and museum collections management systems
or software. A large proportion of the publicly accessible
digital library collections contain digitized photographs
and art images. One company to watch is contentDM
[http://contentdm.com],
connected to OCLC, which features a Customer Collections
page with a category devoted to Art and Drama. The Brigham
Young University BYU Museum of Art Collection
[http://www.lib.byu.edu/hbll/moa/],
for example, contains nearly 9,000 images. With most
contentDM collections you can easily switch from one
collection to another. Art information systems designed
for commercial galleries, such as the collections management
suite sold by Artsystems Ltd. [http://www.artsystems.com],
may also yield some interesting finds.
The Measure of All Art: Metadata and Catalog Systems
To explore some of the practical issues surrounding
art cataloging, e.g., those issues raised in the interviews
with UCAI's project team, read Sherman Clarke's Art
Cataloging [http://artcataloging.net].
Clarke's site mainly covers art-related name authority
issues and links to other art cataloging resources.
Besides associations and organizations such as the
Getty Research Institute [http://www.getty.edu/research/],
other groups maintain a proprietary interest in the
art image cataloging and metadata standards process:
The small but internationally influential
American Library Association's Committee on Cataloging:
Description and Access [http://www.libraries.psu.edu/tas/jca/ccda/
and http://www.ala.org/ala/alctscontent/catalogingsection/catcommittees/ccda/ccda.htm]
that works up the ALA's position on changes to the Anglo-American
Cataloguing Rules, Second Edition, 2002 Revision
(AACR2R). If you're trying to keep ahead of the AACR2R
curve on handling digital images, this is the place
to begin.
The Visual Resources Association's new
initiative, Cataloguing Cultural Objects: A Guide
to Describing Cultural Works and Their Images (Draft,
May 2004) [http://www.vraweb.org/CCOweb/index.html],
a data content standard like AACR2R, also covers 2-D
and 3-D artwork.
The Society of American Archivists' Visual
Materials Section [http://www.lib.lsu.edu/SAA/VMhome.html]
maintains some links to archival image cataloging resources.
The RSLP Collection Description
project [http://www.ukoln.ac.uk/metadata/rslp/],
based at the University of Bath's UKOLN (U.K. Office
for Library and Information Networking), established
a metadata standard and software for use by U.K. research
libraries.
Although the U.K.'s Museum Documentation
Association (MDA) [http://www.mda.org.uk/],
like the distributors of the Anglo-American Cataloguing
Rules, sell their primary museum documentation standard,
Spectrum, in a print and electronic version,
you'll find free access to some thesauri and vocabularies
developed for U.K. museums and a large array of links
to other international documentation standards on the
wordHoard page [http://www.mda.org.uk/wrdhrd1.htm].
Conclusion
Despite the allure of content-based image retrieval,
accurate, valid, standardized, and detailed metadata
is the key to the precision recall of online art images.
As pointed out by Christine L. Sundt and the Union Catalog
of Art Images team, whether it's general image search
engines such as Google Images or large-scale image databases
such as ARTstor and the UCAI project, it all comes down
to the depth and quality of the metadata. In large image
databases, I can see that the ability to begin a query
or to filter search results by specifying image content
attributes for example, show me only pictures
that contain a round yellow object that resembles a
flower will narrow the possibilities down, but
no amount of statistical inference and machine learning,
without prior or concurrent human intervention in the
description (cataloging), search, or retrieval processes,
can confirm or distinguish an amateur painting of a
sunflower from the masterpiece by Vincent van Gogh.
Of course art and artistic images represent only a
small subset of the overall problem when searching for
online images. Still, I believe we're a long way from
the kind of artificial intelligence required that would
permit a machine to consistently and reliably identify
a Rembrandt from a Renoir. The remarkable achievements
of researchers such as Professor James Z. Wang and his
many colleagues around the world in the field of computer
vision, pattern recognition, machine learning, and content-based
image retrieval, nevertheless, all contribute and help
redefine the possible when it comes to searching for
good art.
The author's opinions do not necessarily reflect those
of his employer.
|