FEATURE Infoviz
for Info Pros: Information Visualization Software Tools
by Judith Gelernter, Consultant
Search tools swamp us with relevant results, but we usually
only glance at the first few. Why do we tolerate information
gaps? Perhaps we should challenge our partial results
and not leave to a relevance algorithm the judgments we
might make better for ourselves. Information visualization
(or infoviz) software groups a wider range of results
according to visual cues such as color, size, and shape
in order to help us analyze results at a glance and judge
what to overlook and what to examine.
Not everyone values data pictured graphically. Psychologists
explain that preference for graphics stems from an individual's
makeup. A strong left hemisphere of the brain may reflect
superior language skills and logical thinking, whereas
advanced visual processes are attributed to a strong
right hemisphere. Ornstein puts it succinctly in The
Right Mind, 1997, when he links left hemisphere
dominance to text and right hemisphere dominance
to context. Neurologists tend to view hemispheric
asymmetry as an oversimplification of human intelligence,
yet the model is accepted widely enough to extract some
truth from these generalizations1.
Info pros, regardless of their preference for text
or context, should be aware of the growing availability
and acceptance of context-type infoviz tools. To ease
the transition from old to new models of display, most
current infoviz applications include text alongside
data. The infoviz concept of grouping large data sets
graphically even appears in text-only tools such as
Vivísimo and TripleHop's MatchPoint with results
grouped in categories by subject. However, word categories
do not show data trends as easily as visual clusters.
Infoviz is not mainstream. These days, likely as not,
the terms "infoviz" and "visualization" are used incorrectly.
For example, the title of a June 2003 Economist article
("Grokking the Infoviz") seems to imply that
infoviz is a sort of visual representation of cyberspace,
which of course it is not. As another example, a February
2004 searchenginewatch.com article described search
engines Vivísimo and Zapmeta as visual, presumably
because these two allow data visualization in ways other
than via the standard list2.
In fact, neither Vivísimo nor Zapmeta exemplifies
information visualization. Will infoviz ever go mainstream?
The impression of readers such as yourselves may help
determine its fate.
A Young Field
Visualization, according to Inxight CTO Ramao Rao,
is "an ingredient technology, not an application in
itself." Infoviz comprises 2-D visualization (including
GIS programs such as MapQuest), 3-D visualization (such
as the Visible Human Head Browser from the University
of Michigan), multidimensional data (such as the HomeFinder
application developed at the University of Maryland),
hierarchical data (as demonstrated by Inxight), and
temporal data (as displayed by Vision)3.
This article concentrates on 2-D visual interfaces to
the sort of information sources that constitute the
daily diet of the info pro.
It has been said that information visualization branched
off from scientific visualization only 10 or 15 years
ago. Infoviz may depict either physical or abstract
data, whereas visualization in science mostly depicts
physical data (see http://www/infoviz/net/E-zine/2003/num_112.htm).
One could maintain, however, that infoviz is more than
a decade old and that the present generation of interfaces
derive from spreadsheet software. A visible calculator
VisiCalc emerged in the late 1970s with
data organized in columns and rows. The point of the
spreadsheet was to offer an overview of the data. It
was so successful in the '70s that it became the incentive
for many to purchase a personal computer. In the late
1980s, Excel was one of the first programs released
for the Windows operating system4.
Only this spring in a telephone interview (dated 3/9/04),
anacubis product manager Paul Stefan named Excel as
one of his product's greatest competitors.
A trusted voice in the interdisciplinary infoviz arena
is that of Yale computer science professor emeritus
Edward Tufte. Professor Tufte discusses strategies to
reconcile display complexity and clarity in his books
such as The Visual Display of Quantitative Information (now
in its second edition). Creators of computer interfaces
have adapted these strategies. (See http://www.edwardtufte.
com/tufte/index for information on single-day
courses on the design of information.)
A more software-oriented approach appears in the writings
and graduate courses of University of Maryland Computer
Science Professor Ben Shneiderman. Professor Shneiderman
is advisory editor for the journal Information
Visualization,
established in 2002. Along with Ben Bederson, he edited The
Craft of Information Visualization: Readings and Reflections,
2003, a collection of papers by scholars in the field.
As Professor Shneiderman put it in the introduction
to The Craft of Information Visualization, the
point of infoviz is to "improve the experience of people
using computers, and make that experience more effective
and enjoyable." The economics of selling these software
packages leans on "effective." Infoviz companies aim
to add value by showing a large amount of data on an
information map that can be assimilated at a glance.
The challenge is to develop easy-to-learn systems.
Experimental and mathematical psychologist James Wise
has studied human color perception and determined that
hue alone does not make nearly as effective an impression
as does hue along with brightness and saturation of
color against a background. And color is only one component
of visual language. No language is self-evident, and
invariably learning a language involves dealing with
some cultural component5.
The testing ground for standardizing visual options
may be the desktop. The standard WIMP system (Windows,
Icons, Menus, and Pointers) has seen no significant
changes since the 1980s. The next version of Windows
scheduled for release in 2006 will employ an animated
desktop. If infoviz integrates toolbars with browsers
on the desktop, it might help conquer the learning curve
and make a market splash. A product which adopts the
visual language of Microsoft, such as the lowercase
"e" of Internet Explorer to symbolize a Web site, will
move ahead of the curve. Even so, the eventual dominance
of one infoviz product or another will depend upon the
effectiveness of its marketing as well as the product's
usefulness and how easy it is to learn.
A January 12, 2004, Wall Street Journal article
reported the expansion of the market for visualization
software. The article speculates that this might be
due to more powerful PCs that in turn allow more powerful
programs. Processing power was the limiting feature
on visualization as recently as 5 years ago. But in
today's tight economy, some executives low on staff
turn to infoviz software for overall corporate pictures
or sales analysis6. The Wall
Street Journal prediction is echoed by those
in the information industry. A January 2004 article
in Information Today quotes projections for
2004 from Clare Hart of Factiva and Allen Paschal,
formerly of Gale, that predict the increased use and
popularity of visualization tools (Information
Today, vol.
21, no. 1, January 2004, pp. 1, 13, 21—26, 29).
Information professionals have had mixed reactions
to infoviz. Our own Barbara Quint, editor of Searcher,
mentioned this spring that she hesitates to adopt infoviz
tools because of her preference for verbal over visual
presentation. Other left-brainers with similar preferences
should rest assured that text options accompany visuals
in the present generation of infoviz products. In an EContent article
from last year, Mary Ellen Bates wrote that she values
visualization tools to answer the more general questions
she receives on market trend analysis.(See "Search Show Offs," EContent,
vol. 26, no. 6, June 2003, p. 27.) A skeptical Stephen
Arnold wrote for Searcher ("In Search of . . . the Good
Search: The Invisible Elephant," Searcher, vol.
11, no. 3, March 2003, pp. 40-51) that he believes
such tools "won't do much more than turn off-point hits into
an interesting picture." He emphasized that we really
need better focus for off-point hits, preferably the
result of cleaning up data and improving metadata. His
remarks magnify the ideas of Donald Beagle, who declared
in his article "Visualization of Metadata" (Information
Technology and Libraries, vol. 18, no. 4, December
1999, http://www.lita.org/cfapps/archive.cfm?path=ital/1804_beagle.html)
that "visualization research has advanced further and
faster on the interface side than on the content side"
and that results will improve when metadata processing
improves.
But what kind of results do we seek? Do info pros
use search tools as quick reference agents to find specific
answers to pointed questions or for research to see
what we can find about a topic? Ron Miller remarked
in an April 2004 article on visual search: "...[S]trengths
lie in the research tool market as opposed to pure play,
search-and-find tools."7
Infoviz advertising pitches products at research rather
than quick reference, playing up the "information discovery"
qualities of "decision-making software." A few industries
have begun to see the utility of infoviz for very large
data sets. Karl Fast of the University of Ontario, who
gave an infoviz presentation at last year's ASIST conference,
may turn out to be right in his optimism for the future
of this industry. (For a transcript of "Information
Visualization: Failed Experiment or Future Revolution?,"
presented at the 5th Annual ASIST Information Architecture
Summit, February 27-29, 2004, see http://www.livingskies.com/writings/2004/ia-summit/.)
Infoviz in Action
Infoviz products can display results quickly when
words match words. That is, a text query might match
with object metadata rather than the query coursing
through the full text of a document or examine the object
itself. While speeding the search, this places a burden
on the accuracy of the metadata. Images generally include
metadata in text form to speed up the matching. Tools
are not yet sophisticated enough to match the term "apple"
to a red or green or yellow roundish form.
Currently there is an engine in development to search
three-dimensional objects in which the user sketches
what he seeks8.
The newness of the software adds to the complexity
of the industry. A single company may produce a range
of related products with versions of the flagship product
continually updated and improved. Young infoviz companies
share technologies among themselves. Furthermore, while
some infoviz tools include their own search mechanisms,
others work alongside search engines to uncover results
and then fit the results to its own categories and show
these categories visually. Such complexities, collaborations,
and partnerships should simplify over time.
Graphic environments differ with different infoviz
tools. KartOO and anacubis use a maplike space. Mooter
is very basic in its use of line. Touchgraph Technology
LLC, the basis for applications such as Inxight, TheBrain,
and ThinkMap, and the similar linear-looking Spotfire,
appears more like charts9.
Grokker and Fractal:PC rely on abstract shape and decorative
color. Panopticon uses a type of heat map in which colors
represent data records in rectangular "countries." Antarctica
uses color to show overlapping subjects. Autonomy and
Omniviz offer a selection of graphic environments.
Function varies as widely as graphic display. Spotfire
is used in academia and also by industrial chemists
and biologists. Fractal:Edge provides visual interfaces
to information providers, financial services, manufacturers,
utilities, and telecommunications. Enterprise products
such as Inxight, Nstein, Panopticon, Autonomy, and Antarctica
play to a larger corporate market.
I have limited this article's examples to KartOO,
Grokker, and anacubis, products that have small-business
or desktop versions and whose company representatives
made themselves available for interviewing. Though similar
in function, the three are not equivalent. KartOO v.
4 is dedicated to Web search; Grokker will look out
to the Web or into a local hard drive. anacubis will
search the Web or a local information system or drive
and can deftly combine different information sources
into a single view. To help readers compare products,
I submitted the same search term to all three and set
Grokker and anacubis to overlay Google (not an option
available for KartOO). These conditions allow a comparison
of categories and Web sites retrieved and show how relevancy
works. Please remember that despite the equivalent search,
we are in a functional sense comparing apples to oranges
in that each product has its own strongest applications
and best recipes for success.
KartOO
The French company KartOO, founded in 2001, specializes
in information retrieval, knowledge management, site monitoring,
and visual interfaces. The name suggests cartography and
art and to match the Internet double "oo" of Yahoo and
Google. The metasearch engine is open to a general audience,
while the genie that flashes as the engine works its magic
should appeal especially to children. The imagery changes
regularly according to the graphicist's ideas. Past Kartoons
are on view in an online gallery at http://www.kartoo.net/a/en/visuels03.html.
KartOO v. 4 was released in November 2003.
In a personal communication, marketing manager Alexandre
Dos Santos explained a major company goal:
For years now, WWW surfers have been disappointed
by the quality of search engine results. One day we
asked ourselves the question: what does a relevant result
mean? It all depends on the question and context. For
example, if a user types the word "car," what kind of
site do we have to offer? Pages about the car manufacturers,
rental agencies, and car collectors. A traditional search
engine returns all that in a linear list of numbered
sites, without asking more of the user. But if I'm looking
for rental car, I will be very disappointed to find
a list of manufacturers and collectors. We need to alter
the experience in two ways. First, [it must] yield results
by search themes and if possible to position them in
relation to each other, with links to the sites with
those themes. This is where the idea of the map comes
from. Second, we must help the user refine her search,
always remembering that she is not a Boolean mathematician
and that she shouldn't have to learn all the advanced
syntax.
The metasearch engine KartOO can query Viola, AlltheWeb,
AltaVista, MSN, Yahoo!, WiseNut, HotBot, Lycos, Nomade,
Toile de Québec, Exalead, Dmoz, Teoma, and LookSmart.
In default automatic search mode, KartOO bases its choice
of search engine on the language and syntax of the query
term(s). Otherwise, in manual mode, the user chooses
which search engine(s) KartOO should query. KartOO retrieves
results from the first page of an engine's display list
and from additional pages if necessary to meet a minimum
of 50 results for a map. KartOO then determines the
number of sites to be displayed based on score and relevance.
The graphics of the map are taken into memory to speed
display. Users may consult a history of searches, print
a map, send a link for a map to an e-mail address, or
download a KarTOOlbar to give easy access to the main
functions.
Figure 1 below shows a map for a search on "infoviz."
The larger the icon, the more relevant the site. Here
the icons are Web sites and the smaller adjoining icons
are pages from the same Web site. If a site has a word
in common with the query, the map may not contain that
word in order to save space. KartOO divides the first
1214 results into categories including (information)
visualization, graphics, topics, projects. Mouse over
the category "visualization" to see how it relates to
other categories and search results (see Figure 2 on
page 56).
Should you wish to delve into related search categories,
choose "> next map" on screen at the bottom
right to yield a new set of categories and related sites
(see Figure 3 on page 57).
Alternatively, change the result display from graphic
map to text list. Or keep the map and turn your attention
to the categories listed in the left-most column and
site addresses in the right-most column (Figure 3).
Grokker
Grokker's name comes from the invented verb "to grok"
to understand profoundly through intuition. The term
was coined by Robert Heinlein in his 1961 science fiction
story, Stranger in a Strange Land. The name of
the company, Groxis, is an elision of "grokking systems."
Grokker, just like KartOO, has been developed for
a general audience. According to CTO Jean-Michel Decombe,
"My goal is to make a beautiful system so that people
feel happy about using it" (personal communication).
To its credit, Grokker is not just another pretty interface.
Its colors organize document clusters on any general
subject.
The technology for information retrieval is built
in four layers, with the foundation layer holding data
graphed with nodes and links. Above is the acquisition
layer with plug-ins to allow the program to work with
different information sources. Above that is the augmentation
layer that adds categories to the data to enable clustering.
On top is the transformation and visualization layer
that uses metadata to filter results and then display
those results in a map.
Grokker 2.1 can retrieve results from several engines
at once Yahoo!, MSN, AltaVista, FAST, and WiseNut
and includes a plug-in for Google. The company
plans to start releasing plug-ins regularly to allow
users to query specialized databases such as LexisNexis.
It will also launch a beta version of a software development
kit that offers Grokker's APIs to those who wish to
build custom connections to other information sources.
Future developments aim to speed grokking. The long-term
plan is for the software to use sounds and sensations
to draw the user's attention to relevant documents.
Grokker may be used to search an information source
or local drive. In the example below, it overlays Google.
It takes only a few trial clicks to determine what the
control buttons do, so first-time users are unlikely
to need the "Learning the Basics" screen presented when
the program opens. The user chooses whether the map
should occupy half, all, or none of the screen. Results
display in a standard text list when users select the
option for the map to occupy none of the screen. The
visually inclined user sets text, range, color, and
site-type filters before entering a term into the search
box and clicking "grok." The grok might take a few seconds
if the search is in the computer's cache or up to a
minute for a phrase not attempted before. As the results
come in, the quivering colored balls that represent
categories appear one by one, jiggling and jostling
each other as they make way for themselves within the
spherical boundary. Larger-sized balls indicate either
more items in a category or the greater relevance of
a category to the search term.
A search on "infoviz" over Google with Grokker yielded
Figure 4 (see page 58).
Sixteen first-level categories come up for the 460
items retrieved; the whole subject hierarchy requires
415 categories. Visualization is the largest category:
the most relevant with the most results. Selecting the
Visualization category shows the action of zooming in
on that sphere (see Figure 5 on page 58).
This map thus enlarges the Visualization sphere in
the previous map, retaining the outline of the exterior
Infoviz sphere and the pea-green color in the scheme
as shown in the previous map. Each sphere represents
a category open for further mining. Squares represent
sites. Mouse over a site square to pull up an overview
of the site with name, description, location, domain,
source, and rank to help judge whether to jump to that
site. See for example, Figure 6, "Understanding Information
Collections with Maps and Visualizations," on page 59.
Maps may be shared or saved in .gxml format that allows
viewing only in Grokker. Version 2.2 works with Windows,
Mac OS, and Linux. A 30-day free download for the PC
is available on the Web with support available via e-mail.
anacubis
The i2 Group released investigative analysis software
13 years ago to aid national and international law enforcement
and intelligence organizations. Now anacubis, a privately
held subsidiary of the i2 Group, has a related product
designed specifically for applications in law and business:
mergers and acquisitions, risk management, competitive
intelligence, patent analysis, and other forms of market
research. The name anacubis derives from Analytical
Cubism, a painting style developed by Picasso and Braque
in which the different parts of the image are deconstructed
into their components and given equal ground.
anacubis Desktop 2.0, a visual research and analysis
product, was released this spring from $1,950 per seat.
Plug-ins offer analysis for specific applications such
as searching intellectual property and patents and are
sold at $750 per subscription. The second is a Web-based
version that has similar visual capabilities but less
analysis functionality and is the solution used for
displaying search results from Google and other information
sources mentioned below.
Both products match user criteria to metadata from
commercial information providers, Web sites, or in-house
data, consolidating data with filters and linguistic
analysis and clustering results using preset taxonomies.
Results display on an XML map in peacock, hierarchy,
or group form animated with a Java applet. Selected
information services from D&B, Hoover's, LexisNexis,
QuestelOrbit, and Google work with the anacubis
interface.
Using the Web-based version, the user sets the number
of sites for display using anacubis' Google-Enabled
Visual Search at http://www.anacubis.com/googledemo/google/index.asp,
a free demonstration released by anacubis to showcase
its visualization technology. Figure 7 (see page 60)
shows a peacock form result screen for "infoviz." Retrieved
objects in anacubis are termed entities and returned
as unexpanded icons. Here the user chose to expand the
golden "e" Web site for "Information Visualization at
Pacific Northwest National Laboratories." The URL displays
when the cursor hovers over the site, as Figure 7 shows.
Green lines indicate linked sites. The "expand linked
sites" command brings up sites linked to the site selected.
Alternatively, the user may bring up sites that are
similar but not necessarily linked to the sites selected.
Red lines indicate similarity, and the "expand similar
sites" command yields the Figure 8 screen (see page
60). The double colon or vertical bars that appear in
the site title are an anomaly of the publicly available
API provided by Googe and integrated by anacubis into
the visual search. The punctuation results from the
page title on the Web site concerned that may use characters
that the API does not understand.
The anacubis system does not organize results in intermediate
categories as do KartOO and Grokker. Instead, it offers
a "Find text" feature to search within a map. This helps
draw the eye to relevant results in maps with a large
number of entities. Enter "visualization" into the "Find
text" dialogue box.
As you can see in Figure 9 on page 61, the Find search
can direct the gaze to sites with "visualization" in
the title.
A left click on an entity produces another map with
sites on more specific topics; a right click produces
the option of visiting the Web page. The user may zoom
into or pan out of a view, add or delete links, move
nodes around on a map, or, as demonstrated above, search
within a map for specific entities or link labels. Internal
information can dragged onto anacubis Desktop.
The anacubis Desktop is available for a 10-day trial
at http://www.anacubis.com/products/desktop.
A free download called View Manager can reveal charts
created in the Desktop. The company recently announced
a new search demonstration based on Hoover's data at
http://www.anacubis.com/hoovers.
Both the Google and Hoover's searches are also integrated
into the anacubis Desktop as free information sources.
Conclusion
Infoviz software uses quantitative data to reveal
trends probably undetectable in raw textual or numerical
output. Content retrieval research lags behind visualization
to the extent that, at this point, queries and not graphics
limit the infoviz software's power.
Of the three products surveyed here, KartOO and Grokker
use generic taxonomies for general documents. Grokker
allows users to create their own categories to apply
the tool for more specific data sets. The KartOO genie
and Grokker animated spheres endear them to the young.
The anacubis Desktop, with its nuanced choices and specialized
taxonomy, is geared to those in business and finance.
Infoviz tools tend to be developed for a specific
application or audience and then expanded to a more
general application or wider audience. This approach
in the marketplace is considered slow diffusion rather
than killer app, just as the i2 Group began within law
enforcement and expanded into anacubis legal and business
applications.
When looking to widen the market, developers must
also consider the colorblind, the graphically challenged,
and most of us who instinctively resist a new look and
prefer the familiar. But with an animated Windows desktop
due for release in 2006, visual language might be borrowed
from Microsoft and the way paved for the public to encounter
graphical interfaces commonly. Today's infoviz market
picture is encouraging, even while it remains blurry
How does infoviz add value?
Comprehensive. Displays very large data sets
and affords a concise overall picture.
Context. Patterns and trends are displayed
that would not be otherwise discernable.
Colorful. Pleasing to the eye.
Endnotes
1 Robert E. Ornstein, The
Right Mind: Making Sense of the Hemispheres,
New York, 1997. Corballis, P.M., "Visuospatial processing
and the right-hemisphere interpreter." Brain Cognition,
vol. 53, no. 2, November, 2003, pp. 1716.
2 "Grokking the Infoviz" Economist, vol. 367, no. 8329, June 19, 2003. http://economist.com/science/tq/
displayStory.cfm?story_id=1841120.
Chris Sherman, "ZapMeta: A Promising New Meta
Search Engine," dated February 26, 2004, at http://searchenginewatch.com.
3 See "Making Information
More Accessible: A Survey of Information Visualization
Applications and Techniques" by Gary Geisler, last updated
January 31, 1998, at http://www.ils.unc.edu/~geisg/info/infovis/paper.html
and a list of information visualization software from
the University of Maryland at http://www.cs.umd.edu/hcil/pubs/products.shtml.
4 D. J. Power, "A
Brief History of Spreadsheets," DSSRe
sources.COM, World Wide Web http://dssresources.com/history/sshistory.html,
version 3.5, October 4, 2003.
5 On color, see James
A. Wise, "The Ecology of Colour," Inf@Vis!, No. 129,
September 15, 2003, at http://www.infovis.net/E-zine/2003/num_129.htm.
On visual language, see Juan C. Dürsteler, "Visual
Language," Info@Vis!, No. 120, May 5, 2003,
http://www.infovis.net/E-zine/2003/num_120.htm.
Also see dissertation by Yuri Engelhardt, the Language
of Graphics, 2002.
6 Jeanette Borzo,
"Get the Picture: In the Age of Information Overload,
Visualization Software Promises to Cut through the Clutter," Wall
Street Journal, January 12, 2004, p.R.4.
7 Ron Miller, "Get
the Picture: Visualizing the Future of Search," EContent,
vol. 27, no. 4, April 2004, p. 35.
8 Brian Bergstein,
"Researchers Develop 3D Search Engine," ExtremeTech,
April 16, 2004, http://www.extremetech.com/article2/0,1558,1569245,00.asp.
9 http://touchgraph.sourceforge.net/index.html.
TouchGraph has developed a Java browser for Google at
http://www.touchgraph.com/TGGoogleBrowser.html.
|