FEATURE
A Web-Based Database of CIA
Declassified Documents on the Vietnam War
By Vinh-The Lam and Darryl Friesen |
During the
Vietnam War years (1960-1975), the U.S. government
generated a large volume of classified documents.
The declassification of these documents started with
Executive Order No. 11652 signed by President Richard
Nixon in 1972 [1]. Part of that executive order is
on the Web [www.fas.org/sgp/eprint/legacy_appendix.html].
Thousands of these documents, formerly classified as "Confidential," "Secret," and "Top
Secret," are being declassified, made public, and are
available for educational and research purposes. On
microfiche, the documents were published by Primary
Source Microfilm as Declassified Documents Reference
System (DDRS). The microfiche
are abstracted, indexed, and published in a bimonthly
periodical titled Declassified Documents Catalog (DDC).
The DDC is now also published as a CD-ROM
by Thomson Gale, while the DDRS is available
through subscription on the Internet at www.ddrs.psmedia.com/.
Recently, the Vietnam Center of Texas Tech University
in Lubbock, Texas, through its Virtual Vietnam Archive
(VVA) [www.vietnam.ttu.edu/virtualarchive/], began
providing access to a large number of full-text declassified
documents. The Declassified CIA Documents on the Vietnam War database [http://library.usask.ca/Vietnam] is the result of a sabbatical leave research project approved and supported
by the University of Saskatchewan, Canada. It includes only declassified documents
created by the U.S. Central Intelligence Agency (CIA). It provides an in-depth
indexing of the CIA declassified documents and, where possible, also provides
a link to the full-text documents available at the VVA and offers both simple
and advanced search capabilities.
DATABASE STRUCTURE
Each record in the database contains the following
fields:
Record Number: Automatically created by system.
Title: Title of document.
Date of Creation: Date document was created.
Date of Declassification: Date document was
declassified.
Type of Document: Type of document, e.g.,
Report, Memorandum, Cable, etc.
Level of Classification: Level of classification
of document before it was declassified; only four terms
will be used: CONFIDENTIAL, SECRET, TOP SECRET, and
NOT GIVEN.
Status of Copy: Status of copy of document;
only two terms will be used: ORIGINAL and SANITIZED.
Pagination: Number of pages and illustrations,
such as maps.
Abstract: Abstract of contents of document;
taken mostly from the CD-ROM published by the Thomson
Gale.
Indexing Terms: Controlled vocabulary (words,
phrases) describing topics presented in document.
DDRS Location: Document identifier showing
location of document in the Declassified Documents
Reference System.
Link to Full Text: If available, URL of document
available full text at the Web site of the VVA.
DOCUMENT INDEXING AND DATABASE CONTENTS
The main reason we created this database is the DDC's
lack of in-depth indexing. The very detailed indexing
provided by the Carollton Press for the Declassified
Documents Retrospective Collection, published
in 1976, was abandoned when Carollton Press began publishing
the Declassified Documents Quarterly Catalog,
which preceded the DDC. Research Publications
adopted this practice for the DDC. When
Primary Source Microfilm replaced Research Publications
as publisher of DDC, it continued this
practice. As a result, a very limited number of indexing
terms are used in the DDC:
Vietnam
Armed Forces
Foreign relations with -
Politics and government
Religion
Vietnam, North
Commerce
Foreign relations with -
Military policy
Vietnam, South
Armed forces
Commerce
Commerce with -
Economic conditions
Foreign relations with -
Politics and government
Religion
Social conditions
Vietnamese Conflict, 1961-1975
Campaigns
Missing in action
Peace negotiations
Prisoners of war
Topical searches such as searches for personal names,
place-names, names of operations/battles, and titles
of U.S. and/or Vietnamese government projects/programs,
which would be very useful for Vietnam War scholars/researchers,
are impossible.
We decided, therefore, to provide an in-depth content
analysis of the documents. Full-text documents were
analyzed thoroughly page-by-page so that names of people
(politicians, military leaders), operations/battles,
military units (U.S., Allied, South Vietnamese, North
Vietnamese, Viet Cong divisions, regiments, battalions),
projects/programs, place-names (provinces, cities,
towns, valleys, mountains, rivers) could be picked
up and used as indexing terms.
For example, a search for the most important Communist
offensive of the war, the Tet Offensive, retrieves
a screen showing the number of results, the document's
ID number, and a hyperlinked document title. (See Figure
1 above.)
The search for Tet Offensive retrieves 63 documents;
one for the famous U.S. 101st Airborne Division yields
six; for Khe Sanh, location of the bloodiest battle
between the U.S. Marines and the North Vietnamese divisions,
28; and for General Duong Van "Big" Minh, leader of
the military coup that overthrew the Ngo Dinh Diem
government on November 1, 1963, 133 documents.
In addition to in-depth indexing, we also tried to
achieve consistency for indexing terms assigned to
records throughout the whole database in order to maximize
retrieval. It was decided to provide personal names
in the non-inverted formDuong Van Minh instead
of Minh, Duong Van, or Robert McNamara, not McNamara,
Robert. Since one of the co-authors is of Vietnamese
origin, we detected and corrected wrongly spelled Vietnamese
names in documents. South Vietnamese government program
titles were translated into English. Sometimes both
English and Vietnamese forms of the program titles,
if already familiar within the Vietnam War research
community, were used as equivalent indexing terms,
as with Returnee Program and Chieu Hoi Program. When
the database was populated with about 500 records,
we conducted a thorough review and revision of all
indexing terms to detect and correct typos and inconsistencies.
We did a second review/revision when the database reached
the 1,000-record level. The index now contains 3,461
terms and its complete listing is 101 pages long.
The database currently contains 1,080 records, 34
percent of which provide a link to the full-text documents
available online at the VVA. The documents analyzed
could go from one to a few hundred pages. These could
be a Memo, a Telegram, a Report (weekly, monthly, etc.),
a Situation Report (or SitRep), a Biographical Sketch,
a National Intelligence Estimate (or NIE), a Special
National Intelligence Estimate (or SNIE), or a Research
Study Report. Sometimes, when an important event was
occurring, such as the Tet Offensive, the CIA produced
Intelligence memos on a daily or even hourly basis.
(See Figure 2 on page 32.)
After the Johnson administration decided to send
combat troops to South Vietnam in 1965, the CIA produced
weekly and monthly reports, called "The Situation in
South Vietnam," in which details on political, military,
and economic situation of South Vietnam were given.
In the "Political Situation" sections, the reports
give detailed information of activities of the South
Vietnamese government, such as cabinet reshuffles,
inauguration/development/changes of government programs/projects,
and deliberations within the National Assembly. Also
included is information on activities of political
parties and their leaders, on rumors of possible coups,
and on local/provincial/national elections. In the "Military
Situation" sections, the reports give detailed account
of operations/battles engaging U.S., Allied, South
Vietnamese, North Vietnamese, and Viet Cong units,
as well as their casualties and weapon losses.
The Economic Situation sections report important
economic indicators. Examples are retail prices index
(especially prices of rice and pork) and weekly and
monthly prices of gold and currency in the Saigon free
market. About 150 such reports are now included in
the database. These reports are extremely useful for
researchers who want to draw a chronological picture
of South Vietnam during the war years, especially between
1965 and 1968. Another series of reports present monthly
evaluation of the cost-effectiveness of Operation Rolling
Thunder, which carried out the U.S. sustained
bombing of North Vietnam. Still another series details
the level of North Vietnamese Army infiltration into
South Vietnam. A close look at those reports, together
with the NIEs and SNIEs on Vietnam, will help database
users understand how U.S. policy on Vietnam was conceived
and implemented. A large number of these declassified
documents are sanitized, with source of information
and names of informants removed for protection purposes.
DATABASE DESIGN
We used Microsoft SQL Server 2000 as the database
server. In addition to being an outstanding relational
database server, the rich full-text search capabilities
it offers made it an excellent choice for this project.
The database consists of a single table, although
some normalization could have been done, especially
with respect to the indexing terms. However, considering
the few data elements in the table, the relatively
small number of documents indexed, and the strength
of the SQL server's search capabilities, we favored
a simple design.
All columns in the table, with the exception of Document
ID, are variable length character data (varchar). DocumentID
is an auto-incrementing integer value, managed by the
SQL Server, and used as the primary key for the table.
The CreationDate and DeclassificationDate fields were
initially standard SQL datetime data types, but had
to be changed to character fields because of a bug
in one of the underlying software components.
The following SQL statement was used to create the
table in the database:
CREATE TABLE DeclassifiedDocuments (
DocumentID int IDENTITY (1, 1) NOT NULL ,
Title varchar(512) NOT NULL ,
CreationDate varchar(15) NULL ,
DeclassificationDate varchar(15) NULL ,
DocumentType varchar(254) NULL ,
ClassificationLevel varchar(50) NULL ,
CopyStatus varchar(10) NULL ,
Pagination varchar(100) NULL ,
Abstract varchar(8000) NULL ,
Descriptors varchar(8000) NULL ,
DDRS_Location varchar(50) NULL ,
URL varchar(254) NULL
)
All access to the database, including data entry
and other administrative functions, is done using a
Web browser. The Web-based user interface was written
in the PHP programming language. PHP has experienced
a rapid growth in popularity in recent years, due in
part to its excellent handling of textual data (such
as the data sent and received via Web browsers and
server) and database support.
The Web server is a Sun UltraEnterprise 2 server
running the Solaris 8 operating system and a recent
version of the Apache Web server software. An open
source product called FreeTDS allows the Unix Web server
to communicate with the Microsoft SQL Server directly
using the Tabular Data Stream (TDS) protocol. TDS is
the native protocol used by Microsoft and Sybase for
their database products. Although still somewhat a
fledgling product, FreeTDS is a workable solution for
establishing connectivity between UNIX machines and
Microsoft or Sybase database servers.
Administrative functionsadding, modifying and
deleting recordsare also performed using the
Web browser. The administrative features, located in
a secure area on the Web server, are password-protected.
The administrative interface closely resembles the
public view, with the addition of links in both the
brief and full records display that allow the document
to be easily modified or deleted. A new record can
be added by simply clicking the "New Record" button
located at the top of the screen. In addition to being
able to browse or search for records requiring modification,
a quick edit feature, located in the upper right, is
available for documents for which the Document ID number
is known. A simple Web-based form is used for data
entry and record editing (See Figure 3 above.)
DATABASE NAVIGATION
The default view is an alphabetical listing of all
indexed documents, shown in a brief record format.
Included in the brief citation is document title, creation
date, declassification date, type of document, level
of classification, and status of copy.
The number of documents displayed in the brief format
is limited to 50 per page. A drop-down menu provides
easy access to all indexed documents.. Limiting the
display in this manner, rather than listing all 1,080
documents at once, significantly decreases the time
it takes a Web browser to load the page and, from a
usability point of view, increases the functionality
of the database.
Clicking the title brings up a full record display
for the selected document. Included in this view are
fields not shown in the brief display, including pagination,
abstract, indexing terms, DDRS location, and URL. (See
Figure 4 below.)
The URL, if present, will link to the full text of
the document in Virtual Vietnam Archive. Indexing terms
are also hyperlinked, and clicking one term will return
all documents sharing that indexing term.
RETRIEVAL MECHANISM
Microsoft SQL Server 2000 allows full-text indexes
to be defined on selected columns in a table. This
permits complex searches to be executed against any
of the columns in the index, or all the columns at
once. Boolean operators, phrase searching, word stemming,
weighting, proximity searching, and wildcard operators
are all supported. Unlike traditional indexes defined
on columns, SQL Server full-text indexes reside outside
the database on the server's local file system. Thus,
additional steps must be taken to populate them.
Index population can be scheduled to occur at any
time. In the case of this database, a full population
and rebuilding of the index occurs once a week during
off hours (5 a.m. Saturday morning), and an incremental
population happens hourly during the period when data
entry might normally occur (weekdays between 7 a.m.
and 7 p.m.). This schedule ensures that the full-text
index is up-to-date with any additions or changes to
database records. The document title, document type,
level of classification, status of copy, creation date,
declassification date, and abstract and indexing term
fields are all included in the full-text index.
The full-text index is utilized by both simple keyword
and advanced searches. In the case of the simple keyword
search, located conveniently at the top of almost every
page, all terms entered by the user are joined with
the Boolean AND operator and a search is performed
across all fields in the full-text index.
The advanced search is quite powerful, allowing for
more control both in terms of what is searched for
and how, as well as the limits that are applied. Complex
queries in which specific phrases are combined with
a list of terms, all of which must appear, limited
by classification level, copy status, full-text availability,
and date range limits for both the document creation
and declassification dates can be constructed.
Refining the search using theses advanced options
decreases the result set from 46 (for a simple keyword
search on TET OFFENSIVE) to five. (See Figure 5 at
left.)
The search terms themselves can be considered optional,
and queries making use of just the limiting features
are acceptable.
This online database was designed to provide an efficient
tool for Vietnam War scholars/researchers to search
for Declassified CIA Documents on various specific
topics, with some possibility to retrieve full-text
documents. (See Figure 6 below.) The Web-based user
interface, written in the PHP programming language,
provides users with an easy and smooth database for
searching, retrieval, and navigation. As CIA classified
documents continue to be declassified, and with a firm
commitment from the University of Saskatchewan Library,
this database will continue to grow.
Notes
[1] Morehead, Joe and Mary Fetzer. Introduction
to United States Government Information Sources. 4th
ed. Englewood, Colo.: Libraries Unlimited,
1992. p. 376. |
Vinh-The
Lam [vinhthe.lam@usask.ca] is librarian cataloguer,
Technical Services Division, University of Saskatchewan
Library and Darryl Friesen [darryl.friesen@usask.ca]
is programmer analyst, Information Technology Services
Division, University of Saskatchewan Library.
Comments? E-mail letters to the editor to marydee@xmission.com. |