| 
                          
                            | FEATURE A Web-Based Database of CIA
                              Declassified Documents on the Vietnam War
 By Vinh-The Lam and Darryl Friesen
 |  
 During the
                            Vietnam War years (1960-1975), the U.S. government
                            generated a large volume of classified documents.
                          The declassification of these documents started with
                          Executive Order No. 11652 signed by President Richard
                          Nixon in 1972 [1]. Part of that executive order is
                          on the Web [www.fas.org/sgp/eprint/legacy_appendix.html].
                          Thousands of these documents, formerly classified as "Confidential," "Secret," and "Top
                          Secret," are being declassified, made public, and are
                          available for educational and research purposes. On
                          microfiche, the documents were published by Primary
                          Source Microfilm as Declassified Documents Reference
                          System (DDRS). The microfiche
                          are abstracted, indexed, and published in a bimonthly
                          periodical titled Declassified Documents Catalog (DDC).
                          The DDC is now also published as a CD-ROM
                          by Thomson Gale, while the DDRS is available
                          through subscription on the Internet at www.ddrs.psmedia.com/.
                          Recently, the Vietnam Center of Texas Tech University
                          in Lubbock, Texas, through its Virtual Vietnam Archive
                          (VVA) [www.vietnam.ttu.edu/virtualarchive/], began
                          providing access to a large number of full-text declassified
  documents. The Declassified CIA Documents on the Vietnam War database [http://library.usask.ca/Vietnam]  is the result of a sabbatical leave research project approved and supported
  by the University of Saskatchewan, Canada. It includes only declassified documents
  created by the U.S. Central Intelligence Agency (CIA). It provides an in-depth
  indexing of the CIA declassified documents and, where possible, also provides
  a link to the full-text documents available at the VVA and offers both simple
  and advanced search capabilities.
                          DATABASE STRUCTURE
                          Each record in the database contains the following
                          fields:
                         
                           Record Number: Automatically created by system.
                            Title: Title of document.
                            Date of Creation: Date document was created.
                            Date of Declassification: Date document was
                            declassified.
                            Type of Document: Type of document, e.g.,
                            Report, Memorandum, Cable, etc.
                            Level of Classification: Level of classification
                            of document before it was declassified; only four terms
                            will be used: CONFIDENTIAL, SECRET, TOP SECRET, and
                            NOT GIVEN.
                            Status of Copy: Status of copy of document;
                            only two terms will be used: ORIGINAL and SANITIZED.
                            Pagination: Number of pages and illustrations,
                            such as maps.
                            Abstract: Abstract of contents of document;
                            taken mostly from the CD-ROM published by the Thomson
                            Gale.
                            Indexing Terms: Controlled vocabulary (words,
                            phrases) describing topics presented in document.
                            DDRS Location: Document identifier showing
                            location of document in the Declassified Documents
                            Reference System.
                            Link to Full Text: If available, URL of document
                            available full text at the Web site of the VVA.
                                                    DOCUMENT INDEXING AND DATABASE CONTENTS
                          The main reason we created this database is the DDC's
                          lack of in-depth indexing. The very detailed indexing
                          provided by the Carollton Press for the Declassified
                          Documents Retrospective Collection, published
                          in 1976, was abandoned when Carollton Press began publishing
                          the Declassified Documents Quarterly Catalog,
                          which preceded the DDC. Research Publications
                          adopted this practice for the DDC. When
                          Primary Source Microfilm replaced Research Publications
                          as publisher of DDC, it continued this
                          practice. As a result, a very limited number of indexing
                          terms are used in the DDC:
                         
                           Vietnam
                                                    Armed Forces
                            Foreign relations with -   Politics and government
                            Religion
                            Vietnam, North
                                                    Commerce
                            Foreign relations with -   Military policy
                            Vietnam, South
                                                    Armed forces
                            Commerce
                            Commerce with -   Economic conditions
                            Foreign relations with -   Politics and government
                            Religion
                            Social conditions
                            
                                                    Vietnamese Conflict, 1961-1975
                                                    Campaigns
                            Missing in action
                            Peace negotiations
                            Prisoners of war
                            
                          Topical searches such as searches for personal names,
                          place-names, names of operations/battles, and titles
                          of U.S. and/or Vietnamese government projects/programs,
                          which would be very useful for Vietnam War scholars/researchers,
                          are impossible.
                          We decided, therefore, to provide an in-depth content
                          analysis of the documents. Full-text documents were
                          analyzed thoroughly page-by-page so that names of people
                          (politicians, military leaders), operations/battles,
                          military units (U.S., Allied, South Vietnamese, North
                          Vietnamese, Viet Cong divisions, regiments, battalions),
                          projects/programs, place-names (provinces, cities,
                          towns, valleys, mountains, rivers) could be picked
                          up and used as indexing terms.
                          For example, a search for the most important Communist
                          offensive of the war, the Tet Offensive, retrieves
                          a screen showing the number of results, the document's
                          ID number, and a hyperlinked document title. (See Figure
                          1 above.)
                          The search for Tet Offensive retrieves 63 documents;
                          one for the famous U.S. 101st Airborne Division yields
                          six; for Khe Sanh, location of the bloodiest battle
                          between the U.S. Marines and the North Vietnamese divisions,
                          28; and for General Duong Van "Big" Minh, leader of
                          the military coup that overthrew the Ngo Dinh Diem
                          government on November 1, 1963, 133 documents.
                          In addition to in-depth indexing, we also tried to
                          achieve consistency for indexing terms assigned to
                          records throughout the whole database in order to maximize
                          retrieval. It was decided to provide personal names
                          in the non-inverted formDuong Van Minh instead
                          of Minh, Duong Van, or Robert McNamara, not McNamara,
                          Robert. Since one of the co-authors is of Vietnamese
                          origin, we detected and corrected wrongly spelled Vietnamese
                          names in documents. South Vietnamese government program
                          titles were translated into English. Sometimes both
                          English and Vietnamese forms of the program titles,
                          if already familiar within the Vietnam War research
                          community, were used as equivalent indexing terms,
                          as with Returnee Program and Chieu Hoi Program. When
                          the database was populated with about 500 records,
                          we conducted a thorough review and revision of all
                          indexing terms to detect and correct typos and inconsistencies.
                          We did a second review/revision when the database reached
                          the 1,000-record level. The index now contains 3,461
                          terms and its complete listing is 101 pages long.
                          The database currently contains 1,080 records, 34
                          percent of which provide a link to the full-text documents
                          available online at the VVA. The documents analyzed
                          could go from one to a few hundred pages. These could
                          be a Memo, a Telegram, a Report (weekly, monthly, etc.),
                          a Situation Report (or SitRep), a Biographical Sketch,
                          a National Intelligence Estimate (or NIE), a Special
                          National Intelligence Estimate (or SNIE), or a Research
                          Study Report. Sometimes, when an important event was
                          occurring, such as the Tet Offensive, the CIA produced
                          Intelligence memos on a daily or even hourly basis.
                          (See Figure 2 on page 32.)
                          After the Johnson administration decided to send
                          combat troops to South Vietnam in 1965, the CIA produced
                          weekly and monthly reports, called "The Situation in
                          South Vietnam," in which details on political, military,
                          and economic situation of South Vietnam were given.
                          In the "Political Situation" sections, the reports
                          give detailed information of activities of the South
                          Vietnamese government, such as cabinet reshuffles,
                          inauguration/development/changes of government programs/projects,
                          and deliberations within the National Assembly. Also
                          included is information on activities of political
                          parties and their leaders, on rumors of possible coups,
                          and on local/provincial/national elections. In the "Military
                          Situation" sections, the reports give detailed account
                          of operations/battles engaging U.S., Allied, South
                          Vietnamese, North Vietnamese, and Viet Cong units,
                          as well as their casualties and weapon losses.
                          The Economic Situation sections report important
                          economic indicators. Examples are retail prices index
                          (especially prices of rice and pork) and weekly and
                          monthly prices of gold and currency in the Saigon free
                          market. About 150 such reports are now included in
                          the database. These reports are extremely useful for
                          researchers who want to draw a chronological picture
                          of South Vietnam during the war years, especially between
                          1965 and 1968. Another series of reports present monthly
                          evaluation of the cost-effectiveness of Operation Rolling
                          Thunder, which carried out the U.S. sustained
                          bombing of North Vietnam. Still another series details
                          the level of North Vietnamese Army infiltration into
                          South Vietnam. A close look at those reports, together
                          with the NIEs and SNIEs on Vietnam, will help database
                          users understand how U.S. policy on Vietnam was conceived
                          and implemented. A large number of these declassified
                          documents are sanitized, with source of information
                          and names of informants removed for protection purposes.
                          DATABASE DESIGN
                          We used Microsoft SQL Server 2000 as the database
                          server. In addition to being an outstanding relational
                          database server, the rich full-text search capabilities
                          it offers made it an excellent choice for this project.
                          The database consists of a single table, although
                          some normalization could have been done, especially
                          with respect to the indexing terms. However, considering
                          the few data elements in the table, the relatively
                          small number of documents indexed, and the strength
                          of the SQL server's search capabilities, we favored
                          a simple design.
                          All columns in the table, with the exception of Document
                          ID, are variable length character data (varchar). DocumentID
                          is an auto-incrementing integer value, managed by the
                          SQL Server, and used as the primary key for the table.
                          The CreationDate and DeclassificationDate fields were
                          initially standard SQL datetime data types, but had
                          to be changed to character fields because of a bug
                          in one of the underlying software components.
                          The following SQL statement was used to create the
                          table in the database:
                         
                           CREATE TABLE DeclassifiedDocuments (
                            DocumentID int IDENTITY (1, 1) NOT NULL ,
                            Title varchar(512) NOT NULL ,
                            CreationDate varchar(15) NULL ,
                            DeclassificationDate varchar(15) NULL ,
                            DocumentType varchar(254) NULL ,
                            ClassificationLevel varchar(50) NULL ,
                            CopyStatus varchar(10) NULL ,
                            Pagination varchar(100) NULL ,
                            Abstract varchar(8000) NULL ,
                            Descriptors varchar(8000) NULL ,
                            DDRS_Location varchar(50) NULL ,
                            URL varchar(254) NULL
                            )
                            All access to the database, including data entry
                          and other administrative functions, is done using a
                          Web browser. The Web-based user interface was written
                          in the PHP programming language. PHP has experienced
                          a rapid growth in popularity in recent years, due in
                          part to its excellent handling of textual data (such
                          as the data sent and received via Web browsers and
                          server) and database support.
                          The Web server is a Sun UltraEnterprise 2 server
                          running the Solaris 8 operating system and a recent
                          version of the Apache Web server software. An open
                          source product called FreeTDS allows the Unix Web server
                          to communicate with the Microsoft SQL Server directly
                          using the Tabular Data Stream (TDS) protocol. TDS is
                          the native protocol used by Microsoft and Sybase for
                          their database products. Although still somewhat a
                          fledgling product, FreeTDS is a workable solution for
                          establishing connectivity between UNIX machines and
                          Microsoft or Sybase database servers.
                          Administrative functionsadding, modifying and
                          deleting recordsare also performed using the
                          Web browser. The administrative features, located in
                          a secure area on the Web server, are password-protected.
                          The administrative interface closely resembles the
                          public view, with the addition of links in both the
                          brief and full records display that allow the document
                          to be easily modified or deleted. A new record can
                          be added by simply clicking the "New Record" button
                          located at the top of the screen. In addition to being
                          able to browse or search for records requiring modification,
                          a quick edit feature, located in the upper right, is
                          available for documents for which the Document ID number
                          is known. A simple Web-based form is used for data
                          entry and record editing (See Figure 3 above.)
                          DATABASE NAVIGATION
                          The default view is an alphabetical listing of all
                          indexed documents, shown in a brief record format.
                          Included in the brief citation is document title, creation
                          date, declassification date, type of document, level
                          of classification, and status of copy.
                          The number of documents displayed in the brief format
                          is limited to 50 per page. A drop-down menu provides
                          easy access to all indexed documents.. Limiting the
                          display in this manner, rather than listing all 1,080
                          documents at once, significantly decreases the time
                          it takes a Web browser to load the page and, from a
                          usability point of view, increases the functionality
                          of the database.
                          Clicking the title brings up a full record display
                          for the selected document. Included in this view are
                          fields not shown in the brief display, including pagination,
                          abstract, indexing terms, DDRS location, and URL. (See
                          Figure 4 below.)
                          The URL, if present, will link to the full text of
                          the document in Virtual Vietnam Archive. Indexing terms
                          are also hyperlinked, and clicking one term will return
                          all documents sharing that indexing term.
                          RETRIEVAL MECHANISM
                          Microsoft SQL Server 2000 allows full-text indexes
                          to be defined on selected columns in a table. This
                          permits complex searches to be executed against any
                          of the columns in the index, or all the columns at
                          once. Boolean operators, phrase searching, word stemming,
                          weighting, proximity searching, and wildcard operators
                          are all supported. Unlike traditional indexes defined
                          on columns, SQL Server full-text indexes reside outside
                          the database on the server's local file system. Thus,
                          additional steps must be taken to populate them.
                          Index population can be scheduled to occur at any
                          time. In the case of this database, a full population
                          and rebuilding of the index occurs once a week during
                          off hours (5 a.m. Saturday morning), and an incremental
                          population happens hourly during the period when data
                          entry might normally occur (weekdays between 7 a.m.
                          and 7 p.m.). This schedule ensures that the full-text
                          index is up-to-date with any additions or changes to
                          database records. The document title, document type,
                          level of classification, status of copy, creation date,
                          declassification date, and abstract and indexing term
                          fields are all included in the full-text index.
                          The full-text index is utilized by both simple keyword
                          and advanced searches. In the case of the simple keyword
                          search, located conveniently at the top of almost every
                          page, all terms entered by the user are joined with
                          the Boolean AND operator and a search is performed
                          across all fields in the full-text index.
                          The advanced search is quite powerful, allowing for
                          more control both in terms of what is searched for
                          and how, as well as the limits that are applied. Complex
                          queries in which specific phrases are combined with
                          a list of terms, all of which must appear, limited
                          by classification level, copy status, full-text availability,
                          and date range limits for both the document creation
                          and declassification dates can be constructed.
                          Refining the search using theses advanced options
                          decreases the result set from 46 (for a simple keyword
                          search on TET OFFENSIVE) to five. (See Figure 5 at
                          left.)
                          The search terms themselves can be considered optional,
                          and queries making use of just the limiting features
                          are acceptable.
                          This online database was designed to provide an efficient
                          tool for Vietnam War scholars/researchers to search
                          for Declassified CIA Documents on various specific
                          topics, with some possibility to retrieve full-text
                          documents. (See Figure 6 below.) The Web-based user
                          interface, written in the PHP programming language,
                          provides users with an easy and smooth database for
                          searching, retrieval, and navigation. As CIA classified
                          documents continue to be declassified, and with a firm
                          commitment from the University of Saskatchewan Library,
                        this database will continue to grow.                        
                         
                          
                            | Notes
                                [1] Morehead, Joe and Mary Fetzer. Introduction
                                    to United States Government Information Sources. 4th
                                    ed. Englewood, Colo.: Libraries Unlimited,
                            1992. p. 376.  |                                                                              
                         
 Vinh-The
                              Lam [vinhthe.lam@usask.ca] is librarian cataloguer,
                              Technical Services Division, University of Saskatchewan
                              Library and Darryl Friesen [darryl.friesen@usask.ca]
                              is programmer analyst, Information Technology Services
                              Division, University of Saskatchewan Library.
                          Comments? E-mail letters to the editor to marydee@xmission.com.  |