Title

Identification of Sensitive Unclassified Information

Document Type

Chapter

Abstract

Sensitive Unclassified information is defined as any unclassified information that may cause adverse consequences against the government facilities. In this chapter, we explore the use of categorization techniques and information extraction to discover this kind of information in scanned documents.

We show here that the combined use of a K-Dependence Bayesian categorization engine and a semi-automated review application reduce by nearly 95% the number of man hours required to redact sensitive unclassified information. We also discuss and provide statistics on how OCR errors can affect the information extraction tasks.

Disciplines

Electrical and Computer Engineering | Engineering

Permissions

Use Find in Your Library, contact the author, or interlibrary loan to garner a copy of the item. Publisher policy does not allow archiving the final published version. If a post-print (author's peer-reviewed manuscript) is allowed and available, or publisher policy changes, the item will be deposited.