Mr Gently Benevolent
05-14-2004, 10:52 AM
For all you intel buffs and OCR geeks like me.
Researches say they've devised a method of probabalistically identifying blacked out words in censored documents, based on the pixel length of the redacted text. Governments everywhere are presumably instructing FOI censors to switch to monospaced fonts.
The researchers, David Naccache, the director of an information security lab for Gemplus, a Luxembourg-based maker of banking and security cards, and Claire Whelan, a computer science graduate student at Dublin City University, also applied the technique to a confidential Defense Department memorandum on Iraqi military use of Hughes helicopters.
They said that although the name of a country had been blacked out in that memorandum, their software showed that it was highly likely the document named South Korea as having helped the Iraqis.
By realigning the document, it was possible to use another program Claire Whelan had written to determine that it had been formatted in the Arial font. Next, they found the number of pixels that had been blacked out in the sentence: "An Egyptian Islamic Jihad (EIJ) operative told an xxxxxxxx service at the same time that Bin Ladin was planning to exploit the operative's access to the U.S. to mount a terrorist strike." They then used a computer to determine the pixel length of words in the dictionary when written in the Arial font.
The program rejected all of the words that were not within three pixels of the length of the word that was probably under the blacked-out area in the document.
The software then reduced the number of possible words to just seven from 1,530 by using semantic guidelines, including the grammatical context. The researchers selected the word "Egyptian" from the seven possible words, rejecting "Ukrainian" and "Ugandan," because those countries would be less likely to have such information.
More info at cryptome.org
http://cryptome.org/cia-decrypt.htm
I don't know when the full paper will be posted but it could be worth while looking at the Gemplus site sometime in the next few months.
http://www.gemplus.com/smart/r_d/publications/
Researches say they've devised a method of probabalistically identifying blacked out words in censored documents, based on the pixel length of the redacted text. Governments everywhere are presumably instructing FOI censors to switch to monospaced fonts.
The researchers, David Naccache, the director of an information security lab for Gemplus, a Luxembourg-based maker of banking and security cards, and Claire Whelan, a computer science graduate student at Dublin City University, also applied the technique to a confidential Defense Department memorandum on Iraqi military use of Hughes helicopters.
They said that although the name of a country had been blacked out in that memorandum, their software showed that it was highly likely the document named South Korea as having helped the Iraqis.
By realigning the document, it was possible to use another program Claire Whelan had written to determine that it had been formatted in the Arial font. Next, they found the number of pixels that had been blacked out in the sentence: "An Egyptian Islamic Jihad (EIJ) operative told an xxxxxxxx service at the same time that Bin Ladin was planning to exploit the operative's access to the U.S. to mount a terrorist strike." They then used a computer to determine the pixel length of words in the dictionary when written in the Arial font.
The program rejected all of the words that were not within three pixels of the length of the word that was probably under the blacked-out area in the document.
The software then reduced the number of possible words to just seven from 1,530 by using semantic guidelines, including the grammatical context. The researchers selected the word "Egyptian" from the seven possible words, rejecting "Ukrainian" and "Ugandan," because those countries would be less likely to have such information.
More info at cryptome.org
http://cryptome.org/cia-decrypt.htm
I don't know when the full paper will be posted but it could be worth while looking at the Gemplus site sometime in the next few months.
http://www.gemplus.com/smart/r_d/publications/