Sunday, March 29, 2009

Microsoft Office Vulnerability Research

First step towards looking for possible Security Bugs in Microsoft Office Suite is to understand the file format used.

Microsoft Office File Format Internals: A given MS Office document is organized internally using OLE Structure Storage. OLE Structured Storage is defined as a systematic organization of components of any MS Office document. Each document has a root component which contains storage and stream components. The OLE Structured Storage is synonymous with the file system structure, such that 'storage' components are equivalent to directories and 'stream' components are equivalent to files. A storage component may exist as a standalone component. Each storage component may have one or more sub-storage components and stream components. Also the root component may have stream components directly within it.

The actual implementation details are defined in The Windows Compound Binary File Format specification.

Most of my research on MS Office File Format was conducted using the Ruby OLE library which allows easy and abstract read-write on the various streams and storages packed in the internal OLE structures. Install the Ruby-OLE gem before trying out any of the examples below.


Dumping the OLE structure of a given word document:
user@sigsegv$ oletool --tree sample2.doc
- #<Dirent:"Root Entry">
|- #<Dirent:"1Table" size=34907 data="^\004\032\000\022...">
|- #<Dirent:"\001CompObj" size=121 data="\001\000\376\377\003...">
|- #<Dirent:"MsoDataStore">
| \- #<Dirent:"F\303\223\303\216\303\226U\303\2261\303\2305U4\303\217\303\2201BEKP\303\235N\303\203\303\200==">
| |- #<Dirent:"Item" size=216 data="<b:So...">
| \- #<Dirent:"Properties" size=341 data="<?xml...">
|- #<Dirent:"WordDocument" size=15429 data="\354\245\301\000}...">
|- #<Dirent:"\005SummaryInformation" size=4096 data="\376\377\000\000\005...">
\- #<Dirent:"\005DocumentSummaryInformation" size=4096 data="\376\377\000\000\005...">

Sample code to display the size of the WordDocument stream inside a doc file:


require 'rubygems'
require 'ole/storage'

ole ="sample2.doc")
buf ="/WordDocument")

puts "WordDocument stream size: #{buf.size}"

Sample code to display only the text part of a doc file:

require 'rubygems'
require 'ole/storage'
require 'lib/fib'

if __FILE__ == $0
if ARGV.size != 1

ole =[0])
docbuf ="/WordDocument")

fib = Word::FIB.load(ole)
off_start = fib.fcMin
off_end = fib.fcMac

puts "Text Offset start: #{off_start}"
puts "Text offset end: #{off_end}"

text = docbuf[off_start, off_end - off_start]
puts text.inspect
Reverse Engineering a Microsoft Office Patch: The patches against Microsoft Office Suite as distributed by Microsoft usually consists of self extractable MSP or MSI packages extracting which is not exactly same as that of other patches.


After fetching the patch installer executable, the first thing to do is to have to the installer extract the MSI/MSP installer programs:
officexp-KB-XXX.exe /C /T:e:\ms08-042-extracted\
The above command will extract the actual patch installer files to e:\ms08-042-extracted\ directory. Among the extracted files, there will be an MSI or MSP file which is the main patch installer program.


The MSI/MSP files are special OLE structured installer programs. Details can be found here, here. There is also an utility for extracting MSI/MSP files here.
msix.exe WINWORD.msp /out:e:\ms08-042-extracted\ /ext
This should extract all the table data and other relevant information along with a CAB file containing the actual patch binaries which we are interested in. Find the CAB file among the extracted files and extract it normally using WinZIP/WinRAR etc. and BANG!

Bug Hunting: A good number of bugs, including theoretically Security Vulnerabilities where discovered using very trivial bit-byte alteration fuzzing of various structures including the File Information Block (FIB) in Word Documents, random structures in the TableStream etc. There are a no. of structures in the File Formats particularly the Word File Format whose sizes are also read from the document itself, these areas can be good vectors for fuzzing particularly when there are multiple structure load from file with size value read from the file itself.


  1. Thank you for the article.
    It would be useful for everyone to know how to check a file for vulnerabilities. By the way, some companies started using cloud solutions for the business, this allows going without MS Office. And some, as far as I know use the best virtual data room services for performing the same operations.

  2. Using external cloud offerings can yield even more pronounced savings. Some executives cite examples of 60 to 70 percent savings by replacing custom-developed internal applications with software-as-a-service alternatives sourced from the public cloud.
    electronic data room due diligence