TL;DR News: Feedly:SANS Internet Storm Center, InfoCON: green. Malicious RTF Files, (Fri, Jul 29th)

Friday, July 29, 2016

Feedly:SANS Internet Storm Center, InfoCON: green. Malicious RTF Files, (Fri, Jul 29th)

from SANS Internet Storm Center, InfoCON: green

About a year ago I received RTF samples that I could not analyze with RTFScan or rtfobj (FYI: Philippe Lagadec has improved rtfobj.py significantly since then). So I started to write my own RTF analysis tool (rtfdump), but I was not satisfied enough with the way I presented the analysis result to warrant a release of my tool. Last week, I started analyzing new samples and updating my tool. I released it, and show how I analyze sample 07884483f95ae891845caf0d50ce507f in this diary entry.

This sample is an heavily obfuscated RTF file. RTF files are essentially sets of nested strings that start with { and end with }. Like this (strongly simplified):

{\rtf {data {more data}}}.

Malicious RTF files contain a payload. Objects in RTF files are embedded in hexadecimal, like this (strongly simplified):
{\rtf {data
{\*\objdata
01050000
02000000
08000000
46696C656E616D6500000000000000...
}}}

Malicious RTF files obfuscate the hexadecimal data in many ways, one of them is to put extra control strings inside the hexadecimal data, like this:
{\rtf {data
{\*\objdata
01050000
02000000
08000000
46696C656E61{\obj}6D6500000000000000...
}}}

The sample I analyzed takes this to the extreme. After each hexadecimal digit, extra control strings and whitespace are inserted:

(I removed a lot of whitespace to be able to put several hexadecimal digits on the screen).
The hexadecimal digits (highlighted in red) are 01050…

My tool outputs a line of analysis data for each nested string. In this sample, because of the obfuscation, there are a lot of them (22956, which is gigantic for an RTF file).

But you can reduce the output by filtering for entries that (potentially) contain an embedded object using option -f O:

Entry 165 is the one we will take a closer look at first. The information presented for entry 165 is the following: the nesting level is 4, it has 1 child (c=), starts at position 2ae5 in the file (p=), is 1194952 bytes long (l=), has 11429 hexadecimal digits (h=), has no \bin entries (b=), contains an embedded object (O), has 1 unknown character (u=) and is named \*\objdata133765.

We can select entry 165 for closer analysis:

I highlighted the hexademical digits in red.

To decode the hexadecimal data, we use option -H:

You can see the hex data clearly now: 01 05 00 ...

Since this is an embedded object, we use option -i to get more info on the object:

From the magic header, we see that the embedded object is an OLE file (FYI: if we analyze it with oledump, we get parsing errors).

Looking further into the stream (-H), we see stream entries in the output:

And a bit further, we even find a URL:

Taking a closer look, I don't only see a URL, but hex data that looks like shellcode.

We can select this shellcode by cutting if out of the stream (option -c):

And of course also dump it to a file (option -d), so that we scan analyze it with the shellcode analyzer from libemu:

So this RTF file is a downloader.

The presence of shellcode in an RTF file is often an indication of an exploit. rtfdump supports YARA (like many of my *dump tools):

The first YARA search doesn't find anything. But the second search with option -H (to decode the hexadecimal content to binary) has hits for my RTF_ListView2_CLSID YARA rule. This indicates that entry 165 contains a byte sequence for the ListView2 classid, so this is very likely an exploit for vulnerability CVE-2012-0148 in this ListView.

The set of samples I looked at last week are characterized by the following properties:

they start with {\rtfMETAX

they end with this:

If you have interesting tools or techniques to analyze RTF files, please post a comment.

Didier Stevens
Microsoft MVP Consumer Security
blog.DidierStevens.com DidierStevensLabs.com