There has been a dramatic shift in the platforms targeted by attackers over the past few years. Up until 2016, browsers tended to be the most common attack vector to exploit and infect machines but now Microsoft Office applications are preferred, according to a report published here during March 2019. Increasing use of Microsoft Office as a popular exploitation target poses an interesting security challenge. Apparently, weaponized documents in email attachments are a top infection vector.
Object Linking and Embedding (OLE), a technology based on Component Object Model (COM), is one of the features in Microsoft Office documents which allows the objects created in other Windows applications to be linked or embedded into documents, thereby creating a compound document structure and providing a richer user experience. OLE has been massively abused by attackers over the past few years in a variety of ways. OLE exploits in the recent past have been observed either loading COM objects to orchestrate and control the process memory, take advantage of the parsing vulnerabilities of the COM objects, hide malicious code or connecting to external resources to download additional malware.
Microsoft Rich Text Format is heavily used in the email attachments in phishing attacks. It has been gaining massive popularity and its wide adoption in phishing attacks is primarily attributed to the fact that it has an ability to contain a wide variety of exploits and can be used efficiently as a delivery mechanism to target victims. Microsoft RTF files can embed various forms of object types either to exploit the parsing vulnerabilities or to aid further exploitation. The Object Linking and Embedding feature in Rich Text Format files is largely abused to either link the RTF document to external malicious code or to embed other file format exploits within itself and use it as the exploit container. Apparently, the RTF file format is very versatile.
In the below sections, we attempt to outline some of the exploitation and infection strategies used in Microsoft Rich Text format files over the recent past and then towards the end , we introspect on the key takeaways that can help automate the analysis of RTF exploits and set the direction for the generic analysis approach.
RTF Control Words
Rich Text Format files are heavily formatted using control words. Control words in the RTF files primarily define the way the document is presented to the user. Since these RTF control words have the associated parameters and data, parsing errors for them can become a target for exploitation. Exploits in the past have been found using control words to embed malicious resources as well. Consequently, it becomes significant to examine a destination control word that consumes data and extract the stream. RTF specifications describe several hundred control words consuming data.
RTF parsers must also be able to handle the control word obfuscation mechanisms commonly used by attackers, to further aid the analysis process. Below is one of the previous instances’ exploits using control word parameters to introduce executable payloads inside the datastore control word.
Overlay Data in RTF Files
Overlay data is the additional data which is appended to the end of RTF documents and is predominantly used by exploit authors to embed decoy files or additional resources, either in the clear, or encrypted form which is usually decrypted when the attacker-controlled code is executed. Overlay data of the volume beyond a certain size should be deemed suspicious and must be extracted and analysed further. However, Microsoft Word RTF parser will ignore the overlay data while processing RTF documents. Below are some instances of RTF exploits with a higher volume of overlay data appended at the end of the file, with CVE-2015-1641 embedding both the decoy document and multi-staged shellcodes with markers.
Object Linking and Embedding in RTF Files
Linked or embedded objects in RTF documents are represented as RTF objects, precisely to the RTF destination control word “object”. The data for the embedded or linked object is stored as the parameter to the RTF sub-destination control word “objdata” in the hex-encoded OLESaveToStream format. Modifier control word “objclass” determines the type of the object embedded in the RTF files and helps the client application to render the object. However, the hex encoded object data as the argument to the “objdata” control word can also be heavily obfuscated, either to make the reverse engineering and analysis effort more time consuming or to break the immature RTF parsers. Apparently, OLE has been one of the dominant attack vectors in the recent past, with many instances of OLE based exploits used in targeted attacks, essentially implying robust RTF document parsers for the extraction of objects, along with deeper inspection of object data is extremely critical.
Object Linking – Linking RTF to External Resource
Using object linking, it is possible to link the RTF files to the remote object which could be the link to the malicious resource hosted on the remote server. This leads the resulting RTF file to behave as a downloader and subsequently execute the downloaded resource by invoking the registered application-specific resource handlers. Inspecting the modifier RTF control words to “object”, linked objects are indicated by another nested control word “objautlink”, as represented below in the RTF document.
As indicated in the above representation, object data as the argument to the RTF control word “objdata” is OLE1.0NativeStream in the OLESaveToStream format which is followed by the NativeDataSize indicating the size of the OLE2.0 Compound document that is wrapped in the NativeStream. As per the Rich Text Format specifications, if the object is linked to the container application, which in this case is the RTF document, the Root Storage directory entry of the compound document will have the CLSID of the StdOleLink indicating the linked object. Also, when the object is in the OLE2.0 format, the linked source data is specified in the MonikerStream of the OLESteam structure. As highlighted below, while parsing the object data, the ole32.OleConvertOLESTREAMToIStorage function is responsible for converting the OLE1.0 NativeStream data to OLE2.0 structured storage format. Following the pointer to the OLE stream lpolestream will allow us to visualize the parsed extracted native data. Below is a memory snapshot from when an RTF document with a linked object was parsed by the winword.exe process.
Launching the RTF document with the link to external object will throw up a dialogue box asking to update the data from the linked object, as shown below.
However, this is not the ideal exploitation strategy to target victims. This error can be eliminated by inserting another modifier control word “objupdate”, which internally calls link object’s IOleObject::Update method to update the link’s source.
Subsequently the urlmon.dll, which is the registered server for the URL Moniker, is instantiated.
Once the COM object is instantiated, the connection is initiated to the external resource and, based on the content-type header returned by the server in the response, URL Moniker consults the Mime database in the registry and invokes registered application handlers.
Details on how URL Moniker is executed and an algorithm to determine which appropriate handlers to invoke is described by Microsoft here. We have had multiple such RTF exploits in the past including CVE-2017-0199, CVE-2017-8756 and others using Monikers to download and execute remote code.
However, COM objects used in the mentioned exploits had been blacklisted by Microsoft in the newer versions, but similar techniques could be used in future which essentially necessitates the analysis of OLE structured storage streams.
Object Embedding – RTF Containing OLE Controls
As indicated earlier, embedded objects are represented in the container documents in the OLE2 format. When the object is stored in the OLE2 format, the container application (here Rich Text Format files) creates the OLE Compound File Storage for each of the objects embedded and the respective object data is stored in the OLE Compound File Stream Objects. Layout of the container documents storing embedded objects is as represented below and described in the Microsoft documentation here.
RTF exploits historically have been found embedding and loading multiple OLE controls in order to bypass exploit mitigations and to take advantage of memory corruption vulnerabilities by loading vulnerable OLE controls. Embedded OLE controls in the RTF document are usually indicated by nested control word “objocx” or “objemb” followed by the “objclass” with the argument as the name of the OLE control to render the object. Below is one of the examples of the previous exploit used in the targeted attacks, which exploited a vulnerability in the COM object and loaded another OLE control to aid the exploitation process which had the staged malicious code embedded. Apparently, it is critical to extract this object data, extract the OLE2 compound file storage and extract each of the stream objects for further inspection of hidden malicious shellcodes.
Object Embedding – RTF Containing Other Documents
Malicious RTF documents can use the OLE functionality to embed other file formats like Flash files and Word documents, either to exploit respective file format vulnerabilities or to further assist and set up the stage for the successful exploitation process. Multiple RTF exploits have been observed in the past embedding OOXML documents using OLE functionality to manipulate the process heap memory and bypass Windows exploit mitigations. In RTF files, embedded objects are usually indicated by nested control word “objemb” with a version-dependent “ProgID” string as the argument to the nested control word “objclass”. One such RTF exploit used in targeted attacks in the recent past, is as indicated below.
Below is another instance where the PDF file was physically embedded within the compound document. As mentioned, the embedded object is stored physically along with all the information required to render it.
In the embedded object, the creating application’s identifier is stored in the CLSID field of the compound file directory entry of the CFB storage object. If we take a look at the previous instance, when the object data is extracted and inspected manually, the following CLSID is observed in the CFB storage object, which corresponds to the CLSID_Microsoft_Word_Document.
When OLE2 stream objects are parsed and the embedded OOXML is extracted and analysed after deflating the contents, we see the suspicious ActiveX object loading activity and embedded malicious code in one of the binary files. Apparently, it is significant to extract the embedded files in RTF and perform further analysis.
OLE Packages in RTF Files
RTF documents can also embed other file types like scripts (VBSsript, JavaScript, etc.), XML files and executables via OLE packages. An OLE package in an RTF file is indicated by the ProgID string “package” as the argument to the nested control word “objclass”. Packager format is the legacy format that does not have an associated OLE server. Looking at the associated CLSID in the registry, there is no specific data format mapped with Packages.
This essentially implies that OLE packages can store multiple file types and, if a user clicks the object, it will lead to execution of it and, eventually, infection of the machine if they are malicious scripts. RTF documents have been known to deliver malware by embedding scripts via OLE packages and then using Monikers, as described in the previous sections, to drop files in the desired directory and then execute them. One such instance of a malicious RTF document exploiting CVE-2018-0802, embedding an executable file, is shown below.
Since many RTF documents have been found delivering malware via OLE packages, it is critical to look for these embedded objects and analyse them for such additional payloads. Embedded executables / scripts within RTF could be malicious. Looking for OLE packages and extracting embedded files should be a trivial task.
The above exploit delivery strategies can allow us to take a step towards building analysis frameworks for RTF documents. Primarily, inspecting the linked or embedded objects turns out to be the critical aspect of automated analysis tasks along with the RTF control words inspection. The following are the key takeaways:
- Using the RTF file as the container, many other file format exploits can be embedded inside using the Object Linking and Embedding feature, essentially weaponizing the RTF documents.
- Extract and analysing embedded or linked objects for malicious code, payload or resource handler invocations becomes very essential.
- If RTF document has a higher volume of appended data, it must be further looked at.
- Non-OLE control words and OLE packages must also be analysed for any malicious content.
McAfee Response
As Microsoft Office vulnerabilities continue to surface, generic inspection methods will have to be improved and enhanced, consequently leading to better detection results. As a reminder, the McAfee Anti-Malware engine used on all our endpoints and most of our appliances has the potential to unpack Office, RTF and OLE documents, expose the streams of content and unpack these streams if necessary.