Friday, November 4, 2022

Reading OneDrive Logs Part 2

In the last OneDrive blog post, I outlined how the ODL file format is structured. A working version of an ODL parser was also created to read these files. One key detail was how personal file/folder, location or credential identifying strings were obfuscated with the original values stored in the ObfuscationStringMap.txt file. 

However some time in April 2022, Microsoft decided to change the way the obfuscation worked and the parser no longer worked (the unobfuscation part). 

What changed?

OneDrive now appeared to encrypt the data and the ObfuscationStringMap.txt is no longer used. The file may still exist on older installations, but newer ones include a different file.

Figure 1 - Contents of  \AppData\Local\Microsoft\OneDrive\logs\Business1 folder

As seen in Figure 1 above, there is a new file called general.keystore. This file's format is JSON that can be easily read and apparently holds the key to decrypt the encrypted content as a base64 encoded string. 

Figure 2 - Sample general.keystore contents

Time for some Reverse Engineering

With a little bit of digging around with IDA Pro on the LoggingPlatform.dll file from OneDrive, we can see the BCrypt Windows APIs being used in this file. Note, this is not the bcrypt hash algorithm which bears the same name!

Figure 3 - BCrypt* Imports in LoggingPlatform.dll

Jumping to where these functions are used, it is quickly apparent that the encryption used is AES in CBC (Cipher Block Chaining) mode with a key size of 128 bits. 

Figure 4 - IDA Pro Disassembly

In the above snippet, we can see the call to BCryptAlgorithmProvider and then if successful, a call to BCryptSetProperty function which has the following syntax:

NTSTATUS BCryptSetProperty(
  [in, out] BCRYPT_HANDLE hObject,
  [in]      LPCWSTR       pszProperty,
  [in]      PUCHAR        pbInput,
  [in]      ULONG         cbInput,
  [in]      ULONG         dwFlags

Without delving into too many boring assembly details, I'll skip to the relevant parts...

For each string to be encrypted, OneDrive initialises a new encryption object with the key that is stored in the general.keystore file, then encrypts the string and disposes of the encryption object. The encrypted blob is then base64 encoded and written out to the log the obfuscated string. There are a few other quirks along the way, such as replacement of the characters  / and  with _  and respectively, as the former can appear in base64 text but are also used in URLs to make it parseable later.

Why the change?

In the previous iteration of ODL (when the ObfuscationStringMap was used), there were instances where the same key (3 word combination) was often repeated in the file making it difficult or impossible to know which value to use as its replacement to get the original string.

Using encryption in place and not using a lookup table does appear to be a more robust scheme which eliminates the above issue. It does use some more disk space as the encrypted blob will always be a multiple of 16 bytes (128 bits) as this is block based encryption. In other words, it's inefficient for small text (less than 10 bytes). 

Updated code

The python ODL parser has been updated to accomodate this new format, and works with both the old and new versions. It is available here.

Sunday, February 13, 2022

Reading OneDrive Logs

Due to the popularity of OneDrive, it has become an important source of evidence in forensics. Last week, Brian Maloney posted about his research on reconstructing the folder tree from the usercid.dat files, and also provided a script to do so. In this brief post, we explore the format of OneDrive Logs, and provide a tool to parse them. In subsequent posts, I will showcase use case scenarios. 

Where to find them?

OneDrive logs have the extension .odl and are found in the user's profile folder at the following locations:

On Windows - 


On macOS - 


At these locations, there are usually 3 folders - Common, Business1 and Personal, each containing logs. As the name suggests Business1 is the OneDrive for Business version. 

Figure 1 - Contents of Business1 folder

The .odl file is the currently active log, while the .odlgz files are older logs that have been compressed. Depending on which folder (and OS) you are in, you may also see .odlsent and .aodl files, which have a similar format.

What is in there?

These are binary files, and cannot be directly viewed in a text editor. Here is what a .odl file looks like in a hex editor.

Figure 2 - .odl file in a hex editor

It is a typical binary file format with a 256 byte header and data blocks that follow. Upon first inspection, it seems to be a log of all important function calls made by the program. Having this low level run log can be useful in certain scenarios where you don't have other logging and need to prove upload/download or synchronisation of files/folders or even a discovery of items which no longer exist on disk/cloud. 

You do notice some funny looking strings (highlighted in red). More on that later.. 

The Format

With a bit of reverse engineering, the header format is worked out as follows:

struct {
    char     signature[8]; // EBFGONED
    uint32   unk_version;  // value seen = 2
    uint32   unknown2;
    uint64   unknown3;     // value seen = 0
    uint32   unknown4;     // value seen = 1
    char     one_drive_version[0x40];
    char     os_version[0x40];
    byte     reserved[0x64];
} Odl_header;

The structures for the data blocks are as follows:

struct {
    uint64     signature; // CCDDEEFF 0000000
    uint64     timestamp; // Unix Millisecond time
    uint32     unk1;
    uint32     unk2;
    byte       unk3_guid[16];
    uint32     unk4;
    uint32     unk5;  // mostly 1
    uint32     data_len;
    uint32     unk6;  // mostly 0
    byte       data[data_len];
} Data_block;

struct {
    uint32    code_file_name_len;
    char      code_file_name[code_file_name_len];
    uint32    unknown;
    uint32    code_function_name_len;
    char      code_function_name[code_function_name_len];
    byte      parameters[];
} Data;

In case of .odlgz files, the Odl_header is the same, followed by a single gzip compressed blob. The blob can be uncompressed to parse the Data_block structures.

Now, we can try to interpret the data. Leaving aside the few unknowns, the data block mainly consists of a timestamp (when event occurred), the function name that was called, the code file that function resides in and the parameters passed to the function. The parameters can be of various types like int, char, float, etc.. and that part hasn't been fully reverse engineered yet, but simply extracting the strings gives us a lot of good information. However the strings are obfuscated!

Un-obfuscating the strings

Since Microsoft uploads these logs to their servers for telemetry and debugging, they obfuscate anything that is part of a file/folder name or url string or username. However file extensions are not obfuscated. The ways this works is that the data identified to obfuscate is replaced by a word which is stored in a dictionary. The dictionary is available as the file ObfuscationStringMap.txt, usually in the same folder as the .odl files. To un-obfuscate, one simple needs to find and replace the strings with their original versions. 

Figure 3 - Snipped of ObfuscationStringMap.txt

This is a tab separated file, stored as either UTF-8 or UTF-16LE depending on whether you are running macOS or Windows. 

Now, referring back to the original funny looking string in Figure 2 - 
  .. after un-obfuscating becomes ..

It is important to note that since extensions are not obfuscated, they can still provide valuable intel even if some or all of the parts in the path cannot be decoded.

Now this process seems easy, however not all obfuscated strings are file/folder names or parts of paths/urls. Some are multi line strings. Another problem is that the words (or keys to the dictionary) are reused! So you might see the same key several times in the ObfuscationStringMap. The thing to remember is that new entries get added at the top in this file, not at the bottom, so when reading a file, the first occurrence of a key should be the latest one.  Also sometimes, the key is not found, as it's cleaned out after a period of time. Also there is no way to tell if an entry in the dictionary is stale or valid for a specific log file being parsed. All of this just means that the decoded strings need to be taken with a grain of salt. 

Based on the above, a python script to parse the ODL logs is available here. A snippet of the output produced by the script is shown below.

Figure 4 - output snippet

In a subsequent post, we'll go through the items of interest in these logs like Account linking/unlinking, uploads, downloads, file info, etc..