Thursday, 5 January 2017

Flexing SQL muscle for parsing an MS db on OSX

This post is about using recursive SQL queries (or rather a single recursive query) to parse the MicrosoftRegistrationDB.reg file created by Microsoft office on Mac OSX systems.

A little background..

On OSX (mac), there is no registry. Most apps just use plist files instead to save local information. Microsoft Office leaves cache and configuration information in plist files like every other OSX application. However it also keeps a copy in this file – microsoftRegistrationDB.reg. The file can be found here –

/Users/research/Library/Group Containers/xxxxxxxxxx.Office/MicrosoftRegistrationDB.reg

This is an sqlite database which is a flattened version of the registry tree that office would create in windows under HKCU\Software\Microsoft\Office, the format of which is quite straight-forward and documented. Some useful MRU artifacts and configuration settings reside here.

The sqlite database has the same fields as in the registry, namely - key, key_last_modified_time, value_name, value_type and value_data. This is nicely arranged in the following table structure.

Figure 1 - Database table schema

Pulling the data out is fairly simple in SQL. However, if you wish to recreate _all_ the registry paths from the flattened tree, then it’s a bit more involved. In the HKEY_CURRENT_USER table, each key has a single entry and along with the key name, you have the parent key reference. As an analyst, you would like to get full key path (i.e. HKCU\Software\Microsoft\...) for every value. There lies the problem. To recreate the paths for every single registry value, you would have to run several individual SQL queries, each query would fetch a key’s parent, and you keep doing that till you reach the ROOT of the tree. You could do this in a recursive function in python. Or you can let SQL do the recursion by running a recursive query. Sqlite supports recursive queries. You can read up about recursive queries in sqlite here and here.

The Final Query

SELECT t2.node_id, t2.write_time, path as Key,
  HKEY_CURRENT_USER_values.name as valueName,
  HKEY_CURRENT_USER_values.value as value,
  HKEY_CURRENT_USER_values.type as valueType from
(
  WITH RECURSIVE
    under_software(path,name,node_id,write_time) AS 
    (
      VALUES('Software','',1,0)
      UNION ALL
      SELECT under_software.path || '\' || HKEY_CURRENT_USER.name,
          HKEY_CURRENT_USER.name, HKEY_CURRENT_USER.node_id,
          HKEY_CURRENT_USER.write_time
        FROM HKEY_CURRENT_USER JOIN under_software ON
          HKEY_CURRENT_USER.parent_id=under_software.node_id
        ORDER BY 1
    )
  SELECT name, path, write_time, node_id FROM under_software

as t2 LEFT JOIN HKEY_CURRENT_USER_values on
  HKEY_CURRENT_USER_values.node_id=t2.node_id;

Here the ‘WITH RECURSIVE’ part will perform the recursive querying for every value item. It will create the full key path for that value. The line:
SELECT under_software.path || '\' || HKEY_CURRENT_USER.name’ will concatenate the parent key path with the sub-key name using backslash as separator. The double-pipe ‘||’ is the concatenate operator. The 'ORDER BY 1' is not really necessary, but this makes it sort the output by the first parameter to the recursive function, i.e, path.

A python script to do this automatically is available here. This script will run the recursive query on the database and then provide both a csv and a plist as output. I chose to output as a plist too because this data is best viewed as a tree as shown below.

Figure 2 - Sample plist produced by script (viewed in plist Editor Pro)