This post is about using recursive SQL queries (or rather a single recursive query) to parse the MicrosoftRegistrationDB.reg file created by Microsoft office on Mac OSX systems.
/Users/research/Library/Group Containers/xxxxxxxxxx.Office/MicrosoftRegistrationDB.reg
This is an sqlite database which is a flattened version of the registry tree that office would create in windows under HKCU\Software\Microsoft\Office, the format of which is quite straight-forward and documented. Some useful MRU artifacts and configuration settings reside here.
The sqlite database has the same fields as in the registry, namely - key, key_last_modified_time, value_name, value_type and value_data. This is nicely arranged in the following table structure.
Pulling the data out is fairly simple in SQL. However, if you wish to recreate _all_ the registry paths from the flattened tree, then it’s a bit more involved. In the HKEY_CURRENT_USER table, each key has a single entry and along with the key name, you have the parent key reference. As an analyst, you would like to get full key path (i.e. HKCU\Software\Microsoft\...) for every value. There lies the problem. To recreate the paths for every single registry value, you would have to run several individual SQL queries, each query would fetch a key’s parent, and you keep doing that till you reach the ROOT of the tree. You could do this in a recursive function in python. Or you can let SQL do the recursion by running a recursive query. Sqlite supports recursive queries. You can read up about recursive queries in sqlite here and here.
HKEY_CURRENT_USER_values.name as valueName,
HKEY_CURRENT_USER_values.value as value,
HKEY_CURRENT_USER_values.type as valueType from
(
WITH RECURSIVE
under_software(path,name,node_id,write_time) AS
(
VALUES('Software','',1,0)
UNION ALL
SELECT under_software.path || '\' || HKEY_CURRENT_USER.name,
HKEY_CURRENT_USER.name, HKEY_CURRENT_USER.node_id,
HKEY_CURRENT_USER.write_time
FROM HKEY_CURRENT_USER JOIN under_software ON
HKEY_CURRENT_USER.parent_id=under_software.node_id
ORDER BY 1
)
SELECT name, path, write_time, node_id FROM under_software
)
as t2 LEFT JOIN HKEY_CURRENT_USER_values on
HKEY_CURRENT_USER_values.node_id=t2.node_id;
Here the ‘WITH RECURSIVE’ part will perform the recursive querying for every value item. It will create the full key path for that value. The line:
‘SELECT under_software.path || '\' || HKEY_CURRENT_USER.name’ will concatenate the parent key path with the sub-key name using backslash as separator. The double-pipe ‘||’ is the concatenate operator. The 'ORDER BY 1' is not really necessary, but this makes it sort the output by the first parameter to the recursive function, i.e, path.
A python script to do this automatically is available here. This script will run the recursive query on the database and then provide both a csv and a plist as output. I chose to output as a plist too because this data is best viewed as a tree as shown below.
A little background..
On OSX (mac), there is no registry. Most apps just use plist files instead to save local information. Microsoft Office leaves cache and configuration information in plist files like every other OSX application. However it also keeps a copy in this file – microsoftRegistrationDB.reg. The file can be found here –/Users/research/Library/Group Containers/xxxxxxxxxx.Office/MicrosoftRegistrationDB.reg
This is an sqlite database which is a flattened version of the registry tree that office would create in windows under HKCU\Software\Microsoft\Office, the format of which is quite straight-forward and documented. Some useful MRU artifacts and configuration settings reside here.
The sqlite database has the same fields as in the registry, namely - key, key_last_modified_time, value_name, value_type and value_data. This is nicely arranged in the following table structure.
|  | 
| Figure 1 - Database table schema | 
Pulling the data out is fairly simple in SQL. However, if you wish to recreate _all_ the registry paths from the flattened tree, then it’s a bit more involved. In the HKEY_CURRENT_USER table, each key has a single entry and along with the key name, you have the parent key reference. As an analyst, you would like to get full key path (i.e. HKCU\Software\Microsoft\...) for every value. There lies the problem. To recreate the paths for every single registry value, you would have to run several individual SQL queries, each query would fetch a key’s parent, and you keep doing that till you reach the ROOT of the tree. You could do this in a recursive function in python. Or you can let SQL do the recursion by running a recursive query. Sqlite supports recursive queries. You can read up about recursive queries in sqlite here and here.
The Final Query
SELECT t2.node_id, t2.write_time, path as Key,HKEY_CURRENT_USER_values.name as valueName,
HKEY_CURRENT_USER_values.value as value,
HKEY_CURRENT_USER_values.type as valueType from
(
WITH RECURSIVE
under_software(path,name,node_id,write_time) AS
(
VALUES('Software','',1,0)
UNION ALL
SELECT under_software.path || '\' || HKEY_CURRENT_USER.name,
HKEY_CURRENT_USER.name, HKEY_CURRENT_USER.node_id,
HKEY_CURRENT_USER.write_time
FROM HKEY_CURRENT_USER JOIN under_software ON
HKEY_CURRENT_USER.parent_id=under_software.node_id
ORDER BY 1
)
SELECT name, path, write_time, node_id FROM under_software
)
as t2 LEFT JOIN HKEY_CURRENT_USER_values on
HKEY_CURRENT_USER_values.node_id=t2.node_id;
Here the ‘WITH RECURSIVE’ part will perform the recursive querying for every value item. It will create the full key path for that value. The line:
‘SELECT under_software.path || '\' || HKEY_CURRENT_USER.name’ will concatenate the parent key path with the sub-key name using backslash as separator. The double-pipe ‘||’ is the concatenate operator. The 'ORDER BY 1' is not really necessary, but this makes it sort the output by the first parameter to the recursive function, i.e, path.
A python script to do this automatically is available here. This script will run the recursive query on the database and then provide both a csv and a plist as output. I chose to output as a plist too because this data is best viewed as a tree as shown below.
|  | 
| Figure 2 - Sample plist produced by script (viewed in plist Editor Pro) |