Friday, 28 April 2017

The mystery of /var/folders on OSX

The /var/folders (which is actually /private/var/folders) has been a known but undocumented entity for a very long time. Apple does not document why its there, or what it is. But there are plenty of guides online that suggest it may be good to periodically clear those folders to recover disk space and perhaps speed up you system. Whether that is wise is beyond the scope of this blog post.

If you've ever done a MacOS (OSX) forensic examination, you've probably noticed some odd folders here, all two character folder names with long random looking subfolders. Something like /var/folders/_p/vbvyjz297778zzwzhfpsqb2w0000gl/. A level below are 3 folders C, T, 0 and you can then see familiar data files under those. The 3 folders represent Cache (C), Temporary files (T) and User files (0)

On a live system these can be queried using the command 'getconf VARIABLE' where VARIABLE can be DARWIN_USER_CACHE_DIR, DARWIN_USER_TEMP_DIR or DARWIN_USER_DIR.

These are locations where mac stores cache and temporary files from various programs and system utilities. They are further segregated on a per-user basis. So each of those folders (Example: /var/folders/_p/vbvyjz297778zzwzhfpsqb2w0000gl) represents a single user's temporary space. 

Whats in it for forensicators?

From a forensics examination perspective, there is not a lot of artifacts here. However some useful artifacts like Notifications databases and Quicklook thumbnailcache databases are located here. The Launchpad dock database is also here. Sometimes you can find useful tidbits of cache from other apps too.

Figure: /private/var/folders on MacOS 10.12 (Sierra) 

It would be nice to be able to determine which user owned a particular folder when analyzing the artifacts within it. This is actually really easy, as you can just look up the owner uid of the folder. But if you are more interested in how the name gets generated, read on.

Reverse engineering the Algorithm

There is an old forum post here that does not provide the algorithm but hints that its likely generated from uuid and uid. Both of which are available from the user's plist (under /var/db/dslocal/nodes/Default/users/<USER>.plist). From the narration, the character set used in the folder names would be '+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'. However that's not what is seen on newer macs (10.10 onwards)

After analyzing the folder names on my mac test machines, the data set was narrowed down to 0-9, underscore and all alphabets except vowels (aeiou). A little bit of research confirmed this, when the string '0123456789_bcdfghjklmnpqrstvwxyz' was seen in the libsystem_coreservices.dylib binary. The string is 32 char in size, so you would need 5 bits to represent an index which could point to a specific character in the string. After a bit of experimentation with creating a few user accounts and setting custom UUIDs, it was clear how this worked. The  algorithm takes the 128 bit UUID string as a binary bitstream, appends to it the binary bitstream of the UID (4 byte), then runs a single pass over that data. For each 5 bits it reads, it uses that as an index to get a single char from the charset array and copies that to output. A python implementation that generate these folder names (for both new osx versions and older ones) is provided here.


Thursday, 5 January 2017

Flexing SQL muscle for parsing an MS db on OSX

This post is about using recursive SQL queries (or rather a single recursive query) to parse the MicrosoftRegistrationDB.reg file created by Microsoft office on Mac OSX systems.

A little background..

On OSX (mac), there is no registry. Most apps just use plist files instead to save local information. Microsoft Office leaves cache and configuration information in plist files like every other OSX application. However it also keeps a copy in this file – microsoftRegistrationDB.reg. The file can be found here –

/Users/research/Library/Group Containers/xxxxxxxxxx.Office/MicrosoftRegistrationDB.reg

This is an sqlite database which is a flattened version of the registry tree that office would create in windows under HKCU\Software\Microsoft\Office, the format of which is quite straight-forward and documented. Some useful MRU artifacts and configuration settings reside here.

The sqlite database has the same fields as in the registry, namely - key, key_last_modified_time, value_name, value_type and value_data. This is nicely arranged in the following table structure.

Figure 1 - Database table schema

Pulling the data out is fairly simple in SQL. However, if you wish to recreate _all_ the registry paths from the flattened tree, then it’s a bit more involved. In the HKEY_CURRENT_USER table, each key has a single entry and along with the key name, you have the parent key reference. As an analyst, you would like to get full key path (i.e. HKCU\Software\Microsoft\...) for every value. There lies the problem. To recreate the paths for every single registry value, you would have to run several individual SQL queries, each query would fetch a key’s parent, and you keep doing that till you reach the ROOT of the tree. You could do this in a recursive function in python. Or you can let SQL do the recursion by running a recursive query. Sqlite supports recursive queries. You can read up about recursive queries in sqlite here and here.

The Final Query

SELECT t2.node_id, t2.write_time, path as Key, as valueName,
  HKEY_CURRENT_USER_values.value as value,
  HKEY_CURRENT_USER_values.type as valueType from
    under_software(path,name,node_id,write_time) AS 
      SELECT under_software.path || '\' ||,
        FROM HKEY_CURRENT_USER JOIN under_software ON
        ORDER BY 1
  SELECT name, path, write_time, node_id FROM under_software


Here the ‘WITH RECURSIVE’ part will perform the recursive querying for every value item. It will create the full key path for that value. The line:
SELECT under_software.path || '\' ||’ will concatenate the parent key path with the sub-key name using backslash as separator. The double-pipe ‘||’ is the concatenate operator. The 'ORDER BY 1' is not really necessary, but this makes it sort the output by the first parameter to the recursive function, i.e, path.

A python script to do this automatically is available here. This script will run the recursive query on the database and then provide both a csv and a plist as output. I chose to output as a plist too because this data is best viewed as a tree as shown below.

Figure 2 - Sample plist produced by script (viewed in plist Editor Pro)

Saturday, 29 October 2016

WofCompressed streams in Windows 10

On windows 10, there is a new 'System Compression' option that compresses files using reparse points. This is not the NTFS-based compression that earlier versions of windows utilized, its different. This post is about the new compression scheme and how it affects forensic analysts.

With windows 10, a lot of details are automatically managed without user input and this is one of them. Windows can determine if the compression will be beneficial to the host system and automatically trigger it! This usually happens when you upgrade as opposed to clean installing the OS. Some users have reported seeing it as an option in 'Disk Cleanup' too.

Windows provides a utility called Compact.exe to do this processing manually. Using it, you can compress/decompress files and folders or simply query a system to determine if it will be beneficial at all on a specific volume. The compression algorithms are XPRESS (4K, 8K, 16K) or LZX. While the files are compressed on disk, if an application opens/reads such a file, it is still getting the original decompressed data and all decompression is handled on the fly automatically by windows 10.

Figure 1 - Compact.exe and its command usage info
The command 'compact /exe <file>' will compress any file (not just exe)

Lets get to the point, how does this impact forensics

Well, as of now, no tools will recognize and decompress these files. Hence, you can't read, keyword search or extract these files in their original uncompressed form.

Tools tested

Here is a list of tools tested so far:

Tool VersionSupport (as of 10/26/2016)
SIFT Workstation3No
Xways Forensic19.0No

How it works?

System compression utilizes reparse points and creates a new Alternate Data Stream (ADS) having the name 'WofCompressedData'. The compressed data is stored here. Reparse points are an NTFS feature that allow custom implementation like this. However this means that other applications that are not aware of this custom implementation will not be able to read/write to that file. In encase (or other forensic tools), you can see the file and the WofCompressedData stream. Clicking on the file just shows the contents to be all zeroes. Clicking on the stream, you can get the compressed data, but as of now, no automatic transparent decompression (as it does with NTFS compressed files). This is seen in screenshot below.

Note - This isn't to be confused with WOFF compression, which is a compression scheme used in Web Open Font Format!

Figure 2 - Encase shows the WofCompressedData stream. The file's original data was all text.
If you mount a volume containing such compressed files in SIFT Workstation or any linux system (they all use the same NTFS-3g FUSE driver), you will see the message 'Unsupported reparse point' when trying to list these files. Trying to access file contents will result in errors as seen in screenshot below.

Figure 3 - Files DW20.exe and upgrader_default.log are compressed here
If you attach a windows 10 formatted volume/disk to a Windows 7 system, you won't be able to access files as it does not know how to read them. See screenshot below:

Figure 3 - Notepad trying to view upgrader_default.log file (which is compressed)

Workarounds (till supported is added in by tool developers)

For Linux

If you use SIFT or another Linux system to do your forensics, the fix is simple. A few months back, Eric Biggers wrote a plugin to handle this. Its a plugin to the ntfs-3g FUSE driver. Its available here:

For this, you will first need to download, compile and install the latest version of the ntfs-3g driver (but not from Tuxera, that one is missing a file!); then proceed to download, compile and install the above mentioned plugin. You can get this working on SIFT with roughly the following steps:

1. Go to and download the source code for the latest stable release, right now its ntfs-3g_2016.2.22AR.1.orig.tar.gz.
2. Unzip and extract the file downloaded.
3. Open Terminal and browse to the extracted folder.
4. Compile and install using commands:
sudo make install
4. Go to and download the entire code as a zip file.
5. Unzip and extract the archive.
6. Open Terminal and browse to the extracted folder.
7. A few more tools need to be installed to compile this, so run the following commands:
sudo apt-get update
sudo apt-get install autoconf automake libtool
8. Run following commands to generate a configure script:
mkdir m4
autoreconf -i
9. Compile and install
sudo make install
10. If all went well (without errors), you are done!

Now you should be able to view and read those files normally, all decompression is handled on the fly automatically!

Figure 4 - No errors seen listing or reading files after installing the system compression plugin

For Windows

If you use Windows as your host machine for forensics processing, then you should only use a Windows 10 machine for processing evidence files that contain windows 10 images. This applies to tasks such as antivirus scanning, where you would typically share the entire disk out using Disk emulation (if you use Encase) which allow windows to parse and interpret the disk. This would only work (to read system compressed files) if the host system is Windows 10.

If you are looking to identify the system compressed files, you could filter on all files with ADS streams that have the name 'WofCompressedData'.

Fortunately, by default windows only compresses system files (EXE/DLL in windows and system32) and not user files, so you should mostly be fine. However, users can compress any file manually using the compact command.

Friday, 3 June 2016

Parsing the Windows 10 Notification database

Notifications on windows was a new feature added with windows 8 and continues in 10. In this post, I briefly discuss the format and data obtained from these notifications. Notifications can hold useful recent data (and some not so recent data) such as popup messages from applications, email snippets, application specific data like torrent downloaded messages among other information. As of now, not many applications use this feature on windows (when contrasted to apps on mac), but that is changing as more applications begin adding support for sending events to the Notifications Center/Bar.

As pointed out by Brent Muir here, this database is located at:

This Notifications database holds not just the popup notifications which the user sees briefly, but also  any updates to Tiles on the new windows start screen/start menu. Under the notifications scheme used by windows, there are 4 types of notifications, Toasts (popups), Tiles (updates on app live tiles like latest news stories, tweets or weather), Badges (small overlay on tile used to show status or count of items) or Raw push notifications (app specific data).

Appdb.db is a binary database having the signature 'DNPW' as the first 4 bytes. The structure of the file is roughly as shown below:

By default, there are 256 chunks in the file. Each chunk has a header element, however, only the first chunk has the header filled in. The chunk header starts with the DNPW signature, followed by what I believe to be the time the last notification was displayed to the user (8 bytes FILETIME) and the next sequential Notification ID to be used, and some unknown data after that (12 bytes).

The header is followed by data that I assume to be flags (8 bytes), followed then by Push URI (URL used by apps to push data and notifications to the client), Badge XML content and Tile Data (5 metadata objects and 5 corresponding XML data strings). Each of these elements in the chunk has its own data structure, which is quite detailed in itself. I am not reproducing all the structures here. To get this information, download the 010 Template (from link below) containing all the definitions for structures (deciphered so far..). There is also a python script available to parse information from this file and write out to a CSV file.

Tuesday, 10 May 2016

Amcache on Windows 7

The amcache registry hive which made its debut in windows 8, is now also showing up on Windows 7 systems. I was alerted to this by a fellow DFIR analyst Clint Hastings, who noticed this and has been using my scripts to parse them on windows 7 for some time now.

Amcache on Windows 7

So, what happened? After a bit of investigation on my machines, it was traced to Windows Update KB2952664, which updates the application inventory and telemetry (Microsoft terminology for the programs that monitor application usage) executables and libraries.

The update first came out in April 2015, but it appears as if it was not widely deployed (automatically) until around October.

Both Amcache.hve and RecentFileCache.bcf are updated now. I verified this information by parsing both these artifacts. Amcache of-course, had a lot more detail about the same files. So, don't forget to look for amcache on your windows 7 examinations.

Monday, 21 April 2014

Search history on windows 8.1 - Part 2

I have recently blogged about windows 8.1 search history and how searched terms/phrases are recorded as LNK files in a post here. But windows also logs searched terms (search history) to the event log and web history (and cache).

From the LNK files, we know the first time a term was searched for, but not the next time or the last time it was searched, which is usually more relevant from an investigation perspective. However, this information can be obtained from the Connected-Search event log file. On disk it would be under:

Under Event viewer, you can find it under:
 \Applications and Services Logs\Microsoft\Windows\Connected-Search\Operational

Below is a screenshot for one such log entry.

Searched keyword is 'enscript' and machine was online when search was run

Windows logs all URLs and reference links here. Windows, by default tries to search for everything online as well as on the machine. Even if you are offline, a search URL for online searches is generated and seen here. The screenshot below shows the same search run when machine was offline (not connected to internet).

Searching for 'enscript' when machine is offline

Each time a search is run, an entry is created, sometimes multiple entries (this probably has to do with different views when browsing the search results). For searches (if machine was online), the URL requests and responses are also found in the IE web history and cache database (WebcacheV01.dat). The database is located at

The best way to study this data would be by parsing this database either manually (using libesedb and lots of data formatting with additional parsing!) or use a free program like IE10 History reader (or an expensive brand forensic tool). However, if you are just interested in the search terms without dates or other information, a raw search into the datebase files will suffice.

To find searched terms, you will need to search  for URLs beginning with

The screenshot below shows hits when searching the IE web cache files for the above URL using Encase.
Search hits in IE web cache database as seen in encase

Search as you type

An aspect missing from LNK files and the event logs is the search suggestions and interim search results. As Rob Lee hinted to me earlier that a user could search for a term (without hitting Enter or the search button after entering the term in the search box), but not click on any results and none of the above artifacts would be created. Windows uses the 'search as you type' feature and  terms windows guessed for you (as you were typing into the search box) or interim search results are discarded. However you would find some traces of the terms as windows will make online queries for the 'search as you type' feature. If this is not clear, just recall how you search with google. As you begin typing the letters of your keyword into the search box, google automatically suggests most popular searches beginning with those letters. Windows also does the same thing.

Google's search as you type feature

To find such searches which were discarded, search for URLs beginning with

Actually, these are not the suggestions, but the lookups for term/phrase entered into the search box. The data returned (query response) will contain the suggestions.

Below is a screenshot showing the webcacheV01.dat file (and supporting db files) with search hits as displayed in Encase.

Search hits showing Windows querying for popular search term suggestions based on user entered input

Thus there are multiple locations (connected search LNK files, event logs, web cache) where an investigator can find evidence of searches run by a user. Each has its uses and caveats.

Friday, 4 April 2014

Windows 8 Thumbs.db files - still the same and not the same!

Screenshot of folder in Windows 8 showing Thumbs.db

Thumbs.db files have made a comeback in windows 8. Now, like in windows XP, explorer will create these files in every folder containing media files. This used to be a great forensic resource for investigators because thumbnails once created and stored in the Thumbs.db remained there even after the image file itself was deleted. This behavior is also noted with Windows 8.

The only thing that is different is the format of these new Thumbs.db files. It is not the Windows XP format and the usual thumbs.db file viewers including most forensic tools will not parse this file correctly. The format is actually the same as Windows 7 Thumbs.db files. Yes, that was not a typo, I said 'Windows 7'. I had looked into this earlier and the details are available here.

An interesting thing to note is that in windows 8, the same Thumbcache_*.db files are still maintained on a per user basis like windows 7 does. So the Thumbs.db is really a redundant location for these thumbnails as they are already cached in the Thumbcache database. So why the duplication?

Update (Thanks proneer for this tip!):
There are some caveats here. On windows 8, Thumbs.db will only be created in folders under a user profile folder, so anything created in C:\ or C:\program files or C:\program data or any other folder not under a user profile, ie, C:\Users\<USER>\* will not have thumbs.db files. 

But this has got nothing to do with a particular logged in user. A thumbs.db file will be created even when the logged in user browses folders of another user under their profile (as long as file permissions allow that user to write files to the other users' folder).

This behavior is different from Windows 7 thumbs.db where the location does not matter for creation of thumbs.db files.

There is another oddity noted. Sometimes a thumbs.db is created immediately upon folder being opened in explorer, on other occasions it has be triggered by changing the 'view' of the folder to 'Large icons'.