Pages

Tuesday, August 21, 2018

An open source spotlight parser

Spotlight is the name of the indexing system which comes built into macOS. It is responsible for continuous indexing of files and folders on all attached volumes. It keeps a copy of all metadata for almost every single file and folder on disk.

Thus, it can provide some excellent data for your investigation. While much of the same information can be obtained if you have access to the full disk image, it is known that there is information in this database that is not available elsewhere. Details like Date(s) Last Opened or Number of Times (an application or file) is Opened/Used are not available anywhere else on the file system. Unfortunately though it uses a proprietary undocumented format, and no publicly available code existed to read it. So over the last few months, I’ve been studying the file format of these databases and have created a tool/library to read and extract the data contained within.

The library and tool are open sourced now and located here:
https://github.com/ydkhatri/spotlight_parser

The format of the database will be discussed in a later post.

For those familiar with macOS, you know this data (contained in the database) can be obtained on a locally mounted volume using the macOS built-in mdls utility. However to do this, you need to mount your disk image on a mac to do so and the utility can only be run on individual files/folders, not the entire disk. It can be run recursively (with a bit of command line fu) on the entire volume but the output is not easy to read then.

If you don't prefer to do that, run spotlight_parser instead. Just point it to the database files which are named store and .store (located in the /.Spotlight-V100/Store-V2/<UUID> folder) and let it parse out the complete database for you.

Here is a screenshot of spotlight_parser running. Depending on how much data is contained in the database, this can take anywhere between a few seconds to a few minutes (5-10 on very large disks with lots of files).

Figure 1 - Running spotlight_parser

Once done, you will have 2 files as output. One is a text file (prefix_data.txt) containing the database dump of all entries. The other is a CSV (actually tab separated) which tries to build a path for every file/folder using inode number (CNID) from data available in the database. Since not every single folder may be included, some paths may not resolve and you might get ..NOT FOUND.. in the path sometimes along with an error on the console as seen above.

In the prefix_data.txt file, you will see some XML content (configuration information) at the beginning followed by database entries for files and folders.
Below is a snippet of the prefix_data.txt file, showing only output for a single jpg image file.

Figure 2 - Output showing a single jpg file's metadata information from database

Here the text in Red is metadata pertaining to a single entry in the database, including the date and time it was last updated. This is followed by the metadata itself. The items in Blue are information only available in the spotlight database. The last two may be of particular interest to an investigator.

Note - The screenshot above is not from any special version of the code, actual output is plain text, it has no coloring! Colors were added just for explanation.

The spotlight_parser has been incorporated into mac_apt as the SPOTLIGHT plugin. In mac_apt, the output is also available in an sqlite database making it easier to query. Mark McKinnon has forked a version of this library and also added sqlite capability, it is available here.

While this exposes the data and makes it available, it is still not easy to query. Perhaps one of these days, I will write a GUI application with drop-down boxes for easily accessing and querying the output data.