Tuesday, November 6, 2018

The ._ (dot-underscore) file format

If you've ever looked at removable media and found several hidden files which start with ._ and there exists one for almost every file (or folder) on the disk, this is the result of having that media being used on macOS.

macOS keeps a copy of file metadata in a separate area, known as Extended Attributes (xattr) on HFS+ or APFS. However, when writing to external media which is not formatted as HFS+ or APFS (thus not having the capability to store extended attributes), it will write this information out as a separate file which will have the same name, just prefixed with dot-underscore  ._  as seen in the screenshot below.

Figure 1 - Screenshot showing exFAT volume on External USB disk
While this is well known for many years, this information is often overlooked in a forensic investigation. On media that has interacted with both macOS and Windows (or even Linux), macOS will create these files and delete them too when the original file is deleted. However, if the file is deleted or renamed on Windows or Linux, then the dot-underscore files will be left behind untouched. A while back, Lee Whitfield touched upon this here specifically pointing out its use for knowing the date & time that a file was copied onto the media. However, there is useful information inside the file too. 

This file can contain useful metadata such as kMDItemWhereFroms (URL of file if downloaded from internet) and kMDItemDownloadedDate (Date & Time when it was downloaded) among other extended attributes.

After a bit of reverse engineering, I wrote an 010 hex editor template to parse this information, it is  available at https://github.com/ydkhatri/MacForensics/blob/master/DotUnderscore_macos.bt

In the screenshot below, you can see it being run on one such file. 

Figure 2 - DotUnderscore_macOS.bt template output

Here is analysis from the data extracted:

Attribute Name Value Meaning
kMDItemWhereFroms https://upload.wikimedia.org/wikipedia/commons/3/3c/Thiruvalluvar_Statue_at_Kanyakumari_beach.jpg URL from where it was downloaded
kMDItemDownloadedDate 0x41BFB51D1CFFA4F8 (11/09/2017 23:32:44) Timestamp when file was downloaded
com.apple.quarantine 0083;5a04e59c;Safari;A451620D-2B49-49BD-ADC1-88DEBEA66582 File was downloaded using the Safari browser*

kMDItemDownloadedDate is stored in a plist as a date value, which is stored as a 64-bit double that is the number of seconds since 01/01/2001
The template does not parse the plist for you, you can export it out and open in any plist viewer to view the human-readable date value.

*The 3rd item in com.apple.quarantine's value (separated by ;) is the Application (agent) name which downloads the file. For more details on this, read Howard Oakley's blog post

Tuesday, October 16, 2018

The user spotlight database

On macOS, the spotlight database is a central database holding metadata of all files/folders that macOS indexes and is always located at the root of any disk under /.Spotlight-V100

However, while browsing the folders on my macOS 10.14 (Mojave) image, I find a folder that contains yet another spotlight database. It appears that there are now more than one spotlight databases on a single disk. There is one for each user located at:
As with the other spotlight database, the files that hold the information are store.db and .store.db.

Mojave (10.14) isn't the first version of macOS to include this database. This appeared first in High Sierra (10.13).

What is in it?

The per-user database store is used to store metadata from items that aren't files or folders. Items seen so far are:
  • Safari browser history (web pages visited)
  • Safari browser bookmarks
  • News App history (web pages visited)
  • Notes App notes
  • Maps App data (locations?)
I would speculate that emails would also be seen here, as a number of email related fields are present too:
  • kMDItemRecipientEmailAddresses
  • kMDItemPrimaryRecipientEmailAddresses
  • kMDItemAdditionalRecipientEmailAddresses
  • kMDItemHiddenAdditionalRecipientEmailAddresses

However no email metadata was seen with a single configured IMAP account in the Mail app. 

Since this is a test environment with very little activity and almost no apps other than those that come with macOS, there is likely to be a lot more metadata from different apps in this database on real world systems.

Why a separate database?

Reading Apple's documentation here and here seems to suggest that this is the implementation of functionality intended to allow app developers to provide in-app content searches and includes the ability to define metadata to do so. Items indexed are not required to be files.

Prior to this (10.12 and below) there existed a folder at

which housed all the *.webhistory files. See pic below.

Figure 1 - webhistory files in ~/Library/Caches/Metadata/Safari/

The individual files were plists which were then indexed by spotlight.

Figure 2 - webhistory file content
That folder now does not exist and in its place we have the new spotlight database.

Parsing the data

Using mac_apt's single_plugin script to only run the spotlight plugin over individual store.db files, we can easily parse the database.

Figure 3 - mac_apt_singleplugin to parse the store.db file

There are quite a few fields, some more important than others. Below are some screenshots showing selected data (notes, safari history, and news). mac_apt gives you the data as a spreadsheet, sqlite db and a flat text file too (similar to mdls output).

Figure 4 - notes metadata from store.db (not all fields shown here)

Figure 5 - safari history from store.db (not all fields shown here)

Figure 6 - A single entry from News app showing all metadata (parsed from store.db)

mac_apt's spotlight plugin has been updated to automatically handle/process these user spotlight databases now. 

Tuesday, August 21, 2018

An open source spotlight parser

Spotlight is the name of the indexing system which comes built into macOS. It is responsible for continuous indexing of files and folders on all attached volumes. It keeps a copy of all metadata for almost every single file and folder on disk.

Thus, it can provide some excellent data for your investigation. While much of the same information can be obtained if you have access to the full disk image, it is known that there is information in this database that is not available elsewhere. Details like Date(s) Last Opened or Number of Times (an application or file) is Opened/Used are not available anywhere else on the file system. Unfortunately though it uses a proprietary undocumented format, and no publicly available code existed to read it. So over the last few months, I’ve been studying the file format of these databases and have created a tool/library to read and extract the data contained within.

The library and tool are open sourced now and located here:

The format of the database will be discussed in a later post.

For those familiar with macOS, you know this data (contained in the database) can be obtained on a locally mounted volume using the macOS built-in mdls utility. However to do this, you need to mount your disk image on a mac to do so and the utility can only be run on individual files/folders, not the entire disk. It can be run recursively (with a bit of command line fu) on the entire volume but the output is not easy to read then.

If you don't prefer to do that, run spotlight_parser instead. Just point it to the database files which are named store and .store (located in the /.Spotlight-V100/Store-V2/<UUID> folder) and let it parse out the complete database for you.

Here is a screenshot of spotlight_parser running. Depending on how much data is contained in the database, this can take anywhere between a few seconds to a few minutes (5-10 on very large disks with lots of files).

Figure 1 - Running spotlight_parser

Once done, you will have 2 files as output. One is a text file (prefix_data.txt) containing the database dump of all entries. The other is a CSV (actually tab separated) which tries to build a path for every file/folder using inode number (CNID) from data available in the database. Since not every single folder may be included, some paths may not resolve and you might get ..NOT FOUND.. in the path sometimes along with an error on the console as seen above.

In the prefix_data.txt file, you will see some XML content (configuration information) at the beginning followed by database entries for files and folders.
Below is a snippet of the prefix_data.txt file, showing only output for a single jpg image file.

Figure 2 - Output showing a single jpg file's metadata information from database

Here the text in Red is metadata pertaining to a single entry in the database, including the date and time it was last updated. This is followed by the metadata itself. The items in Blue are information only available in the spotlight database. The last two may be of particular interest to an investigator.

Note - The screenshot above is not from any special version of the code, actual output is plain text, it has no coloring! Colors were added just for explanation.

The spotlight_parser has been incorporated into mac_apt as the SPOTLIGHT plugin. In mac_apt, the output is also available in an sqlite database making it easier to query. Mark McKinnon has forked a version of this library and also added sqlite capability, it is available here.

While this exposes the data and makes it available, it is still not easy to query. Perhaps one of these days, I will write a GUI application with drop-down boxes for easily accessing and querying the output data.

Friday, July 20, 2018

APFS template for 010 Editor

For quite some time, I've been analyzing APFS mostly with custom python code, which is not very efficient and rather time consuming and is not visual. Since most people doing any kind of serious hex editing use the 010 Editor (as do I), this was long overdue.

I've created an 010 template, which is basically a port from the apfs.ksy project. This has taken quite a bit of time and I hope you find it useful. Not all structures are known, there are some parts that may be incorrect. This is a work in progress as more details about APFS emerge..

Link: https://github.com/ydkhatri/APFS_010/blob/master/apfs.010.bt

The template will not parse out the file system tree yet. With APFS this is challenging to do within 010's template capabilities as you cannot create local objects or classes and/or store temporary objects. The template does however define most of the structures and will follow most pointers (to other disk blocks and parse them) automatically when you start expanding the structures in the template viewer.

To use the template, simply load your APFS image (unencrypted only) into 010. Then edit the template to set the Apfs_Offset variable to the byte offset of wherever your APFS partition starts. Now run the template. The APFS start offset can be located easily by running the GPT template (which you can find on 010's website or in the program's template repository). The GPT template will give you the sector offset, multiply it by sector size (usually 512 or sometimes 4096) to get the byte offset (location) of the APFS partition.

Thursday, May 3, 2018

Bash sessions in macOS (and why you need to understand its working)

While all versions of macOS have provided bash_history for users, since macOS 10.11 (El Capitan), we get even more information on terminal history through the bash sessions files. This is not a replacement for the old .bash_history file which is still there.

There are several problems with bash_history - you cannot tell when any command in that file was run, the sequence of commands may not be right, and so on. For more on that, refer Hal Pomeranz's excellent talk - You don't know jack about Bash history

Even if there were no anomalies and only a single terminal was always in use, there is still the issue of how do I know which command was run when? With Bash sessions, macOS gives us more data to work with. Since El Capitan, every new terminal window will be tracked independently with a TERM_SESSION_ID which appears to be a randomly generated UUID.

Figure 1 - Fetching terminal's session id

Each session can also be restored when you shutdown and restart your machine with the "Reopen windows when logging back in" option set. Perhaps for this purpose, session history (a subset of bash history) is tracked and saved separately on a per session basis.

Figure 2 - Restored session

Show me the artifacts!

The location you want to go to is  /Users/<USER>/.bash_sessions

You will find 3 files for each session as seen in screenshot below.

Figure 3 - .bash_sessions folder contents

TERM_SESSION_ID.history    --> Contains session history
TERM_SESSION_ID.historynew --> Mostly blank/empty
TERM_SESSION_ID.session    --> Contains the last session resume date and time

Figure 4 - Sample .session file

Figure 5 - Sample .history file showing commands typed at terminal 

How this helps?

Some (but not all) of the problems associated with reading .bash_history are now gone.
Theoretically, as bash history is now also stored on a per session basis, this should make it trivial to track commands run in different windows (sessions). If you were expecting history for a single session in its .history file, then you thought wrong. The .history file contains all previous history (from earlier sessions) and then appended at the very end, the history for this session.

So can we reliably break apart commands per session? Is the sequence of commands intact? Let's run a small experiment to find out.

We create two sessions (2 terminal windows) and run a few commands in each session. Commands are interspersed, so we run a command in Session-1, then another in Session-2 and then again something in Session-1. We will try to see if order is maintained.

Session-1 started 9:44
Session-2 started 9:51
Figure 6 - Commands run with their sequence

Session-1 closed 9:57
Session-2 closed 9:59

Session-1 is closed first, followed by Session-2.  Here is a snippet of relevant metadata from the resulting files:

Figure 7 - Relevant metadata from stat command

Fun Facts

The start and stop time for a session is available if you look at the crtime (File Created time) for the .history and .historynew files. These are in bold in the screenshot above.

Created Time of TERM_SESSION_ID.historynew = Session created time
Created Time of TERM_SESSION_ID.history        = Session end time

Isolating session data

By comparing the data in various .history files (from different sessions), you can find out exactly which commands belong to a particular session. See pic below, where lines 1-181 (not shown) are from older history (other past sessions). Lines 182-184 are from Session-1 and are seen in its history file at the end. Session-2 (closed after Session-1) has the same format, ie, old session history with this session's history appended (lines 185-189).

Figure 8- .history files from Session-1 (Left) and Session-2 (Right)

This is easily done in code and the mac_apt BASHSESSIONS plugin parses this information to break out the individual commands per session, along with session start and stop time.

While you still cannot get the exact time when an individual command was run, the sessions functionality does give you a very good narrowed time frame to work with. While we do not have the absolute order of commands ("cp -h" was run before "printenv"), we do have a narrowed time-frame for the set of commands ("cp-h" run between 9:51-9:59 and "printenv" run between 9:44-9:57). This is a big thing for analysts and investigators!

Friday, February 2, 2018

Reading Notes database on macOS

The notes app comes built-in with every OSX/macOS release since OSX 10.8 (Mountain Lion). It is great for quick notes and keeps notes synced in the cloud. It can have potentially useful tidbits of user information in an investigation.

Artifact breakdown  & Forensics


Depending on version of macOS, this can vary. Sometimes there is more than one database, probably as a result of an upgrade! But only one is actively used at any given time (not both). So far, we haven't seen any duplicate data when two are present.

Location 1

In here, databases will be named as one of the following:
NotesV1.storedata ← Mountain Lion  (Thanks Geoff Black)
NotesV2.storedata ← Mavericks
NotesV4.storedata ← Yosemite
NotesV6.storedata ← Elcapitan & Sierra
NotesV7.storedata ← HighSierra
If a note has an attachment, then the attachment is usually stored at the following location:

There does not appear to be much difference between the database types. The following tables have been seen.
Tables for NotesV2
Tables for NotesV6

Each note can be associated with either a local account or an online one. The following account information can be obtained.  

Account email address, id and username from ZACCOUNT table

Individual note data is stored in ZNOTEBODY and the rest of the tables provide information about note parent folder, sync information, and attachment locations. 

Notes converts all data to HTML as seen below.

ZNOTES Table with note Html content
The graphic below shows how you can find and resolve note attachments to their locations on disk. If an attachment is present, ZNOTEBODY.ZHTMLSTRING will contain the UUID of that which can be matched up to the ZATTACHMENT.ZCONTENTID to get a binary plist blob. When parsed, you can find the full path to the attachment in the plist.

Resolving attachment location

The dates and times fetched are Mac Absolute time, which is the number of seconds since 1/1/2001.

Reading the data

The following SQL query will pull out most pertinent information from this database:

SELECT n.Z_PK as note_id, datetime(n.ZDATECREATED + 978307200, 'unixepoch') as created, datetime(n.ZDATEEDITED + 978307200, 'unixepoch') as edited, n.ZTITLE, (SELECT ZNAME from ZFOLDER where n.ZFOLDER=ZFOLDER.Z_PK) as Folder,(SELECT zf2.ZACCOUNT from ZFOLDER as zf1  LEFT JOIN ZFOLDER as zf2 on (zf1.ZPARENT=zf2.Z_PK) where n.ZFOLDER=zf1.Z_PK) as folder_parent,ac.ZEMAILADDRESS as email, ac.ZACCOUNTDESCRIPTION, b.ZHTMLSTRING, att.ZCONTENTID, att.ZFILEURLFROM ZNOTE as nLEFT JOIN ZNOTEBODY as b ON b.ZNOTE = n.Z_PKLEFT JOIN ZATTACHMENT as att ON att.ZNOTE = n.Z_PKLEFT JOIN ZACCOUNT as ac ON ac.Z_PK = folder_parent

Location 2

/Users/<USER>/Library/Group Containers/group.com.apple.notes/NoteStore.sqlite
This one has been seen on El Capitan, Sierra and HighSierra. Attachments are stored in the Media folder located here:
/Users/<USER>/Library/Group Containers/group.com.apple.notes/Media/<UUID>/
Here UUID is the unique identifier for each attachment. The database scheme is different here.

NoteStore.sqlite in HighSierra

NotesStore.sqlite in ElCapitan

Only account name and identifier (UUID) is available. To get full account information, this will need to be correlated with the account info database stored elsewhere.
Note data is available in the ZICNOTEDATA table. 

That ZICNOTEDATA.ZDATA blob is gzip compressed. Upon decompression, it reveals the note data stored in a proprietary unknown binary format. As seen below, you can spot the text, its formatting information and attachment info.

ZDATA gzipped blob showing gzip signature (1F8B08)
Uncompressed ZDATA blob showing text, formatting and attachment info

Most notable in this database is the presence of several timestamps-
Attachment Modified → ZMODIFICATIONDATE
Attachment Preview Updated → ZPREVIEWUPDATEDATE

Reading NoteStore.sqlite

The following SQL query will pull out most pertinent information from this database:
SELECT n.Z_12FOLDERS as folder_id , n.Z_9NOTES as note_id, d.ZDATA as data,
c2.ZTITLE2 as folder,
datetime(c2.ZDATEFORLASTTITLEMODIFICATION + 978307200, 'unixepoch') as folder_title_modified,
datetime(c1.ZCREATIONDATE + 978307200, 'unixepoch') as created,
datetime(c1.ZMODIFICATIONDATE1 + 978307200, 'unixepoch')  as modified,
c1.ZSNIPPET as snippet, c1.ZTITLE1 as title, c1.ZACCOUNT2 as acc_id,
c5.ZACCOUNTTYPE as acc_type, c5.ZIDENTIFIER as acc_identifier, c5.ZNAME as acc_name,
c3.ZMEDIA as media_id, c3.ZFILESIZE as att_filesize,
datetime(c3.ZMODIFICATIONDATE + 978307200, 'unixepoch') as att_modified,
datetime(c3.ZPREVIEWUPDATEDATE + 978307200, 'unixepoch') as att_previewed,
c3.ZTITLE as att_title, c3.ZTYPEUTI, c3.ZIDENTIFIER as att_uuid,
c4.ZFILENAME, c4.ZIDENTIFIER as media_uuid
ORDER BY note_id
On HighSierra, use this query:
If you are looking for an automated way to read this, use mac_apt, the NOTES plugin will parse it.

Tuesday, December 12, 2017

mac_apt + APFS support

Over the past few months, I've been working at adding APFS support into mac_apt, and its finally here. Version 0.2 of mac_apt is now available with APFS support. It also adds a new plugin to process print jobs, some enhanced functionality in other plugins and several minor bug fixes.

As of now basic APFS support is complete, mac_apt can view and extract any file on the file system, including compressed files. It does not have support for FileVault2 (encryption) and will not handle an encrypted volume. The checkpoint feature in APFS is currently not supported or tested although this may be added later.

This is the first forensic processing tool (in freeware) to support APFS. I believe at this time, Sumuri Recon is the only commercial one. I am unaware of any other that can read APFS.

I would like to thank Kurt-Helge Hansen for publishing the paper detailing APFS internal structure and working. He was also helpful in providing a proof of concept code for the same.

The implementation we've used is based on the APFS template built with kaitai-struct. Kaitai-Struct is a library that makes it easy to define and read C structures. It will generate all the code required to read those structures. For APFS, the kaitai-struct template was developed originally by Jonas Plum (@cugu_pio) & Thomas Tempelmann (@tempelorg) here.

APFS working and implementation

The approach we've taken is to read all inodes and populate a database with this data. This means we have to read the entire filesystem data upfront before we have information to read a single file. It isn't ideal, it practically takes 2-4 min. time to do this on an image having default macOS installation (using my slow regular SATA III external disk over USB3), which is not too bad. But I opted for this path as it is the only solution available for now. Why? The way APFS stores files in its b-tree, they are not sorted by name alphabetically. Instead, a 3 byte hash is computed for each file name and the b-tree maintains nodes sorted by this hash instead. The problem is that this hash algorithm is currently unknown. It may just be some sort of CRC variant or something very different. Until this algorithm is known, we cannot write a native parser that walks the b-tree. Hence the database for now.

The database does offer us several advantages though. For compressed file information, we can pre-process the logical size and save that for quick retrieval. In APFS, a compressed file will have its logical size set to zero in file metadata. To lookup its real size, you have to go read its compressed data header (which may be inline or in a resource fork), parse it and get the uncompressed (logical) size. This often means going out to an extent to read it, which makes it slow. Pre-populating this info in a database makes it much quicker for later analysis.

APFS allows extended attributes to be defined and used just the same as HFS+. This means a file can have extended attributes and those are used to save compression parameters (similar to HFS+). APFS also uses Copy-On-Write, which means if you copy a file, the resulting copy will not duplicate the data on disk. Both inodes (original and copy) will point to the same original extents. Only when the copy is changed will new extents be allocated.

If you are not familiar with APFS, the Disk Info output from mac_apt might look strange to you.

Screenshot - Disk Info data from mac_apt showing same offset & size for all APFS volumes
This may be read as 4 partitions all type APFS having the same exact starting offset and size! The reason for this is that APFS is a little different. It isn't just defining a volume, rather it implements a container which can host several volumes in it! This output is from a default installation of HighSierra, where the Disk partitioning scheme is GPT and it defines 2 partitions as seen in the screenshot below.
Illustration showing APFS container and volumes within the Disk

The APFS container by default does not put a limit on the size or location of those volumes within it (Preboot, Macintosh HD, VM, Recovery). Unlike normal partitions on disk where sectors are allocated for each volume before you can use the volumes, APFS allows all volumes to share a common pool of extents (clusters) and they all report having total free space as the same. But really its shared space, so you cannot sum it up for all volumes. This also means data from all volumes is interspersed and volumes are not contiguous. The design makes sense as the target media is flash memory (SSD), and it will never be contiguous there (as it did on spinning HDDs) because of the way flash memory chips work.