Tuesday, August 21, 2018

An open source spotlight parser

Spotlight is the name of the indexing system which comes built into macOS. It is responsible for continuous indexing of files and folders on all attached volumes. It keeps a copy of all metadata for almost every single file and folder on disk.

Thus, it can provide some excellent data for your investigation. While much of the same information can be obtained if you have access to the full disk image, it is known that there is information in this database that is not available elsewhere. Details like Date(s) Last Opened or Number of Times (an application or file) is Opened/Used are not available anywhere else on the file system. Unfortunately though it uses a proprietary undocumented format, and no publicly available code existed to read it. So over the last few months, I’ve been studying the file format of these databases and have created a tool/library to read and extract the data contained within.

The library and tool are open sourced now and located here:

The format of the database will be discussed in a later post.

For those familiar with macOS, you know this data (contained in the database) can be obtained on a locally mounted volume using the macOS built-in mdls utility. However to do this, you need to mount your disk image on a mac to do so and the utility can only be run on individual files/folders, not the entire disk. It can be run recursively (with a bit of command line fu) on the entire volume but the output is not easy to read then.

If you don't prefer to do that, run spotlight_parser instead. Just point it to the database files which are named store and .store (located in the /.Spotlight-V100/Store-V2/<UUID> folder) and let it parse out the complete database for you.

Here is a screenshot of spotlight_parser running. Depending on how much data is contained in the database, this can take anywhere between a few seconds to a few minutes (5-10 on very large disks with lots of files).

Figure 1 - Running spotlight_parser

Once done, you will have 2 files as output. One is a text file (prefix_data.txt) containing the database dump of all entries. The other is a CSV (actually tab separated) which tries to build a path for every file/folder using inode number (CNID) from data available in the database. Since not every single folder may be included, some paths may not resolve and you might get ..NOT FOUND.. in the path sometimes along with an error on the console as seen above.

In the prefix_data.txt file, you will see some XML content (configuration information) at the beginning followed by database entries for files and folders.
Below is a snippet of the prefix_data.txt file, showing only output for a single jpg image file.

Figure 2 - Output showing a single jpg file's metadata information from database

Here the text in Red is metadata pertaining to a single entry in the database, including the date and time it was last updated. This is followed by the metadata itself. The items in Blue are information only available in the spotlight database. The last two may be of particular interest to an investigator.

Note - The screenshot above is not from any special version of the code, actual output is plain text, it has no coloring! Colors were added just for explanation.

The spotlight_parser has been incorporated into mac_apt as the SPOTLIGHT plugin. In mac_apt, the output is also available in an sqlite database making it easier to query. Mark McKinnon has forked a version of this library and also added sqlite capability, it is available here.

While this exposes the data and makes it available, it is still not easy to query. Perhaps one of these days, I will write a GUI application with drop-down boxes for easily accessing and querying the output data.

Friday, July 20, 2018

APFS template for 010 Editor

For quite some time, I've been analyzing APFS mostly with custom python code, which is not very efficient and rather time consuming and is not visual. Since most people doing any kind of serious hex editing use the 010 Editor (as do I), this was long overdue.

I've created an 010 template, which is basically a port from the apfs.ksy project. This has taken quite a bit of time and I hope you find it useful. Not all structures are known, there are some parts that may be incorrect. This is a work in progress as more details about APFS emerge..

Link: https://github.com/ydkhatri/APFS_010/blob/master/apfs.010.bt

The template will not parse out the file system tree yet. With APFS this is challenging to do within 010's template capabilities as you cannot create local objects or classes and/or store temporary objects. The template does however define most of the structures and will follow most pointers (to other disk blocks and parse them) automatically when you start expanding the structures in the template viewer.

To use the template, simply load your APFS image (unencrypted only) into 010. Then edit the template to set the Apfs_Offset variable to the byte offset of wherever your APFS partition starts. Now run the template. The APFS start offset can be located easily by running the GPT template (which you can find on 010's website or in the program's template repository). The GPT template will give you the sector offset, multiply it by sector size (usually 512 or sometimes 4096) to get the byte offset (location) of the APFS partition.

Thursday, May 3, 2018

Bash sessions in macOS (and why you need to understand its working)

While all versions of macOS have provided bash_history for users, since macOS 10.11 (El Capitan), we get even more information on terminal history through the bash sessions files. This is not a replacement for the old .bash_history file which is still there.

There are several problems with bash_history - you cannot tell when any command in that file was run, the sequence of commands may not be right, and so on. For more on that, refer Hal Pomeranz's excellent talk - You don't know jack about Bash history

Even if there were no anomalies and only a single terminal was always in use, there is still the issue of how do I know which command was run when? With Bash sessions, macOS gives us more data to work with. Since El Capitan, every new terminal window will be tracked independently with a TERM_SESSION_ID which appears to be a randomly generated UUID.

Figure 1 - Fetching terminal's session id

Each session can also be restored when you shutdown and restart your machine with the "Reopen windows when logging back in" option set. Perhaps for this purpose, session history (a subset of bash history) is tracked and saved separately on a per session basis.

Figure 2 - Restored session

Show me the artifacts!

The location you want to go to is  /Users/<USER>/.bash_sessions

You will find 3 files for each session as seen in screenshot below.

Figure 3 - .bash_sessions folder contents

TERM_SESSION_ID.history    --> Contains session history
TERM_SESSION_ID.historynew --> Mostly blank/empty
TERM_SESSION_ID.session    --> Contains the last session resume date and time

Figure 4 - Sample .session file

Figure 5 - Sample .history file showing commands typed at terminal 

How this helps?

Some (but not all) of the problems associated with reading .bash_history are now gone.
Theoretically, as bash history is now also stored on a per session basis, this should make it trivial to track commands run in different windows (sessions). If you were expecting history for a single session in its .history file, then you thought wrong. The .history file contains all previous history (from earlier sessions) and then appended at the very end, the history for this session.

So can we reliably break apart commands per session? Is the sequence of commands intact? Let's run a small experiment to find out.

We create two sessions (2 terminal windows) and run a few commands in each session. Commands are interspersed, so we run a command in Session-1, then another in Session-2 and then again something in Session-1. We will try to see if order is maintained.

Session-1 started 9:44
Session-2 started 9:51
Figure 6 - Commands run with their sequence

Session-1 closed 9:57
Session-2 closed 9:59

Session-1 is closed first, followed by Session-2.  Here is a snippet of relevant metadata from the resulting files:

Figure 7 - Relevant metadata from stat command

Fun Facts

The start and stop time for a session is available if you look at the crtime (File Created time) for the .history and .historynew files. These are in bold in the screenshot above.

Created Time of TERM_SESSION_ID.historynew = Session created time
Created Time of TERM_SESSION_ID.history        = Session end time

Isolating session data

By comparing the data in various .history files (from different sessions), you can find out exactly which commands belong to a particular session. See pic below, where lines 1-181 (not shown) are from older history (other past sessions). Lines 182-184 are from Session-1 and are seen in its history file at the end. Session-2 (closed after Session-1) has the same format, ie, old session history with this session's history appended (lines 185-189).

Figure 8- .history files from Session-1 (Left) and Session-2 (Right)

This is easily done in code and the mac_apt BASHSESSIONS plugin parses this information to break out the individual commands per session, along with session start and stop time.

While you still cannot get the exact time when an individual command was run, the sessions functionality does give you a very good narrowed time frame to work with. While we do not have the absolute order of commands ("cp -h" was run before "printenv"), we do have a narrowed time-frame for the set of commands ("cp-h" run between 9:51-9:59 and "printenv" run between 9:44-9:57). This is a big thing for analysts and investigators!

Friday, February 2, 2018

Reading Notes database on macOS

The notes app comes built-in with every OSX/macOS release since OSX 10.8 (Mountain Lion). It is great for quick notes and keeps notes synced in the cloud. It can have potentially useful tidbits of user information in an investigation.

Artifact breakdown  & Forensics


Depending on version of macOS, this can vary. Sometimes there is more than one database, probably as a result of an upgrade! But only one is actively used at any given time (not both). So far, we haven't seen any duplicate data when two are present.

Location 1

In here, databases will be named as one of the following:
NotesV1.storedata ← Mountain Lion  (Thanks Geoff Black)
NotesV2.storedata ← Mavericks
NotesV4.storedata ← Yosemite
NotesV6.storedata ← Elcapitan & Sierra
NotesV7.storedata ← HighSierra
If a note has an attachment, then the attachment is usually stored at the following location:

There does not appear to be much difference between the database types. The following tables have been seen.
Tables for NotesV2
Tables for NotesV6

Each note can be associated with either a local account or an online one. The following account information can be obtained.  

Account email address, id and username from ZACCOUNT table

Individual note data is stored in ZNOTEBODY and the rest of the tables provide information about note parent folder, sync information, and attachment locations. 

Notes converts all data to HTML as seen below.

ZNOTES Table with note Html content
The graphic below shows how you can find and resolve note attachments to their locations on disk. If an attachment is present, ZNOTEBODY.ZHTMLSTRING will contain the UUID of that which can be matched up to the ZATTACHMENT.ZCONTENTID to get a binary plist blob. When parsed, you can find the full path to the attachment in the plist.

Resolving attachment location

The dates and times fetched are Mac Absolute time, which is the number of seconds since 1/1/2001.

Reading the data

The following SQL query will pull out most pertinent information from this database:

SELECT n.Z_PK as note_id, datetime(n.ZDATECREATED + 978307200, 'unixepoch') as created, datetime(n.ZDATEEDITED + 978307200, 'unixepoch') as edited, n.ZTITLE, (SELECT ZNAME from ZFOLDER where n.ZFOLDER=ZFOLDER.Z_PK) as Folder,(SELECT zf2.ZACCOUNT from ZFOLDER as zf1  LEFT JOIN ZFOLDER as zf2 on (zf1.ZPARENT=zf2.Z_PK) where n.ZFOLDER=zf1.Z_PK) as folder_parent,ac.ZEMAILADDRESS as email, ac.ZACCOUNTDESCRIPTION, b.ZHTMLSTRING, att.ZCONTENTID, att.ZFILEURLFROM ZNOTE as nLEFT JOIN ZNOTEBODY as b ON b.ZNOTE = n.Z_PKLEFT JOIN ZATTACHMENT as att ON att.ZNOTE = n.Z_PKLEFT JOIN ZACCOUNT as ac ON ac.Z_PK = folder_parent

Location 2

/Users/<USER>/Library/Group Containers/group.com.apple.notes/NoteStore.sqlite
This one has been seen on El Capitan, Sierra and HighSierra. Attachments are stored in the Media folder located here:
/Users/<USER>/Library/Group Containers/group.com.apple.notes/Media/<UUID>/
Here UUID is the unique identifier for each attachment. The database scheme is different here.

NoteStore.sqlite in HighSierra

NotesStore.sqlite in ElCapitan

Only account name and identifier (UUID) is available. To get full account information, this will need to be correlated with the account info database stored elsewhere.
Note data is available in the ZICNOTEDATA table. 

That ZICNOTEDATA.ZDATA blob is gzip compressed. Upon decompression, it reveals the note data stored in a proprietary unknown binary format. As seen below, you can spot the text, its formatting information and attachment info.

ZDATA gzipped blob showing gzip signature (1F8B08)
Uncompressed ZDATA blob showing text, formatting and attachment info

Most notable in this database is the presence of several timestamps-
Attachment Modified → ZMODIFICATIONDATE
Attachment Preview Updated → ZPREVIEWUPDATEDATE

Reading NoteStore.sqlite

The following SQL query will pull out most pertinent information from this database:
SELECT n.Z_12FOLDERS as folder_id , n.Z_9NOTES as note_id, d.ZDATA as data,
c2.ZTITLE2 as folder,
datetime(c2.ZDATEFORLASTTITLEMODIFICATION + 978307200, 'unixepoch') as folder_title_modified,
datetime(c1.ZCREATIONDATE + 978307200, 'unixepoch') as created,
datetime(c1.ZMODIFICATIONDATE1 + 978307200, 'unixepoch')  as modified,
c1.ZSNIPPET as snippet, c1.ZTITLE1 as title, c1.ZACCOUNT2 as acc_id,
c5.ZACCOUNTTYPE as acc_type, c5.ZIDENTIFIER as acc_identifier, c5.ZNAME as acc_name,
c3.ZMEDIA as media_id, c3.ZFILESIZE as att_filesize,
datetime(c3.ZMODIFICATIONDATE + 978307200, 'unixepoch') as att_modified,
datetime(c3.ZPREVIEWUPDATEDATE + 978307200, 'unixepoch') as att_previewed,
c3.ZTITLE as att_title, c3.ZTYPEUTI, c3.ZIDENTIFIER as att_uuid,
c4.ZFILENAME, c4.ZIDENTIFIER as media_uuid
ORDER BY note_id
On HighSierra, use this query:
If you are looking for an automated way to read this, use mac_apt, the NOTES plugin will parse it.

Tuesday, December 12, 2017

mac_apt + APFS support

Over the past few months, I've been working at adding APFS support into mac_apt, and its finally here. Version 0.2 of mac_apt is now available with APFS support. It also adds a new plugin to process print jobs, some enhanced functionality in other plugins and several minor bug fixes.

As of now basic APFS support is complete, mac_apt can view and extract any file on the file system, including compressed files. It does not have support for FileVault2 (encryption) and will not handle an encrypted volume. The checkpoint feature in APFS is currently not supported or tested although this may be added later.

This is the first forensic processing tool (in freeware) to support APFS. I believe at this time, Sumuri Recon is the only commercial one. I am unaware of any other that can read APFS.

I would like to thank Kurt-Helge Hansen for publishing the paper detailing APFS internal structure and working. He was also helpful in providing a proof of concept code for the same.

The implementation we've used is based on the APFS template built with kaitai-struct. Kaitai-Struct is a library that makes it easy to define and read C structures. It will generate all the code required to read those structures. For APFS, the kaitai-struct template was developed originally by Jonas Plum (@cugu_pio) & Thomas Tempelmann (@tempelorg) here.

APFS working and implementation

The approach we've taken is to read all inodes and populate a database with this data. This means we have to read the entire filesystem data upfront before we have information to read a single file. It isn't ideal, it practically takes 2-4 min. time to do this on an image having default macOS installation (using my slow regular SATA III external disk over USB3), which is not too bad. But I opted for this path as it is the only solution available for now. Why? The way APFS stores files in its b-tree, they are not sorted by name alphabetically. Instead, a 3 byte hash is computed for each file name and the b-tree maintains nodes sorted by this hash instead. The problem is that this hash algorithm is currently unknown. It may just be some sort of CRC variant or something very different. Until this algorithm is known, we cannot write a native parser that walks the b-tree. Hence the database for now.

The database does offer us several advantages though. For compressed file information, we can pre-process the logical size and save that for quick retrieval. In APFS, a compressed file will have its logical size set to zero in file metadata. To lookup its real size, you have to go read its compressed data header (which may be inline or in a resource fork), parse it and get the uncompressed (logical) size. This often means going out to an extent to read it, which makes it slow. Pre-populating this info in a database makes it much quicker for later analysis.

APFS allows extended attributes to be defined and used just the same as HFS+. This means a file can have extended attributes and those are used to save compression parameters (similar to HFS+). APFS also uses Copy-On-Write, which means if you copy a file, the resulting copy will not duplicate the data on disk. Both inodes (original and copy) will point to the same original extents. Only when the copy is changed will new extents be allocated.

If you are not familiar with APFS, the Disk Info output from mac_apt might look strange to you.

Screenshot - Disk Info data from mac_apt showing same offset & size for all APFS volumes
This may be read as 4 partitions all type APFS having the same exact starting offset and size! The reason for this is that APFS is a little different. It isn't just defining a volume, rather it implements a container which can host several volumes in it! This output is from a default installation of HighSierra, where the Disk partitioning scheme is GPT and it defines 2 partitions as seen in the screenshot below.
Illustration showing APFS container and volumes within the Disk

The APFS container by default does not put a limit on the size or location of those volumes within it (Preboot, Macintosh HD, VM, Recovery). Unlike normal partitions on disk where sectors are allocated for each volume before you can use the volumes, APFS allows all volumes to share a common pool of extents (clusters) and they all report having total free space as the same. But really its shared space, so you cannot sum it up for all volumes. This also means data from all volumes is interspersed and volumes are not contiguous. The design makes sense as the target media is flash memory (SSD), and it will never be contiguous there (as it did on spinning HDDs) because of the way flash memory chips work.

Thursday, September 28, 2017

APFS timestamps

From APFS documentation it is revealed that the new timestamp has a nano second resolution. From my data I can see several timestamps in what appears to be the root directory for a test APFS volume I created. Armed with this information, it was pretty easy to guess the epoch - it is the Unix epoch.

ApfsTimeStamp = number of nano-seconds since 1-1-1970

If you like to use the 010 editor for analysis, put the following code in your Inspector.bt or InspectorDates.bt file:

// ApfsTime
//  64-bit integer, number of nanoseconds since 01/01/1970 00:00:00

typedef int64 ApfsTime <read=ApfsTimeRead, write=ApfsTimeWrite>;
FSeek( pos ); ApfsTime _aft <name="ApfsTimes">;
string ApfsTimeRead( ApfsTime t )
    // Convert to FILETIME
    return FileTimeToString( t/100L + 116444736000000000L );
int ApfsTimeWrite( ApfsTime &t, string value )
    // Convert from FILETIME
    FILETIME ft;
    int result = StringToFileTime( value, ft );
    t = (((int64)ft - 116444736000000000L)*100L);
    return result;

Now 010 can interpret APFS timestamps :)

010 interprets ApfsTimes

If you need to read this timestamp in python (and convert to python datetime), the following function will do this:

import datetime

def ConvertApfsTime(ts):
        return datetime.datetime(1970,1,1) + datetime.timedelta(microseconds=ts / 1000. )
  return None

Update: As noted by Joachim, the timestamp is an int64, and has been corrected in the above code.

Monday, September 18, 2017

Interpreting volume info from FXDesktopVolumePositions

On a mac, if you wanted to get a list of all connected volumes, you would typically lookup FXDesktopVolumePositions under the com.apple.finder.plist file (for each user). This is available under /Users/<USER>/Library/Preferences/  The volumes listed here can include anything mounted, a CD, a DMG file, a volume from external disk, network mapped volume or anything else that mac sees as a mounted volume/device.

Figure 1: Snippet of 'FKDesktopVolumePositions' from com.apple.finder.plist

The above snippet shows every volume connected to one of our test machines. What's strange is the way the volume information is stored. Ignoring the icon position data (which resides a level below not shown in that screenshot), all that's available is a string that looks like this:


Studying this a bit, it appears that the random looking hex number after the name is not really random. It represents the 'Created date' of the volume's root folder (for most but not all the entries!). The date format is Mac Absolute Time (ie, number of seconds since 2001).
The picture below shows the breakup.

Figure 2: Interpreting volume info from FXDesktopVolumePositions entry

The first part of the entry (till the last underscore) is the name as it appears in the Finder window. The next part ignoring the period is the created date. This is followed by p+ and then a decimal number (usually 0, 26, 27, 28 or 29). I believe that number represents a volume type. From what I can see in some limited testing, if volume type=28, then the hex number is always the created date.

Large external disks (USB) that identify as fixed disks and most DMG volumes will show up as type 28. All USB removable disks show up under type 29 and here the hex number is not a created date, it is unknown what this may be. Sometimes this number is negative for type 29, and a lot of volumes share the same number.

There is still some mystery about the hex number. For type=28, sometimes it is less than 8 digits long, and needs to be padded with zeroes at the end. This does produce an accurate date then. Also, sometimes it is longer than 8 digits! In these cases, truncating the number at 8 digits has again produced an accurate date in limited testing. It is unclear what those extra digits would denote.

Update: As pointed out by Geoff Black, the part after the underscore is just the way floats are represented (including the p+xx part).

Following this discovery, mac_apt has been updated to parse this date.