Friday, February 2, 2018

Reading Notes database on macOS

The notes app comes built-in with every OSX/macOS release since OSX 10.8 (Mountain Lion). It is great for quick notes and keeps notes synced in the cloud. It can have potentially useful tidbits of user information in an investigation.

Artifact breakdown  & Forensics


Depending on version of macOS, this can vary. Sometimes there is more than one database, probably as a result of an upgrade! But only one is actively used at any given time (not both). So far, we haven't seen any duplicate data when two are present.

Location 1

In here, databases will be named as one of the following:
NotesV1.storedata ← Mountain Lion  (Thanks Geoff Black)
NotesV2.storedata ← Mavericks
NotesV4.storedata ← Yosemite
NotesV6.storedata ← Elcapitan
NotesV7.storedata ← HighSierra
If a note has an attachment, then the attachment is usually stored at the following location:

There does not appear to be much difference between the database types. The following tables have been seen.
Tables for NotesV2
Tables for NotesV6

Each note can be associated with either a local account or an online one. The following account information can be obtained.  

Account email address, id and username from ZACCOUNT table

Individual note data is stored in ZNOTEBODY and the rest of the tables provide information about note parent folder, sync information, and attachment locations. 

Notes converts all data to HTML as seen below.

ZNOTES Table with note Html content
The graphic below shows how you can find and resolve note attachments to their locations on disk. If an attachment is present, ZNOTEBODY.ZHTMLSTRING will contain the UUID of that which can be matched up to the ZATTACHMENT.ZCONTENTID to get a binary plist blob. When parsed, you can find the full path to the attachment in the plist.

Resolving attachment location

The dates and times fetched are Mac Absolute time, which is the number of seconds since 1/1/2001.

Reading the data

The following SQL query will pull out most pertinent information from this database:

SELECT n.Z_PK as note_id, datetime(n.ZDATECREATED + 978307200, 'unixepoch') as created, datetime(n.ZDATEEDITED + 978307200, 'unixepoch') as edited, n.ZTITLE, (SELECT ZNAME from ZFOLDER where n.ZFOLDER=ZFOLDER.Z_PK) as Folder,(SELECT zf2.ZACCOUNT from ZFOLDER as zf1  LEFT JOIN ZFOLDER as zf2 on (zf1.ZPARENT=zf2.Z_PK) where n.ZFOLDER=zf1.Z_PK) as folder_parent,ac.ZEMAILADDRESS as email, ac.ZACCOUNTDESCRIPTION, b.ZHTMLSTRING, att.ZCONTENTID, att.ZFILEURLFROM ZNOTE as nLEFT JOIN ZNOTEBODY as b ON b.ZNOTE = n.Z_PKLEFT JOIN ZATTACHMENT as att ON att.ZNOTE = n.Z_PKLEFT JOIN ZACCOUNT as ac ON ac.Z_PK = folder_parent

Location 2

/Users/<USER>/Library/Group Containers/
This one has been seen on El Capitan, Sierra and HighSierra. Attachments are stored in the Media folder located here:
/Users/<USER>/Library/Group Containers/<UUID>/
Here UUID is the unique identifier for each attachment. The database scheme is different here.

NoteStore.sqlite in HighSierra

NotesStore.sqlite in ElCapitan

Only account name and identifier (UUID) is available. To get full account information, this will need to be correlated with the account info database stored elsewhere.
Note data is available in the ZICNOTEDATA table. 

That ZICNOTEDATA.ZDATA blob is gzip compressed. Upon decompression, it reveals the note data stored in a proprietary unknown binary format. As seen below, you can spot the text, its formatting information and attachment info.

ZDATA gzipped blob showing gzip signature (1F8B08)
Uncompressed ZDATA blob showing text, formatting and attachment info

Most notable in this database is the presence of several timestamps-
Attachment Modified → ZMODIFICATIONDATE
Attachment Preview Updated → ZPREVIEWUPDATEDATE

Reading NoteStore.sqlite

The following SQL query will pull out most pertinent information from this database:
SELECT n.Z_12FOLDERS as folder_id , n.Z_9NOTES as note_id, d.ZDATA as data,
c2.ZTITLE2 as folder,
datetime(c2.ZDATEFORLASTTITLEMODIFICATION + 978307200, 'unixepoch') as folder_title_modified,
datetime(c1.ZCREATIONDATE + 978307200, 'unixepoch') as created,
datetime(c1.ZMODIFICATIONDATE1 + 978307200, 'unixepoch')  as modified,
c1.ZSNIPPET as snippet, c1.ZTITLE1 as title, c1.ZACCOUNT2 as acc_id,
c5.ZACCOUNTTYPE as acc_type, c5.ZIDENTIFIER as acc_identifier, c5.ZNAME as acc_name,
c3.ZMEDIA as media_id, c3.ZFILESIZE as att_filesize,
datetime(c3.ZMODIFICATIONDATE + 978307200, 'unixepoch') as att_modified,
datetime(c3.ZPREVIEWUPDATEDATE + 978307200, 'unixepoch') as att_previewed,
c3.ZTITLE as att_title, c3.ZTYPEUTI, c3.ZIDENTIFIER as att_uuid,
c4.ZFILENAME, c4.ZIDENTIFIER as media_uuid
ORDER BY note_id
On HighSierra, use this query:
If you are looking for an automated way to read this, use mac_apt, the NOTES plugin will parse it.

Tuesday, December 12, 2017

mac_apt + APFS support

Over the past few months, I've been working at adding APFS support into mac_apt, and its finally here. Version 0.2 of mac_apt is now available with APFS support. It also adds a new plugin to process print jobs, some enhanced functionality in other plugins and several minor bug fixes.

As of now basic APFS support is complete, mac_apt can view and extract any file on the file system, including compressed files. It does not have support for FileVault2 (encryption) and will not handle an encrypted volume. The checkpoint feature in APFS is currently not supported or tested although this may be added later.

This is the first forensic processing tool (in freeware) to support APFS. I believe at this time, Sumuri Recon is the only commercial one. I am unaware of any other that can read APFS.

I would like to thank Kurt-Helge Hansen for publishing the paper detailing APFS internal structure and working. He was also helpful in providing a proof of concept code for the same.

The implementation we've used is based on the APFS template built with kaitai-struct. Kaitai-Struct is a library that makes it easy to define and read C structures. It will generate all the code required to read those structures. For APFS, the kaitai-struct template was developed originally by Jonas Plum (@cugu_pio) & Thomas Tempelmann (@tempelorg) here.

APFS working and implementation

The approach we've taken is to read all inodes and populate a database with this data. This means we have to read the entire filesystem data upfront before we have information to read a single file. It isn't ideal, it practically takes 2-4 min. time to do this on an image having default macOS installation (using my slow regular SATA III external disk over USB3), which is not too bad. But I opted for this path as it is the only solution available for now. Why? The way APFS stores files in its b-tree, they are not sorted by name alphabetically. Instead, a 3 byte hash is computed for each file name and the b-tree maintains nodes sorted by this hash instead. The problem is that this hash algorithm is currently unknown. It may just be some sort of CRC variant or something very different. Until this algorithm is known, we cannot write a native parser that walks the b-tree. Hence the database for now.

The database does offer us several advantages though. For compressed file information, we can pre-process the logical size and save that for quick retrieval. In APFS, a compressed file will have its logical size set to zero in file metadata. To lookup its real size, you have to go read its compressed data header (which may be inline or in a resource fork), parse it and get the uncompressed (logical) size. This often means going out to an extent to read it, which makes it slow. Pre-populating this info in a database makes it much quicker for later analysis.

APFS allows extended attributes to be defined and used just the same as HFS+. This means a file can have extended attributes and those are used to save compression parameters (similar to HFS+). APFS also uses Copy-On-Write, which means if you copy a file, the resulting copy will not duplicate the data on disk. Both inodes (original and copy) will point to the same original extents. Only when the copy is changed will new extents be allocated.

If you are not familiar with APFS, the Disk Info output from mac_apt might look strange to you.

Screenshot - Disk Info data from mac_apt showing same offset & size for all APFS volumes
This may be read as 4 partitions all type APFS having the same exact starting offset and size! The reason for this is that APFS is a little different. It isn't just defining a volume, rather it implements a container which can host several volumes in it! This output is from a default installation of HighSierra, where the Disk partitioning scheme is GPT and it defines 2 partitions as seen in the screenshot below.
Illustration showing APFS container and volumes within the Disk

The APFS container by default does not put a limit on the size or location of those volumes within it (Preboot, Macintosh HD, VM, Recovery). Unlike normal partitions on disk where sectors are allocated for each volume before you can use the volumes, APFS allows all volumes to share a common pool of extents (clusters) and they all report having total free space as the same. But really its shared space, so you cannot sum it up for all volumes. This also means data from all volumes is interspersed and volumes are not contiguous. The design makes sense as the target media is flash memory (SSD), and it will never be contiguous there (as it did on spinning HDDs) because of the way flash memory chips work.

Thursday, September 28, 2017

APFS timestamps

From APFS documentation it is revealed that the new timestamp has a nano second resolution. From my data I can see several timestamps in what appears to be the root directory for a test APFS volume I created. Armed with this information, it was pretty easy to guess the epoch - it is the Unix epoch.

ApfsTimeStamp = number of nano-seconds since 1-1-1970

If you like to use the 010 editor for analysis, put the following code in your or file:

// ApfsTime
//  64-bit integer, number of nanoseconds since 01/01/1970 00:00:00

typedef uint64 ApfsTime <read=ApfsTimeRead, write=ApfsTimeWrite>;
FSeek( pos ); ApfsTime _aft <name="ApfsTimes">;
string ApfsTimeRead( ApfsTime t )
    // Convert to FILETIME
    return FileTimeToString( t/100L + 116444736000000000L );
int ApfsTimeWrite( ApfsTime &t, string value )
    // Convert from FILETIME
    FILETIME ft;
    int result = StringToFileTime( value, ft );
    t = (((uint64)ft - 116444736000000000L)*100L);
    return result;

Now 010 can interpret APFS timestamps :)

010 interprets ApfsTimes

If you need to read this timestamp in python (and convert to python datetime), the following function will do this:

import datetime

def ConvertApfsTime(ts):
        return datetime.datetime(1970,1,1) + datetime.timedelta(microseconds=ts / 1000. )
  return None

Monday, September 18, 2017

Interpreting volume info from FXDesktopVolumePositions

On a mac, if you wanted to get a list of all connected volumes, you would typically lookup FXDesktopVolumePositions under the file (for each user). This is available under /Users/<USER>/Library/Preferences/  The volumes listed here can include anything mounted, a CD, a DMG file, a volume from external disk, network mapped volume or anything else that mac sees as a mounted volume/device.

Figure 1: Snippet of 'FKDesktopVolumePositions' from

The above snippet shows every volume connected to one of our test machines. What's strange is the way the volume information is stored. Ignoring the icon position data (which resides a level below not shown in that screenshot), all that's available is a string that looks like this:


Studying this a bit, it appears that the random looking hex number after the name is not really random. It represents the 'Created date' of the volume's root folder (for most but not all the entries!). The date format is Mac Absolute Time (ie, number of seconds since 2001).
The picture below shows the breakup.

Figure 2: Interpreting volume info from FXDesktopVolumePositions entry

The first part of the entry (till the last underscore) is the name as it appears in the Finder window. The next part ignoring the period is the created date. This is followed by p+ and then a decimal number (usually 0, 26, 27, 28 or 29). I believe that number represents a volume type. From what I can see in some limited testing, if volume type=28, then the hex number is always the created date.

Large external disks (USB) that identify as fixed disks and most DMG volumes will show up as type 28. All USB removable disks show up under type 29 and here the hex number is not a created date, it is unknown what this may be. Sometimes this number is negative for type 29, and a lot of volumes share the same number.

There is still some mystery about the hex number. For type=28, sometimes it is less than 8 digits long, and needs to be padded with zeroes at the end. This does produce an accurate date then. Also, sometimes it is longer than 8 digits! In these cases, truncating the number at 8 digits has again produced an accurate date in limited testing. It is unclear what those extra digits would denote.

Update: As pointed out by Geoff Black, the part after the underscore is just the way floats are represented (including the p+xx part).

Following this discovery, mac_apt has been updated to parse this date.

Sunday, September 3, 2017

Releasing mac_apt - macOS Artifact Parsing Tool

Over the last several months, I've been developing a tool to process mac full disk images. It's an extendible framework (in python) which has plugins to parse different artifacts on a mac. The idea was to make a tool that an examiner could run as soon as a mac image was obtained to have artifacts parsed out in an automated fashion. And it had to be free and open-source, that ran on windows/linux (or mac) without the requirement of needing a mac to process mac images. 

Here are the specifics:
  • INPUT - E01 or dd image or DMG(not compressed)
  • OUTPUT - Xlsx, Csv, Sqlite
  • Parsed artifacts (files) are also exported for later manual review.
  • lzvn compressed files (quite a few on recent OSX versions including key plist files) are supported too!

As of now, there are plugins to parse network information (ip, dhcp, interfaces), wifi hotspots, basic machine settings (timezone, hostname, HFS info, disk info,..), recent items (file, volume, applications,..) , local & domain user info, notifications, Safari history and Spotlight shortcuts. More are in the works.. Tested on OSX images from 10.9-10.12

The project is currently in alpha status. Let us know of bugs/issues, any feature requests, ..

The project is available on GitHub here.

Some motivations behind this project:
- Learn Python
- Learn more about OSX

Wednesday, August 2, 2017

Finding the Serial number of a Mac from disk image

On a mac (osx/macOS), the serial number is usually not stored on the disk, it is stored in the firmware and available either printed on the backside/underside of your mac/macbook computer or accessible via software on a booted system using 'About My Mac' or System Profiler.

On recent versions of OSX, there are however a few system databases that store this information and make it available for forensic investigators to use (or for verification). These are:
  • consolidated.db
  • cache_encryptedA.db
  • lockCache_encryptedA.db
All the above files are sqlite databases located in the 'root' user's Darwin user cache folder located under /private/var/folders/zz/zyxvpxvq6csfxvn_n00000sm00006d/C/. This location should be the same for all OSX/macOS installations (10.9 & above) because UID and UUID of root is same on all systems and does not change. 
For more information on Darwin folders, see this blog post. 

Screenshot 1 - Table 'TableInfo' inside consolidated.db showing Serial Number

In the above screenshot, the serial number is seen starting with 'VM'. It starts with VM since this was a virtual machine; for real machines, you will see the actual hardware serial number here. I was able to verify this on several macs running osx 10.9 to 10.12. 

In addition, other software might retrieve and store this information too. One such software, is KeyAccess, installed by Sassafras asset management system. KeyAccess leaves behind a binary file /Library/Preferences/KeyAccess/KeyAccess Prefs which also contains the serial number.

Another place where you might find the serial is sysinfo.cache. This is created by Apple Remote Desktop and is found at /var/db/RemoteManagement/caches/sysinfo.cache.

Thursday, April 27, 2017

The mystery of /var/folders on OSX

The /var/folders (which is actually /private/var/folders) has been a known but undocumented entity for a very long time. Apple does not document why its there, or what it is. But there are plenty of guides online that suggest it may be good to periodically clear those folders to recover disk space and perhaps speed up you system. Whether that is wise is beyond the scope of this blog post.

If you've ever done a MacOS (OSX) forensic examination, you've probably noticed some odd folders here, all two character folder names with long random looking subfolders. Something like /var/folders/_p/vbvyjz297778zzwzhfpsqb2w0000gl/. A level below are 3 folders C, T, 0 and you can then see familiar data files under those. The 3 folders represent Cache (C), Temporary files (T) and User files (0)

On a live system these can be queried using the command 'getconf VARIABLE' where VARIABLE can be DARWIN_USER_CACHE_DIR, DARWIN_USER_TEMP_DIR or DARWIN_USER_DIR.

These are locations where mac stores cache and temporary files from various programs and system utilities. They are further segregated on a per-user basis. So each of those folders (Example: /var/folders/_p/vbvyjz297778zzwzhfpsqb2w0000gl) represents a single user's temporary space. 

Whats in it for forensicators?

From a forensics examination perspective, there is not a lot of artifacts here. However some useful artifacts like Notifications databases and Quicklook thumbnailcache databases are located here. The Launchpad dock database is also here. Sometimes you can find useful tidbits of cache from other apps too.

Figure: /private/var/folders on MacOS 10.12 (Sierra) 

It would be nice to be able to determine which user owned a particular folder when analyzing the artifacts within it. This is actually really easy, as you can just look up the owner uid of the folder. But if you are more interested in how the name gets generated, read on.

Reverse engineering the Algorithm

There is an old forum post here that does not provide the algorithm but hints that its likely generated from uuid and uid. Both of which are available from the user's plist (under /var/db/dslocal/nodes/Default/users/<USER>.plist). From the narration, the character set used in the folder names would be '+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'. However that's not what is seen on newer macs (10.10 onwards)

After analyzing the folder names on my mac test machines, the data set was narrowed down to 0-9, underscore and all alphabets except vowels (aeiou). A little bit of research confirmed this, when the string '0123456789_bcdfghjklmnpqrstvwxyz' was seen in the libsystem_coreservices.dylib binary. The string is 32 char in size, so you would need 5 bits to represent an index which could point to a specific character in the string. After a bit of experimentation with creating a few user accounts and setting custom UUIDs, it was clear how this worked. The  algorithm takes the 128 bit UUID string as a binary bitstream, appends to it the binary bitstream of the UID (4 byte), then runs a single pass over that data. For each 5 bits it reads, it uses that as an index to get a single char from the charset array and copies that to output. A python implementation that generate these folder names (for both new osx versions and older ones) is provided here.