Yogesh Khatri's forensic blog: 2017

Tuesday, December 12, 2017

mac_apt + APFS support

Over the past few months, I've been working at adding APFS support into mac_apt, and its finally here. Version 0.2 of mac_apt is now available with APFS support. It also adds a new plugin to process print jobs, some enhanced functionality in other plugins and several minor bug fixes.

As of now basic APFS support is complete, mac_apt can view and extract any file on the file system, including compressed files. It does not have support for FileVault2 (encryption) and will not handle an encrypted volume. The checkpoint feature in APFS is currently not supported or tested although this may be added later.

This is the first forensic processing tool (in freeware) to support APFS. I believe at this time, Sumuri Recon is the only commercial one. I am unaware of any other that can read APFS.

I would like to thank Kurt-Helge Hansen for publishing the paper detailing APFS internal structure and working. He was also helpful in providing a proof of concept code for the same.

The implementation we've used is based on the APFS template built with kaitai-struct. Kaitai-Struct is a library that makes it easy to define and read C structures. It will generate all the code required to read those structures. For APFS, the kaitai-struct template was developed originally by Jonas Plum (@cugu_pio) & Thomas Tempelmann (@tempelorg) here.

APFS working and implementation

The approach we've taken is to read all inodes and populate a database with this data. This means we have to read the entire filesystem data upfront before we have information to read a single file. It isn't ideal, it practically takes 2-4 min. time to do this on an image having default macOS installation (using my slow regular SATA III external disk over USB3), which is not too bad. But I opted for this path as it is the only solution available for now. Why? The way APFS stores files in its b-tree, they are not sorted by name alphabetically. Instead, a 3 byte hash is computed for each file name and the b-tree maintains nodes sorted by this hash instead. The problem is that this hash algorithm is currently unknown. It may just be some sort of CRC variant or something very different. Until this algorithm is known, we cannot write a native parser that walks the b-tree. Hence the database for now.

The database does offer us several advantages though. For compressed file information, we can pre-process the logical size and save that for quick retrieval. In APFS, a compressed file will have its logical size set to zero in file metadata. To lookup its real size, you have to go read its compressed data header (which may be inline or in a resource fork), parse it and get the uncompressed (logical) size. This often means going out to an extent to read it, which makes it slow. Pre-populating this info in a database makes it much quicker for later analysis.

APFS allows extended attributes to be defined and used just the same as HFS+. This means a file can have extended attributes and those are used to save compression parameters (similar to HFS+). APFS also uses Copy-On-Write, which means if you copy a file, the resulting copy will not duplicate the data on disk. Both inodes (original and copy) will point to the same original extents. Only when the copy is changed will new extents be allocated.

If you are not familiar with APFS, the Disk Info output from mac_apt might look strange to you.

Screenshot - Disk Info data from mac_apt showing same offset & size for all APFS volumes

This may be read as 4 partitions all type APFS having the same exact starting offset and size! The reason for this is that APFS is a little different. It isn't just defining a volume, rather it implements a container which can host several volumes in it! This output is from a default installation of HighSierra, where the Disk partitioning scheme is GPT and it defines 2 partitions as seen in the screenshot below.

Illustration showing APFS container and volumes within the Disk

The APFS container by default does not put a limit on the size or location of those volumes within it (Preboot, Macintosh HD, VM, Recovery). Unlike normal partitions on disk where sectors are allocated for each volume before you can use the volumes, APFS allows all volumes to share a common pool of extents (clusters) and they all report having total free space as the same. But really its shared space, so you cannot sum it up for all volumes. This also means data from all volumes is interspersed and volumes are not contiguous. The design makes sense as the target media is flash memory (SSD), and it will never be contiguous there (as it did on spinning HDDs) because of the way flash memory chips work.

Thursday, September 28, 2017

APFS timestamps

From APFS documentation it is revealed that the new timestamp has a nano second resolution. From my data I can see several timestamps in what appears to be the root directory for a test APFS volume I created. Armed with this information, it was pretty easy to guess the epoch - it is the Unix epoch.

ApfsTimeStamp = number of nano-seconds since 1-1-1970

If you like to use the 010 editor for analysis, put the following code in your Inspector.bt or InspectorDates.bt file:

//----------------------------------------------------------------

// ApfsTime

// 64-bit integer, number of nanoseconds since 01/01/1970 00:00:00

typedef int64 ApfsTime <read=ApfsTimeRead, write=ApfsTimeWrite>;

FSeek( pos ); ApfsTime _aft <name="ApfsTimes">;

string ApfsTimeRead( ApfsTime t )

{

// Convert to FILETIME

return FileTimeToString( t/100L + 116444736000000000L );

}

int ApfsTimeWrite( ApfsTime &t, string value )

{

// Convert from FILETIME

FILETIME ft;

int result = StringToFileTime( value, ft );

t = (((int64)ft - 116444736000000000L)*100L);

return result;

}

Now 010 can interpret APFS timestamps :)

010 interprets ApfsTimes

If you need to read this timestamp in python (and convert to python datetime), the following function will do this:

import datetime

def ConvertApfsTime(ts):
try:
return datetime.datetime(1970,1,1) + datetime.timedelta(microseconds=ts / 1000. )
except:
pass
return None

Update: As noted by Joachim, the timestamp is an int64, and has been corrected in the above code.

Monday, September 18, 2017

Interpreting volume info from FXDesktopVolumePositions

On a mac, if you wanted to get a list of all connected volumes, you would typically lookup FXDesktopVolumePositions under the com.apple.finder.plist file (for each user). This is available under /Users/<USER>/Library/Preferences/ The volumes listed here can include anything mounted, a CD, a DMG file, a volume from external disk, network mapped volume or anything else that mac sees as a mounted volume/device.

Figure 1: Snippet of 'FKDesktopVolumePositions' from com.apple.finder.plist

The above snippet shows every volume connected to one of our test machines. What's strange is the way the volume information is stored. Ignoring the icon position data (which resides a level below not shown in that screenshot), all that's available is a string that looks like this:

HandBrake-0.10.2-MacOSX.6_GUI_x86_64_0x1.b2a122dp+28

Studying this a bit, it appears that the random looking hex number after the name is not really random. It represents the 'Created date' of the volume's root folder (for most but not all the entries!). The date format is Mac Absolute Time (ie, number of seconds since 2001).
The picture below shows the breakup.

Figure 2: Interpreting volume info from FXDesktopVolumePositions entry

The first part of the entry (till the last underscore) is the name as it appears in the Finder window. The next part ignoring the period is the created date. This is followed by p+ and then a decimal number (usually 0, 26, 27, 28 or 29). I believe that number represents a volume type. From what I can see in some limited testing, if volume type=28, then the hex number is always the created date.

Large external disks (USB) that identify as fixed disks and most DMG volumes will show up as type 28. All USB removable disks show up under type 29 and here the hex number is not a created date, it is unknown what this may be. Sometimes this number is negative for type 29, and a lot of volumes share the same number.

There is still some mystery about the hex number. For type=28, sometimes it is less than 8 digits long, and needs to be padded with zeroes at the end. This does produce an accurate date then. Also, sometimes it is longer than 8 digits! In these cases, truncating the number at 8 digits has again produced an accurate date in limited testing. It is unclear what those extra digits would denote.

Update: As pointed out by Geoff Black, the part after the underscore is just the way floats are represented (including the p+xx part).

Following this discovery, mac_apt has been updated to parse this date.

Sunday, September 3, 2017

Releasing mac_apt - macOS Artifact Parsing Tool

Over the last several months, I've been developing a tool to process mac full disk images. It's an extendible framework (in python) which has plugins to parse different artifacts on a mac. The idea was to make a tool that an examiner could run as soon as a mac image was obtained to have artifacts parsed out in an automated fashion. And it had to be free and open-source, that ran on windows/linux (or mac) without the requirement of needing a mac to process mac images.

Here are the specifics:

INPUT - E01 or dd image or DMG(not compressed)
OUTPUT - Xlsx, Csv, Sqlite
Parsed artifacts (files) are also exported for later manual review.
lzvn compressed files (quite a few on recent OSX versions including key plist files) are supported too!

As of now, there are plugins to parse network information (ip, dhcp, interfaces), wifi hotspots, basic machine settings (timezone, hostname, HFS info, disk info,..), recent items (file, volume, applications,..) , local & domain user info, notifications, Safari history and Spotlight shortcuts. More are in the works.. Tested on OSX images from 10.9-10.12

The project is currently in alpha status. Let us know of bugs/issues, any feature requests, ..

The project is available on GitHub here.

Some motivations behind this project:
- Learn Python
- Learn more about OSX

Wednesday, August 2, 2017

Finding the Serial number of a Mac from disk image

On a mac (osx/macOS), the serial number is usually not stored on the disk, it is stored in the firmware and available either printed on the backside/underside of your mac/macbook computer or accessible via software on a booted system using 'About My Mac' or System Profiler.

On recent versions of OSX, there are however a few system databases that store this information and make it available for forensic investigators to use (or for verification). These are:

consolidated.db
cache_encryptedA.db
lockCache_encryptedA.db

All the above files are sqlite databases located in the 'root' user's Darwin user cache folder located under /private/var/folders/zz/zyxvpxvq6csfxvn_n00000sm00006d/C/. This location should be the same for all OSX/macOS installations (10.9 & above) because UID and UUID of root is same on all systems and does not change.

For more information on Darwin folders, see this blog post.

Screenshot 1 - Table 'TableInfo' inside consolidated.db showing Serial Number

In the above screenshot, the serial number is seen starting with 'VM'. It starts with VM since this was a virtual machine; for real machines, you will see the actual hardware serial number here. I was able to verify this on several macs running osx 10.9 to 10.12.

In addition, other software might retrieve and store this information too. One such software, is KeyAccess, installed by Sassafras asset management system. KeyAccess leaves behind a binary file /Library/Preferences/KeyAccess/KeyAccess Prefs which also contains the serial number.

Another place where you might find the serial is sysinfo.cache. This is created by Apple Remote Desktop and is found at /var/db/RemoteManagement/caches/sysinfo.cache.

Thursday, April 27, 2017

The mystery of /var/folders on OSX

The /var/folders (which is actually /private/var/folders) has been a known but undocumented entity for a very long time. Apple does not document why its there, or what it is. But there are plenty of guides online that suggest it may be good to periodically clear those folders to recover disk space and perhaps speed up you system. Whether that is wise is beyond the scope of this blog post.

If you've ever done a MacOS (OSX) forensic examination, you've probably noticed some odd folders here, all two character folder names with long random looking subfolders. Something like /var/folders/_p/vbvyjz297778zzwzhfpsqb2w0000gl/. A level below are 3 folders C, T, 0 and you can then see familiar data files under those. The 3 folders represent Cache (C), Temporary files (T) and User files (0)

On a live system these can be queried using the command 'getconf VARIABLE' where VARIABLE can be DARWIN_USER_CACHE_DIR, DARWIN_USER_TEMP_DIR or DARWIN_USER_DIR.

These are locations where mac stores cache and temporary files from various programs and system utilities. They are further segregated on a per-user basis. So each of those folders (Example: /var/folders/_p/vbvyjz297778zzwzhfpsqb2w0000gl) represents a single user's temporary space.

Whats in it for forensicators?

From a forensics examination perspective, there is not a lot of artifacts here. However some useful artifacts like Notifications databases and Quicklook thumbnailcache databases are located here. The Launchpad dock database is also here. Sometimes you can find useful tidbits of cache from other apps too.

Figure: /private/var/folders on MacOS 10.12 (Sierra)

It would be nice to be able to determine which user owned a particular folder when analyzing the artifacts within it. This is actually really easy, as you can just look up the owner uid of the folder. But if you are more interested in how the name gets generated, read on.

Reverse engineering the Algorithm

There is an old forum post here that does not provide the algorithm but hints that its likely generated from uuid and uid. Both of which are available from the user's plist (under /var/db/dslocal/nodes/Default/users/<USER>.plist). From the narration, the character set used in the folder names would be '+-0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz'. However that's not what is seen on newer macs (10.9 onwards)

After analyzing the folder names on my mac test machines, the data set was narrowed down to 0-9, underscore and all alphabets except vowels (aeiou). A little bit of research confirmed this, when the string '0123456789_bcdfghjklmnpqrstvwxyz' was seen in the libsystem_coreservices.dylib binary. The string is 32 char in size, so you would need 5 bits to represent an index which could point to a specific character in the string. After a bit of experimentation with creating a few user accounts and setting custom UUIDs, it was clear how this worked. The algorithm takes the 128 bit UUID string as a binary bitstream, appends to it the binary bitstream of the UID (4 byte), then runs a single pass over that data. For each 5 bits it reads, it uses that as an index to get a single char from the charset array and copies that to output. A python implementation that generate these folder names (for both new osx versions and older ones) is provided here.

References:

http://www.magnusviri.com/Mac/what-is-var-folders.html

https://arstechnica.com/civis/viewtopic.php?f=19&t=42677

Thursday, January 5, 2017

Flexing SQL muscle for parsing an MS db on OSX

This post is about using recursive SQL queries (or rather a single recursive query) to parse the MicrosoftRegistrationDB.reg file created by Microsoft office on Mac OSX systems.

A little background..

On OSX (mac), there is no registry. Most apps just use plist files instead to save local information. Microsoft Office leaves cache and configuration information in plist files like every other OSX application. However it also keeps a copy in this file – microsoftRegistrationDB.reg. The file can be found here –

/Users/research/Library/Group Containers/xxxxxxxxxx.Office/MicrosoftRegistrationDB.reg

This is an sqlite database which is a flattened version of the registry tree that office would create in windows under HKCU\Software\Microsoft\Office, the format of which is quite straight-forward and documented. Some useful MRU artifacts and configuration settings reside here.

The sqlite database has the same fields as in the registry, namely - key, key_last_modified_time, value_name, value_type and value_data. This is nicely arranged in the following table structure.

Figure 1 - Database table schema

Pulling the data out is fairly simple in SQL. However, if you wish to recreate _all_ the registry paths from the flattened tree, then it’s a bit more involved. In the HKEY_CURRENT_USER table, each key has a single entry and along with the key name, you have the parent key reference. As an analyst, you would like to get full key path (i.e. HKCU\Software\Microsoft\...) for every value. There lies the problem. To recreate the paths for every single registry value, you would have to run several individual SQL queries, each query would fetch a key’s parent, and you keep doing that till you reach the ROOT of the tree. You could do this in a recursive function in python. Or you can let SQL do the recursion by running a recursive query. Sqlite supports recursive queries. You can read up about recursive queries in sqlite here and here.

The Final Query

SELECT t2.node_id, t2.write_time, path as Key,
HKEY_CURRENT_USER_values.name as valueName,
HKEY_CURRENT_USER_values.value as value,
HKEY_CURRENT_USER_values.type as valueType from
(
WITH RECURSIVE
under_software(path,name,node_id,write_time) AS
(
VALUES('Software','',1,0)
UNION ALL
SELECT under_software.path || '\' || HKEY_CURRENT_USER.name,
HKEY_CURRENT_USER.name, HKEY_CURRENT_USER.node_id,
HKEY_CURRENT_USER.write_time
FROM HKEY_CURRENT_USER JOIN under_software ON
HKEY_CURRENT_USER.parent_id=under_software.node_id
ORDER BY 1
)
SELECT name, path, write_time, node_id FROM under_software
)
as t2 LEFT JOIN HKEY_CURRENT_USER_values on
HKEY_CURRENT_USER_values.node_id=t2.node_id;

Here the ‘WITH RECURSIVE’ part will perform the recursive querying for every value item. It will create the full key path for that value. The line:
‘SELECT under_software.path || '\' || HKEY_CURRENT_USER.name’ will concatenate the parent key path with the sub-key name using backslash as separator. The double-pipe ‘||’ is the concatenate operator. The 'ORDER BY 1' is not really necessary, but this makes it sort the output by the first parameter to the recursive function, i.e, path.

A python script to do this automatically is available here. This script will run the recursive query on the database and then provide both a csv and a plist as output. I chose to output as a plist too because this data is best viewed as a tree as shown below.

Figure 2 - Sample plist produced by script (viewed in plist Editor Pro)

Pages