Saturday, January 9, 2021

Gboard has some interesting data..

Gboard - the Google Keyboard, is the default keyboard on Pixel devices, and overall has been installed over a billion times according to the Play Store.

Although not the default on most non-Google brands, it is a popular app installed by foreign language users because of its good support and convenience of use particularly with dozens of Asian and Indian languages.

As a keyboard app, it monitors and analyzes your keystrokes, offering suggestions and corrections for spelling and grammar, sentence completion and even emoji suggestions. 

Now for the interesting part. Since the last few versions, it also retains a lot of data (ie, user keystrokes!) in its cache. This is at least seen from the version from Jan 2020 (v 8.3.x). From a DFIR perspective, that is GOLD. For a forensic examiner, this can possibly show you data that was typed by the user on an app that is now deleted, or show messages typed that were then deleted, or messages from apps that have the disappearing message feature turned on! Or data entered into fields on web pages/online apps (that wouldn't be stored locally at all). Also for some apps that don't track when a particular item was created/modified, this could be useful.

Note - The Signal app wasn't specifically tested to see if data from that app is retained, but based on what we can see here, it seems likely those messages would end up here too. All testing was on a Pixel 3 running latest Android 11 using the default keyboard, and default settings. This was also verified on other earlier taken images. Josh Hickman's Android 10 Pixel 3 image was also used, and Josh was able to verify that Telegram and WhatsApp sent messages were present here. The specific versions of Gboard databases studied were:

  • (on Android 10)
  • (on Android 10)
  • (on Android 11)


Gboard's app data (sandbox) folder is located here:


Here you might see a number of databases that start with trainingcache*. These are the files that contain the caches.

Figure 1 - Contents of Gboard's databases folder (v

In different versions of the app, the database formats and names have changed a bit. Of these, useful data can be found in trainingcache2.db, trainingcache3.db and trainingcachev2.db. Let's examine some of them now.

trainingcache2.db (v

The table training_input_events_table contains information about the application in focus, its field name (where input was sent), the timestamp of event and a protobuf BLOB stored in _payload field, as shown in screenshot below.

Figure 2 - training_input_events_table (not all columns shown)

The highlighted entry above is from an app that was since deleted. The _payload BLOB is decoded in screenshot below, highlighting the text typed by the user in the Email input field. The protobuf has also has all of the data included in the other columns in the table.

Figure 3 - Decoded Protobuf from _payload column

In most instances however, the protobuf looks like this - see screenshot below, where input needs to be put back together as shown.. Here you can see the words the user typed as well as suggestions offered by the app. Suggestions can be for spelling, grammar, or contact names, or something else.
Figure 4 - Decoded protobuf - reconstructing user input

Above, you can see the words typed and suggestions offered. On an Android device, the suggestions appear as shown below while typing.

Figure 5 - Android keyboard highlighting suggested words

trainingcache3.db (v

In version 8.x, this same database is named trainingcache2.db, and follows the same exact format. The table s_table looks similar to the training_input_events_table seen earlier. However, the _payload field does not store the keystokes here.

Figure 6 - s_table

Figure 7 - _payload protobuf decoded from s_table

Keystroke data is stored in the table tf_table. Here, most entries are a single key press, and to read this, it again needs to be put back together as shown below.

Figure 8 - tf_table entries

All keystrokes from the same session have the same f1 value (a timestamp like field but not used as a timestamp). The order of the keys pressed is stored in f4. Assuming they are all in order, we can run a short query to concatenate the f3 column values for easy reading (shown below). This isn't perfect, as group_concat() doesn't guarantee order of concatenation, but it seems to work for now!

Figure 9 - Reading keystroke sessions from tf_table

We can combine (join) this data with the one from s_table to recreate the same data as we got from training_input_events_table earlier. 

Figure 10 - joined tables

In the screenshot shown above, you can even see data being typed into a google doc, not saved locally. Only a snippet is shown above, but if you want to see the full parsed data, get Josh's Android image(s), and the latest version of ALEAPP (code), which now parses this out. Below is a preview (from a different image my students might recognize).

Figure 11 - ALEAPP output showing trainingcache parsed output

Cached keystroke data can also be seen and reconstructed from trainingcachev2.db, whose format is a bit different (not discussed here). Nothing of significance was found in trainingcache4 or the other databases. 


As expected, keystrokes from password fields are not stored or tracked.

In data reconstructed from tf_table, you can see all the spelling mistakes a user made while typing! Any corrections made in the middle of a word/sentence will be seen at the end (because we are getting the raw keystokes in order of keys pressed). Hence it might be difficult to read some input. Also, if a user types something into a field, then deletes a word(s), and retypes, you won't see the final edited (clean) version, as backspaces (delete) are not tracked. You can see some of this in the output above (figure 9).

The caches are periodically deleted (and likely size limited too), and so you shouldn't expect to find all user typed data here. 

Sunday, January 3, 2021

iOS Application Groups & Shared data


Tracking down an iOS application's Data folder, aka, SandboxPath in iOS is fairly easy. One simply needs to look at the applicationState.db sqlite database located under /private/var/mobile/Library/FrontBoard/ This is well known. 

However locating the sandbox folder for its AppGroups (and Extensions) is not so straight-forward. The suggested method by Scott Vance here, and recommended by few others too is to look for the file under each of the UUID folders:

  • /private/var/containers/Shared/SystemGroup/UUID/
  • /private/var/mobile/Containers/Shared/AppGroup/UUID/
  • /private/var/mobile/Containers/Data/InternalDaemon/UUID/
  • /private/var/mobile/Containers/Data/PluginKitPlugin/UUID/

As noted by Scott, the iLEAPP tool does this too, reading all the plists and listing out the path and its group name. For manual analysis, this works out great, as you can visually make out the app name from the group name. For example, the Notes app has bundle_id and one of its shared groups (where the actual Notes db is stored!) has the id

The Problem

For automated analysis, this approach does not work, as each app follows its own convention on naming for ids. A program cannot know that corresponds to Hence we search for something with a more direct reference connecting Shared Containers to their Apps. Before we proceed further, its important to understand the relationships between extensions, apps and shared containers. The diagram below does a good job of summarizing this. The shared containers are identified by AppGroups.

Figure 1 - iOS App, Extension, container relationships - Source:

The Solution

Fortunately, there is a database that tracks container information on iOS. It is located at /private/var/root/Library/MobileContainerManager/containers.sqlite3

It precisely lists all Apps, their extensions, AppGroups and Entitlements. As far as I can tell, this is the only place where this information is stored (apart from caches and logs). It does not have information about UUIDs. This database is listed in the SANS smartphone forensics poster, but I couldn't find any details on it elsewhere. 

The database structure is simple with just 3 main tables (and an sqlite_sequence one). 

Figure 2 - containers.sqlite database tables

The child_bundles table lists extensions and their owner Apps. In figure below, you can see the extensions for the app.

Figure 3 - child_bundles table, filtered on 'notes'

Or one could write a small query to list all apps with their extension names like shown below.

Figure 4 - App & Extensions - query and output

Information about AppGroups is found in the data field of the code_signing_data table as a BLOB, which stores a binary plist.

Figure 5 - Plist (for - cs_info_id 456) from ''

The Entitlements dictionary has a lot of information in it. If this App creates a shared AppGroup, then it will show up under There may also be groups under

Figure 6 - AppGroup information in Entitlements section (in plist)

So from the above data, we know that the Notes App has 5 extensions and 2 AppGroups, and we have the exact string names(aka ids) too - and . Correlating this data with information we found from files (from each UUID folder earlier), we can programmatically search and link the two as being part of the same App, based on the container id (AppGroup name).

Figure 7 - AppGroup/UUID folder showing plist's content and Container owner id

This methodology is implemented in the APPS plugin for ios_apt, which now lists every App, it's AppGroups, SystemGroups, Extensions, and all the relationships. So you don't have to do any of it manually now. Enjoy!

Figure 8 - Apps Table from ios_apt output (not all columns are shown here)

Figure 9 - AppGroupInfo Table from ios_apt output

Monday, December 28, 2020

Introducing ios_apt - iOS Artifact Parsing Tool

ios_apt is the new shiny companion to mac_apt

ios_apt is not a separate project, it's just a part of the mac_apt framework, and serves as a launch script that processes iOS/iPadOS artifacts. 

Why yet another iOS parsing tool, don't we already have too many?

In addition to paid tools, we have iLEAPP, APOLLO and a few others, and I am also an active contributor to some of them. This isn't meant to compete with them, rather it utilizes the mac_apt framework to prevent duplication of work. 

Many artifacts on iOS and macOS share common backend databases, configuration and artifact types. Among the artifacts that are almost identical are -

  • Spotlight
  • UnifiedLogging logs
  • Network usage database
  • Networking artifacts like hardware info and last IP leases
  • Safari
  • Notes
  • FSevents
  • ScreenTime

There are a few others too that aren't listed here. But you get the picture. Since mac_apt already parsed all of them, it made sense to just create an ios variant that parses these from ios extractions. 

Also many of these artifacts are fairly complex and other FOSS tools don't have the architecture needed to handle them. APOLLO only gathers information from SQLite databases. iLEAPP is geared towards single artifact parsing per plugin. It is not designed for multiple layers of parsing where information parsed from one artifact/file may be used as a key to jump to an artifact elsewhere on disk.


In its first version, ios_apt only works on full file system images extracted out to a folder. No support yet for zip/tar/dar/7z/other archives.

Available Plugins / Modules

The following Plugins are available as of now -

  • APPS
  • WIFI
Download the latest version of mac_apt to get ios_apt.

Sunday, July 19, 2020

KTX to PNG in Python for iOS snapshots

App snapshots on iOS are stored as KTX files, this is fairly well known at this point, thanks to the research by Geraldine Blay (@i_am_the_gia) and Alex Brignoni (@AlexisBrignoni) here and here. They even came up with a way to collect and convert them to PNG format. However that solution was only for macOS, and hence this research..


KTX is a file format used to store textures, and used commonly by OpenGL programs. The KTX file format is known and available here. There aren't many standalone utilities that work with KTX files, as it is mostly used in games and not for reading/distributing standalone textures. There are also no readily available python libraries to read it! The Khronos group that created the format distributes libktx, but it is C++ only. Even so, it would not be able to read iOS created ktx files (read on for the reasons mentioned below). The few Windows applications I could find like PicoPixel would not recognize Apple created KTX files as valid files.

So what is different here? A quick glance over the file in the hex editor showed that the texture data was stored in LZFSE compressed form, which currently only macOS/iOS can read.

Figure - Ascii view of 010 hex editor with ktx template

Now using pyliblzfse, I could decompress the data, and recreate a new KTX file with raw texture data. Even so, it would not render with KTX viewers other than macOS's Finder/Quickview and Preview. So I tried a few different approaches to get to the data.

Attempt 1 - Rendering & Export

Textures are different from 2D images and there is hence not a direct conversion from a textures to an image format. From available research, it seemed like the easiest way to do this would be to use OpenGL to render the texture data (extracted from the KTX file), then use OpenGL to save a 2D image of the rendered image. There is some sample code available too on the internet, but in order to get it to work, one would need to know how to use OpenGL to render textures, a learning curve that was too steep for me..

After spending several hours trying to get this to work in Python, I ultimately gave up as python is not the platform where major OpenGL development takes place, therefore there is little to no support, and libraries are platform dependent. I barely got one of the libraries to install correctly in Linux, and every step of the way I got more errors than I wanted to debug, ultimately I threw in the towel.

Attempt 2 - Convert texture data to RAW image data

Reading the KTX file header, the glInternalFormat value field from the header is 0x93B0 for all iOS produced KTX files (as seen in screenshot above). This value is the enumeration for COMPRESSED_RGBA_ASTC_4x4. So now we know the format is ASTC, which is Adaptive Scalable Texture Compression, a lossy compressed format for storing texture data, and uses a block size of 4x4 pixels. That simplifies our task to now finding a way to convert ASTC data to raw image data. A bit of searching led me to the python library astc_decomp which does precisely that. So what I needed now was to put the pieces together as follows:

  1. Read KTX file and parse format to get LZFSE compressed data, and parameters of Width and Height
  2. Decompress LZFSE to get ASTC data
  3. Convert ASTC to RAW image stream
  4. Save RAW image as PNG using PIL library

Combining this together, we are able to create a python script that can convert KTX files to PNG files. Get it here:

There is also a windows compiled executable there if you need to do this on windows without python. Alex Brignoni was helpful in sending me samples of KTX files to work with from multiple images. The code also works with KTX files that are not really KTX, ie, they have the .ktx extension but the header is 'AAPL'. The format is however similar and my code will parse them out too. If you do come across a file that does not work, send it to me and I can take a look.

A point to note is that not all KTX files use the COMPRESSED_RGBA_ASTC_4x4 format, only the iOS created ones do. So you may come across many KTX files deployed or shipped with apps that can't be parsed with this tool, as it only handles ASTC 4x4 format.


Tuesday, June 9, 2020

Screentime Notifications in Catalina (10.15)

If you routinely perform mac forensics, you've probably done a few macOS Catalina (10.15) examinations already. And if you are the kind that verifies your data, you may have noticed that for ScreenTime notifications the databases don't show you the same strings that you see in the actual displayed Notification and several forensic tools don't either.

Let's explore why.

To start with, lets review the format of the Notifications database. For macOS High Sierra (10.) and above, it is located at:


where the <xx>/<yyyyyy> portion represents what might appear like random strings, but they are not random. This folder path represents the DARWIN_USER_DIR for a specific user. For more details on this read my old post here.

Inside the database, the record table holds the actual notification data (title, sub-title, body) and date of notification among other fields. A simple database query can get the useful data.

  (SELECT identifier FROM app WHERE app.app_id=record.app_id) as app,
  uuid, data, presented, delivered_date
FROM record

The actual notification data is within a plist stored in the column data. Inside this plist, you can easily navigate to the items titl, subt and body to get the title, sub-title and body. However for screentime notifications, the data looks different. Instead of individual strings in these values, they are lists.

Figure 2 - Embedded plist for screentime notification
Screentime uses format strings and a list of data, which needs to be put back together. This is similar to how Event logs in windows or Unified logging in macOS works. The format strings are located at the paths shown below (for english) and are available in other languages too:


These files are plists which consist of a single dictionary each. So WeeklyReportNotificationNegativeDeltaBody seen in plist above resolves to the message :
"Your screen time was down %@ last week, for an average of %@ a day." The %@ will be replaced with data provided (15% and 6 hours, 24 minutes) becoming:
"Your screen time was down 15% last week, for an average of 6 hours, 24 minutes a day."

Figure 3 - Snippet of Localizable.strings plist
Similarly WeeklyReportNotificationTitle becomes Weekly Report Available. So now, we are able to reconstruct the complete original message.

mac_apt's NOTIFICATIONS plugin has now been updated with this functionality.

Monday, March 30, 2020

Parsing unknown protobufs with python

Protocol Buffers are quite popular, more and more apps and system files are storing data in this format in both iOS and Android operating systems. If you aren't familiar with Protocol Buffers, read this post. There I use the protoc.exe utility (by google), as does everyone else who needs to view this data, when you do not have the corresponding .proto file.

This is great! But the raw view/output has one big disadvantage. While this approach (--decode_raw) works fine if you just want to see the text strings stored in your data, it does not always provide the correct conversions for all the raw data types! 

According to google, when the message (data) is encoded, there are only 6 different types of data types allowed. These are known as wire types. Here are the allowed types (below).

Figure - Allowed wire types from

Unless you have the .proto file, you really don't know what the original data type may be. Even protoc.exe just makes a best guess. For instance, all binary blobs are also converted to strings with protoc as both the string and bytes type use the Length-delimited wiretype. There is also no way to tell if a number is to be interpreted as signed or unsigned, because they both use the same underlying type (varint)!

Now to raw-decode a protobuf in python, there are a couple of libraries I have seen so far that do a decent job. I will list out the libraries, then demonstrate parsing with them, and compare.

This seems to be more than 4 years old and not maintained any more. It is also in python2. There is a python3 port somewhere. It makes several assumptions regarding data types and attempts to produce output similar to protoc.

This is a more mature library that provides much more in functionality. It makes relatively few assumptions about data types. In addition to parsing the protobuf and returning a dictionary object, it also provides a type definition dictionary for the parsed data. 

To demonstrate what I am talking about, I created a demo protocol buffer file called addressbook.proto and defined a protobuf message as shown below.

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;
  required int64 id64 = 4;
  required uint64 uid64 = 5;
  optional double double = 6;
  optional bytes bytes = 7;

Then compiled it using protoc.exe.

protoc --python_out=. addressbook.proto

Now used a python script to include the compiled python protobuf header and generated a binary protobuf file called tester_pb. The data contained in it is shown below.

Actual data
  name: "John Doe"
  id: 1234
  email: "[email protected]"
  id64: -22
  uid64: 13360317589766481554
  double: 4.5566
  bytes: b'\x00\x124V'

Protoc outputprotoc --decode_raw < ..\tester_pb  )
1 {
  1: "John Doe"
  2: 1234
  3: "[email protected]"
  4: 18446744073709551594
  5: 13360317589766481554
  6: 0x401239f559b3d07d
  7: "\000\0224V"

protobuf-decoder output
  '01:00:string': 'John Doe',
  '02:01:Varint': 1234,
  '03:02:string': '[email protected]',
  '04:03:Varint': 18446744073709551594,
  '05:04:Varint': 13360317589766481554,
  '06:05:64-bit': 4.5566
  '07:06:string': '\x00\x124V'

blackboxprotobuf output (includes data dictionary and types dictionary)
  '1': b'John Doe',
  '2': 1234,
  '3': b'[email protected]',
  '4': -22,
  '5': -5086426483943070062,
  '6': 4616816293942907005,
  '7': b'\x00\x124V'
{'1': {'type': 'message', 'message_typedef':
  '1': {'type': 'bytes', 'name': ''},
  '2': {'type': 'int', 'name': ''},
  '3': {'type': 'bytes', 'name': ''},
  '4': {'type': 'int', 'name': ''},
  '5': {'type': 'int', 'name': ''},
  '6': {'type': 'fixed64', 'name': ''},
  '7': {'type': 'bytes', 'name': ''}
 }, 'name': ''}

As seen in the outputs above, each decoder makes some default assumptions about the data types encountered. The items highlighted in red are the ones that are interpreted using incorrect types. I like blackboxprotobuf because it lets you specify the real type via a types dictionary similar to the one it outputs. So once we have figured out the correct types, we can pass this into the decode_message() function to get the correct output. See code snippet below.

import blackboxprotobuf

with open('tester_pb', 'rb') as f:
    pb =
    types = {'1': {'type': 'message', 'message_typedef':
               '1': {'type': 'str', 'name': 'name'},

               '2': {'type': 'int', 'name': 'id'},
               '3': {'type': 'str', 'name': 'email'},
               '4': {'type': 'int', 'name': 'id64'},
               '5': {'type': 'uint', 'name': 'uid64'},
               '6': {'type': 'double', 'name': 'double'},
               '7': {'type': 'bytes', 'name': 'bytes'}
              }, 'name': ''}
    values, _ = blackboxprotobuf.decode_message(pb, types)

That produces the desired output -

  'name': 'John Doe',
  'id': 1234,
  'email': '[email protected]',
  'id64': -22,
  'uid64': 13360317589766481554,
  'double': 4.5566,
  'bytes': b'\x00\x124V'

In summary, I recommend using the blackboxprotobuf library, however note that it is not exactly plug and play. Since it is not on pypi, you have to use it from code. Also, to use it with python3, I had to make one small tweak. I also added the 'str' type decode as that was not available. Since then I have tested this with numerous protobuf streams and it has not failed me! For my updated version of this library, get it here.

Update (8/2020): I made more bug fixes and published it to pypi, so you can install it via pip now. 
  pip install blackboxprotobuf

Saturday, March 28, 2020

Google Search & Personal Assistant data on android

The Google app, previously known as Google Now, is installed by default on most phones. From the app's description -

The Google app keeps you in the know about things that matter to you. Find quick answers, explore your interests, and stay up to date with Discover. The more you use the Google app, the better it gets.

Search and browse:
- Nearby shops and restaurants
- Live sports scores and schedules
- Movies times, casts, and reviews
- Videos and images
- News, stock information, and more
- Anything you’d find on the web

It is that ubiquitous bar/widget sometimes called the Google Assistant Search Bar or just google Search widget found on the phone's home screen.

Figure 1 - Google Search / Personal Assistant Bar 

The internal package goes by the name It's artifacts are found at /data/data/

There are many files and folders here, but the most interesting data is the sub-folder files/recently

Your recent searches along with some full screen screenshots of search results are stored here. Screenshots (saved as jpg) are in .webp format. The unique number in the name is referenced by the data in the protobuf file (file name is the email address of the logged in user account). If you are not logged in, nothing is populated in this folder. See screenshots below.

Figure 2 - Folder 'recently' has no entries when no account was logged on.

Figure 3 - Folder 'recently' has files when searches were performed after logging in

The protobuf file ([email protected] in this case) when decoded has entries that look like this (see below) for a typical search. If you aren't familiar with protobuf decoding, read this.

1 {
  1: 15485946382791341007
  3: 0
  4: 1585414188066
  5: "dolphin"
  8 {
    1: "web"
    2: ""
  9: 10449902870035666886
  17: 1585413397978

In the protobuf data (decoded using protoc.exe), as seen above, we can easily distinguish the relevant fields:

Item Description
1 session id
4 timestamp1 (unix epoch)
5 search query
8 dictionary
1 = type of search (web, video, ..)
2 = search engine
9 screenshot-id (needs conversion to int from uint)
17 timestamp2 (unix epoch)

Here is the corresponding screenshot saved in the same folder -
Figure 4 - Screenshot of search for"dolphin"

If you clicked on a recent news story in the app, the protobuf entry looks like this (below):

1 {
  1: 9016892896339717414
  3: 1
  4: 1572444614834
  5: ""
  7 {
    1: ""
    2: ""
    3: "Photos of AirPods Pro arriving in stores around the world - 9to5Mac"
  9: 9016892896339717414
  10: 9
  17: 1572444614834
Figure 5 - Screenshot for news article clicked from link in google app

Last week, I added a plugin for ALEAPP to read these recent search artifacts. This isn't all, there is actually more data to be read here.

The search widget can be used to make any kind of query, which may then be forwarded to the web browser or Android Auto or the Email or Messaging apps depending on what was queried for. This makes for an interesting artifact.

From my test data, all searches are stored in the app_session folder as protobuf files having the extension .binarypb. See screenshot below.

Figure 6 - .binarypb files
Each of these files is a protobuf that stores a lot of data about the searches. This includes searches from Android Auto too. Josh Hickman did some excellent research on Android Auto and addressed some of this briefly in his talk here. A parser is not available to read this as the format of the data contained in the protobufs is unknown. I've attempted to reverse-engineer parts of it enough to get the useful bits of information out, such as the search queries. There are also mp3 recordings of the replies from google assistant stored in some of them. These are being added to ALEAPP to parse.

The format here is a bit too much to write about. Below is the raw protobuf structure (sans the binary blobs, replaced by ...). The search term here was "tom and jerry".

  1: 0x00000053b0c63c1b
  2: 0x11f0299e
  3: "search"
  132242267: ""
  132264001 {
    1: "..."
    2: 0x00000000
    3: 0
    4: 0x00000000000e75fe
  132269388 {
    2: 0x0000000000000040
    3 {
      1: "..."
      2: ""
      3: "and.gsa.launcher.allapps.appssearch"
  132269847 {
    1 {
      1: "..."
      2: ""
      3: "and.gsa.launcher.allapps.appssearch"
    2 [
      0: "...",
      1: "... tom and jerry ..."
      2: "..."
      3: 1
  146514374 {
    1: "and.gsa.launcher.allapps.appssearch"
  206022552 {
    1: 0

After studying this and several other samples, here are the important pieces in the parsed protobuf dictionary:

1session id (same number as in filename)
3type of query (search, car_assistant, opa)
car_assistant = Android Auto
opa = personal assistant
1 = mp3 recording of response

1 = dictionary
2last query

2 = List of session queries (in blobs)

For more details, refer the module in ALEAPP. Below is a screenshot of the parsed out data.
Figure 7 - ALEAPP output showing Google App / Personal assistant queries