Saturday, January 9, 2021

Gboard has some interesting data..

Gboard - the Google Keyboard, is the default keyboard on Pixel devices, and overall has been installed over a billion times according to the Play Store.

Although not the default on most non-Google brands, it is a popular app installed by foreign language users because of its good support and convenience of use particularly with dozens of Asian and Indian languages.

As a keyboard app, it monitors and analyzes your keystrokes, offering suggestions and corrections for spelling and grammar, sentence completion and even emoji suggestions. 

Now for the interesting part. Since the last few versions, it also retains a lot of data (ie, user keystrokes!) in its cache. This is at least seen from the version from Jan 2020 (v 8.3.x). From a DFIR perspective, that is GOLD. For a forensic examiner, this can possibly show you data that was typed by the user on an app that is now deleted, or show messages typed that were then deleted, or messages from apps that have the disappearing message feature turned on! Or data entered into fields on web pages/online apps (that wouldn't be stored locally at all). Also for some apps that don't track when a particular item was created/modified, this could be useful.

Note - The Signal app wasn't specifically tested to see if data from that app is retained, but based on what we can see here, it seems likely those messages would end up here too. All testing was on a Pixel 3 running latest Android 11 using the default keyboard, and default settings. This was also verified on other earlier taken images. Josh Hickman's Android 10 Pixel 3 image was also used, and Josh was able to verify that Telegram and WhatsApp sent messages were present here. The specific versions of Gboard databases studied were:

  • (on Android 10)
  • (on Android 10)
  • (on Android 11)


Gboard's app data (sandbox) folder is located here:


Here you might see a number of databases that start with trainingcache*. These are the files that contain the caches.

Figure 1 - Contents of Gboard's databases folder (v

In different versions of the app, the database formats and names have changed a bit. Of these, useful data can be found in trainingcache2.db, trainingcache3.db and trainingcachev2.db. Let's examine some of them now.

trainingcache2.db (v

The table training_input_events_table contains information about the application in focus, its field name (where input was sent), the timestamp of event and a protobuf BLOB stored in _payload field, as shown in screenshot below.

Figure 2 - training_input_events_table (not all columns shown)

The highlighted entry above is from an app that was since deleted. The _payload BLOB is decoded in screenshot below, highlighting the text typed by the user in the Email input field. The protobuf has also has all of the data included in the other columns in the table.

Figure 3 - Decoded Protobuf from _payload column

In most instances however, the protobuf looks like this - see screenshot below, where input needs to be put back together as shown.. Here you can see the words the user typed as well as suggestions offered by the app. Suggestions can be for spelling, grammar, or contact names, or something else.
Figure 4 - Decoded protobuf - reconstructing user input

Above, you can see the words typed and suggestions offered. On an Android device, the suggestions appear as shown below while typing.

Figure 5 - Android keyboard highlighting suggested words

trainingcache3.db (v

In version 8.x, this same database is named trainingcache2.db, and follows the same exact format. The table s_table looks similar to the training_input_events_table seen earlier. However, the _payload field does not store the keystokes here.

Figure 6 - s_table

Figure 7 - _payload protobuf decoded from s_table

Keystroke data is stored in the table tf_table. Here, most entries are a single key press, and to read this, it again needs to be put back together as shown below.

Figure 8 - tf_table entries

All keystrokes from the same session have the same f1 value (a timestamp like field but not used as a timestamp). The order of the keys pressed is stored in f4. Assuming they are all in order, we can run a short query to concatenate the f3 column values for easy reading (shown below). This isn't perfect, as group_concat() doesn't guarantee order of concatenation, but it seems to work for now!

Figure 9 - Reading keystroke sessions from tf_table

We can combine (join) this data with the one from s_table to recreate the same data as we got from training_input_events_table earlier. 

Figure 10 - joined tables

In the screenshot shown above, you can even see data being typed into a google doc, not saved locally. Only a snippet is shown above, but if you want to see the full parsed data, get Josh's Android image(s), and the latest version of ALEAPP (code), which now parses this out. Below is a preview (from a different image my students might recognize).

Figure 11 - ALEAPP output showing trainingcache parsed output

Cached keystroke data can also be seen and reconstructed from trainingcachev2.db, whose format is a bit different (not discussed here). Nothing of significance was found in trainingcache4 or the other databases. 


As expected, keystrokes from password fields are not stored or tracked.

In data reconstructed from tf_table, you can see all the spelling mistakes a user made while typing! Any corrections made in the middle of a word/sentence will be seen at the end (because we are getting the raw keystokes in order of keys pressed). Hence it might be difficult to read some input. Also, if a user types something into a field, then deletes a word(s), and retypes, you won't see the final edited (clean) version, as backspaces (delete) are not tracked. You can see some of this in the output above (figure 9).

The caches are periodically deleted (and likely size limited too), and so you shouldn't expect to find all user typed data here. 

Sunday, January 3, 2021

iOS Application Groups & Shared data


Tracking down an iOS application's Data folder, aka, SandboxPath in iOS is fairly easy. One simply needs to look at the applicationState.db sqlite database located under /private/var/mobile/Library/FrontBoard/ This is well known. 

However locating the sandbox folder for its AppGroups (and Extensions) is not so straight-forward. The suggested method by Scott Vance here, and recommended by few others too is to look for the file under each of the UUID folders:

  • /private/var/containers/Shared/SystemGroup/UUID/
  • /private/var/mobile/Containers/Shared/AppGroup/UUID/
  • /private/var/mobile/Containers/Data/InternalDaemon/UUID/
  • /private/var/mobile/Containers/Data/PluginKitPlugin/UUID/

As noted by Scott, the iLEAPP tool does this too, reading all the plists and listing out the path and its group name. For manual analysis, this works out great, as you can visually make out the app name from the group name. For example, the Notes app has bundle_id and one of its shared groups (where the actual Notes db is stored!) has the id

The Problem

For automated analysis, this approach does not work, as each app follows its own convention on naming for ids. A program cannot know that corresponds to Hence we search for something with a more direct reference connecting Shared Containers to their Apps. Before we proceed further, its important to understand the relationships between extensions, apps and shared containers. The diagram below does a good job of summarizing this. The shared containers are identified by AppGroups.

Figure 1 - iOS App, Extension, container relationships - Source:

The Solution

Fortunately, there is a database that tracks container information on iOS. It is located at /private/var/root/Library/MobileContainerManager/containers.sqlite3

It precisely lists all Apps, their extensions, AppGroups and Entitlements. As far as I can tell, this is the only place where this information is stored (apart from caches and logs). It does not have information about UUIDs. This database is listed in the SANS smartphone forensics poster, but I couldn't find any details on it elsewhere. 

The database structure is simple with just 3 main tables (and an sqlite_sequence one). 

Figure 2 - containers.sqlite database tables

The child_bundles table lists extensions and their owner Apps. In figure below, you can see the extensions for the app.

Figure 3 - child_bundles table, filtered on 'notes'

Or one could write a small query to list all apps with their extension names like shown below.

Figure 4 - App & Extensions - query and output

Information about AppGroups is found in the data field of the code_signing_data table as a BLOB, which stores a binary plist.

Figure 5 - Plist (for - cs_info_id 456) from ''

The Entitlements dictionary has a lot of information in it. If this App creates a shared AppGroup, then it will show up under There may also be groups under

Figure 6 - AppGroup information in Entitlements section (in plist)

So from the above data, we know that the Notes App has 5 extensions and 2 AppGroups, and we have the exact string names(aka ids) too - and . Correlating this data with information we found from files (from each UUID folder earlier), we can programmatically search and link the two as being part of the same App, based on the container id (AppGroup name).

Figure 7 - AppGroup/UUID folder showing plist's content and Container owner id

This methodology is implemented in the APPS plugin for ios_apt, which now lists every App, it's AppGroups, SystemGroups, Extensions, and all the relationships. So you don't have to do any of it manually now. Enjoy!

Figure 8 - Apps Table from ios_apt output (not all columns are shown here)

Figure 9 - AppGroupInfo Table from ios_apt output