Pages

Saturday, March 28, 2020

Google Search & Personal Assistant data on android

The Google app, previously known as Google Now, is installed by default on most phones. From the app's description -

The Google app keeps you in the know about things that matter to you. Find quick answers, explore your interests, and stay up to date with Discover. The more you use the Google app, the better it gets.

Search and browse:
- Nearby shops and restaurants
- Live sports scores and schedules
- Movies times, casts, and reviews
- Videos and images
- News, stock information, and more
- Anything you’d find on the web


It is that ubiquitous bar/widget sometimes called the Google Assistant Search Bar or just google Search widget found on the phone's home screen.

Figure 1 - Google Search / Personal Assistant Bar 

The internal package goes by the name com.google.android.googlequicksearchbox. It's artifacts are found at /data/data/com.google.android.googlequicksearchbox/

There are many files and folders here, but the most interesting data is the sub-folder files/recently

Your recent searches along with some full screen screenshots of search results are stored here. Screenshots (saved as jpg) are in .webp format. The unique number in the name is referenced by the data in the protobuf file (file name is the email address of the logged in user account). If you are not logged in, nothing is populated in this folder. See screenshots below.

 
Figure 2 - Folder 'recently' has no entries when no account was logged on.

Figure 3 - Folder 'recently' has files when searches were performed after logging in


The protobuf file ([email protected] in this case) when decoded has entries that look like this (see below) for a typical search. If you aren't familiar with protobuf decoding, read this.

1 {
  1: 15485946382791341007
  3: 0
  4: 1585414188066
  5: "dolphin"
  8 {
    1: "web"
    2: "google.com"
  }
  9: 10449902870035666886
  17: 1585413397978
}

In the protobuf data (decoded using protoc.exe), as seen above, we can easily distinguish the relevant fields:

Item Description
1 session id
4 timestamp1 (unix epoch)
5 search query
8 dictionary
1 = type of search (web, video, ..)
2 = search engine
9 screenshot-id (needs conversion to int from uint)
17 timestamp2 (unix epoch)


Here is the corresponding screenshot saved in the same folder -
Figure 4 - Screenshot of search for"dolphin"

If you clicked on a recent news story in the app, the protobuf entry looks like this (below):

1 {
  1: 9016892896339717414
  3: 1
  4: 1572444614834
  5: ""
  7 {
    1: "https://9to5mac.com/2019/10/30/photos-of-airpods-pro/"
    2: "9to5mac.com"
    3: "Photos of AirPods Pro arriving in stores around the world - 9to5Mac"
  }
  9: 9016892896339717414
  10: 9
  17: 1572444614834
}
Figure 5 - Screenshot for news article clicked from link in google app


Last week, I added a plugin for ALEAPP to read these recent search artifacts. This isn't all, there is actually more data to be read here.

The search widget can be used to make any kind of query, which may then be forwarded to the web browser or Android Auto or the Email or Messaging apps depending on what was queried for. This makes for an interesting artifact.

From my test data, all searches are stored in the app_session folder as protobuf files having the extension .binarypb. See screenshot below.

Figure 6 - .binarypb files
Each of these files is a protobuf that stores a lot of data about the searches. This includes searches from Android Auto too. Josh Hickman did some excellent research on Android Auto and addressed some of this briefly in his talk here. A parser is not available to read this as the format of the data contained in the protobufs is unknown. I've attempted to reverse-engineer parts of it enough to get the useful bits of information out, such as the search queries. There are also mp3 recordings of the replies from google assistant stored in some of them. These are being added to ALEAPP to parse.

The format here is a bit too much to write about. Below is the raw protobuf structure (sans the binary blobs, replaced by ...). The search term here was "tom and jerry".

{
  1: 0x00000053b0c63c1b
  2: 0x11f0299e
  3: "search"
  132242267: ""
  132264001 {
    1: "..."
    2: 0x00000000
    3: 0
    4: 0x00000000000e75fe
  }
  132269388 {
    2: 0x0000000000000040
    3 {
      1: "..."
      2: ""
      3: "and.gsa.launcher.allapps.appssearch"
    }
  }
  132269847 {
    1 {
      1: "..."
      2: ""
      3: "and.gsa.launcher.allapps.appssearch"
    }
    2 [
      0: "...",
      1: "... tom and jerry ..."
      2: "..."
      3: 1
    ]
  }
  146514374 {
    1: "and.gsa.launcher.allapps.appssearch"
  }
  206022552 {
    1: 0
  }
}

After studying this and several other samples, here are the important pieces in the parsed protobuf dictionary:


ItemDescription
1session id (same number as in filename)
3type of query (search, car_assistant, opa)
car_assistant = Android Auto
opa = personal assistant
132269388dictionary
1 = mp3 recording of response
132269847dictionary

1 = dictionary
ItemDescription
2last query

2 = List of session queries (in blobs)



For more details, refer the module googleQuickSearchbox.py in ALEAPP. Below is a screenshot of the parsed out data.
Figure 7 - ALEAPP output showing Google App / Personal assistant queries

No comments:

Post a Comment