Friday, June 15, 2012

Windows 7 Thumbcache hash algorithm

What we know
Windows 7 (and vista) store thumbnails in a central database known as the thumbcache in the files thumbcache_32.db, thumbcache_96.db, thumbcache_256.db, thumbcache_1024.db, thumbcache_sr.db and thumbcache_idx.db. The format(s) for these files has been reverse engineered well enough to be able to read and extract the thumbnails.

What is yet unknown is the hashing algorithm used to generate the ThumbnailCacheId. I did some research of my own to fill this gap.

To generate the ThumbnailCacheId windows uses the volume guid (that the file resides on), the fileId (for NTFS volumes), the extension of the file (.xxx) and the file last modified time (as a DOS GMT date). These are ‘blended’ and mangled using a hash function one by one. I believe the easiest way to describe this would be to simply show the code. Feel free to copy this code into your applications.

UInt64 CalculateHashKey (UInt64 seed, byte[] buffer, uint buffer_length)
  int count = 0;
  if (buffer_length) {
      seed ^= ( (seed >> 2) + (2080 * seed) + buffer[count++] );
    } while (count < buffer_length);
  return seed;

UInt64 GetThumbnailCacheId (byte[] VolGUID, byte[] FileID, byte[] FileExtension, byte[] FileModTime)
  UInt64 hash = CalculateHashKey (0x95E729BA2C37FD21, VolGUID, 16);
  hash = CalculateHashKey (hash, FileID, 8);
  hash = CalculateHashKey (hash, FileExtension, FileExtension.GetLength() * 2);
  hash = CalculateHashKey (hash, FileModTime, 4);
  return hash;

Volume GUID can be found in the registry SYSTEM hive under ‘MountedDevices’. On a live machine, this is HKEY_LOCAL_MACHINE\SYSTEM\MountedDevices. If for example, your guid is ‘\\?\Volume\{ 234cf70e-a70c-11de-a48c-806e6f6e6963}’, then your buffer will have the value “0EF74C230CA7DE11A48C806E6f6E6963”.

FileID (or File Reference number) can be obtained from the MFT. Note: The FileID shown to you by tools such as Encase is simply the MFT Record number (without MFT Sequence number bytes). Read more about it here.

The extension buffer is Unicode text starting with the dot(.) as “2E006A0070006700" for “.jpg”. This is case sensitive.

For the DOS time, you can easily convert the file’s modified date from a FILETIME 64 bit value to the GMT DOS timestamp of 4 bytes.

Because this is a hashing scheme requiring so many inputs, it is not possible to reverse a ThumbnailCacheId back to a full path, which is what investigators would really love. Still, now you know how to validate your thumbnails against their respective existing files.