Friday, June 15, 2012

Windows 7 Thumbcache hash algorithm

What we know
Windows 7 (and vista) store thumbnails in a central database known as the thumbcache in the files thumbcache_32.db, thumbcache_96.db, thumbcache_256.db, thumbcache_1024.db, thumbcache_sr.db and thumbcache_idx.db. The format(s) for these files has been reverse engineered well enough to be able to read and extract the thumbnails.

What is yet unknown is the hashing algorithm used to generate the ThumbnailCacheId. I did some research of my own to fill this gap.

To generate the ThumbnailCacheId windows uses the volume guid (that the file resides on), the fileId (for NTFS volumes), the extension of the file (.xxx) and the file last modified time (as a DOS GMT date). These are ‘blended’ and mangled using a hash function one by one. I believe the easiest way to describe this would be to simply show the code. Feel free to copy this code into your applications.

UInt64 CalculateHashKey (UInt64 seed, byte[] buffer, uint buffer_length)
  int count = 0;
  if (buffer_length) {
      seed ^= ( (seed >> 2) + (2080 * seed) + buffer[count++] );
    } while (count < buffer_length);
  return seed;

UInt64 GetThumbnailCacheId (byte[] VolGUID, byte[] FileID, byte[] FileExtension, byte[] FileModTime)
  UInt64 hash = CalculateHashKey (0x95E729BA2C37FD21, VolGUID, 16);
  hash = CalculateHashKey (hash, FileID, 8);
  hash = CalculateHashKey (hash, FileExtension, FileExtension.GetLength() * 2);
  hash = CalculateHashKey (hash, FileModTime, 4);
  return hash;

Volume GUID can be found in the registry SYSTEM hive under ‘MountedDevices’. On a live machine, this is HKEY_LOCAL_MACHINE\SYSTEM\MountedDevices. If for example, your guid is ‘\\?\Volume\{ 234cf70e-a70c-11de-a48c-806e6f6e6963}’, then your buffer will have the value “0EF74C230CA7DE11A48C806E6f6E6963”.

FileID (or File Reference number) can be obtained from the MFT. Note: The FileID shown to you by tools such as Encase is simply the MFT Record number (without MFT Sequence number bytes). Read more about it here.

The extension buffer is Unicode text starting with the dot(.) as “2E006A0070006700" for “.jpg”. This is case sensitive.

For the DOS time, you can easily convert the file’s modified date from a FILETIME 64 bit value to the GMT DOS timestamp of 4 bytes.

Because this is a hashing scheme requiring so many inputs, it is not possible to reverse a ThumbnailCacheId back to a full path, which is what investigators would really love. Still, now you know how to validate your thumbnails against their respective existing files.


  1. What did you use to find the hashing algorithm? And how did you find the input data that goes into it?

  2. I used IDA Pro, the best disassembler out there. The functions are located in shell32.DLL, GetThumbnailCacheId() and CalculateHashKey().

  3. Great information Yogesh - thanks.

    The value in the INDX entry is the MFT sequence number; it's also to be found at offset 16 for 2 bytes in a file's MFT record.

    BTW, you have a slight typo in your code at the point you increment the array count.


    1. Thanks for spotting that Simon, the 'v3' crept in from my enscript, it should have been replaced by 'count'. Fixed now.

  4. It may not work in every case, but one way you can match the ThumbCacheID to the orginal full path is using Windows.edb the Windows Search database.

    1. Yes indeed, from a practical point of view, that is where to match it. However the post was not meant for that, it is more academic and intends to explain the hashing scheme as to how these are generated in the first place.

  5. why do only certain caches get populated? Does it depend on which view you see the images in? extra large, large, small, and medium? (Does that affect which cache is populated?)

    1. That is correct, on first view of a particular file in explorer depending on the view size (small, large, ..) the thumbnail(s) will get created and stored in the appropriate cache file.

  6. This comment has been removed by the author.

  7. I'm using the same algorithm but I have had no luck generating the correct hash yet.
    What are byte arrays here and how is data filled in them?
    Consider this file_id = 46443371157476663
    Is this byte array set correctly?

    Thank you