top of page
Search

Hunting for File Signatures


In attempting to learn Python, I’ve been trying to find any reason to implement the language into any project I could think of. I thought it would be a great learning experience to incorporate Python in a Digital Forensics setting to automate finding file signatures within a hex file. The test hex file represents a hard drive image and the script I wrote could be used for an actual image file.

The script takes in two files, first being a CSV file that includes all the file signature values we are looking for. This file is in the following format: File Signature, File Extension. The second file is a hex file that we will be searching for said files.

I created a test hex file using HxD by inserting 1024 Bytes of random data (Edit -> Insert bytes…) and then inserting a few file signatures in various locations throughout the file. I saved the file as “test” without a file extension in the same directory as the Python script and CSV signature file.

When the script is ran, it opens the hex file to be read as a binary and creates a single string converted to uppercase. It also opens the signatures file and create an array of all the signatures and their extensions, splitting the elements by commas.

I incorporated somewhat of a progress bar to print out as the script is working. It’s not pretty, but it gives you an idea of where the found signatures reside in the file. When a signature if found, it displays on screen what percentage and Byte offset it is from the beginning of the file.




File Signature File:

25504446,PDF
504B030414,MS_Office
146674797071742020,MOV
186674797033677035,MP4
18667479706D703432,M4V
100005374616E64617264204A6574204442,MS_ACCESS
100080001000101,IMG
6E1EF0,PPT
908100000060500,XLS
3026B2758E66CF11A6D900AA0062CE6C,WMV
38425053,PSD
3C3F786D6C2076657273696F6E3D22312E30223F3E,XML
4344303031,ISO
474946383761,GIF
474946383961,GIF
494433,MP3
4C00000001140200,LNK
504B0304,ZIP
504B0304140008000800,JAR
57696E5A6970,WINZIP
5A5753,SWF
5F27A889,JAR
62706C697374,PLIST
6674797033677035,MP4

Python Script:

import binascii
import re

hexFile = 'test'
sigFile = 'signatures.csv'

with open(hexFile, 'rb') as f:#'rb' for windows, read as binary
    content = f.read()

hexDump = (binascii.hexlify(content)).upper()
#print hexDump
#create string 'hexDump' of entire file

dumpLen = len(hexDump)
#file length in nibbles

with open(sigFile, 'r') as s:
	sigs = s.read()

list = re.split('\n|,',sigs) #create array split elements by comma and new line
signature = list[::2] #[start:stop:step] list[beginning:end:every other]
type = list[1::2]
#create a list of signatures and of file type

progress = [10,20,30,40,50,60,70,80,90,100]

for i in range(0, dumpLen):
	if(i>0):
		percent = 100*i/dumpLen		
		if(percent%10 == 0):
			percent = percent+10 #account for starting at 0
			for y in range(0, len(progress)): #check if percentage is included in progress list				
				if(progress[y] == percent): #percentage found in progress list
					print percent,"%"
					progress[y] = 0 #MARK IT ZERO! Display a percentage once
	if(hexDump[i] != '0'):
		for x in range(0, len(signature)): #search for found signature in file signature list
			sigLen = len(signature[x])			
			found = hexDump[i:i+sigLen]			
			if(hexDump[i:i+sigLen] == signature[x]): #found match
				print "Found possible ",type[x]," at byte offset ",i/2," with signature ",signature[x]

bottom of page