In attempting to learn Python, I’ve been trying to find any reason to implement the language into any project I could think of. I thought it would be a great learning experience to incorporate Python in a Digital Forensics setting to automate finding file signatures within a hex file. The test hex file represents a hard drive image and the script I wrote could be used for an actual image file.
The script takes in two files, first being a CSV file that includes all the file signature values we are looking for. This file is in the following format: File Signature, File Extension. The second file is a hex file that we will be searching for said files.
I created a test hex file using HxD by inserting 1024 Bytes of random data (Edit -> Insert bytes…) and then inserting a few file signatures in various locations throughout the file. I saved the file as “test” without a file extension in the same directory as the Python script and CSV signature file.
When the script is ran, it opens the hex file to be read as a binary and creates a single string converted to uppercase. It also opens the signatures file and create an array of all the signatures and their extensions, splitting the elements by commas.
I incorporated somewhat of a progress bar to print out as the script is working. It’s not pretty, but it gives you an idea of where the found signatures reside in the file. When a signature if found, it displays on screen what percentage and Byte offset it is from the beginning of the file.
File Signature File:
25504446,PDF
504B030414,MS_Office
146674797071742020,MOV
186674797033677035,MP4
18667479706D703432,M4V
100005374616E64617264204A6574204442,MS_ACCESS
100080001000101,IMG
6E1EF0,PPT
908100000060500,XLS
3026B2758E66CF11A6D900AA0062CE6C,WMV
38425053,PSD
3C3F786D6C2076657273696F6E3D22312E30223F3E,XML
4344303031,ISO
474946383761,GIF
474946383961,GIF
494433,MP3
4C00000001140200,LNK
504B0304,ZIP
504B0304140008000800,JAR
57696E5A6970,WINZIP
5A5753,SWF
5F27A889,JAR
62706C697374,PLIST
6674797033677035,MP4
Python Script:
import binascii
import re
hexFile = 'test'
sigFile = 'signatures.csv'
with open(hexFile, 'rb') as f:#'rb' for windows, read as binary
content = f.read()
hexDump = (binascii.hexlify(content)).upper()
#print hexDump
#create string 'hexDump' of entire file
dumpLen = len(hexDump)
#file length in nibbles
with open(sigFile, 'r') as s:
sigs = s.read()
list = re.split('\n|,',sigs) #create array split elements by comma and new line
signature = list[::2] #[start:stop:step] list[beginning:end:every other]
type = list[1::2]
#create a list of signatures and of file type
progress = [10,20,30,40,50,60,70,80,90,100]
for i in range(0, dumpLen):
if(i>0):
percent = 100*i/dumpLen
if(percent%10 == 0):
percent = percent+10 #account for starting at 0
for y in range(0, len(progress)): #check if percentage is included in progress list
if(progress[y] == percent): #percentage found in progress list
print percent,"%"
progress[y] = 0 #MARK IT ZERO! Display a percentage once
if(hexDump[i] != '0'):
for x in range(0, len(signature)): #search for found signature in file signature list
sigLen = len(signature[x])
found = hexDump[i:i+sigLen]
if(hexDump[i:i+sigLen] == signature[x]): #found match
print "Found possible ",type[x]," at byte offset ",i/2," with signature ",signature[x]
Comments