File Signature Checker

Build a utility application to compare a list of file signatures with the contents of a folder

Background

Due to an intrusion on my web site, I found the need to make sure that none of my web site files had been changed, that no files had been removed, and most importantly to me that no additional content had been added by an unauthorized person. I built a utility application to traverse the content folder for my web site, make a list of signatures for the files in that folder, and compare them with a master list of valid file signatures I create. Ideally the master list would be built automatically whenever I deploy the web site from my development environment. This utility application creates a report that shows which file signatures match the master list, and which files are changed, missing, or not on the list. I schedule this utility to run once per day on my web site and email me the report. As an added feature, I also created a web page that produces the same report that I can run at any time I choose.

The code in this article contains snippets that illustrate the concepts being presented. At the end of this article I will provide a link to the complete utility project on Github. I use Visual Studio to build both this utility as well as build and deploy my web site.

Definition for the File Signature

Before creating the utility we need to determine just what a meaningful file signature consists of. A starting point is to use the file name and file size in bytes. The obvious vulnerability here is that somebody could substitute a file with an identical name and file size, but completely different content. To work around this vulnerability I chose to also calculate a SHA256 hash of the file. That will help ensure that the actual content of the file hasn't been changed in any way. So now we have a FileSignature that consists of the following fields:

    public string FileName { get; set; } = "";
    public long Length { get; set; } = 0;
    public string Hash { get; set; } = "";
    

I created a FileSignature class with those properties, a default constructor, a constructor that takes a FileInfo as a parameter, and some other useful methods that will be used later. The class ended up like this:


using System;
using System.Collections.Generic;
using System.IO;

namespace FileSignatureChecker.Classes
{
    public class FileSignature : IComparable<FileSignature>, IEqualityComparer<FileSignature>
    {
        public string FileName { get; set; } = "";
        public long Length { get; set; } = 0;
        public string Hash { get; set; } = "";
        public FileSignature() { }
        public FileSignature(FileInfo fileInfo) 
        { 
            FileName = fileInfo.Name;
            Length = fileInfo.Length;
            string fullPath = Path.Combine(fileInfo.Directory.FullName, FileName);
            Hash = UtilityFunctions.CalculateSHA256(fullPath);
        }

        public override string ToString()
        {
            return string.Format("Name: {0}, Length: {1}, Hash: {2}", FileName, Length, Hash);
        }

        public int CompareTo(FileSignature other)
        {
            if (other is null) throw new ArgumentNullException(nameof(other));

            int result = FileName.CompareTo(other.FileName);
            if (result == 0)
            {
                result = Length.CompareTo(other.Length);
            }
            if (result == 0)
            {
                result = Hash.CompareTo(other.Hash);
            }

            return result;
        }

        public override bool Equals(object obj)
        {
            return obj is FileSignature file &&
                   FileName == file.FileName &&
                   Length == file.Length &&
                   Hash == file.Hash;
        }

        public override int GetHashCode()
        {
            int hashCode = 1482071670;
            hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(FileName);
            hashCode = hashCode * -1521134295 + Length.GetHashCode();
            hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(Hash);
            return hashCode;
        }

        public bool Equals(FileSignature x, FileSignature y)
        {
            return x.Equals(y);
        }

        public int GetHashCode(FileSignature obj)
        {
            return obj.GetHashCode();
        }
    }
}    
    

One thing still missing is how to compute the SAH256 hash. I separated that code out into a Utility class because I've used it for other projects, but it can be included in the FileSignature class as well. Here is the code for calculating the SHA256 hash:


        public static string CalculateSHA256(string filename)
        {
            using (var sha256 = SHA256.Create())
            {
                using (FileStream stream = File.OpenRead(filename))
                {
                    byte[] hash = sha256.ComputeHash(stream);
                    stream.Close();
                    return BitConverter.ToString(hash).Replace("-", "").ToLowerInvariant();
                }
            }
        }
    

Creating the Signature List File

I wanted to have Visual Studio automatically generate the master file signature list. In order to accomplish this, I built a T4 text template as a part of my web project. A T4 text template contains both text blocks and control logic that can generate a text file. An article that explains more about writing T4 text templates is located at Code Generation and T4 Text Templates. This template traverses the folder containing the files I deploy to my web site and builds a list of FileSignatures. This list is then serialized to an XML file.

The template I developed as a part of my web site project follows.

<#@ template debug="false" hostspecific="true" language="C#" #>
<#@ output extension=".xml" #>
<#@ assembly name="System.Core" #>
<#@ assembly name="System.Xml" #>
<#@ assembly name="path to your web project dll goes here" #>
<#@ import namespace="System.Linq" #>
<#@ import namespace="System.Text" #>
<#@ import namespace="System.Text.RegularExpressions" #>
<#@ import namespace="System.Collections.Generic" #>
<#@ import namespace="System.IO" #>
<#@ import namespace="System.Xml" #>
<#@ import namespace="System.Xml.Serialization" #>
<#@ import namespace="KB3HHAWebSite.Classes" #>
<# 
Regex reg = new Regex(@"^(?!.*\.cs|.*\.tt)");
DirectoryInfo dir = new DirectoryInfo(this.Host.ResolvePath(".."));
List<FileInfo> fileList = dir.GetFiles().Where(fi => reg.IsMatch(fi.Name)).ToList();

List<FileSignature> siteList = new List<FileSignature>();
foreach(FileInfo file in fileList)
{
    siteList.Add(new FileSignature(file));
}
XmlSerializer serializer = new XmlSerializer(typeof(List<FileSignature>));
string xml;

using(StringWriter sww = new StringWriter())
{
    using(XmlWriter writer = XmlWriter.Create(sww))
    {
        writer.WriteProcessingInstruction("xml", "version='1.0'");
        serializer.Serialize(writer, siteList);
        xml = sww.ToString(); // Your XML
    }
 }
 #><#= xml#>
       

I named my template filelist.tt, which creates an output file called filelist.xml.You will notice I used a regular expression to filter out any C# and text template files since none of those files will be deployed to the target web site. The generated filelist.xml file needs to be published to my web site. The template itself reads the files in a folder that match the regular expression, builds a list of FileSignature class instances, and then serializes that list to the resulting xml file. That makes it a simple task to read the file signatures from the file back into a list. This bit of code shows how that is done.

    using (var stream = File.OpenRead(sitemapFilePath))
    {
        var serializer = new XmlSerializer(typeof(List<FileSignature>));
        List<FileSignature> list = serializer.Deserialize(stream) as List<FileSignature>;
    }
    

Comparing Lists

At this point we have a way to build the master list, save it to a file, and read the entries back into a list. The next piece of the puzzle is to create a list of file signatures for the folder we want to compare with the master list, do the comparison, and display the results.Building the list is pretty much the same as building the master list. Here is the code to build that list. I left in the regular expression portion of the code in case I want to use it. For now the regular expression is not used and is an empty string.

      
      Regex reg = new Regex(@"");
      DirectoryInfo dir = new DirectoryInfo(folderPath);
      List<fileinfo> fileList = dir.GetFiles().Where(fi => reg.IsMatch(fi.Name)).ToList();

      List<filesignature> siteList = new List<filesignature>();
      foreach (FileInfo file in fileList)
      {
          siteList.Add(new FileSignature(file));
      }
    

Now that we have both of our lists built, we can do the comparisons. This is probably not the most efficient way to compare the lists, but in my case the lists are short. I was more interested in making things work correctly first and felt that I could always do some optimization if it became necessary later on. Here is the code to find the matches, the changed files, the missing files (in the master list but not on the file system), and any added files (files on the file system but not in the master list).

            if (mapFileSignatures != null && folderSignatures != null)
            {
                IEqualityComparer<filesignature> allFieldsComparer = new FileSignature();
                
                // files that completely match between the site map file and the local file system
                List<filesignature> matchingFiles = mapFileSignatures.Intersect(folderSignatures, allFieldsComparer).ToList();
                
                // files that match names between the site map file and the local file system but contents are different
                List<filesignature> changedFiles = (from mapSig in mapFileSignatures where
                    folderSignatures.Any(x => x.FileName == mapSig.FileName && (x.Length != mapSig.Length || x.Hash != mapSig.Hash))
                    select mapSig).ToList();
                
                // files that are in the signature file but not on the local file system
                List<filesignature> missingFiles = (from mapSig in mapFileSignatures where 
                    !folderSignatures.Any(x => x.FileName == mapSig.FileName) 
                    select mapSig).ToList();

                // files that are in the local file system but not in the signature file
                List<filesignature> extraFiles = (from folderSig in folderSignatures where
                    !mapFileSignatures.Any(x => x.FileName == folderSig.FileName)
                    select folderSig).ToList();
            }
    
    

At this point you can display the resulting lists any way you desire. I display them in columns showing all of the fields (name, length, and hash). To make it look nice I up all the columns and padded all of the columns to make them line up. I do that by walking through the data, finding the widest column for each column, and then walking through the data again and display the column sby padding all of them to match the widest column. The process is pretty simple, but maybe I'll write another short article on another day showing how I do that.

I have uploaded the code that accompanies this article to GitHub at sdcohe/FileSignatureChecker (github.com) It is a work in progress so it might be updated as I refine the process.


Written by Seth Cohen on 05-Jun-2024