Background
Due to an intrusion on my web site, I found the need to make sure that
none of my web site files had been changed, that no files had been
removed, and most importantly to me that no additional content had been
added by an unauthorized person. I built a utility application to traverse
the content folder for my web site, make a list of signatures for the
files in that folder, and compare them with a master list of valid file
signatures I create. Ideally the master list would be built automatically
whenever I deploy the web site from my development environment. This
utility application creates a report that shows which file signatures
match the master list, and which files are changed, missing, or not on the
list. I schedule this utility to run once per day on my web site and email
me the report. As an added feature, I also created a web page that
produces the same report that I can run at any time I choose.
The code in this article contains snippets that illustrate the concepts
being presented. At the end of this article I will provide a link to the
complete utility project on Github. I use Visual Studio to build both this
utility as well as build and deploy my web site.
Definition for the File Signature
Before creating the utility we need to determine just what a meaningful
file signature consists of. A starting point is to use the file name and
file size in bytes. The obvious vulnerability here is that somebody could
substitute a file with an identical name and file size, but completely
different content. To work around this vulnerability I chose to also
calculate a SHA256 hash of the file. That will help ensure that the actual
content of the file hasn't been changed in any way. So now we have a
FileSignature that consists of the following fields:
public string FileName { get; set; } = "";
public long Length { get; set; } = 0;
public string Hash { get; set; } = "";
I created a FileSignature class with those properties, a default
constructor, a constructor that takes a FileInfo as a parameter, and some
other useful methods that will be used later. The class ended up like
this:
using System;
using System.Collections.Generic;
using System.IO;
namespace FileSignatureChecker.Classes
{
public class FileSignature : IComparable<FileSignature>, IEqualityComparer<FileSignature>
{
public string FileName { get; set; } = "";
public long Length { get; set; } = 0;
public string Hash { get; set; } = "";
public FileSignature() { }
public FileSignature(FileInfo fileInfo)
{
FileName = fileInfo.Name;
Length = fileInfo.Length;
string fullPath = Path.Combine(fileInfo.Directory.FullName, FileName);
Hash = UtilityFunctions.CalculateSHA256(fullPath);
}
public override string ToString()
{
return string.Format("Name: {0}, Length: {1}, Hash: {2}", FileName, Length, Hash);
}
public int CompareTo(FileSignature other)
{
if (other is null) throw new ArgumentNullException(nameof(other));
int result = FileName.CompareTo(other.FileName);
if (result == 0)
{
result = Length.CompareTo(other.Length);
}
if (result == 0)
{
result = Hash.CompareTo(other.Hash);
}
return result;
}
public override bool Equals(object obj)
{
return obj is FileSignature file &&
FileName == file.FileName &&
Length == file.Length &&
Hash == file.Hash;
}
public override int GetHashCode()
{
int hashCode = 1482071670;
hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(FileName);
hashCode = hashCode * -1521134295 + Length.GetHashCode();
hashCode = hashCode * -1521134295 + EqualityComparer<string>.Default.GetHashCode(Hash);
return hashCode;
}
public bool Equals(FileSignature x, FileSignature y)
{
return x.Equals(y);
}
public int GetHashCode(FileSignature obj)
{
return obj.GetHashCode();
}
}
}
One thing still missing is how to compute the SAH256 hash. I separated
that code out into a Utility class because I've used it for other
projects, but it can be included in the FileSignature class as well. Here
is the code for calculating the SHA256 hash:
public static string CalculateSHA256(string filename)
{
using (var sha256 = SHA256.Create())
{
using (FileStream stream = File.OpenRead(filename))
{
byte[] hash = sha256.ComputeHash(stream);
stream.Close();
return BitConverter.ToString(hash).Replace("-", "").ToLowerInvariant();
}
}
}
Creating the Signature List File
I wanted to have Visual Studio automatically generate the master file
signature list. In order to accomplish this, I built a T4 text template as
a part of my web project. A T4 text template contains both text blocks and
control logic that can generate a text file. An article that explains more
about writing T4 text templates is located at Code Generation and T4 Text Templates. This template
traverses the folder containing the files I deploy to my web site and
builds a list of FileSignatures. This list is then serialized to an XML
file.
The template I developed as a part of my web site project follows.
<#@ template debug="false" hostspecific="true" language="C#" #>
<#@ output extension=".xml" #>
<#@ assembly name="System.Core" #>
<#@ assembly name="System.Xml" #>
<#@ assembly name="path to your web project dll goes here" #>
<#@ import namespace="System.Linq" #>
<#@ import namespace="System.Text" #>
<#@ import namespace="System.Text.RegularExpressions" #>
<#@ import namespace="System.Collections.Generic" #>
<#@ import namespace="System.IO" #>
<#@ import namespace="System.Xml" #>
<#@ import namespace="System.Xml.Serialization" #>
<#@ import namespace="KB3HHAWebSite.Classes" #>
<#
Regex reg = new Regex(@"^(?!.*\.cs|.*\.tt)");
DirectoryInfo dir = new DirectoryInfo(this.Host.ResolvePath(".."));
List<FileInfo> fileList = dir.GetFiles().Where(fi => reg.IsMatch(fi.Name)).ToList();
List<FileSignature> siteList = new List<FileSignature>();
foreach(FileInfo file in fileList)
{
siteList.Add(new FileSignature(file));
}
XmlSerializer serializer = new XmlSerializer(typeof(List<FileSignature>));
string xml;
using(StringWriter sww = new StringWriter())
{
using(XmlWriter writer = XmlWriter.Create(sww))
{
writer.WriteProcessingInstruction("xml", "version='1.0'");
serializer.Serialize(writer, siteList);
xml = sww.ToString(); // Your XML
}
}
#><#= xml#>
I named my template filelist.tt, which creates an output file called
filelist.xml.You will notice I used a regular expression to filter out any
C# and text template files since none of those files will be deployed to
the target web site. The generated filelist.xml file needs to be published
to my web site. The template itself reads the files in a folder that match
the regular expression, builds a list of FileSignature class instances,
and then serializes that list to the resulting xml file. That makes it a
simple task to read the file signatures from the file back into a list.
This bit of code shows how that is done.
using (var stream = File.OpenRead(sitemapFilePath))
{
var serializer = new XmlSerializer(typeof(List<FileSignature>));
List<FileSignature> list = serializer.Deserialize(stream) as List<FileSignature>;
}
Comparing Lists
At this point we have a way to build the master list, save it to a file,
and read the entries back into a list. The next piece of the puzzle is to
create a list of file signatures for the folder we want to compare with
the master list, do the comparison, and display the results.Building the
list is pretty much the same as building the master list. Here is the code
to build that list. I left in the regular expression portion of the code
in case I want to use it. For now the regular expression is not used and
is an empty string.
Regex reg = new Regex(@"");
DirectoryInfo dir = new DirectoryInfo(folderPath);
List<fileinfo> fileList = dir.GetFiles().Where(fi => reg.IsMatch(fi.Name)).ToList();
List<filesignature> siteList = new List<filesignature>();
foreach (FileInfo file in fileList)
{
siteList.Add(new FileSignature(file));
}
Now that we have both of our lists built, we can do the comparisons. This
is probably not the most efficient way to compare the lists, but in my
case the lists are short. I was more interested in making things work
correctly first and felt that I could always do some optimization if it
became necessary later on. Here is the code to find the matches, the
changed files, the missing files (in the master list but not on the file
system), and any added files (files on the file system but not in the
master list).
if (mapFileSignatures != null && folderSignatures != null)
{
IEqualityComparer<filesignature> allFieldsComparer = new FileSignature();
// files that completely match between the site map file and the local file system
List<filesignature> matchingFiles = mapFileSignatures.Intersect(folderSignatures, allFieldsComparer).ToList();
// files that match names between the site map file and the local file system but contents are different
List<filesignature> changedFiles = (from mapSig in mapFileSignatures where
folderSignatures.Any(x => x.FileName == mapSig.FileName && (x.Length != mapSig.Length || x.Hash != mapSig.Hash))
select mapSig).ToList();
// files that are in the signature file but not on the local file system
List<filesignature> missingFiles = (from mapSig in mapFileSignatures where
!folderSignatures.Any(x => x.FileName == mapSig.FileName)
select mapSig).ToList();
// files that are in the local file system but not in the signature file
List<filesignature> extraFiles = (from folderSig in folderSignatures where
!mapFileSignatures.Any(x => x.FileName == folderSig.FileName)
select folderSig).ToList();
}
At this point you can display the resulting lists any way you desire. I
display them in columns showing all of the fields (name, length, and
hash). To make it look nice I padded all of the columns to make them line
up. I do that by walking through the data, then for each row of data I
examine each column and save the width of the widest columns. I then walk
through the data again and display the columns by padding all of them to
match the widest column. The process is pretty simple, but maybe I'll
write another short article on another day showing how I do that.
I have uploaded the code that accompanies this article to GitHub at sdcohe/FileSignatureChecker
(github.com) It is a work in progress so it might be updated as I
refine the process.