How To Parse A Log File

Hello everyone,

I hope everyone’s Independence Day was a great one!

red-white-and-blue_thumb

Recently, I’ve had the opportunity to write a console application that would allow a user to select logs which would then be parsed and summarized for easier viewing. I’m writing on this topic hoping that it will help someone along the way if they encounter a similar issue. Also, I would like some input from others who may have done this in a more efficient or elegant way.

First and foremost, there’s the user interface. Considering this is a console application, it’s a pretty simple thing to do. I started by using the console application template in Visual Studio, and used Console.WriteLine(“String here!”) to display a message for the user to select a log file. To make sure users would read this and know what the application was for, I utilized Console.ReadKey(true) to ensure the app would pause until a key is pressed. I then used OpenFileDialog() to allow the user to select a log file (note: you must be using System.Windows.Forms to utilize this). For the sake of incorrect selection, one can use Filter (see OpenFileDialog link) to show only the file types they want, which in this case I chose files with the ‘.log’ extension. Then, when DialogResult.OK is true, I set the name of the selected file as a variable for later use. Also, I have it show a message in case a file is not selected, and then close the application.

Console.WriteLine("Please select a log file to summarize");
Console.WriteLine("- Press any key to browse for .log file...");
Console.ReadKey(true);
using (var oFile = new OpenFileDialog())
{
    oFile.Filter = "Log Files (*.log)|*.log";
    oFile.Multiselect = false;
    var result = oFile.ShowDialog();
    if (result == DialogResult.OK)
    {
        inputFile = oFile.FileName;
        path = Path.GetDirectoryName(inputFile) + "\\";
    }
else
    {
        Console.Clear();
        Console.WriteLine("A file was not selected. Press any key to close...");
        Console.ReadKey(true);
        Environment.Exit(0);
    }
}

That wraps up the UI, so let’s get to the meat of this. For the most part, the log files look like this:

7/07/14 3:05:35 AM SERVERNAME1
Message: Something happened here.
    at Some.Where
    at Out.There
Module: Some.Service.dll
Severity: Error

Or it could look like this:

7/07/14 3:05:35 AM SERVERNAME1
Message: Something happened here.<Description> Some descriptive stuff here.<Description><SomeOtherTag></SomeOtherTag><Message>Some message here.</Message>
    at Some.Where
    at Out.There
Module: Some.Service.dll
Severity: Error

…or like this:

7/07/14 3:05:35 AM SERVERNAME1
Message: Something happened here.<Description> Some descriptive stuff here.<Description><SomeOtherTag></SomeOtherTag>The exception message is: Some message here.
    at Some.Where
    at Out.There
Module: Some.Service.dll
Severity: Error

The requirements state that only the message and description is needed (without the tags). There’s also another requirement that only Severity: Error messages be summarized, and all others ignored. With that in mind, I utilized StreamReader and used File.OpenText() to open the file. I then looped through the list, and would add each line to the list. Once it reached a line with ‘Severity: ‘ in it, a decision would be made: if the line doesn’t contain ‘Error’, clear the list; if the item does contain ‘Error’, add the item to a second list. I did this so that I could easily separate the errors from the other log items. EDIT: Matt Groves pointed out that I was not disposing of StreamReader when I’d finished with it, which is a no-no. I’ve implemented a Using block for StreamReader so it will be disposed of indirectly. For more information on disposing, refer to the Remarks section of the StreamReader MSDN page.

List<string> errorList = new List<string>();
using (StreamReader reader = File.OpenText(inputFile))
{ 
    string line;
    List<string> tempList = new List<string>();
    while ((line = reader.ReadLine()) != null)
    {
        List<string> items = new List<string>(line.Split('\n'));
        foreach (string item in items)
        {
            tempList.Add(item);
            if (item.Contains("Severity: ") && (!item.Contains("Error")))
            {
                tempList.Clear();
            }
            if (item.Contains("Severity: Error"))
            {
                foreach (string i in tempList)
                {
                    errorList.Add(i);
                }
                tempList.Clear();
            }
        }
    }
}

Now that I have the right items, I can get exactly what I want out of it. This data looks quite a bit like XML, so I attempted to use XML parsing techniques. However, I’d have to remove all the non-XML data for that to work easily, so I moved on. I then tried Regex, but that wasn’t really made to sift through XML-like data. I ended up landing on IndexOf() and Substring(), thanks to a StackOverflow user, and it works well. IndexOf() finds the index of a character or substring within a string, and Substring() returns the substring found.

int descriptionBegin = 0;
int descriptionEnd = 0;
int messageBegin = 0;
int messageEnd = 0;
descriptionBegin = j.IndexOf("<Description>");
descriptionEnd = j.IndexOf("</Description>") - 13;
messageBegin = j.IndexOf("<Message>");
messageEnd = j.IndexOf("</Message>") - 9;
int descriptionDiff = 0;
int messageDiff = 0;
if (j.Contains("<Message>") && (!j.Contains("</Message>")))
{
    messageEnd = j.Length - messageBegin;
    descriptionDiff = descriptionEnd - descriptionBegin;
    messageDiff = messageEnd - messageBegin;
    string description = j.Substring(descriptionBegin + 13, descriptionDiff);
    string message = j.Substring(messageBegin + 9, messageEnd - 9);
    taglessList.Add("Description: " + description + " " + "Message: " + message);
}
else if (j.Contains("<Message>") && (j.Contains("</Message>")))
{
    descriptionDiff = descriptionEnd - descriptionBegin;
    messageDiff = messageEnd - messageBegin;
    string description = j.Substring(descriptionBegin + 13, descriptionDiff);
    string message = j.Substring(messageBegin + 9, messageDiff);
    taglessList.Add("Description: " + description + " " + "Message: " + message);
}
    // SOME ERRORS DIDN'T HAVE <MESSAGE> TAGS, BUT DID HAVE ERROR MESSAGES
if (!j.Contains("<Message>"))
{
    if (j.Contains("<Description>"))
    {
        messageBegin = j.IndexOf("The exception message is: ");
        messageEnd = j.IndexOf(".. ---&amp;gt; System") - 25;
        descriptionDiff = descriptionEnd - descriptionBegin;
        messageDiff = messageEnd - messageBegin;
        string description = j.Substring(descriptionBegin + 13, descriptionDiff);
        string message = j.Substring(messageBegin + 26, messageDiff);
        taglessList.Add("Description: " + description + " " + "Message: " + message);
        continue;
    }
    taglessList.Add(j);
}

Essentially, I’m finding the index of the tag, subtracting the characters of the tag so they’re not picked up, and then having substring grab all the items based on the number of characters between those indexes. It seems a bit complex, but this is honestly the simplest way I could find to do this. I’ve always wanted to follow the idea of “Don’t be clever; be simple”, but I could find no easier alternative in this case.

There’s also a couple more final requirements: only one instance of each error should be shown in the summary, and a count shown of how many times that error occurred in the selected log. Also, the list should be sorted by count, and then alphabetically. I solved these with a little bit of LINQ.

List<string> distinctList = taglessList.Distinct().ToList();
List<DupedItem> finalList = new List<DupedItem>();
// COUNTS DUPES AND ADDS ONLY DISTINCT ITEMS
foreach (string k in distinctList)
{
    int dupeCount = 0;
    foreach (string l in taglessList)
    {
        if (l == k)
        {
        dupeCount++;
        }
    }   
    finalList.Add(new DupedItem
    {
        Error = k,
        dupeCount = dupeCount
    });
}
// SORTS ITEMS BY AMOUNT OF DUPES, THEN ALPHABETICALLY
List<DupedItem> fL = finalList.OrderByDescending(d => d.dupeCount).
                     ThenBy(d => d.Error).ToList();

for (int i = 0; i < fL.Count(); i++)
{
    File.AppendAllText(path + outputFile, "[Total: " + fL[i].dupeCount + "] " + fL[i].Error + nl);
}

Distinct() removes all duplicate instances from a sequence, which I then put the output of that to a list. I then compare the items in the distinct list to the original list, and use dupeCount to keep count of duplicates. I created the DupedItem class to hold both the string and the dupeCount, use LINQ’s OrderByDescending() to sort by dupeCount and then alphabetically, and then append this data to a summarized log using AppendAllText.

Well, that wraps this one up! Please leave comments below or on Facebook/Twitter/LinkedIn if you have any questions or suggestions.

Thanks for reading!

Advertisements

2 thoughts on “How To Parse A Log File

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s