How To Perform A Natural Sort On A List Of Numbers

Hello everyone,

For those who have no idea what the title is about, the post will explain it in due time.

In my log parsing project that I’ve written about in previous posts, I’ve received new requirements to parse another type of log. This type of log has nearly no similarity to the other logs, so parsing this was a whole new ball game. For the sake of keeping this post focused, I’ll go into more details on the new stuff I added at a later date.

These new logs had messages which were associated with threads. These messages would show the initiation of an action, and would log all steps of that action until its success or failure, along with the times that these actions occurred. The new requirements stated that I must show the time the thread started, the amount of time the thread ran, the thread’s ID number and any error message in the case of a failure. I’d gathered this data, used DateTime to parse the times, and then used the Subtract method to find the difference between the first and last message of the thread. I then inserted this TimeSpan output into my custom class object as a string, not knowing the trouble I was stumbling into.

This list of times should be sorted by this difference in descending order, so I performed an OrderByDescending sort in LINQ. However, it sorted them in this fashion: 1, 10, 100, 2, 20, 200, 3, 30, 300, etc. This is because it’s being sorted in ASCII order (computer-friendly) instead of natural order (human-friendly). It would have sorted just fine if my number was stored in an int, but because it was in a string, it sorted it in this fashion. Considering I couldn’t change the type in the class without breaking other things, and using another class for this one scenario would be a bit overkill in my opinion, I searched for other options.

One option was padding, either with zeroes or with spaces. Although this wouldn’t look very pretty, it would solve the problem. This is because zero and a space are both valid ASCII, so they would be sorted in natural order, like so (if using spaces):
1
2
3
10
20
30
100
200
300

The TimeSpan output to a string already is padded, as it’s in a HH:MM:SS.SSS. So, if my time difference was 1 minute, 10 seconds, it would read 00:01:10.000. If I then have a difference of 1 minute, 9 seconds, it would read 00:01:09.000, and so it would be sorted below the first when in descending order. This is the way I did it, because I’m attempting to make a log summarizer. A user would need to read through this summary quickly, so having fields of equal length make reading it easy. Also, it doesn’t give the summary a ‘staircase’ look to it (like the example above).

Also, one could convert the string to an int. This would be a good solution, because LINQ’s OrderBy and OrderByDescending could be used to easily sort this in a natural way. However, TimeSpan or padding may work well as well. It’s a case of figuring out what’s best for your scenario.

…and that’s all I know at this point. Please feel free to give suggestions/comments.

Thanks for reading!

Advertisements

How To Parse A Log File

Hello everyone,

I hope everyone’s Independence Day was a great one!

red-white-and-blue_thumb

Recently, I’ve had the opportunity to write a console application that would allow a user to select logs which would then be parsed and summarized for easier viewing. I’m writing on this topic hoping that it will help someone along the way if they encounter a similar issue. Also, I would like some input from others who may have done this in a more efficient or elegant way.

First and foremost, there’s the user interface. Considering this is a console application, it’s a pretty simple thing to do. I started by using the console application template in Visual Studio, and used Console.WriteLine(“String here!”) to display a message for the user to select a log file. To make sure users would read this and know what the application was for, I utilized Console.ReadKey(true) to ensure the app would pause until a key is pressed. I then used OpenFileDialog() to allow the user to select a log file (note: you must be using System.Windows.Forms to utilize this). For the sake of incorrect selection, one can use Filter (see OpenFileDialog link) to show only the file types they want, which in this case I chose files with the ‘.log’ extension. Then, when DialogResult.OK is true, I set the name of the selected file as a variable for later use. Also, I have it show a message in case a file is not selected, and then close the application.

Console.WriteLine("Please select a log file to summarize");
Console.WriteLine("- Press any key to browse for .log file...");
Console.ReadKey(true);
using (var oFile = new OpenFileDialog())
{
    oFile.Filter = "Log Files (*.log)|*.log";
    oFile.Multiselect = false;
    var result = oFile.ShowDialog();
    if (result == DialogResult.OK)
    {
        inputFile = oFile.FileName;
        path = Path.GetDirectoryName(inputFile) + "\\";
    }
else
    {
        Console.Clear();
        Console.WriteLine("A file was not selected. Press any key to close...");
        Console.ReadKey(true);
        Environment.Exit(0);
    }
}

That wraps up the UI, so let’s get to the meat of this. For the most part, the log files look like this:

7/07/14 3:05:35 AM SERVERNAME1
Message: Something happened here.
    at Some.Where
    at Out.There
Module: Some.Service.dll
Severity: Error

Or it could look like this:

7/07/14 3:05:35 AM SERVERNAME1
Message: Something happened here.<Description> Some descriptive stuff here.<Description><SomeOtherTag></SomeOtherTag><Message>Some message here.</Message>
    at Some.Where
    at Out.There
Module: Some.Service.dll
Severity: Error

…or like this:

7/07/14 3:05:35 AM SERVERNAME1
Message: Something happened here.<Description> Some descriptive stuff here.<Description><SomeOtherTag></SomeOtherTag>The exception message is: Some message here.
    at Some.Where
    at Out.There
Module: Some.Service.dll
Severity: Error

The requirements state that only the message and description is needed (without the tags). There’s also another requirement that only Severity: Error messages be summarized, and all others ignored. With that in mind, I utilized StreamReader and used File.OpenText() to open the file. I then looped through the list, and would add each line to the list. Once it reached a line with ‘Severity: ‘ in it, a decision would be made: if the line doesn’t contain ‘Error’, clear the list; if the item does contain ‘Error’, add the item to a second list. I did this so that I could easily separate the errors from the other log items. EDIT: Matt Groves pointed out that I was not disposing of StreamReader when I’d finished with it, which is a no-no. I’ve implemented a Using block for StreamReader so it will be disposed of indirectly. For more information on disposing, refer to the Remarks section of the StreamReader MSDN page.

List<string> errorList = new List<string>();
using (StreamReader reader = File.OpenText(inputFile))
{ 
    string line;
    List<string> tempList = new List<string>();
    while ((line = reader.ReadLine()) != null)
    {
        List<string> items = new List<string>(line.Split('\n'));
        foreach (string item in items)
        {
            tempList.Add(item);
            if (item.Contains("Severity: ") && (!item.Contains("Error")))
            {
                tempList.Clear();
            }
            if (item.Contains("Severity: Error"))
            {
                foreach (string i in tempList)
                {
                    errorList.Add(i);
                }
                tempList.Clear();
            }
        }
    }
}

Now that I have the right items, I can get exactly what I want out of it. This data looks quite a bit like XML, so I attempted to use XML parsing techniques. However, I’d have to remove all the non-XML data for that to work easily, so I moved on. I then tried Regex, but that wasn’t really made to sift through XML-like data. I ended up landing on IndexOf() and Substring(), thanks to a StackOverflow user, and it works well. IndexOf() finds the index of a character or substring within a string, and Substring() returns the substring found.

int descriptionBegin = 0;
int descriptionEnd = 0;
int messageBegin = 0;
int messageEnd = 0;
descriptionBegin = j.IndexOf("<Description>");
descriptionEnd = j.IndexOf("</Description>") - 13;
messageBegin = j.IndexOf("<Message>");
messageEnd = j.IndexOf("</Message>") - 9;
int descriptionDiff = 0;
int messageDiff = 0;
if (j.Contains("<Message>") && (!j.Contains("</Message>")))
{
    messageEnd = j.Length - messageBegin;
    descriptionDiff = descriptionEnd - descriptionBegin;
    messageDiff = messageEnd - messageBegin;
    string description = j.Substring(descriptionBegin + 13, descriptionDiff);
    string message = j.Substring(messageBegin + 9, messageEnd - 9);
    taglessList.Add("Description: " + description + " " + "Message: " + message);
}
else if (j.Contains("<Message>") && (j.Contains("</Message>")))
{
    descriptionDiff = descriptionEnd - descriptionBegin;
    messageDiff = messageEnd - messageBegin;
    string description = j.Substring(descriptionBegin + 13, descriptionDiff);
    string message = j.Substring(messageBegin + 9, messageDiff);
    taglessList.Add("Description: " + description + " " + "Message: " + message);
}
    // SOME ERRORS DIDN'T HAVE <MESSAGE> TAGS, BUT DID HAVE ERROR MESSAGES
if (!j.Contains("<Message>"))
{
    if (j.Contains("<Description>"))
    {
        messageBegin = j.IndexOf("The exception message is: ");
        messageEnd = j.IndexOf(".. ---&amp;gt; System") - 25;
        descriptionDiff = descriptionEnd - descriptionBegin;
        messageDiff = messageEnd - messageBegin;
        string description = j.Substring(descriptionBegin + 13, descriptionDiff);
        string message = j.Substring(messageBegin + 26, messageDiff);
        taglessList.Add("Description: " + description + " " + "Message: " + message);
        continue;
    }
    taglessList.Add(j);
}

Essentially, I’m finding the index of the tag, subtracting the characters of the tag so they’re not picked up, and then having substring grab all the items based on the number of characters between those indexes. It seems a bit complex, but this is honestly the simplest way I could find to do this. I’ve always wanted to follow the idea of “Don’t be clever; be simple”, but I could find no easier alternative in this case.

There’s also a couple more final requirements: only one instance of each error should be shown in the summary, and a count shown of how many times that error occurred in the selected log. Also, the list should be sorted by count, and then alphabetically. I solved these with a little bit of LINQ.

List<string> distinctList = taglessList.Distinct().ToList();
List<DupedItem> finalList = new List<DupedItem>();
// COUNTS DUPES AND ADDS ONLY DISTINCT ITEMS
foreach (string k in distinctList)
{
    int dupeCount = 0;
    foreach (string l in taglessList)
    {
        if (l == k)
        {
        dupeCount++;
        }
    }   
    finalList.Add(new DupedItem
    {
        Error = k,
        dupeCount = dupeCount
    });
}
// SORTS ITEMS BY AMOUNT OF DUPES, THEN ALPHABETICALLY
List<DupedItem> fL = finalList.OrderByDescending(d => d.dupeCount).
                     ThenBy(d => d.Error).ToList();

for (int i = 0; i < fL.Count(); i++)
{
    File.AppendAllText(path + outputFile, "[Total: " + fL[i].dupeCount + "] " + fL[i].Error + nl);
}

Distinct() removes all duplicate instances from a sequence, which I then put the output of that to a list. I then compare the items in the distinct list to the original list, and use dupeCount to keep count of duplicates. I created the DupedItem class to hold both the string and the dupeCount, use LINQ’s OrderByDescending() to sort by dupeCount and then alphabetically, and then append this data to a summarized log using AppendAllText.

Well, that wraps this one up! Please leave comments below or on Facebook/Twitter/LinkedIn if you have any questions or suggestions.

Thanks for reading!