How To Perform A Natural Sort On A List Of Numbers

Hello everyone,

For those who have no idea what the title is about, the post will explain it in due time.

In my log parsing project that I’ve written about in previous posts, I’ve received new requirements to parse another type of log. This type of log has nearly no similarity to the other logs, so parsing this was a whole new ball game. For the sake of keeping this post focused, I’ll go into more details on the new stuff I added at a later date.

These new logs had messages which were associated with threads. These messages would show the initiation of an action, and would log all steps of that action until its success or failure, along with the times that these actions occurred. The new requirements stated that I must show the time the thread started, the amount of time the thread ran, the thread’s ID number and any error message in the case of a failure. I’d gathered this data, used DateTime to parse the times, and then used the Subtract method to find the difference between the first and last message of the thread. I then inserted this TimeSpan output into my custom class object as a string, not knowing the trouble I was stumbling into.

This list of times should be sorted by this difference in descending order, so I performed an OrderByDescending sort in LINQ. However, it sorted them in this fashion: 1, 10, 100, 2, 20, 200, 3, 30, 300, etc. This is because it’s being sorted in ASCII order (computer-friendly) instead of natural order (human-friendly). It would have sorted just fine if my number was stored in an int, but because it was in a string, it sorted it in this fashion. Considering I couldn’t change the type in the class without breaking other things, and using another class for this one scenario would be a bit overkill in my opinion, I searched for other options.

One option was padding, either with zeroes or with spaces. Although this wouldn’t look very pretty, it would solve the problem. This is because zero and a space are both valid ASCII, so they would be sorted in natural order, like so (if using spaces):
1
2
3
10
20
30
100
200
300

The TimeSpan output to a string already is padded, as it’s in a HH:MM:SS.SSS. So, if my time difference was 1 minute, 10 seconds, it would read 00:01:10.000. If I then have a difference of 1 minute, 9 seconds, it would read 00:01:09.000, and so it would be sorted below the first when in descending order. This is the way I did it, because I’m attempting to make a log summarizer. A user would need to read through this summary quickly, so having fields of equal length make reading it easy. Also, it doesn’t give the summary a ‘staircase’ look to it (like the example above).

Also, one could convert the string to an int. This would be a good solution, because LINQ’s OrderBy and OrderByDescending could be used to easily sort this in a natural way. However, TimeSpan or padding may work well as well. It’s a case of figuring out what’s best for your scenario.

…and that’s all I know at this point. Please feel free to give suggestions/comments.

Thanks for reading!

Advertisements

How To Programmatically Unzip A .Zip File

Hello everyone,

Since writing my last post on the log parser, I’ve added some new functionality and refactored the project. One of those features is a method for unzipping files in the .zip format, as this was something that was taking a long time to do manually for hundreds of files. I found this post on using Windows Shell, which fit my needs just fine for a small amount of written lines. However, if anyone has any suggestions on why this wouldn’t be good or know of a way that would be more elegant, please feel free to leave a comment. The efficiency seems to be just fine for my needs, but wouldn’t mind tips for security or more efficiency.

public static void Unzip(string inputFile)
{
    dynamic shellApplication = Activator.CreateInstance(Type.GetTypeFromProgID("Shell.Application"));
    dynamic compressedFolderContents = shellApplication.NameSpace(inputFile).Items;
    dynamic destinationFolder = shellApplication.NameSpace(Path.GetDirectoryName(inputFile));
    destinationFolder.CopyHere(compressedFolderContents);
}

I implemented multi-select as well, so I have to loop through these selected files in order to unzip them. Also, just for cleanliness, I delete the original zip file after unzipping completes.

using (var oFile = new OpenFileDialog())
{
    oFile.Filter = "Log Files (*.log*)|*.log*";
    oFile.Multiselect = true;
    var result = oFile.ShowDialog();
    if (result == DialogResult.OK)
    {
        inputFile = oFile.FileNames;
        totalNumberOfFiles = inputFile.Count();
        for (int i = 0; i < inputFile.Count(); i++)
        {
            if (inputFile[i].Contains(".zip"))
            {
                string path = Path.GetDirectoryName(inputFile[i]);
                Unzipper.Unzip(inputFile[i]);
                File.Delete(inputFile[i]); // DELETES ZIP FILE
                inputFile[i] = path + "\\" + Path.GetFileNameWithoutExtension(inputFile[i]); // SETS THE INPUT FILE TO THE UNZIPPED FILE
            }
        }
        return inputFile;
    }
    else
    {
        Console.Clear();
        Console.WriteLine("A file was not selected. Press any key to close...");
        Console.ReadKey(true);
        Environment.Exit(0);
        return null;
    }
}

…and that’s all for today. Leave a comment if you have any questions or suggestions on how to make this code better.

Thanks for reading!

How To Parse A Log File

Hello everyone,

I hope everyone’s Independence Day was a great one!

red-white-and-blue_thumb

Recently, I’ve had the opportunity to write a console application that would allow a user to select logs which would then be parsed and summarized for easier viewing. I’m writing on this topic hoping that it will help someone along the way if they encounter a similar issue. Also, I would like some input from others who may have done this in a more efficient or elegant way.

First and foremost, there’s the user interface. Considering this is a console application, it’s a pretty simple thing to do. I started by using the console application template in Visual Studio, and used Console.WriteLine(“String here!”) to display a message for the user to select a log file. To make sure users would read this and know what the application was for, I utilized Console.ReadKey(true) to ensure the app would pause until a key is pressed. I then used OpenFileDialog() to allow the user to select a log file (note: you must be using System.Windows.Forms to utilize this). For the sake of incorrect selection, one can use Filter (see OpenFileDialog link) to show only the file types they want, which in this case I chose files with the ‘.log’ extension. Then, when DialogResult.OK is true, I set the name of the selected file as a variable for later use. Also, I have it show a message in case a file is not selected, and then close the application.

Console.WriteLine("Please select a log file to summarize");
Console.WriteLine("- Press any key to browse for .log file...");
Console.ReadKey(true);
using (var oFile = new OpenFileDialog())
{
    oFile.Filter = "Log Files (*.log)|*.log";
    oFile.Multiselect = false;
    var result = oFile.ShowDialog();
    if (result == DialogResult.OK)
    {
        inputFile = oFile.FileName;
        path = Path.GetDirectoryName(inputFile) + "\\";
    }
else
    {
        Console.Clear();
        Console.WriteLine("A file was not selected. Press any key to close...");
        Console.ReadKey(true);
        Environment.Exit(0);
    }
}

That wraps up the UI, so let’s get to the meat of this. For the most part, the log files look like this:

7/07/14 3:05:35 AM SERVERNAME1
Message: Something happened here.
    at Some.Where
    at Out.There
Module: Some.Service.dll
Severity: Error

Or it could look like this:

7/07/14 3:05:35 AM SERVERNAME1
Message: Something happened here.<Description> Some descriptive stuff here.<Description><SomeOtherTag></SomeOtherTag><Message>Some message here.</Message>
    at Some.Where
    at Out.There
Module: Some.Service.dll
Severity: Error

…or like this:

7/07/14 3:05:35 AM SERVERNAME1
Message: Something happened here.<Description> Some descriptive stuff here.<Description><SomeOtherTag></SomeOtherTag>The exception message is: Some message here.
    at Some.Where
    at Out.There
Module: Some.Service.dll
Severity: Error

The requirements state that only the message and description is needed (without the tags). There’s also another requirement that only Severity: Error messages be summarized, and all others ignored. With that in mind, I utilized StreamReader and used File.OpenText() to open the file. I then looped through the list, and would add each line to the list. Once it reached a line with ‘Severity: ‘ in it, a decision would be made: if the line doesn’t contain ‘Error’, clear the list; if the item does contain ‘Error’, add the item to a second list. I did this so that I could easily separate the errors from the other log items. EDIT: Matt Groves pointed out that I was not disposing of StreamReader when I’d finished with it, which is a no-no. I’ve implemented a Using block for StreamReader so it will be disposed of indirectly. For more information on disposing, refer to the Remarks section of the StreamReader MSDN page.

List<string> errorList = new List<string>();
using (StreamReader reader = File.OpenText(inputFile))
{ 
    string line;
    List<string> tempList = new List<string>();
    while ((line = reader.ReadLine()) != null)
    {
        List<string> items = new List<string>(line.Split('\n'));
        foreach (string item in items)
        {
            tempList.Add(item);
            if (item.Contains("Severity: ") && (!item.Contains("Error")))
            {
                tempList.Clear();
            }
            if (item.Contains("Severity: Error"))
            {
                foreach (string i in tempList)
                {
                    errorList.Add(i);
                }
                tempList.Clear();
            }
        }
    }
}

Now that I have the right items, I can get exactly what I want out of it. This data looks quite a bit like XML, so I attempted to use XML parsing techniques. However, I’d have to remove all the non-XML data for that to work easily, so I moved on. I then tried Regex, but that wasn’t really made to sift through XML-like data. I ended up landing on IndexOf() and Substring(), thanks to a StackOverflow user, and it works well. IndexOf() finds the index of a character or substring within a string, and Substring() returns the substring found.

int descriptionBegin = 0;
int descriptionEnd = 0;
int messageBegin = 0;
int messageEnd = 0;
descriptionBegin = j.IndexOf("<Description>");
descriptionEnd = j.IndexOf("</Description>") - 13;
messageBegin = j.IndexOf("<Message>");
messageEnd = j.IndexOf("</Message>") - 9;
int descriptionDiff = 0;
int messageDiff = 0;
if (j.Contains("<Message>") && (!j.Contains("</Message>")))
{
    messageEnd = j.Length - messageBegin;
    descriptionDiff = descriptionEnd - descriptionBegin;
    messageDiff = messageEnd - messageBegin;
    string description = j.Substring(descriptionBegin + 13, descriptionDiff);
    string message = j.Substring(messageBegin + 9, messageEnd - 9);
    taglessList.Add("Description: " + description + " " + "Message: " + message);
}
else if (j.Contains("<Message>") && (j.Contains("</Message>")))
{
    descriptionDiff = descriptionEnd - descriptionBegin;
    messageDiff = messageEnd - messageBegin;
    string description = j.Substring(descriptionBegin + 13, descriptionDiff);
    string message = j.Substring(messageBegin + 9, messageDiff);
    taglessList.Add("Description: " + description + " " + "Message: " + message);
}
    // SOME ERRORS DIDN'T HAVE <MESSAGE> TAGS, BUT DID HAVE ERROR MESSAGES
if (!j.Contains("<Message>"))
{
    if (j.Contains("<Description>"))
    {
        messageBegin = j.IndexOf("The exception message is: ");
        messageEnd = j.IndexOf(".. ---&amp;gt; System") - 25;
        descriptionDiff = descriptionEnd - descriptionBegin;
        messageDiff = messageEnd - messageBegin;
        string description = j.Substring(descriptionBegin + 13, descriptionDiff);
        string message = j.Substring(messageBegin + 26, messageDiff);
        taglessList.Add("Description: " + description + " " + "Message: " + message);
        continue;
    }
    taglessList.Add(j);
}

Essentially, I’m finding the index of the tag, subtracting the characters of the tag so they’re not picked up, and then having substring grab all the items based on the number of characters between those indexes. It seems a bit complex, but this is honestly the simplest way I could find to do this. I’ve always wanted to follow the idea of “Don’t be clever; be simple”, but I could find no easier alternative in this case.

There’s also a couple more final requirements: only one instance of each error should be shown in the summary, and a count shown of how many times that error occurred in the selected log. Also, the list should be sorted by count, and then alphabetically. I solved these with a little bit of LINQ.

List<string> distinctList = taglessList.Distinct().ToList();
List<DupedItem> finalList = new List<DupedItem>();
// COUNTS DUPES AND ADDS ONLY DISTINCT ITEMS
foreach (string k in distinctList)
{
    int dupeCount = 0;
    foreach (string l in taglessList)
    {
        if (l == k)
        {
        dupeCount++;
        }
    }   
    finalList.Add(new DupedItem
    {
        Error = k,
        dupeCount = dupeCount
    });
}
// SORTS ITEMS BY AMOUNT OF DUPES, THEN ALPHABETICALLY
List<DupedItem> fL = finalList.OrderByDescending(d => d.dupeCount).
                     ThenBy(d => d.Error).ToList();

for (int i = 0; i < fL.Count(); i++)
{
    File.AppendAllText(path + outputFile, "[Total: " + fL[i].dupeCount + "] " + fL[i].Error + nl);
}

Distinct() removes all duplicate instances from a sequence, which I then put the output of that to a list. I then compare the items in the distinct list to the original list, and use dupeCount to keep count of duplicates. I created the DupedItem class to hold both the string and the dupeCount, use LINQ’s OrderByDescending() to sort by dupeCount and then alphabetically, and then append this data to a summarized log using AppendAllText.

Well, that wraps this one up! Please leave comments below or on Facebook/Twitter/LinkedIn if you have any questions or suggestions.

Thanks for reading!

It Seems I Enjoy Peril

For the most part, the last post had described how far I’d gotten in that ASP.NET project. I’m not done by any means, but had to move on because I’d gotten an opportunity to write a desktop app. What this new application would need to do is be able to accept text that comprises of only numbers, would always be either 8 or 9 characters long, would have two columns to view the data that was inserted, and would associate all 9 character items with the last 8 character item that was inserted. This is because this program will be used to scan boxes (8 character barcode on the front) and files within those boxes (9 character barcode on them). Also, the PC this would be used on would be remote and not have easy access to an internal database, so it would need the ability to store this data locally. With that in mind, we decided to export the data into a CSV file in the exact same format as it is in my app.

Considering I’d not done a desktop app yet, I immediately opened Visual C# 2010 Express that I’d already installed on my machine and looked at what project types were available. I saw WPF and Windows Forms, and since I had no idea the difference between the two, I looked them up. I was on a tight schedule, so I didn’t read through all the pages on either (had about a weeks to get solid progress). Instead, I searched for a comparison of the two, and came up with this. Considering some of the people on this post insisted that Windows Forms was easy to get started and well documented, I decided to go with it. However, if I would’ve noticed that this post was 5 years old, I would’ve rethought it. Many of my buddies at Code and Coffee have suggested that this may have been easier if WPF was used, but since I’ve already done most of the project in Windows Forms, I’ll still write about it regardless.

Moving on, I started my Windows Forms project. Right at the very start, you’re given an empty window. One very interesting part of Windows Forms as opposed to Web Forms is the Properties pane down in the bottom right. In this area, you can add and remove a bunch of different things to your Windows Form (you can do this in Web Forms, but it seems to be a bit less useful in my opinion). Also, just like in other projects that have a UI, you can add controls using the toolbox. I generally don’t use this just so I know how to do it manually and understand it more, but considering my limited amount of time, I decided to use it. I added a TreeView, since I figured that would show a good parent/child relationship between the boxes and files. So I started in the same way my other project went, where I created a list of strings and started trying to find a way to add it to the TreeView. TreeView uses Nodes as it’s data containers, so I was looking for ways to add items to them.

This is simple enough though: I did a foreach loop that selected each item in my list and added them using treeview1.Nodes.Add() (where treeview1 is the name of your TreeView). However, when you add an item, it adds it as a root node (as if it’s the top of the tree). I wanted to differentiate between 8 and 9 character items and make 9 character items the child item. So then I looked into child nodes and ran into a brick wall in the form of a lack of time. If you observe this post as a noob, you’ll notice that it has a bit of complexity to it. I had no time for complexity, so I bailed on the TreeView idea and thought a bit further. I then thought it may be a good idea to use a CheckedListBox and change the font style or color when the item is 8 characters long to easily show the difference between the two different kind of items. However, after searching for quite some time and only finding things like this, I decided to try another route.

I went and talked to my boss about what’s really necessary for this project to be sucessful. After clarifying, all I needed was for 8 character items to go into a left column and for 9 character items to go into a right column. Since they’ll be scanned sequentially, I don’t really need any checking; just a raw data feed in a simple list. If there’s any issues, they can easily be sorted afterwards in the CSV output file. Sounding decently easy, I added a ListView. Then, using the Properties box in the bottom right, I added two columns (calling them Boxes and Files) and set GridLines to true (just to make it look better than a blank box). I then used a for loop to run through my list and tried to use an if statement that would add the items to their respective column based on length. The columns have an index number, so I hoped by just trying to add the items to the index, they would keep adding one after another in the column. However, they just kept adding to the first column and was ignoring my if statements. Below is the code, where BoxNumberRepository is the class that holds my list, _boxAndFileList.

for (int i = 0; i < BoxNumberRepository._boxAndFileList.Count; i++)
         {
            var item = BoxNumberRepository._boxAndFileList.Item[i];
            if (item.Length == 8)
            {
                BoxAndFileList.Items.Insert(0, item);
            }
            else
            {
                BoxAndFileList.Items.Insert(1, item);
            }
         }

Next week, I’ll continue to talk about this project and what I did to work around this problem. Until then, have a good week, and thanks for reading!

Learning New (And Secure) Tricks

Last time, I’d explained the start of my form, where I’m now adding items to a list and then adding those items to a checkboxlist. I also said in the last post that I wanted to give users the ability to delete items from the list in case any mistakes were made. This one actually wasn’t too hard to do, but it was difficult at first because I didn’t know the syntax to get it done.

On the event that the Remove button is clicked, I do a foreach loop through the checkboxlist. For each checkbox that’s selected, I remove the item from the list _items based on the checkbox item’s value. Note: when I first did this, I was trying to remove the items from the checkboxlist. That didn’t work, because every time the page reloads, all the items from _items repopulates the checkboxlist, so it just comes right back. It has to be deleted from the source for it to work.

void btn_remove_Click(object sender, EventArgs e)
        {
            ItemNumberRepository _myTempDatabase = new ItemNumberRepository();
            foreach (ListItem checkbox in UPCList.Items)
            {
                if (checkbox.Selected)
                {
                    _items.Remove(checkbox.Value);
                }
            }
            PopulateItemsList();
        }

At this point, I had the ability to add and delete. I’m done, right?… Wrong. Considering this information is pretty sensitive stuff, I wanted to be able to keep track of when items were added and by whom. Also, since two people were going to be keying this information in, I needed a centralized database for them both to add to. Thankfully, we had an instance of MS SQL that wasn’t being used heavily, so I started asking questions about how I can add my items to an SQL table. There’s some frameworks that help, but Matt Groves encouraged me to start humbly by using ADO.NET. That way I know exactly how it all works without being pampered by a framework.

I first started by getting a background of how and why ADO.NET was created. Wikipedia has a pretty good post about it, but I didn’t confirm the resources, so reader beware. I then took a look at the ADO.NET Overview on MSDN, which has many links to great resources that’ll teach about all you’d want to know about it. Also, a quick search through Pluralsight’s library can bring up many good examples and resources.

After wrapping my head around the fundamentals, I jumped in and started writing. I didn’t want to hit the database too much, so I decided to do a final submit that would send the entire list in one shot instead of sending each item individually. However, my boss wants me to have a backup in case the machine crashes in the middle of data entry, so I may go with adding individually if I can’t find a better solution. Enough about that though; on to the code! First, you need a connectionString that specifies what server and table to connect to. This can be put in the config of the project, but I’ve been lazy and just haven’t done it yet. Next, I do a for loop to run through all the items in the checkboxlist and run a command to add all the items into the table and add the account number that’s typed into the acctNum field on each field.

protected void final_submit_Click(object sender, EventArgs e)
        {
            string connectionString =
            "Data Source=server;Initial Catalog=table;"
            + "User Id=userid;"
            + "Password=password;";

            using (SqlConnection connection = new SqlConnection(connectionString))
            {
                for (int i = 0; i < UPCList.Items.Count; i++)
                {
                    string acctnum = AcctNum.Text;
                    string finalSubmit =
                        "INSERT INTO Boxes (BoxNumber)"
                        + "VALUES (@boxnumber)"
                        + "INSERT INTO DestructionOrder (AccountNumber)"
                        + "VALUES (@accountNumber)";

                    SqlCommand command = new SqlCommand(finalSubmit, connection);
                    command.Parameters.AddWithValue("@boxnumber", UPCList.Items[i].ToString());
                    command.Parameters.AddWithValue("@accountNumber", acctnum.ToString());
                    command.Connection.Open();
                    command.ExecuteNonQuery();
                    command.Connection.Close();
                }
            }
        }

The variables @boxnumber amd @accountNumber are parameters that are passed in instead of the live data from the textboxes. This will stop people from being able to inject SQL into my form and mess around with my tables. More info about this and many other security issues can be found at Keith Brown’s ASP.NET Security Pluralsight course.

The insert here is pretty simple: you specify which table (Boxes) and which column (BoxNumber) the data (@boxnumber) should go into (FYI: The pluses(+) are necessary if you decide to put your query on more than one line. They’re not necessary if you put your entire query on the same line). After that, you have to open your connection, execute the command and then close the connection. I would explain why closing the connection immediately is necessary in more detail, but I think Keith Brown does an amazing job at it, so just go watch that.

Well, that’s it for this week. I’m writing this to help people out where I’ve struggled in the past, so if you know anyone that’s new to ASP.NET, webforms, or ADO.NET, please share this with them…and once again, thanks for reading!

Never fear, help is here!

Last time, I’d just realized I didn’t know enough to get the ball rolling on my new ASP.NET Web Forms project. I’d begun to watch some videos on Pluralsight and try to get associated with the fundamentals, but lost focus quickly and couldn’t retain very much information. I was going nowhere fast, and really needed to do well on this project so I could amaze my boss and company so I could get more opportunities in this field.

Considering going to school would’ve taken quite a bit of time (I’m still considering going anyway), I decided to ask for help. Utilizing all the contacts I had on Twitter, I put out a broad tweet for some help. I’d gotten replies from quite a few, but went with the first guy that I was most acquainted with, which was Matt Groves. He and I utilized Skype to speak to each other, and Join.Me for him to give me direct advice. I also received some help from Jonathan Stevens and John Nastase from CodeandCoffee, looked up some things myself, found answers on StackOverflow, and posted some questions myself. All this help and information went into making the application that I’ll be explaining in this post and some of the ones that follow. Without further ado, here goes…

I wanted to give my users the ability to add items to a list and then select and delete them if any mistakes were made. Through some searching, I’d found the CheckBoxList class, which is made of a list and check boxes beside each item that can select multiple items individually. This has to bind to a data source to display the items of that source, so Matt showed me how to add items to a list of strings and then return the items to the CheckBoxList to display them.

The following code shows the button click event for my add button on my form. On this event, I’m setting the text in a field of my form as a variable (so I don’t have to type FRCID.Text every time), then using Trim() to remove all the leading and trailing whitespace from the text input (just in case someone accidentally hits the spacebar when inputting). I then run some validation that makes sure the data being put in is exactly 8 characters long (won’t go into too much detail with this; let me know if you want more info and I’ll elaborate. I will say 2 things though: RegularExpressionValidator and JavaScript’s value.length).

void btn_submit_Click(object sender, EventArgs e)
        {
            var itemCode = FRCID.Text;
            itemCode.Trim();
            Page.Validate();
                if (Page.IsValid == false)
                {
                    Response.Write("Not enough characters");
                }
                string UpcCode = itemCode.TrimNullSafe();
                

Anyway, moving on. In the next area, I had an if statement that was checking for null or empty values. The ! before String.IsNullorEmpty means not, so this line can be read “if the string in FRCID.Text is not null or empty, then move on”. It then adds the item to the list _items and calls PopulateItemsList. In that, the checkboxlist is cleared (UPCList.Items.Clear()) as well as the textbox (I set FRCID.Text as an empty string, which empties the field, so the user doesn’t have to empty the field themselves to add in another item), a new list is created, and then a foreach runs through all the items in _items and adds them to the checkboxlist.

if (!String.IsNullOrEmpty(itemCode))
                {
                    _items.Add(itemCode);
                    PopulateItemsList();
                }
        }
        void PopulateItemsList()
        {
            UPCList.Items.Clear();
            FRCID.Text = string.Empty;
            if (_items == null)
                _items = new List(_itemNumberRepository.GetAllItems());
            foreach (var itemNumber in _items)
                UPCList.Items.Add(itemNumber);
        }

I clear the UPCList because the foreach will add the entire list in _items every time it runs. If I didn’t clear it, it would look something like this:

Type in 321321 -> click Add -> 321321 shows in checkboxlist
Type in 321322 -> click Add -> 321321, 321321, and 321322 shows in checkboxlist
Type in 321323 -> click Add -> 321321, 321321, 321322, 321321, 321322 and 321323 
shows in checkboxlist...and so on.

There’s also a few other things that has been done to this project, but considering this post is getting a bit long, I’ll hold off until next week. Please let me know if you have any questions or comments either here or on Facebook/Twitter…and once again, thanks for reading!