How To Perform A Natural Sort On A List Of Numbers

Hello everyone,

For those who have no idea what the title is about, the post will explain it in due time.

In my log parsing project that I’ve written about in previous posts, I’ve received new requirements to parse another type of log. This type of log has nearly no similarity to the other logs, so parsing this was a whole new ball game. For the sake of keeping this post focused, I’ll go into more details on the new stuff I added at a later date.

These new logs had messages which were associated with threads. These messages would show the initiation of an action, and would log all steps of that action until its success or failure, along with the times that these actions occurred. The new requirements stated that I must show the time the thread started, the amount of time the thread ran, the thread’s ID number and any error message in the case of a failure. I’d gathered this data, used DateTime to parse the times, and then used the Subtract method to find the difference between the first and last message of the thread. I then inserted this TimeSpan output into my custom class object as a string, not knowing the trouble I was stumbling into.

This list of times should be sorted by this difference in descending order, so I performed an OrderByDescending sort in LINQ. However, it sorted them in this fashion: 1, 10, 100, 2, 20, 200, 3, 30, 300, etc. This is because it’s being sorted in ASCII order (computer-friendly) instead of natural order (human-friendly). It would have sorted just fine if my number was stored in an int, but because it was in a string, it sorted it in this fashion. Considering I couldn’t change the type in the class without breaking other things, and using another class for this one scenario would be a bit overkill in my opinion, I searched for other options.

One option was padding, either with zeroes or with spaces. Although this wouldn’t look very pretty, it would solve the problem. This is because zero and a space are both valid ASCII, so they would be sorted in natural order, like so (if using spaces):
1
2
3
10
20
30
100
200
300

The TimeSpan output to a string already is padded, as it’s in a HH:MM:SS.SSS. So, if my time difference was 1 minute, 10 seconds, it would read 00:01:10.000. If I then have a difference of 1 minute, 9 seconds, it would read 00:01:09.000, and so it would be sorted below the first when in descending order. This is the way I did it, because I’m attempting to make a log summarizer. A user would need to read through this summary quickly, so having fields of equal length make reading it easy. Also, it doesn’t give the summary a ‘staircase’ look to it (like the example above).

Also, one could convert the string to an int. This would be a good solution, because LINQ’s OrderBy and OrderByDescending could be used to easily sort this in a natural way. However, TimeSpan or padding may work well as well. It’s a case of figuring out what’s best for your scenario.

…and that’s all I know at this point. Please feel free to give suggestions/comments.

Thanks for reading!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s