Writing a String Numeric Comparer in .NET: Handling Numeric Sorting with Spans
In this post, Khalid Abuhakmeh explores building a numeric string comparer in .NET using modern C# features like Span APIs. He shares practical code, discusses challenges, and offers guidance for developers handling complex sorting scenarios.
Writing a String Numeric Comparer in .NET: Handling Numeric Sorting with Spans
By Khalid Abuhakmeh
Photo by Terminator 2: Judgement Day
Introduction
.NET 10 introduces a numeric comparer that allows sorting string elements containing numeric values—think movie sequels or software version numbers. As Khalid Abuhakmeh points out, this functionality was previously absent from .NET, and implementing it yourself reveals many subtleties and edge cases, such as:
- Deciding the order of numbers versus Roman numerals
- Parsing decimals
- Determining how to handle subjective sorting scenarios
Khalid shares a custom implementation utilizing new .NET features, particularly C# Span APIs, and reflects on both its strengths and limitations.
Motivating Example: A List of Numbered Things
Numbers at the end of text elements are common for:
- Movies (e.g., “Godfather 3”)
- Software (e.g., “Windows 10”)
- Version numbers (e.g., “1.10”)
Sample C# code to sort such data:
var numberedThings = new List<string> {
"Godfather",
"Godfather 3",
"Scream",
"Scream 2",
"Scream 3",
"Scream 1",
"Windows 10",
"Windows 7",
"Windows 11",
"Rocky 5",
"Rocky 2",
"Rocky 4",
"Rocky 3",
"Rock",
"1.2",
"1.3",
"1.1",
"Rocky",
"Windows XP",
"Godfather 2",
"1.11",
"1.10",
"10.0",
"1.0"
};
var numericOrderer = new NumericOrderer();
var sorted = numberedThings
.OrderBy(x => x, numericOrderer)
.ToList();
foreach (var item in sorted) {
Console.WriteLine(item);
}
Expected output:
1. 0
2. 1
3. 2
4. 3
5. 10
6. 11
7. 0
Godfather
Godfather 2
Godfather 3
Rock
Rocky
Rocky 2
Rocky 3
Rocky 4
Rocky 5
Scream
Scream 1
Scream 2
Scream 3
Windows 7
Windows 10
Windows 11
Windows XP
Implementing NumericOrderer
Using Spans
Khalid provides an implementation using C# Span APIs for efficient string manipulation:
public sealed class NumericOrderer : IComparer<string>
{
public int Compare(string? x, string? y)
{
if (x == null && y == null) return 0;
if (x == null) return -1;
if (y == null) return 1;
var xSpan = x.AsSpan();
var ySpan = y.AsSpan();
var commonPrefixLength = xSpan.CommonPrefixLength(ySpan);
while (commonPrefixLength > 0)
{
xSpan = xSpan[commonPrefixLength..];
ySpan = ySpan[commonPrefixLength..];
commonPrefixLength = xSpan.CommonPrefixLength(ySpan);
}
if (int.TryParse(xSpan, out var xNumber) && int.TryParse(ySpan, out var yNumber))
{
return xNumber.CompareTo(yNumber);
}
return xSpan.CompareTo(ySpan, StringComparison.Ordinal);
}
}
Considerations for Configuration
CommonPrefixLength
can accept a custom comparer, useful for case-insensitive comparisons.- The
CompareTo
method usesStringComparison.Ordinal
(case-sensitive), which can be changed based on your needs. int.TryParse
compares whole numbers; usedouble.TryParse
to support decimals.
Edge Cases and Limitations
Certain edge cases and subjective choices demand attention, including:
- Should numbers be ordered before/after Roman numerals?
- Should Roman numerals be parsed as numbers?
- Decimals and multi-part values (e.g., “1.10”, “1.11”)
- Arbitrary string content such as “Windows XP” versus “Windows 11”
As Khalid notes, building a universal comparer is complex and often overkill. Designing for specific use cases is advisable.
Reflection on the Approach
- The
Span
APIs in C# are efficient for working with string segments without creating extra memory allocations. - For production systems with more complex data, relying on actual numeric or date fields for sorting is preferable to string-based sorting.
- The shared implementation is ideal as a starting point for custom needs, but likely needs further adaptation for specific edge cases.
Conclusion
Khalid ultimately recommends customizing numeric comparers as needed and notes the utility of Span
for low-allocation string work. He suggests that for more robust and reliable sorting in product code, dedicated numeric fields or predictable sortable data are preferable. Still, the presented approach offers a solid foundation for most straightforward scenarios.
About Khalid Abuhakmeh
Khalid is a developer advocate at JetBrains focusing on .NET technologies and tooling.
This post appeared first on “Khalid Abuhakmeh’s Blog”. Read the entire article here