Sorting like a Pro

Every one of us has used the sorting in our applications. Rarely the sorting is implemented by ourselves. In most of the cases the IComparer<T> interface and TComparison<T> functions are used.

To sort a collection the following syntax is used:

FCollection.Sort(TComparer<TMyType>.Construct(
  function (const L, R: TMyType): Integer
  begin
    Result := L.Value – R.Value;
  end));

If you are using the Spring4d framework you can omit the comparer construction and just throw the comparison function directly. Nothing to improve, right?

Multi-field sorting

When it comes to multi-field sorting (ordering) it is also simple and usually done like following:

FCollection.Sort(TComparer<TMyType>.Construct(
  function (const L, R: TMyType): Integer
  begin
    Result := L.Value1 – R.Value1;
    if Result = 0 then
      Result := L.Value2 – R.Value2;
  end));

What I do not like from readability point of view is the following things:

  • You write code to get value for the left/right operand twice (once for left, once for right);
  • You need to care about “comparer – specific” business with integer comparison result;
  • For nested sorting, you need to add the nested IF statements.

Ordering like a pro

Lets sort the list of record by 3 fields now (by ParentID field descending, then by Name case insensitive and then by ID).

type
  TRec = record
    ID, ParentID: Integer;
    Name: String;
  end;
var
  LList: IList<TRec>;

So instead of the following code:

LList.Sort(
  function (const L, R: TRec): Integer
  begin
    Result := R.ParentID - L.ParentID;
    if Result = 0 then
      begin
        Result := CompareText(L.Name, R.Name);
        if Result = 0 then
          Result := L.ID - R.ID;
      end;
  end);

we will have the following code:

LList.Sort(TComparerEx<TRec>.Create
  .OrderByDesc(
    function (const A: TRec): Integer
    begin
      Result := A.ParentID;
    end)
  .OrderBy(
    function (const A: TRec): String
    begin
      Result := A.Name;
    end)
  .OrderBy(
    function (const A: TRec): Integer
    begin
      Result := A.ID;
    end));

And the TComparerEx<T> class implements the IComparerEx<T> interface which descends from the standard IComparer<T>. Each call to the OrderBy constructs a copy of the comparer so that the source instance is not affected in case if you are passing the comparer to the outer world and want to append comparisons there. You can also use Ascending and Descending methods to get a new instance of the comparer with changed direction of the last comparison.

When you use the extended comparer, you get following advantages:

  • Better readable code as there is no comparer-specific code (in most cases);
  • More transparent for the complex types;
  • You write the value retrieve/lookup only once and not twice;
  • Unit tests are much easier for the standard data types as there is no branching and you do not need to test the TComparerEx<T> itself every time.

What you lose:

  • Some kind of compactness in a line count (subjectively negative);
  • Performance drop – not noticeable unless you sort multiple hundreds of thousands of records by 3 fields (for 600’000 records of TRec it is 1050ms vs 440ms when using standard comparison with nested IFs).

The source code for the TComparerEx is available on GitHub. You can use it on your own risk, modify it as you want – I do not give you any guarantees on that.

Delphi still lacks of support for lambda expressions. If there will be lambda support then we will be able to make sorting similar to C# or Java languages in the very nice way like:

FList.OrderByDesc(x => x.ParentID).ThenBy(x => x.Name).ThenBy(x => x.ID);

2 comments

  1. The performance drop comes from the string comparing part. While in the “all in one” delegate the compiler can generate quite efficient code
    CompareText(left.Name, right.Name);
    mov edx,[esi+$08]
    mov eax,[ebx+$08]
    call CompareText
    it generates terribly inefficient code for the OrderByCustom generated delegate. I already optimized some code to determine case sensitivity and ordering direction when creating the comparer and it still looks like this:
    Comparers.pas.480: begin
    004E4358 55 push ebp
    004E4359 8BEC mov ebp,esp
    004E435B 6A00 push $00
    004E435D 6A00 push $00
    004E435F 6A00 push $00
    004E4361 53 push ebx
    004E4362 56 push esi
    004E4363 894DFC mov [ebp-$04],ecx
    004E4366 8BF2 mov esi,edx
    004E4368 8BD8 mov ebx,eax
    004E436A 33C0 xor eax,eax
    004E436C 55 push ebp
    004E436D 68B5434E00 push $004e43b5
    004E4372 64FF30 push dword ptr fs:[eax]
    004E4375 648920 mov fs:[eax],esp
    Comparers.pas.481: Result := -CompareText(AFunc(L), AFunc(R));
    004E4378 8D55F8 lea edx,[ebp-$08]
    004E437B 8B45FC mov eax,[ebp-$04]
    004E437E FF530C call dword ptr [ebx+$0c]
    004E4381 8B45F8 mov eax,[ebp-$08]
    004E4384 50 push eax
    004E4385 8D55F4 lea edx,[ebp-$0c]
    004E4388 8BC6 mov eax,esi
    004E438A FF530C call dword ptr [ebx+$0c]
    004E438D 8B45F4 mov eax,[ebp-$0c]
    004E4390 5A pop edx
    004E4391 E83AB0F3FF call CompareText
    004E4396 F7D8 neg eax
    004E4398 8BD8 mov ebx,eax
    Comparers.pas.482: end);
    004E439A 33C0 xor eax,eax
    004E439C 5A pop edx
    004E439D 59 pop ecx
    004E439E 59 pop ecx
    004E439F 648910 mov fs:[eax],edx
    004E43A2 68BC434E00 push $004e43bc
    004E43A7 8D45F4 lea eax,[ebp-$0c]
    004E43AA BA02000000 mov edx,$00000002
    004E43AF E88866F2FF call @UStrArrayClr
    004E43B4 C3 ret
    004E43B5 E93E5CF2FF jmp @HandleFinally
    004E43BA EBEB jmp $004e43a7
    004E43BC 8BC3 mov eax,ebx
    004E43BE 5E pop esi
    004E43BF 5B pop ebx
    004E43C0 8BE5 mov esp,ebp
    004E43C2 5D pop ebp
    004E43C3 C3 ret
    That’s because of the implicit variable the compiler generates because of the function call.
    While this might look bad at first you would get a similar performance result if you put Name as a property with a getter function.
    Bottom line: yes, if you compare fields of a record or directly field backed properties you might get a performance drop – but that only results in the indirection created by the anonymous “getter” functions.
    In my test with 600k records and Name being a property backed by a function getter I got almost equal results of both ways (including my aforementioned optimizations).

    Liked by 1 person

  2. First of all, i like the idea. If Delphi would support lambda expressions you would get rid of the line overhead (permanently declaring function bla bla bla). One thing i have to tell is about the following code snippet you posted:
    LList.Sort(
    function (const L, R: TRec): Integer
    begin
    Result := R.ParentID – L.ParentID;
    if Result = 0 then
    begin
    Result := CompareText(L.Name, R.Name);
    if Result = 0 then
    Result := L.ID – R.ID;
    end;
    end);
    I am not sure but i think you can just skip the nested if like:
    LList.Sort(
    function (const L, R: TRec): Integer
    begin
    Result := R.ParentID – L.ParentID;
    if Result = 0 then
    Result := CompareText(L.Name, R.Name);
    if Result = 0 then
    Result := L.ID – R.ID;
    end);
    I mean its still not readable, but more readable than the code sample above.

    Liked by 2 people

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s