Custom sort for Linked Lists

Share your advanced PureBasic knowledge/code with the community.
zikitrake
Addict
Addict
Posts: 834
Joined: Thu Mar 25, 2004 2:15 pm
Location: Spain

Re: Custom sort for Linked Lists

Post by zikitrake »

Thank you very much :!: You saved my day. Really useful functions.
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

Many thanks, Fred! :-)

zikitrake, I'm glad that the functions are useful for you.
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

New version 1.22, 2019-01-31

Changes
  • Simplified and improved the MergeSort routine, so that it's even faster now.
    As a consequence, the default value of sInsertionSortMaxSize was reduced from 50 to 35.
  • The procedure SetInsertionSortMaxSize() now returns the previous value of sInsertionSortMaxSize for convenience.
  • Some cosmetic changes.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Custom sort for Linked Lists

Post by wilbert »

Did you consider using a (natural) bottom-up merge sort instead of top-down ?
Bottom-up is non-recursive.
Windows (x64)
Raspberry Pi OS (Arm64)
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

wilbert wrote:Did you consider using a (natural) bottom-up merge sort instead of top-down ?
Thanks for your suggestion!
Yes, I considered using a bottom-up merge sort. Comparing its speed with the speed of the existing version is on my ToDo list.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Custom sort for Linked Lists

Post by wilbert »

Little John wrote:Speed test

The following code demonstrates that the hybrid sorting algorithm is considerably faster than using self-written MergeSort alone. On my system, 35 is a good threshold value.
I got some strange results when I ran your speed test.
I found out it had to do with the copies of lists that you are using. Since they are located in different parts of memory, one list may be in a better place compared to another one.
When I changed your code so every test is done with the same list, the results were very different on my computer.

Code: Select all

;-============  Structures  ============

Structure Person
  Idx.i
  Name.s
  Age.i
EndStructure


Procedure CreateRandomStructures (List x.Person(), n.i)
  Protected m$, a, i, r=Int(n/2)
  
  For i = 1 To r
    m$ = RandomString(Random(1000, 200))
    a = Random(80, 20)
    
    AddElement(x())
    x()\Idx = i*2 - 1
    x()\Name = m$
    x()\Age = a
    
    AddElement(x())
    x()\Idx = i*2
    x()\Name = m$
    x()\Age = a + 10
  Next
EndProcedure

Procedure.i ComparePersons (*pa.Integer, *pb.Integer, mode.i)
  ; -- custom comparison function of type 'ProtoCompare'
  ; in : *pa, *pb: pointers to pointers to structures to be compared
  ;      mode    : mode of comparison:
  ;                #PB_Sort_Ascending/#PB_Sort_Descending
  ; out: return value: #True/#False
  Protected.Person *a = *pa\i, *b = *pb\i     ; dereference the pointers passed as parameters
  
  If (mode & #PB_Sort_Descending)
    ProcedureReturn Bool(*a\Name < *b\Name)
  Else
    ProcedureReturn Bool(*a\Name > *b\Name)
  EndIf
EndProcedure


Define u0, u1, u2, u3, u4

NewList p0.Person()
CreateRandomStructures(p0(), n)

; --------------------------------------------------------------------------

threshold = 35

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
u0 = ElapsedMilliseconds()
SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Name), TypeOf(Person\Name))
u0 = ElapsedMilliseconds() - u0

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
CS::SetInsertionSortMaxSize(2 * threshold)
u1 = ElapsedMilliseconds()
CS::SortListAny(p0(), @ ComparePersons())
u1 = ElapsedMilliseconds() - u1

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
CS::SetInsertionSortMaxSize(threshold)
u2 = ElapsedMilliseconds()
CS::SortListAny(p0(), @ ComparePersons())
u2 = ElapsedMilliseconds() - u2

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
CS::SetInsertionSortMaxSize(0.5 * threshold)
u3 = ElapsedMilliseconds()
CS::SortListAny(p0(), @ ComparePersons())
u3 = ElapsedMilliseconds() - u3

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
CS::SetInsertionSortMaxSize(1)
u4 = ElapsedMilliseconds()
CS::SortListAny(p0(), @ ComparePersons())
u4 = ElapsedMilliseconds() - u4
Windows (x64)
Raspberry Pi OS (Arm64)
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

Hi wilbert!
wilbert wrote: I got some strange results when I ran your speed test.
I found out it had to do with the copies of lists that you are using. Since they are located in different parts of memory, one list may be in a better place compared to another one.
I wasn't aware of that problem. Thank you for your investigations!

Now I also did a speed test the way you proposed (with PB 5.70 x64 on Windows 10). Here is the complete executable code I used:

Code: Select all

EnableExplicit

XIncludeFile "SortListCustom.pbi"

CompilerIf #PB_Compiler_Debugger
   MessageRequester("Error",
                    "Switch the Debugger off!",
                    #PB_MessageRequester_Error)
   End
CompilerEndIf


Procedure.s RandomString (length.i)
   Protected *char.Character, ret$=Space(length)
   
   *char = @ ret$
   While *char\c <> 0
      If Random(1) = 0
         *char\c = Random('Z', 'A')
      Else
         *char\c = Random('z', 'a')
      EndIf
      *char + SizeOf(Character)
   Wend
   
   ProcedureReturn ret$
EndProcedure


;-============  Structures  ============

Structure Person
  Idx.i
  Name.s
  Age.i
EndStructure


Procedure CreateRandomStructures (List x.Person(), n.i)
  Protected m$, a, i, r=Int(n/2)
 
  For i = 1 To r
    m$ = RandomString(Random(1000, 200))
    a = Random(80, 20)
   
    AddElement(x())
    x()\Idx = i*2 - 1
    x()\Name = m$
    x()\Age = a
   
    AddElement(x())
    x()\Idx = i*2
    x()\Name = m$
    x()\Age = a + 10
  Next
EndProcedure

Procedure.i ComparePersons (*pa.Integer, *pb.Integer, mode.i)
  ; -- custom comparison function of type 'ProtoCompare'
  ; in : *pa, *pb: pointers to pointers to structures to be compared
  ;      mode    : mode of comparison:
  ;                #PB_Sort_Ascending/#PB_Sort_Descending
  ; out: return value: #True/#False
  Protected.Person *a = *pa\i, *b = *pb\i     ; dereference the pointers passed as parameters
 
  If (mode & #PB_Sort_Descending)
    ProcedureReturn Bool(*a\Name < *b\Name)
  Else
    ProcedureReturn Bool(*a\Name > *b\Name)
  EndIf
EndProcedure


Define u0, u1, u2, u3, u4
Define msg$, threshold, n = 500000

NewList p0.Person()
CreateRandomStructures(p0(), n)

; --------------------------------------------------------------------------

threshold = 35

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
u0 = ElapsedMilliseconds()
SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Name), TypeOf(Person\Name))
u0 = ElapsedMilliseconds() - u0

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
CS::SetInsertionSortMaxSize(2 * threshold)
u1 = ElapsedMilliseconds()
CS::SortListAny(p0(), @ ComparePersons())
u1 = ElapsedMilliseconds() - u1

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
CS::SetInsertionSortMaxSize(threshold)
u2 = ElapsedMilliseconds()
CS::SortListAny(p0(), @ ComparePersons())
u2 = ElapsedMilliseconds() - u2

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
CS::SetInsertionSortMaxSize(0.5 * threshold)
u3 = ElapsedMilliseconds()
CS::SortListAny(p0(), @ ComparePersons())
u3 = ElapsedMilliseconds() - u3

SortStructuredList(p0(), #PB_Sort_Ascending, OffsetOf(Person\Idx), TypeOf(Person\Idx))
CS::SetInsertionSortMaxSize(1)
u4 = ElapsedMilliseconds()
CS::SortListAny(p0(), @ ComparePersons())
u4 = ElapsedMilliseconds() - u4

; --------------------------------------------------------------------------

msg$ + "-- Sorting structures" + #LF$ +
       "u0 = " + StrD(u0/1000,3) + " Sec. (built-in MergeSort)" + #LF$ +
       "u1 = " + StrD(u1/1000,3) + " Sec."   + #LF$ +
       "u2 = " + StrD(u2/1000,3) + " Sec. <" + #LF$ +
       "u3 = " + StrD(u3/1000,3) + " Sec."   + #LF$ +
       "u4 = " + StrD(u4/1000,3) + " Sec. (self-written MergeSort alone)"

MessageRequester("Duration", msg$)
wilbert wrote: When I changed your code so every test is done with the same list, the results were very different on my computer.
Here are the results on my system with both methods.
Results of my original test wrote:-- Sorting structures
u0 = 0.396 Sec. (built-in MergeSort)
u1 = 1.214 Sec.
u2 = 1.177 Sec. <
u3 = 1.197 Sec.
u4 = 2.679 Sec. (self-written MergeSort alone)
Results of your test wrote:-- Sorting structures
u0 = 0.424 Sec. (built-in MergeSort)
u1 = 1.221 Sec.
u2 = 1.198 Sec. <
u3 = 1.234 Sec.
u4 = 2.742 Sec. (self-written MergeSort alone)
The results are pretty similar. Especially using the proposed threshold (u2) is with both methods faster than using a bigger threshold (u1) or a smaller one (u3).

What are the results that you get on macOS with both my original and your new method?
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Custom sort for Linked Lists

Post by wilbert »

Your original code.
-- Sorting strings
t0 = 0.667 Sec. (built-in MergeSort)
t1 = 2.530 Sec.
t2 = 2.249 Sec. <
t3 = 1.612 Sec.
t4 = 1.829 Sec. (self-written MergeSort alone)

-- Sorting structures
u0 = 3.444 Sec. (built-in MergeSort)
u1 = 1.913 Sec.
u2 = 1.844 Sec. <
u3 = 1.807 Sec.
u4 = 2.197 Sec. (self-written MergeSort alone)
The changed version
-- Sorting structures
u0 = 0.383 Sec. (built-in MergeSort)
u1 = 1.105 Sec.
u2 = 1.050 Sec. <
u3 = 1.034 Sec.
u4 = 1.255 Sec. (self-written MergeSort alone)
A PB linked list is a chain of small parts of allocated memory.
What I suspect is that when using different lists, for some lists, the assigned memory is more fragmented compared to other lists.
If you have to jump around in memory to do comparisons, this might make a significant difference.
That's the only explanation I can think of :?

The string sorting benchmark might also suffer from this but I wouldn't know how to restore that list to its original presorted order in that case.

Edit:
After some more thinking ...
Modern versions of MacOS can use memory compression.
So another possibility could be that all the duplicated lists of the original version take up so much memory that the OS decides to compress the memory.
Windows (x64)
Raspberry Pi OS (Arm64)
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

wilbert wrote:Your original code.
[...]

-- Sorting structures
u0 = 3.444 Sec. (built-in MergeSort)
u1 = 1.913 Sec.
u2 = 1.844 Sec. <
u3 = 1.807 Sec.
u4 = 2.197 Sec. (self-written MergeSort alone)
This is really strange. It looks as if the built-in routine were the slowest one. :D
In contrast, the results of your changed version make sense.

I have run my original speed test rather often (but only on Windows), and always got consistent results. So I think the main cause of the strange results that you encountered might be the different memory handling of Windows and MacOS.

I've changed now my original benchmark code, also for sorting the string lists. (I'm storing the order of the elements in a list of pointers, and use the pointers for rearranging the elements of the string list before each new speed test.)

Thanks again for finding this issue and reporting it!
kinglestat
Enthusiast
Enthusiast
Posts: 732
Joined: Fri Jul 14, 2006 8:53 pm
Location: Malta
Contact:

Re: Custom sort for Linked Lists

Post by kinglestat »

It's nice.....I was going to do something similar (simpler!) unfortunately I couldn't translate it to SpiderBasic...which is rather finicky on a good day
I may not help with your coding
Just ask about mental issues!

http://www.lulu.com/spotlight/kingwolf
http://www.sen3.net
User_Russian
Addict
Addict
Posts: 1443
Joined: Wed Nov 12, 2008 5:01 pm
Location: Russia

Re: Custom sort for Linked Lists

Post by User_Russian »

Why doesn't it sort?

Code: Select all

Structure struct
  x.l
EndStructure

Procedure.i CompareStruc(*pa.struct, *pb.struct, mode.i)
  Res = #False
  
  If *pa\x > *pb\x
    Res = #True
  EndIf
  
  ProcedureReturn Res
EndProcedure

NewList l.struct()

For i=1 To 100
  AddElement(l())
  l()\x = Random(100, 10)
Next

CS::SortListAny(l(), @CompareStruc())

ForEach l()
  Debug l()\x
Next
Little John
Addict
Addict
Posts: 4519
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: Custom sort for Linked Lists

Post by Little John »

I have provided a lot of demo code in this thread. It's all there.
User_Russian wrote:Why doesn't it sort?
Because you are not using a proper custom comparison function.

For usage with SortListAny(), the pointers passed to the comparison function must be dereferenced.
With your code, this function works:

Code: Select all

Procedure.i CompareStruc (*pa.Integer, *pb.Integer, mode.i)
   Protected.struct *a = *pa\i, *b = *pb\i    ; dereference the pointers
   Protected.i Res
   
   Res = #False
   If *a\x > *b\x
      Res = #True
   EndIf
   
   ProcedureReturn Res
EndProcedure
Post Reply