SQUINT, Sparse Quad Union Indexed Nibble Trie

Share your advanced PureBasic knowledge/code with the community.
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

SQUINT, Sparse Quad Union Indexed Nibble Trie

Post by idle »

Squint is a O(k*2) compact Trie in the league of crtibit, QP, Hamt Art, and Burst tries both for memory use and speed.
Supports: Set, Get, Enum, Walk, Delete and Prune

examples provided
Autocomplete in a scintilla gadget
SquintMap just compares performance between a map and squint, compile as console with debugger off
FindStrings a basic example of a grep like function

Update 1.0.2
separated test files

https://github.com/idle-PB/SQUINT

Code: Select all

; SQUINT, Sparse Quad Union Indexed Nibble Trie 
; Copyright (c) 2020 Andrew Ferguson aka Idle
; Version 1.0.2 
; PB 5.71 x86 x64 Linux OSX Windows
; Thanks Wilbert for the high low insight to save a bit more memory
; Squint is the result of noting that you can reduce a trie from 256 nodes to 16 nodes
; at only the cost of 2 times the lookup and index by nibbles, then you realise you can
; use a quad to store the index to 16 offsets into a sparse array and also reduce the structure
; down to two control words per node. *vector,(squint.q | value.i): key->squint->*vector\e[offset] 
; resulting in a very compact trie with O(K*2) performance and a memory size 32 times smaller!
; four times smaller than a fixed 16 node trie and eight times smaller again than a 256 node trie.  
; Overhead on x64 reduces to ~8 bytes per byte of input with Unicode16 strings 
; Overhead on x86 reduces to ~6 bytes per byte of input with Unicode16 strings
; In pointer sizes thats ~1 word per word of input on x64 and ~1.5 words on x86
; which are similar to the overheads of a QP Trie thats a 1/3 smaller than a Critbit.
; In terms of speed it should be ~2 times that of a map lookup on average but that's OK as it is O(k*2)
; though if you're frequently looking up values that don't exist it will improve further as it will bail
; out earlier, plus as it's also very cache friendly it may come in closer to 1:1 on lookups. 
; Enumerations are of course magnitudes faster, which is why you'd be using a trie in the 1st place  

; see https://dotat.at/prog/qp/blog-2015-10-04.html
;     https://cr.yp.to/critbit.html 
;     https://en.wikipedia.org/wiki/Trie

; Keys can either be numeric integers, UTF8 or Unicode strings, default is Unicode for convienence
; and both can be used together though UTF8 is better for speed and size with strings. 
;
; Supports Set Get Enum Walk Delete and Prune with a flag in Delete
; 
; todo: add in a string buffer so value can be used for something else and still return the keys and value 

; License MIT
; Copyright (c) 2020 Andrew Ferguson 
; Permission is hereby granted, free of charge, to any person obtaining a copy
; of this software and associated documentation files (the "Software"), to deal
; in the Software without restriction, including without limitation the rights
; To use, copy, modify, merge, publish, distribute, sublicense, and/or sell
; copies of the Software, and to permit persons to whom the Software is
; furnished to do so, subject to the following conditions:
; 
; The above copyright notice and this permission notice shall be included in all
; copies or substantial portions of the Software.
; 
; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS Or
; IMPLIED, INCLUDING BUT Not LIMITED To THE WARRANTIES OF MERCHANTABILITY,
; FITNESS For A PARTICULAR PURPOSE And NONINFRINGEMENT. IN NO EVENT SHALL THE
; AUTHORS Or COPYRIGHT HOLDERS BE LIABLE For ANY CLAIM, DAMAGES Or OTHER
; LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT Or OTHERWISE, ARISING FROM,
; OUT OF Or IN CONNECTION With THE SOFTWARE Or THE USE Or OTHER DEALINGS IN THE
; SOFTWARE. 

DeclareModule SQUINT 
  
  Structure squint_node 
    *vertex.edge       ;vector of child nodes max 16 
    StructureUnion
      squint.q         ;4 bit index of 16 child nodes
      value.i          ;value 
    EndStructureUnion 
  EndStructure   
  
  Structure edge 
    e.squint_node[0] 
  EndStructure 
  
  Structure squint 
    *vt
    size.l
    datasize.l
    count.l
    *root.squint_node 
  EndStructure
  
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86 
    #Squint_Pmask = $ffffffff
  CompilerElse 
    #Squint_Pmask = $ffffffffffff
  CompilerEndIf
  
  Prototype Squint_CB(*key,*userdata=0)
  
  Declare Squint_New() 
  Declare Squint_Free(*this.Squint) 
  Declare Squint_Delete(*this.squint,*key,len,prune=0,stringformat=#PB_Unicode)
  Declare Squint_Set(*this.squint,*key,len,value.i=0,stringformat=#PB_Unicode)
  Declare Squint_Get(*this.squint,*key,len,stringformat=#PB_Unicode)
  Declare Squint_Enum(*this.squint,*key,len,*pfn.squint_CB,*userdata=0,ReturnMatch=0,stringformat=#PB_Unicode)
  Declare Squint_Walk(*this.squint,*pfn.squint_CB,*userdata=0) 
  
  Interface iSquint
    Free()
    Delete(*key,len,prune=0,stringformat=#PB_Unicode)
    Set(*key,len,value.i=0,stringformat=#PB_Unicode)
    Get(*key,len,stringformat=#PB_Unicode)
    Enum(*key,len,*pfn.squint_CB,*userdata=0,ReturnMatch=0,stringformat=#PB_Unicode)
    Walk(*pfn.squint_CB,*userdata=0) 
  EndInterface  
  
  DataSection: vtSquint:
    Data.i @Squint_Free()
    Data.i @Squint_Delete() 
    Data.i @Squint_Set()
    Data.i @Squint_Get()
    Data.i @Squint_Enum()
    Data.i @Squint_Walk() 
  EndDataSection   
  
EndDeclareModule 

Module SQUINT 
  
  EnableExplicit 
  
  Macro SetIndex(in,index,number)
    in = in & ~(15 << (index << 2)) | (number << (index << 2))
  EndMacro
  
  Macro GetNodeCount() 
    CompilerIf #PB_Compiler_Processor = #PB_Processor_x86 
      nodecount = MemorySize(*node\vertex) / SizeOf(squint_node) 
    CompilerElse 
      nodecount = (*node\vertex >> 48)
    CompilerEndIf   
  EndMacro   
  
  Procedure Squint_New() 
    Protected *this.squint,a
    *this = AllocateMemory(SizeOf(squint)) 
    If *this 
      *this\vt = ?vtSquint 
      *this\root = AllocateMemory(SizeOf(squint_node)) 
      *this\root\vertex = AllocateMemory(SizeOf(squint_node)*16) 
      CompilerIf #PB_Compiler_Processor = #PB_Processor_x64 
        *this\root\vertex | (16 << 48)
      CompilerEndIf
      *this\size = 17 * SizeOf(squint_node)
      *this\count=1
      *this\root\squint=-1
      For a = 0 To 15 
        SetIndex(*this\root\squint,a,a)
      Next 
      ProcedureReturn *this
    EndIf 
  EndProcedure  
  
  Procedure Squint_Ifree(*this.squint,*node.squint_node=0) 
    Protected a,offset,nodecount 
    If Not *node 
      ProcedureReturn 0
    EndIf
    For a=0 To 15 
      offset = (*node\squint >> (a<<2)) & $f
      If *node\vertex 
        GetNodeCount()
        If (offset <> 15 Or nodecount = 16)
          Squint_IFree(*this,*node\Vertex\e[offset] & #Squint_Pmask)
        EndIf
      EndIf   
    Next
    If *node\vertex   
      GetNodeCount()
      *this\size - nodecount 
      *this\count - 1 
      FreeMemory(*node\Vertex & #Squint_Pmask) 
      *node\vertex=0
    EndIf
    ProcedureReturn *node
  EndProcedure   
  
  Procedure Squint_Free(*this.squint)
    Protected a,offset,*node.squint_node,nodecount 
    *node = *this\root 
    For a=0 To 15 
      offset = (*node\squint >> (a<<2)) & $f
      If *node\vertex 
        GetNodeCount()
        If (offset <> 15 Or nodecount = 16)
          Squint_IFree(*this,*node)
        EndIf
      EndIf   
    Next  
    FreeMemory(*this\root)
    nodecount = *this\count  
    FreeMemory(*this)
    ProcedureReturn nodecount 
  EndProcedure 
  
  Procedure Squint_Delete(*this.squint,*key,len,prune=0,stringformat=#PB_Unicode) 
    Protected *ikey,ilen,*node.squint_node,a,b,idx,*mem.Ascii,offset,nodecount
    If stringformat = #PB_Unicode 
      *ikey = UTF8(PeekS(*key,len))
      len = MemorySize(*ikey)-1 
    Else 
      *ikey = *key 
      len-1
    EndIf 
    ilen = (len*2)-1
    *node = *this\root   
    For a = 0 To ilen
      *mem = *ikey + (a>>1) 
      idx = (*mem\a >> (((a+1)&1) << 2)) & $f 
      offset = (*node\squint >> (idx<<2)) & $f  
      If *node\vertex 
        GetNodeCount()
        If (offset <> 15 Or nodecount = 16)  
          *node = *node\Vertex\e[offset] & #Squint_Pmask
        EndIf   
      Else 
        If stringformat = #PB_Unicode 
          FreeMemory(*ikey)
        EndIf   
        ProcedureReturn 
      EndIf
    Next 
    If prune 
      Squint_Ifree(*this,*node)
      If (*node\vertex & #Squint_Pmask) = 0 
        *node\squint = 0  
      EndIf 
    Else 
      ilen+2
      For b = a To ilen
        *mem = *ikey + (b>>1)
        idx = (*mem\a >> (((b+1)&1) << 2)) & $f  
        offset = (*node\squint >> (idx<<2)) & $f 
        If *node\vertex 
          GetNodeCount()
          If (offset <> 15 Or nodecount = 16)  
            *node = *node\Vertex\e[offset] & #Squint_Pmask
          EndIf   
          If (*node\vertex & #Squint_Pmask) = 0 
            *node\squint = 0 
          EndIf
        EndIf
      Next  
    EndIf
    If stringformat = #PB_Unicode 
      FreeMemory(*ikey)
    EndIf 
    
  EndProcedure   
  
  Procedure Squint_Set(*this.squint,*key,len,value.i=0,stringformat=#PB_Unicode)
    Protected *node.squint_node,*ikey,ilen,a,idx,*mem.Ascii,offset,nodecount
    *node = *this\root & #Squint_Pmask
    *this\datasize + len 
    If stringformat = #PB_Unicode 
      *ikey = UTF8(PeekS(*key,len))
      len = MemorySize(*ikey)
    Else 
      *ikey = *key
    EndIf 
    ilen = (len*2)-1
    For a = 0 To ilen
      *Mem = *ikey + (a>>1)  
      idx = (*mem\a >> (((a+1)&1) << 2)) & $f  ;look up nibble high low order 
      offset = (*node\squint >> (idx<<2)) & $f ;look up offset in vertext vector  
      If *node\vertex                          ;check it's allocated  
        GetNodeCount()                         ;get node count required for check  
        If offset <= nodecount                 ;if the node is set proceed to next node  
          *node = *node\Vertex\e[offset] & #Squint_Pmask
        Else                                   ;append child node  
          *this\size + SizeOf(squint_node)
          offset = nodecount 
          nodecount+1
          *node\vertex = ReAllocateMemory(*node\vertex & #Squint_Pmask,(nodecount)*SizeOf(squint_node))
          CompilerIf #PB_Compiler_Processor = #PB_Processor_x64 
            *node\vertex | ((nodecount) << 48) 
          CompilerEndIf  
          SetIndex(*node\squint,idx,offset)      ;poke the offset to the squint index 
          *node = *node\Vertex\e[offset] & #Squint_Pmask
        EndIf   
      Else                                     ;allocate a new node 
        *this\size + SizeOf(squint_node)
        *node\vertex = AllocateMemory(SizeOf(squint_Node))
        CompilerIf #PB_Compiler_Processor = #PB_Processor_x64 
          *node\vertex | (1 << 48) 
        CompilerEndIf
        *node\squint = -1 
        SetIndex(*node\squint,idx,0) 
        *node = *node\Vertex\e[0] & #Squint_Pmask
        *this\count+1
      EndIf 
    Next 
         
    If stringformat = #PB_Unicode 
      FreeMemory(*ikey)
    EndIf
    
    If value 
      *node\value = value
    EndIf   
    
    ProcedureReturn *node 
    
  EndProcedure 
  
  Procedure Squint_Get(*this.squint,*key,len,stringformat=#PB_Unicode)
    Protected *ikey,ilen,*node.squint_Node,a,idx,*mem.Ascii,offset,result,nodecount
    If stringformat = #PB_Unicode 
      *ikey = UTF8(PeekS(*key,len))
      len = MemorySize(*ikey) 
    Else 
      *ikey = *key 
    EndIf 
    *node = *this\root & #Squint_Pmask
    ilen = (len*2)-1
    For a = 0 To ilen
      *Mem = *ikey + (a>>1) 
      idx = (*mem\a >> (((a+1)&1) << 2)) & $f  ;look up nibble high low order
      offset = (*node\squint >> (idx<<2)) & $f ;look up offset in vertext vector 
      If *node\vertex   
        GetNodeCount()                         ;get node count 
        If offset <= nodecount                 ;check it's a valid child 
          *node = (*node\Vertex\e[offset] & #Squint_Pmask) 
          result = *node\value
        Else   
          result=0
          Break 
        EndIf
      Else 
        result=0
        Break 
      EndIf
    Next
    If stringformat = #PB_Unicode 
      FreeMemory(*ikey)
    EndIf
    ProcedureReturn result
  EndProcedure 
  
  Procedure Squint_IEnum(*this.squint,*node.squint_Node,depth,*pfn.squint_CB,*userdata=0)
    Protected a.i,offset,nodecount 
    If Not *node 
      ProcedureReturn 0
    EndIf
    For a=0 To 15 
      offset = (*node\squint >> (a<<2)) & $f   ;need to loop 0 to 15 to maintain order
      If (*node\vertex And *node\squint)
        GetNodeCount()                         ;get node count 
        If (offset <> 15 Or nodecount = 16)    ;check it's valid 
          Squint_IEnum(*this,*node\Vertex\e[offset] & #Squint_Pmask,depth+1,*pfn,*userdata)
        EndIf
      EndIf 
    Next
    If (*node\vertex=0 And *node\squint)
      If *pfn
        *pfn(*node\value,*userdata)
      EndIf
    EndIf   
    ProcedureReturn *node
  EndProcedure
  
  Procedure Squint_Enum(*this.squint,*key,len,*pfn.squint_CB,*userdata=0,ReturnMatch=0,stringformat=#PB_Unicode)
    Protected *node.squint_Node,idx,ilen,*mem.Ascii,a,*ikey,lkey,offset,nodecount
    If stringformat = #PB_Unicode    
      *ikey = UTF8(PeekS(*key,len))
      len = MemorySize(*ikey) 
      ilen =((len-1)*2)-1 
    Else 
      *ikey = *key
      ilen =(len*2)-1 
    EndIf 
    If ReturnMatch
      If squint_Get(*this,*ikey,len,stringformat)
        If *pfn
          *pfn(*ikey,*userdata)
        EndIf
        ProcedureReturn
      EndIf
    EndIf   
    *node = *this\root
    For a = 0 To ilen
      *Mem = *ikey + (a>>1) 
      idx = (*mem\a >> (((a+1)&1) << 2)) & $f  ;look up nibble high low order
      offset = (*node\squint >> (idx<<2)) & $f ;look up offset in vertext vector
      If (*node\vertex And *node\squint)       ;check node has children   
        GetNodeCount()                         ;get node count 
        If (offset <> 15 Or nodecount = 16)    ;check it's valid 
          *node = *node\Vertex\e[offset] & #Squint_Pmask
        EndIf
      EndIf
      If Not *node Or *node\vertex = 0
        If stringformat = #PB_Unicode 
          FreeMemory(*ikey)
        EndIf   
        ProcedureReturn 0
      EndIf
    Next   
    Squint_IEnum(*this,*node,a,*pfn,*userdata)
    If stringformat = #PB_Unicode 
      FreeMemory(*ikey)
    EndIf 
  EndProcedure
  
  Procedure Squint_Walk(*this.squint,*pfn.squint_CB,*userdata=0) 
    Squint_IEnum(*this,*this\root,0,*pfn,*userdata)
  EndProcedure  
    
  DisableExplicit
  
EndModule 

Windows 11, Manjaro, Raspberry Pi OS
Image
dcr3
Enthusiast
Enthusiast
Posts: 165
Joined: Fri Aug 04, 2017 11:03 pm

Re: SQUINT, Sparse Quad UTF8 Indexed Nibble Trie

Post by dcr3 »

Very generous of you. Thanks for sharing it :D
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: SQUINT, Sparse Quad UTF8 Indexed Nibble Trie

Post by Kwai chang caine »

Like several of your magicals codes i have not all understand the function of this jewel of code :oops:
Except perhaps to be faster than a MAP :?:
But apparently that works, this is the result on my little VIVO W10 X64 / v5.70 x86
Thanks for sharing 8)
Console wrote:Squint add time = 1714
Map add time = 65
Squint lookup = 1345
Map lookup = 47
Squint enum time = 112
Map enum time = 829
-----------------------------------------------
Squint overhead 2.583663702 Integers per Integer of Data overhead
Squint overhaed 10.334654808 Bytes per Byte of Data overhead
Squint Memory Size 8412936
-----------------------------------------------
testing garbage word 0
-----------------------------------------------
Enum from cat
cat
cat-and-dog
cat-bear
cat-sit
catabolic
catabolise
catabolism
catabolite
catabolize
catachreses
catachresis
catachrestic
cataclasis
cataclastic
cataclysm
catacomb
catadioptric
catadromous
catafalque
catahoula
catalan
catalase
catalepsy
cataleptic
catalina
catalogue
catalogued
catalonia
catalonian
catalpa
catalufa
catalyse
catalysis
catalyst
catalytic
catamaran
cataplectic
cataplexy
catapult
cataract
catarina
catarrh
catastrophe
catastrophiser
catastrophism
catastrophizer
catatonia
catatonic
catawba
catbird
catboat
catcall
catch
catch-all
catchfly
catchlight
catchline
catchpenny
catchphrase
catchup
catchweight
catchword
catchy
catcliffe
catcott
catechesis
catechetical
catechetics
catechin
catechise
catechism
catechist
catechize
catechol
catecholamine
catechumen
categoricity
categorise
categorised
categorize
category
catenary
catenate
catenation
catenative
cater
caterham
caterham-on-the-hill
catering
caterpillar
caterwaul
catesby
catfield
catfight
catfish
catgut
cathar
catharine
catharines
catharism
catharses
catharsis
cathartic
cathay
cathays
cathcart
cathead
cathedine
cathedral
catherine
catherington
catherston
catheter
catheterise
catheterize
cathkin
cathode
cathodoluminescence
catholic
catholicism
catholicity
catholicly
cathy
cation
catkin
catley
catlick
catlike
catlin
catlinite
catmint
catmore
catnap
catnip
cato
caton
caton-with-littledale
catoosa
catrine
catsfield
catshill
catskill
catsuit
cattail
cattal
cattaraugus
catterall
catterick
catterlen
catterline
catterton
cattery
catthorpe
cattistock
cattle
catton
catty
catullus
catv
catwalk
catwick
catworth
-----------------------------------------------
Test Delete 0
-----------------------------------------------
Before Prune of cat 615138 nodes
After Prune of cat 613999 nodes
Test Enum from cat
-----------------------------------------------
Squint add ratio = 26.3692302704
Squint lookup ratio = 28.6170215607
Squint enum ratio = 0.1351025403
Map add ratio = 0.0379229859
Map lookup ratio = 0.0349442363
Map enum ratio = 7.4017858505
ImageThe happiness is a road...
Not a destination
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT, Sparse Quad UTF8 Indexed Nibble Trie

Post by idle »

Kwai chang caine wrote:Like several of your magicals codes i have not all understand the function of this jewel of code :oops:
Except perhaps to be faster than a MAP :?:
But apparently that works, this is the result on my little VIVO W10 X64 / v5.70 x86
Thanks for sharing 8)
It's designed for prefix lookups or enumerating from a given root like in the example "cat", it's very fast for that but lookup for a single key is slower than a map, which is due to the compromise of saving memory at the cost of doing 2 times the lookup on a key, it's on average 2 times slower though it will get very close to a maps look up speed if you start looking for keys that don't exist.

Try it without the debugger and also try it commenting out #CompareToMap ~line 413 to see Autocomplete example

X64 results
Squint add ratio = 3.2068965435
Squint lookup ratio = 1.3999999762
Squint enum ratio = 0.0088495575
Map add ratio = 0.3118279576
Map lookup ratio = 0.7142857313
Map enum ratio = 113
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: SQUINT, Sparse Quad UTF8 Indexed Nibble Trie

Post by Kwai chang caine »

Waaaaoooouhhh !!!! :shock:
I have understand now !!!! it's magical, i have write "CAL" in the editor and a listbox appears as if by magic, with all the words begining by "cal" inside :shock:
Thanks a lot MASTER IDLE for your explanation and the great sharing 8)
It's sure it's really more nice and impressive, when i understand the use :oops: :mrgreen: :lol:
ImageThe happiness is a road...
Not a destination
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT, Sparse Quad UTF8 Indexed Nibble Trie

Post by idle »

Kwai chang caine wrote:Waaaaoooouhhh !!!! :shock:
I have understand now !!!! it's magical, i have write "CAL" in the editor and a listbox appears as if by magic, with all the words begining by "cal" inside :shock:
Thanks a lot MASTER IDLE for your explanation and the great sharing 8)
It's sure it's really more nice and impressive, when i understand the use :oops: :mrgreen: :lol:
It helps when you understand what your looking at. :) Returning sorted data is what a Trie does best. As a structure a Trie is a bit like a map and sorted list combined. Squint optimises for memory and speed, it should use less memory than a map, it also grows quicker than a map and lookup is almost on parity under certain conditions and when you want to output sorted data it's at least 10 times faster than any other methods combined. It's a question that's long been asked, why do we still use maps? A trie is O(k) lookup, where k is the length of your key, a map is O(1) lookup but that's not entirely true as a map degrades to O(n) number of items stored in it's worst case. A map is really more like O(k*1) and Squint is O(k*2) but it's that all of the time.
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: SQUINT, Sparse Quad UTF8 Indexed Nibble Trie

Post by Kwai chang caine »

Master of the sea wrote: It's a question that's long been asked, why do we still use maps?
The best is asking that to "Kwaï chang FRED" :mrgreen:
Because me, i'm already so happy and proud to have understand how simply use MAP 8)
At the begining, i not see the interest :oops: but now...for search sentence quickly..... 8)
In all case, you are too strong for have the courage to try to create something that already exist, but in better :shock:
Thanks again for your talents, who have always impressive me :shock: i'm so proud to know you 8)
ImageThe happiness is a road...
Not a destination
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT, Sparse Quad UTF8 Indexed Nibble Trie

Post by idle »

Thanks for your kind words Kcc.
It can be used just like a map, though as you see it can also return a list of items with a common prefix which are already sorted

Updated to v 1.0.1
Replaced pokeN to SetIndex macro removing the branching
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT, Sparse Quad Union Indexed Nibble Trie

Post by idle »

Updated to v 1.0.2
Redid SetIndex and tidied a bit.
Added FindStrings example a grep like function to find multiple words
Separated tests and put up on github
https://github.com/idle-PB/SQUINT
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
Derren
Enthusiast
Enthusiast
Posts: 313
Joined: Sat Jul 23, 2011 1:13 am
Location: Germany

Re: SQUINT, Sparse Quad Union Indexed Nibble Trie

Post by Derren »

Nice. I have to take a loot at the source some time.
When I tried to implement a Binary Search Tree after learning about them in uni, I found that just adding things to a list and then sorting the list once before a lookup was way faster than my BST.
So I thought, if the whole advantage of a BST, a faster lookup, is not there, why bother...
Since then I have never attempted to create any data-structures again and purely rely on Maps and Lists and Arrays...
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: SQUINT, Sparse Quad Union Indexed Nibble Trie

Post by idle »

Derren wrote:Nice. I have to take a loot at the source some time.
When I tried to implement a Binary Search Tree after learning about them in uni, I found that just adding things to a list and then sorting the list once before a lookup was way faster than my BST.
So I thought, if the whole advantage of a BST, a faster lookup, is not there, why bother...
Since then I have never attempted to create any data-structures again and purely rely on Maps and Lists and Arrays...
That just may have been a case of implementation and the size of the data set you tested it with. I think everyone can benefit from writing their own implementation of a given data structure and PB's good for that since the compiler output is pretty much the naked truth, what you see is what you get and you get to see the consequences of your choices pretty quickly, for instance I replaced one IF statement with logical ops and it yielded a 30% performance increase! So yeah you can gain a lot by trying and trying again...

I'm reasonably satisfied with the performance of squint given that it's a compromise of 2 times the look up for 32 times less memory than a 256 node, plus it's lexicographic enumeration is still magnitudes faster than what you could do with a map lists and arrays. Tries are just as fundamental but you just don't see them as often as their implementation is somewhat domain specific and usually built into operating systems, database engines, regular expression engines, DNS caches or Autocomplete for example.
Windows 11, Manjaro, Raspberry Pi OS
Image
Post Reply