Page 1 of 2

autocomplete Trie

Posted: Wed Jan 15, 2020 12:53 am
by idle
Working on a replacement for Autocomplete with a Trie structure bit of a balancing act trying to constrain memory use, though I think I've got a workable compromise between memory and speed by interpreting the strings in nibbles, this reduces the memory by 8 times at the cost of 2 times the lookup, it should be more efficient than current implementation but before I put in place it should get a look over for any bugs. These kind of algorithms do my head in.

Test code loads mozilla GB dictionary from marco's git, not sure if my cursory pass of the file is 100% but it should show it working


Update 0.95
changed structure to array based to facilitate load, save and catch and to keep the memory size the same as x86 on x64
while it will dynamically grow it will incur excessive page faults, so you need to allocate enough slots

Update 0.96
Mashed it into scintiila autocomplete
*note it's still a little buggy
Return enter selection space bar to cancel
Tested on xp with debugger off works better, linux x64 with debugger seems fine

Update 0.96.1

Code: Select all

;note if your rebuilding this between versions please make sure you deleted the file en-GB_trie.tri
 DeclareModule TRIE
  
  ; Nibbled TRIE, for unicode strings; Access time O(m*2) 
  ; Array based can be saved loaded and caught from memory with compression 
  ; Note node value -1 is reserved for traversal checks  
  ; Author: Idle
  ; Version: 0.96.1
  ; Pb 5.71 lts 
  ; Created: 24/01/2020
  ; Licence MIT
  
  #TRIE_MAX = 1024 ;Max key length in bytes for enumeration of Trie
  
  Structure trie_node
    index.l[16]
    value.l
  EndStructure
  
  Structure Trie 
    *vt
    size.l
    avail.l
    position.l
    datasize.l
    nodecount.l
    count.l
    sb.a[#TRIE_MAX]
    *root.Trie_node
  EndStructure   
  
  Prototype Trie_CB(*Input,len)
  Declare Trie_New(size=512)
  Declare Trie_Free(*this.Trie)
  Declare Trie_Set(*this.Trie,*key,len,value.l=1,mode=#PB_UTF8)
  Declare Trie_Get(*this.Trie,*key,len,mode=#PB_UTF8)
  Declare Trie_iEnum(*this.Trie,*node.Trie_Node,depth,*pfn.Trie_CB) 
  Declare Trie_Enum(*this.Trie,*key,len,*pfn.Trie_CB,ReturnMatch=0,mode=#PB_UTF8)
  Declare Trie_Clear(*this.Trie)
  
  Interface ITrie
    Free()
    Set(*key,StringByteLength,value.l=1,mode=#PB_UTF8)
    Get(*key,StringBytelength,mode=#PB_UTF8)
    Enum(*key,StringBytelength,*pfn.Trie_CB,ReturnMatch=0,mode=#PB_UTF8)
    Clear()
    Load(file.s)
    Save(File.s,CompressLevel=4)
    Catch(Address)
    Stats(item)
  EndInterface   
  
EndDeclareModule

Module TRIE
  
  CompilerIf #PB_Compiler_OS = #PB_OS_Windows
    ImportC "zlib.lib"
      compress2(*dest,*destlen,*source,sourcelen,level)
      uncompress(*dest,*destlen,*source,sourcelen)
    EndImport
  CompilerElse
    ImportC "-lz"   
      compress2(*dest,*destlen,*source,sourcelen,level)
      uncompress(*dest,*destlen,*source,sourcelen)
    EndImport
  CompilerEndIf
  
  EnableExplicit
  ;Thanks Wilbert for a sane version of PeekN and PokeN 
  ;Peek Nibble in order High Low 
  Macro PeekNHL(in,Index,out)
    *Mem.Ascii = in
    *Mem + Index >> 1
    If Index & 1
      out = *Mem\a & $f
    Else
      out = *Mem\a >> 4
    EndIf
  EndMacro
  
  Macro PokeNHL(in, Number, Index)
    *Mem.Ascii = in
    *Mem + Index >> 1
    If Index & 1
      *Mem\a = (*Mem\a & $f0) | (Number & $f)
    Else
      *Mem\a = (*Mem\a & $0f) | (Number << 4)
    EndIf
  EndMacro
  
  Procedure Trie_Stats(*this.TRIE,item=1)
    Select item
      Case 1
        ProcedureReturn *this\datasize
      Case 2
        ProcedureReturn MemorySize(*this\root) 
      Case 3
        ProcedureReturn MemorySize(*this\root) / *this\datasize
      Case 4
        ProcedureReturn *this\nodecount
      Case 5  
        ProcedureReturn *this\count
    EndSelect
  EndProcedure   
  
  Procedure Trie_Clear(*this.Trie)
    Protected idx,*node.Trie_node,size 
    size = MemorySize(*this\root)
    While idx < size
      *node = *this\root+idx
      If *node\value 
        *node\value = -1
      EndIf   
      idx + SizeOf(Trie_node)     
    Wend     
  EndProcedure
  
  Procedure Trie_New(size=512)
    Protected *obj.Trie = AllocateMemory(SizeOf(Trie))
    If *obj
      *obj\vt=?vt_Trie
      *obj\size = size * SizeOf(Trie_node)
      *obj\avail= *obj\size  - SizeOf(Trie_node)
      *obj\position = SizeOf(Trie_node)
      *obj\root = AllocateMemory(*obj\size)
    EndIf
    ProcedureReturn *obj
  EndProcedure
  
  Procedure Trie_Free(*this.Trie)
    FreeMemory(*this\root)
    FreeMemory(*this)
  EndProcedure
  
  ;Set or Unset a value in the Trie, len must be StringByteLength
  Procedure Trie_Set(*this.Trie,*key,len,value.l=1,mode=#PB_UTF8)
    Protected *node.Trie_node,*ikey,ilen,a,idx,*mem.Ascii,offset
    *node = *this\root
    
    If mode <> #PB_UTF8 
      *ikey = UTF8(PeekS(*key,len))
      len = MemorySize(*ikey) 
    Else 
      *ikey = *key 
    EndIf 
    
    *this\datasize + len 
    
    ilen = (len*2)-1
    
    For a = 0 To ilen
      PeekNHL(*ikey,a,idx)
      If *node\index[idx]     
        offset = *node\index[idx]
        *node = *this\root + offset
      ElseIf *this\avail  > 0
        offset = *this\Position
        *node\index[idx] = offset
        *node = *this\root + offset
        *node\value = -1
        *this\nodecount + 1
        *this\avail - SizeOf(Trie_node)
        *this\Position+SizeOf(Trie_node)
      Else
        *this\avail = 512*SizeOf(Trie_node)
        *this\size + *this\avail
        *this\root = ReAllocateMemory(*this\root,*this\size)
        *node = *this\root + offset
        offset = *this\Position
        *node\index[idx] = offset ;
        *node = *this\root + offset
        *node\value = -1
        *this\nodecount + 1
        *this\avail - SizeOf(Trie_node)
        *this\Position+SizeOf(Trie_node)
      EndIf
    Next   
    
    If *node\value
      *this\count+1
      *node\value = value
    EndIf   
    If mode <> #PB_UTF8 
      FreeMemory(*ikey) 
    EndIf   
  EndProcedure
  
  Procedure Trie_Get(*this.Trie,*key,len,mode=#PB_UTF8)
    ;Find the value of the key returns the value or 0
    Protected *ikey,ilen,*node.Trie_Node,a,idx.i,*mem.Ascii
    
    If mode <> #PB_UTF8 
      *ikey = UTF8(PeekS(*key,len))
      len = MemorySize(*ikey) 
    Else 
      *ikey = *key 
    EndIf 
    
    ilen = (len*2)-1
    *node.Trie_node = *this\root
    
    For a = 0 To ilen
      PeekNHL(*ikey,a,idx)
      If *node   
        If *node\index[idx]     
          *node = *this\root + *node\index[idx]
        Else
          If mode <> #PB_UTF8
             FreeMemory(*ikey)
          EndIf 
          ProcedureReturn 0
        EndIf
      Else
        FreeMemory(*ikey)
        ProcedureReturn 0
      EndIf
    Next   
    If mode <> #PB_UTF8 
      FreeMemory(*ikey)
    EndIf   
    If *node\value > -1     
      ProcedureReturn *node\value
    EndIf
    
  EndProcedure
  
  Procedure Trie_iEnum(*this.Trie,*node.Trie_Node,depth,*pfn.Trie_CB)
    Protected a.i,tkey.s,*mem.Ascii
    
    If *node\value = 0
      ProcedureReturn 0
    EndIf
    ;build the key   
    For a=0 To 15
      If *node\index[a] 
        PokeNHL(@*this\sb,a,depth)
        Trie_iEnum(*this,*this\root + *node\index[a],depth+1,*pfn)
      EndIf
    Next
    ;lookup the key value
    If *node\value > 0
      If *pfn
        *pfn(@*this\sb,depth/2-1)
      EndIf
    EndIf   
    ProcedureReturn *node
    
  EndProcedure
  
  Procedure Trie_Enum(*this.Trie,*key,len,*pfn.Trie_CB,ReturnMatch=0,mode=#PB_UTF8)
    ;Enumerate all the entries below the given root key
    Protected *node.Trie_Node,*tnode.Trie_Node,tkey.i,ilen.i,pos.i,*mem.Ascii,a,*ikey,lkey
    If mode <> #PB_UTF8 
      *ikey = UTF8(PeekS(*key,len))
      len = MemorySize(*ikey) 
      ilen =((len-1)*2)-1 
    Else 
      *ikey = *key
      ilen =(len*2)-1 
    EndIf 
    
    FillMemory(@*this\sb[0],#TRIE_MAX,0) 
    
    If ReturnMatch
      If Trie_Get(*this,*ikey,len)
        If *pfn
          *pfn(*ikey,len)
        EndIf
        ProcedureReturn
      EndIf
    EndIf   
    
    *node = *this\root   
    
    For a = 0 To ilen 
      PeekNHL(*ikey,a,tkey)
      *node = *this\root + *node\index[tkey]
      If *node\value = 0
        If mode <> #PB_UTF8 
          FreeMemory(*ikey)
        EndIf   
        ProcedureReturn 0
      EndIf
    Next   
    
    CopyMemory(*ikey,@*this\sb,MemorySize(*ikey))
    Trie_iEnum(*this,*node,a,*pfn)
    
    If mode <> #PB_UTF8 
      FreeMemory(*ikey)
    EndIf   
  EndProcedure
  
  Procedure Trie_SaveToFile(*this.trie,file.s,CompressLevel=4)
    Protected fn,len,*dst,destlen,srclen
    fn = OpenFile(#PB_Any,file)
    If fn
      srclen = MemorySize(*this\root) 
      destlen = (srclen * 1.5) + SizeOf(trie) + 8
      *dst = AllocateMemory(destlen)
      If *dst                                               
        If Compress2(*dst+(SizeOf(trie)+8),@destlen,*this\root,srclen,CompressLevel) = 0   
          PokeL(*dst,srclen)
          PokeL(*dst+4,destlen)
          CopyMemory(*this,*dst+8,SizeOf(Trie))
          len = WriteData(fn,*dst,destlen+SizeOf(trie)+8)
          CloseFile(fn)
          FreeMemory(*dst)
          ProcedureReturn len   
        EndIf
      EndIf
    EndIf   
  EndProcedure   
  
  Procedure Trie_LoadFile(*this.trie,file.s)
    Protected fn,destlen,srclen,*src,*dst
    fn = OpenFile(#PB_Any,file)
    If fn
      destlen = ReadLong(fn) 
      *src = AllocateMemory(destlen)
      srclen = ReadLong(fn)
      *src = AllocateMemory(srclen)
      FreeMemory(*this\root)
      ReadData(fn,*this,SizeOf(trie))
      ReadData(fn,*src,srclen)
      *this\vt=?vt_Trie
      *this\root = AllocateMemory(destlen)
      If uncompress(*this\root,@destlen,*src,srclen) = 0
        FreeMemory(*src)
        CloseFile(fn)
        ProcedureReturn destlen
      EndIf
    EndIf
  EndProcedure   
  
  Procedure Trie_Catch(*this.Trie,*mem,len)
    Protected destlen,srclen,*src
    If *mem
      FreeMemory(*this\root)
      destlen = PeekL(*mem) 
      srclen = PeekL(*mem+4)
      CopyMemory(*mem+8,*this,SizeOf(Trie))
      *this\vt=?vt_Trie 
      *this\root = AllocateMemory(destlen)
      If uncompress(*this\root,@destlen,*mem+SizeOf(Trie)+8,srclen) = 0
        ProcedureReturn destlen
      EndIf
    EndIf
  EndProcedure   
  
  DataSection: vt_Trie:
    Data.i @Trie_Free()
    Data.i @Trie_Set()
    Data.i @Trie_Get()
    Data.i @Trie_Enum()
    Data.i @Trie_Clear()
    Data.i @Trie_LoadFile()
    Data.i @Trie_SaveToFile()
    Data.i @Trie_Catch()
    Data.i @Trie_Stats()
  EndDataSection 
  
EndModule


CompilerIf #PB_Compiler_IsMainFile
  ;set compiler to create exe in current directory
  ;On 1st run it will downloads mozilla GB english dictionary and build the Trie
  ;when you close the window it will compress and save the Trie to file
  ;On 2nd run it will Load the saved Trie
  ;if you uncoment Include binary down the bottom it will catch the Trie from memory
  UseModule TRIE
  EnableExplicit 
  #Auto_Start =0 
  #Auto_Get =1
  #Auto_Enum =2 
  
  Global TmpPath.s = GetTemporaryDirectory() +"en-GB.dic"
  Global TriePath.s = GetPathPart(ProgramFilename()) + "en-GB_trie.tri"
  Global *mt.ITrie = Trie_New(700384) ;specify number of nodes required, realloc can be very slow  
  
  Global index,result
  Global *buf = AllocateMemory(1024) 
  Global itemsize=1024*1024
  Global itempos 
  Global *items = AllocateMemory(itemsize)  ;utf8 list of words for lookup
  Global *text
  
  Procedure ClearItems() 
    FillMemory(*items,itemsize,0) 
  EndProcedure  
  
  Procedure cbTrie(*key,len)
    Protected out.s 
    If (itempos + len + 1) < MemorySize(*items) 
      CopyMemory(*key,*items+itempos,len)
      out = PeekS(*items+itempos,len,#PB_UTF8) 
      Debug out 
      Debug Str(Len(out)) + " " + Str(len) 
      itempos + len 
      PokeA(*items+itempos,32)
      itempos + 1 
    Else 
      itemsize + len + 1 
      *items = ReAllocateMemory(*items,itemsize)
      CopyMemory(*key,*items+itempos,len)
      itempos + len 
      PokeA(*items+itempos,32)
      itempos + 1 
    EndIf   
  EndProcedure   
  
  Procedure LoadDicFile(Dicfile.s,*trie.ITrie)
    Protected fn,fm,pos,word.s,count 
    fn = ReadFile(#PB_Any,Dicfile)
    If fn
      fm = ReadStringFormat(fn)
      While Not Eof(fn)
        word = ReadString(fn,fm,-1)
        pos = FindString(word,"/")
        If pos > 0
          word = Left(word,pos-1)
        EndIf
        word = LCase(word)
        *trie\Set(@word,StringByteLength(word),1,#PB_Unicode)
        count+1
      Wend
      CloseFile(fn)
    EndIf
    ProcedureReturn count
  EndProcedure
  
  Procedure ScintillaCallBack(Gadget, *scinotify.SCNotification)
    Static Auto_state.i     
    Static pos.i,space.i
         
    Protected mods = (#SC_MOD_DELETETEXT | #SC_PERFORMED_USER)
    Select *scinotify\nmhdr\code  
        
      Case #SCN_AUTOCSELECTION
        Debug "selected " + PeekS(*scinotify\text,-1,#PB_UTF8) 
        Auto_state = #Auto_Start 
        ;ScintillaSendMessage(0,#SCI_AUTOCCANCEL) 
        pos = 0 
      Case #SCN_MODIFIED
        If *scinotify\modificationType = mods Or *scinotify\modificationType =  (mods | #SC_STARTACTION)
          pos - *scinotify\length  
          Debug "back spaced " + *scinotify\length 
          If pos < 1 
            Debug "cancel from back space" 
            Auto_state = #Auto_Start 
            ScintillaSendMessage(0,#SCI_AUTOCCANCEL) 
            pos = 0 
          EndIf 
        EndIf   
      Case #SCN_CHARADDED   
        Select *scinotify\ch
          Case $A,$20  
            If Auto_state < #Auto_Enum  
               Debug "by space " + PeekS(*buf,pos,#PB_UTF8) 
               If *mt\Get(*buf,pos+1) = 0 
                 Debug "added "
                 *mt\Set(*buf,pos+1) 
               EndIf 
               Auto_state = #Auto_Start 
               pos=0 
            EndIf   
                      
          Default
            PokeA(*buf+pos,*scinotify\ch)
            If pos > 1 
              Auto_state = #Auto_Get 
              ClearItems()
              *mt\enum(*buf,pos+1,@cbTrie())
              itempos=0
              If ScintillaSendMessage(0,#SCI_AUTOCACTIVE) = 0
                ScintillaSendMessage(0,#SCI_AUTOCSHOW,pos+1,*items)   
              Else 
                ScintillaSendMessage(0,#SCI_AUTOCCANCEL) 
                ScintillaSendMessage(0,#SCI_AUTOCSHOW,pos+1,*items)  
              EndIf 
            EndIf
            pos+1
        EndSelect        
      Default 
        
    EndSelect       
    
  EndProcedure
  
  If ?TRIE_DICTEND - ?TRIE_DICT > 0
    result = *mt\Catch(?TRIE_DICT) 
    Debug "Caught Trie :" + Str(result) + " bytes"
  ElseIf FileSize(TriePath) > 0
    result = *mt\Load(TriePath)
    Debug "Loaded Trie :" + Str(result) + " bytes"
  Else
    InitNetwork()
    If ReceiveHTTPFile("https://raw.githubusercontent.com/marcoagpinto/aoo-mozilla-en-dict/master/en_GB%20(Marco%20Pinto)/en-GB.dic",TmpPath)
      loadDicfile(TmpPath,*mt)
      Debug "raw bytes loaded =" + Str(*mt\Stats(1))
      Debug "size of data stored = " + Str(*mt\Stats(2))
      Debug "data / raw = " + Str(*mt\Stats(3)) 
      Debug "nodes " + Str(*mt\Stats(4))
      Debug "items " + Str(*mt\Stats(5))
    Else
      MessageRequester("OOPS","failed to download")
      End
    EndIf   
  EndIf
  
  If OpenWindow(0, 0, 0, 800, 600, "ScintillaGadget", #PB_Window_SystemMenu | #PB_Window_ScreenCentered)
    
    If InitScintilla()
      ScintillaGadget(0,0,0,800, 600,@ScintillaCallBack())
      ScintillaSendMessage(0,#SCI_AUTOCSETIGNORECASE,#True)
      ScintillaSendMessage(0,#SCI_AUTOCSETAUTOHIDE,0)
      ScintillaSendMessage(0,#SCI_AUTOCSETMAXHEIGHT,10)
      ScintillaSendMessage(0,#SCI_AUTOCSETORDER,#SC_ORDER_PRESORTED)
      ScintillaSendMessage(0, #SCI_STYLESETFORE, 0, RGB(0, 0, 0))
      *Text=UTF8(" ")
      ScintillaSendMessage(0, #SCI_AUTOCSTOPS,0,*text)
      FreeMemory(*Text) 
      *Text=UTF8("Autocomplete test in ScintillaGadget...")
      ScintillaSendMessage(0, #SCI_SETTEXT, 0,*Text)
      FreeMemory(*Text) 
    EndIf
    
    Repeat : Until WaitWindowEvent() = #PB_Event_CloseWindow
  EndIf
  CloseWindow(0) 
  
  If FileSize(TriePath) < 0
    result = *mt\Save(TriePath)
    Debug "Saved Trie :" + Str(result) + " bytes"
  EndIf   
  
  *mt\clear()
  *mt\free()
  
  DataSection : TRIE_DICT:
    ;IncludeBinary #PB_Compiler_FilePath + "en-GB_trie.tri"
    TRIE_DICTEND:
  EndDataSection
  
CompilerEndIf

Re: autocomplete Trie

Posted: Wed Jan 15, 2020 10:04 pm
by box_80
Interesting, might try it out one of these days. Have not been programming much lately. But still enjoy see stuff like this. Thanks for sharing. :D

Re: autocomplete Trie

Posted: Wed Jan 15, 2020 11:29 pm
by Trond
Is this for the IDE?

Can I ask some questions?

What is the current data structured used in the IDE, and does it have any practical problems that justify the need for a more complicated data structure (assuming this is)?

What about removing keys?

Re: autocomplete Trie

Posted: Wed Jan 15, 2020 11:47 pm
by idle
Hello Trond,

Nice to see you back!

Yes that's the plan, current autocomplete is built with a map and lists, it rebuilds each time and sorts the list, it's not ideal, plus it doesn't work half the time on linux but that's a separate issue which needs resolving. While it works a Trie is much better fit for the task, load it once, add to and remove on the fly.
An alternative would be a Fuzzy Search rather than an exact match, that could be better from the onset, though until someones written one a Trie will be an improvement.
I'm still trying to work out what the best strategy is to handle structures and modules, once I've got a handle on how the IDE does it, I should be able to add it using the nodes value to say that the item is a root, like TRIE:: then if UseModule is active you don't want the root in the results, but if you didn't usemodule and typed in TRIE:: it would still enumerate it with the root TRIE::Get() instead of Get() if usemodule was active.

Re: autocomplete Trie

Posted: Thu Jan 16, 2020 12:24 am
by Trond
idle wrote:Nice to see you back!
Thank you.

Remember to benchmark size and time. Sometimes performance isn't intuitive.

I did try to make a trie a long time ago, but couldn't make it work.

Re: autocomplete Trie

Posted: Thu Jan 16, 2020 1:21 am
by idle
It'll be interesting to see how it compares, it's going to take some time to do anyhow and I really want to get the display working on linux before I start butchering it, the enumeration of the Trie could well be slower than the current implementation even once it's optimised.

I'm still not happy with the memory consumption, changing it to utf8 will save ~3rd and it shouldn't impact the enumeration, I could also make it array based rather than heap based which would at least reduce the overhead on x64 by ~1/2

Re: autocomplete Trie

Posted: Thu Jan 16, 2020 1:21 pm
by chi
Works great! (if the debugger is disabled :lol:)

Re: autocomplete Trie

Posted: Thu Jan 16, 2020 3:41 pm
by kenmo
Great start! I'm very excited for IDE contributions. I have a few in progress.

For you I have 1 comment and 1 "bug report"

Comment: If you type a complete word, can you still show other suggestions? Or maybe make that a toggle setting?
If I type "end" I would want to see "EndIf" "EndProcedure" etc.

Bug: I get some wrong results...
type "mur" top result is "mubarak"
"murt" shows "murder" and other similar words with "d" instead of "t"
"ag" shows a lot of words starting with "aw"

Re: autocomplete Trie

Posted: Thu Jan 16, 2020 6:45 pm
by freak
idle wrote:Yes that's the plan, current autocomplete is built with a map and lists, it rebuilds each time and sorts the list, it's not ideal, plus it doesn't work half the time on linux but that's a separate issue which needs resolving. While it works a Trie is much better fit for the task, load it once, add to and remove on the fly.
The list is rebuilt every time because AutoComplete is context-sensitive (procedure and module scope, With-Statements for structured items etc.). The data is not static.

You could probably get rid of the sorting with a new data structure, but you should do some measurements first to see if this is actually problematic enough to be worth the increased complexity.

Re: autocomplete Trie

Posted: Thu Jan 16, 2020 7:18 pm
by Josh
@idle
There are many wrong results. If I type 'bb', there are only 4 entries that really start with 'bb', but I get several hundred results.


@freak
freak wrote:... AutoComplete is context-sensitive (procedure and module scope ...
Really? If I type the following example, then I immediately get a suggestion for 'MyVar'.

Code: Select all

Define MyVar

Procedure MyProc()
  MyV

Re: autocomplete Trie

Posted: Thu Jan 16, 2020 8:32 pm
by kenmo
Josh wrote:Really? If I type the following example, then I immediately get a suggestion for 'MyVar'.
It's definitely procedure-aware. Try below.

But you're right, inside a procedure it suggests a non-global defined outside. That seems like a bug, but a pretty harmless one.

Code: Select all

Global This_Is_Global
Define This_Is_Define

Procedure Test1(This_Is_1.i)
  ; type "this" here
  
EndProcedure

Procedure Test2(This_Is_2.i)
  ; type "this" here
  
EndProcedure
EDIT: Also, hello freak!

Re: autocomplete Trie

Posted: Thu Jan 16, 2020 9:02 pm
by idle
kenmo wrote:Great start! I'm very excited for IDE contributions. I have a few in progress.

For you I have 1 comment and 1 "bug report"

Comment: If you type a complete word, can you still show other suggestions? Or maybe make that a toggle setting?
If I type "end" I would want to see "EndIf" "EndProcedure" etc.

Bug: I get some wrong results...
type "mur" top result is "mubarak"
"murt" shows "murder" and other similar words with "d" instead of "t"
"ag" shows a lot of words starting with "aw"
Thanks and yes I had noticed it was populating the list out of order since I changed it to interpret the key by nibbles, I'm in the process of making it array based to reduce the memory and make it cache friendly, so will sort it out then.

@Josh
Thanks,

@Freak
Yes I understand why it's doing it but it's really not necessary to rebuild it every time to make it context sensitive, each token can have a type identifier and to handle things like modules it can be done a couple of ways. If you have Usemodule TRIE you'd first get to the rootnode of TRIE:: and then enumerate from there, if you don't have usemodule you would just enumerate it from the root as TRIE::Get ..., same will work for structures.
So the Trie can be statically built and added to when needed and the context is dynamic by the node values.

My main concern is to actually fix the display issues on Linux, which has nothing to do with the structure of the code

Re: autocomplete Trie

Posted: Thu Jan 16, 2020 10:53 pm
by freak
If you don't want to rebuild the structure at display time you have to do more work to keep it up to date when edits are made which is an even more performance critical code path. If you spend too much time there the editing will feel sluggish very quickly. The IDE had this problem a lot with the early implementations of AutoComplete when used with larger source files. This is why the current implementation is a compromise: The parsing work during editing is done only with minimal context which also means that reacting to an edit in the code means that only very little parsing is required to update the data (only the edited line in the best case) but there is work needed to build the list when AutoComplete is displayed. I think the resulting performance is well balanced on both sides.

This problem is a lot bigger than just a question of data structures. Trust me, I worked on this part a lot over the years ;)

Re: autocomplete Trie

Posted: Thu Jan 16, 2020 11:05 pm
by freak
@Josh:

You are right, stuff in the main scope is actually treated as "global" even if it isn't. There is still room for improvement there.

It does work well the other way around though:

Code: Select all

Procedure MyProc(ParameterVar)
  Protected ProtectedVar
  Global GlobalVar
  Shared SharedVar  
  Static StaticVar
EndProcedure
The variables inside the procedure are suggested in AutoComplete only when the cursor is actually inside the procedure, except for the Global/Shared one which are visible outside as it should be.

Re: autocomplete Trie

Posted: Fri Jan 17, 2020 12:08 am
by idle
freak wrote:If you don't want to rebuild the structure at display time you have to do more work to keep it up to date when edits are made which is an even more performance critical code path. If you spend too much time there the editing will feel sluggish very quickly. The IDE had this problem a lot with the early implementations of AutoComplete when used with larger source files. This is why the current implementation is a compromise: The parsing work during editing is done only with minimal context which also means that reacting to an edit in the code means that only very little parsing is required to update the data (only the edited line in the best case) but there is work needed to build the list when AutoComplete is displayed. I think the resulting performance is well balanced on both sides.

This problem is a lot bigger than just a question of data structures. Trust me, I worked on this part a lot over the years ;)
I will see if I can improve it, though like I said it might not work out faster, more due to the technicalities of implementing a memory lean trie, reducing it to nibbles and bit indexes in an array should address the memory overhead and eliminate cache thrashing but the added steps to access from nibbles and converting to utf8 will hurt it, though it should still perform close to a map but slower with the added benefit of having sorted output. Also if it's array based it could then be stored and loaded on startup and shutdown eliminating the need to rebuild it.