Lack of "fast strings"

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Lack of "fast strings"

Post by idle »

The inline c version is likely copying the string onto the stack so its doing 2 copies it should push the address obtain the length then it can allocate a chunk and write it out fast.

Also we don't know how large the stack is so it's probably reallocating that as well.
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: Lack of "fast strings"

Post by Sicro »

A StringBuilder alone is not the solution. All functions that take strings must look for the null character to determine the end of the string and because the determined length is not stored, each function must calculate the length again and again.

Here is a test code:

Code: Select all

; Fast string functions were taken from the code of mk-soft:
; FastString - Structured Memory String
; https://www.purebasic.fr/english/viewtopic.php?t=75758

Structure udtFastString
  Len.i
  Char.c[0]
EndStructure

Procedure AllocateString(String.s = "") ; Result String Pointer
  Protected len, *String.udtFastString, *pointer
  len = StringByteLength(String)
  *String = AllocateMemory(len + SizeOf(Integer) + SizeOf(Character))
  If *String
    *Pointer = @*String\Char
    CopyMemoryString(@String, @*pointer)
    *String\Len = len
  Else
    MessageRequester("Fatal Error", "Out Of Memory - Stop", #PB_MessageRequester_Error)
    End
  EndIf
  ProcedureReturn *String
EndProcedure

Macro FreeString(String)
  FreeMemory(String)
EndMacro

Procedure.s GetString(*String.udtFastString)
  ProcedureReturn PeekS(@*String\Char, *String\Len / SizeOf(Character))
EndProcedure

Procedure RightString(*String.udtFastString, Length)
  Protected len, *NewString.udtFastString
  Length * SizeOf(Character)
  If Length <= *String\Len
    len = Length
  Else
    len = *String\Len
  EndIf
  *NewString = AllocateMemory(len + SizeOf(Integer) + SizeOf(Character))
  If *NewString
    CopyMemory(@*String\Char + *String\Len - len, @*NewString\Char, len)
    *NewString\Len = len
  Else
    MessageRequester("Fatal Error", "Out Of Memory - Stop", #PB_MessageRequester_Error)
    End
  EndIf
  ProcedureReturn *NewString
EndProcedure

CompilerIf #PB_Compiler_Debugger
  CompilerError "Turn off the debugger mode!"
CompilerEndIf

Define example$, test$
Define.udtFastString *string, *result
Define.q startTime, endTime, time1, time2

example$ = RSet("Hello", 1000000)
*string = AllocateString(example$)

OpenConsole("")

startTime = ElapsedMilliseconds()
For i = 1 To 10000
  test$ = Right(example$, 5)
  PrintN(test$)
Next
endTime = ElapsedMilliseconds()
time1 = endTime - startTime

ClearConsole()

startTime = ElapsedMilliseconds()
For i = 1 To 10000
  *result = RightString(*string, 5)
  test$ = GetString(*result)
  FreeString(*result)
  PrintN(test$)
Next
endTime = ElapsedMilliseconds()
time2 = endTime - startTime

CloseConsole()

FreeString(*string)

MessageRequester("Result", "Right(): " + Str(time1) + "ms" + #CRLF$ +
                           "RightString(): " + Str(time2) + "ms")

My times:
  • Right(): 2143ms
  • RightString(): 32ms
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
ChrisR
Addict
Addict
Posts: 1127
Joined: Sun Jan 08, 2017 10:27 pm
Location: France

Re: Lack of "fast strings"

Post by ChrisR »

Yes, a StringBuilder alone is not A solution for ALL string functions but it can be a quick solution for the concatenation, waiting for a native solution.
Uncalled procedures are not deleted unless you use chi's pbOptimizer or other.

It is good to know that for small chains, without too much manipulation, the current native PB way can be faster. See the example part below:
Otherwise, of course, I would also like to see a native solution in the future for large chains, avoiding the current bottleneck.

Code: Select all

example$ = "Hello World"

OpenConsole("")

startTime = ElapsedMilliseconds()
For i = 1 To 100
  test$ = Right(example$, 5)
  PrintN(test$)
Next
endTime = ElapsedMilliseconds()
time1 = endTime - startTime

; ClearConsole()  : the console has to be in graphical mode
PrintN(~"\n\n----- Fast string functions -----\n\n")

startTime = ElapsedMilliseconds()
*string = AllocateString(example$)
For i = 1 To 100
  *result = RightString(*string, 5)
  test$ = GetString(*result)
  FreeString(*result)
  PrintN(test$)
Next
endTime = ElapsedMilliseconds()
time2 = endTime - startTime

FreeString(*string)

MessageRequester("Result", "Right(): " + Str(time1) + "ms" + #CRLF$ +
                           "RightString(): " + Str(time2) + "ms")
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: Lack of "fast strings"

Post by Sicro »

ChrisR wrote: Thu Feb 16, 2023 1:16 am Yes, a StringBuilder alone is not A solution for ALL string functions but it can be a quick solution for the concatenation, waiting for a native solution.
Sure, it can help. Just wanted to mention here again that the other string functions also negatively affect the speed, resulting in the program still being too slow in the end, despite using a StringBuilder.
ChrisR wrote: Thu Feb 16, 2023 1:16 am It is good to know that for small chains, without too much manipulation, the current native PB way can be faster. See the example part below:
In this example, the additional calls to GetString() and FreeString() are probably too time-consuming. With a native implementation, things could look different again with the speed.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Lack of "fast strings"

Post by idle »

proof of concept hijack the "+" string operator to work with bstrings so the following code
it still has the same runtime as the native PB method but I'm sure I can fix that.

Code: Select all

Define.bstrings *s1,*s2,*s3

*s1 = Bstr("Hello ") 
*s2 = Bstr("BString ") 
*s3 = Bstr("world..") 

*s1\s + *s2\s + *s3\s   

Debug *s1\s 
Debug *s1\len 
Will output
Hello BString world..
42

Code: Select all

;BSTRING Wraps PB strings to produce bstrings  
;facilitates using regular + string operator to concat bstrings 

DeclareModule BSTRING
  
  EnableExplicit 
  
  Structure bStrings  
    len.l
    s.s
  EndStructure 
  
  Declare Bstr(s.s,*bstr.bStrings=0) 
  Declare BstrStr(val.i)  
  
EndDeclareModule 

Module BSTRING 
  
  Global NewList *lStrings.bstrings() 
  
  ProcedureC _SYS_PushStringBasePosition();
    PushListPosition(*lStrings()) 
  EndProcedure 
  
  ProcedureC _SYS_PopStringBasePosition()
    ProcedureReturn PopListPosition(*lStrings())  
  EndProcedure 
  
  ProcedureC _SYS_CopyString(*bstr);
    Protected *str.bStrings 
    
    If *bstr
      AddElement(*lStrings()) 
      *lStrings() = *bstr-4 
    EndIf   
  EndProcedure   
  
  Global lp 
  
  ProcedureC _SYS_AllocateString(*str.bstrings,inpos)
    Protected size = 65596 
    Protected  pos 
    Protected  *buff
    
    If *str
      *str - 4 
      *buff = AllocateMemory(size)
       While NextElement(*lStrings()) 
         If  *lStrings()\len + pos > size 
           While *lStrings()\len + pos > size 
             size + 65596  
           Wend   
           *buff = ReAllocateMemory(*buff,size)
        EndIf   
                     
        CopyMemory(@*lStrings()\s,*buff+pos,*lStrings()\len)  
        pos + *lStrings()\len 
        DeleteElement(*lStrings())   
      Wend   
           
      *str\len = pos 
      *str\s = PeekS(*buff,-1) 
      FreeMemory(*buff) 
      
    EndIf  
    
  EndProcedure  
  
  _SYS_PushStringBasePosition()
  _SYS_CopyString(0)
  _SYS_PopStringBasePosition() 
  _SYS_AllocateString(0,0)

  
  !#define SYS_PushStringBasePosition() bstringXf__sys_pushstringbaseposition();
  !#define SYS_CopyString(bstr) bstringXf__sys_copystring(&(bstr)); 
  !#define SYS_PopStringBasePosition() bstringXf__sys_popstringbaseposition();
  !#define SYS_AllocateString4(bstr,inpos) bstringXf__sys_allocatestring(bstr,bstringXf__sys_popstringbaseposition()); 
    
  Procedure Bstr(s.s,*bstr.bStrings=0) 
    Protected len
    len = StringByteLength(s)
    If *bstr  
      *bstr\s = "" 
      FreeMemory(*bstr) 
    EndIf   
    *bstr = AllocateMemory(SizeOf(bStrings)) 
    *bstr\len = len 
    *bstr\s = s 
    ProcedureReturn *bstr
  EndProcedure   
  
  Procedure BstrStr(val.i) 
    Protected out.s  
    
    Debug val 
    
    !SYS_PushStringBasePosition();
    !PB_Str(v_val,SYS_PopStringBasePosition()); 
    !SYS_AllocateString4(&v_out,SYS_PopStringBasePosition()); 
    
    
    ProcedureReturn Bstr(out)
  EndProcedure   
  
EndModule 

CompilerIf #PB_Compiler_IsMainFile  
  
  UseModule BSTRING
    
  Define.bstrings *s1,*s2,*s3,*out
  
  *s1 = Bstr("Hello ") 
  *s2 = Bstr("BString ") 
  *s3 = Bstr("world..") 
  
  *s1\s + *s2\s + *s3\s   ;use normal syntax to concat bstrings together 
  
  Debug *s1\s 
  Debug *s1\len 
  
  ;Reuse a bstring frees it 
  *s1 = Bstr("foo",*s1) 
  
  Debug *s1\s 
  Debug *s1\len 
  
  *s1 = Bstr("1",*s1) 
  
  st = ElapsedMilliseconds() 
  For a =0 To 10000
    *s1\s + *s2\s 
  Next   
  et = ElapsedMilliseconds() 
  
  Debug *s1\s 
  
  Debug et-st
  
  *out = bstrStr(et-st) 
  MessageRequester("test",*out\s) 
 
    
CompilerEndIf 


Axolotl
Enthusiast
Enthusiast
Posts: 435
Joined: Wed Dec 31, 2008 3:36 pm

Re: Lack of "fast strings"

Post by Axolotl »

@idle, thanks for the example.

I found out that firstly the C backend is needed and secondly I got an error message:
PureBasic Debugger
The debugged executable quit unexpectedly.

BTW: After following the exciting competition for the best large string implementation, i have two questions:
At what point should you stop using standard strings for performance reasons?
Is there a rule of thumb?
Mostly running PureBasic <latest stable version and current alpha/beta> (x64) on Windows 11 Home
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: Lack of "fast strings"

Post by Sicro »

Axolotl wrote: Sat Feb 18, 2023 12:10 pm At what point should you stop using standard strings for performance reasons?
Is there a rule of thumb?
The decision of when to stop using the current native string functions can't be based on the length of the string alone, because how many operations are applied to those long strings is also crucial. Thus, we can't simply say, starting with a string of length x or more, the current native string functions should no longer be used.

But as a rule of thumb:
  • Is speed very important?
  • Is it to be expected that the strings to be processed are often very long?
If you can answer at least one of the questions with "yes", you should not use the current string functions.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
AZJIO
Addict
Addict
Posts: 1312
Joined: Sun May 14, 2017 1:48 am

Re: Lack of "fast strings"

Post by AZJIO »

If a loop is used, then this leads to noticeable delays. If the program makes you wait for the execution of the operation, then you should take measures to improve the algorithm. This is programming - the ability not to make useless moves.
User avatar
HeX0R
Addict
Addict
Posts: 980
Joined: Mon Sep 20, 2004 7:12 am
Location: Hell

Re: Lack of "fast strings"

Post by HeX0R »

There is no rule of thumb, it's more like, if you have an application which has to handle very long strings, and you are asking yourself "who the heck stands on the brake here?", then you should consider to use one of all the offered possibilities.
As said already, string handling of PB is fine in most situations.
I personally would not really need that to be changed for those handfull of applications, which are "special".
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Lack of "fast strings"

Post by mk-soft »

HeX0R wrote: Sat Feb 18, 2023 3:06 pm There is no rule of thumb, it's more like, if you have an application which has to handle very long strings, and you are asking yourself "who the heck stands on the brake here?", then you should consider to use one of all the offered possibilities.
As said already, string handling of PB is fine in most situations.
I personally would not really need that to be changed for those handfull of applications, which are "special".
That's exactly how I see it.
For very large TXT files I use LinkedList. It is much easier and faster to process the data.
To create a large TXT stream, you can use your own or my string builder and send it directly as a memory.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Lack of "fast strings"

Post by idle »

Axolotl wrote: Sat Feb 18, 2023 12:10 pm BTW: After following the exciting competition for the best large string implementation, i have two questions:
At what point should you stop using standard strings for performance reasons?
Is there a rule of thumb?
Don't stop using the regular methods unless there's a clear performance bottle neck that will effect you or users of your code.
The performance issue with strings are generally simple to address, just avoid situations where you need to obtain the length and you're 1/2 way there.

PB's string handling is reasonably complicated we have immediate strings stored in the data section, the PB string manger, and then heap managed strings utf8() ascii() The heap managed strings have their length readily available. Windows provides the memory size for heap allocated memory, Linux and OSX doesn't, so the pb memory allocator stores the size of the chunk of memory before the returned pointer. So a call to memorysize() is cheap.

Yes there's an issue with the bstring sample it crashes when the debugger is on but if I reduce the timing loop to a 1000 times it runs, I had found an issues with peeks, when I set the size parameter it could only handle 8192 bytes, not sure what to make of that, when I switch to -1 so it copies until a null is found, it can handle more than that, so not sure there.
I could do the same with fasm but I'm not sure about nasm/yasm macro capability.
Axolotl
Enthusiast
Enthusiast
Posts: 435
Joined: Wed Dec 31, 2008 3:36 pm

Re: Lack of "fast strings"

Post by Axolotl »

@idle, @mk-soft. @HeXOR, @AZJIO, @Sicro,

thank you for your explanations.
Now I don't feel so bad when I use the standard functions. :)
And still I will continue to experiment with the pointers, because it's just fun. :oops:
I know, I'm definitely weird.

Happy coding and stay healthy.
Mostly running PureBasic <latest stable version and current alpha/beta> (x64) on Windows 11 Home
SMaag
Enthusiast
Enthusiast
Posts: 110
Joined: Sat Jan 14, 2023 6:55 pm

Re: Lack of "fast strings"

Post by SMaag »

@marcoagpinto

How you are doing your string handling. What is very slow? Can you give us an example?
We had a discussion on string speed in the german forum some weeks ago.
There are some points you can speed up PB's String handling.

I'm writing a PrueBasic Framework for my applications. The code I publish on Github.
There is a String Module where you can see some of the optimations.
It's in developper state! But if you exactly know what you are doing, maybe you can use some parts of it.
There is one greate trick to speed up pasing strings to a procedure.
Use a Prototype to call your Procedure what receives a String. In this case the PB do not copy the string.
This trick you can see in my module!

https://github.com/Maagic7/PureBasicFra ... _STRING.pb

An other way to speed up StringLength-Command is to use Assembler-SSE String-Commands.
I put my proof of concept in the german forum.
https://www.purebasic.fr/german/viewtopic.php?t=32931

Maybe open a seperate Thread in the standard forum, then we can talk about how to optimize String operations with the
code from your applilcation.
User avatar
skywalk
Addict
Addict
Posts: 3972
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Lack of "fast strings"

Post by skywalk »

SMaag wrote: Wed Mar 22, 2023 4:36 pmThere is one greate trick to speed up pasing strings to a procedure.
Use a Prototype to call your Procedure what receives a String. In this case the PB do not copy the string.
Nice trick, but why can't you pass your strings byref, using a pointer?
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Lack of "fast strings"

Post by mk-soft »

skywalk wrote: Wed Mar 22, 2023 10:26 pm Nice trick, but why can't you pass your strings byref, using a pointer?
Because you have to pass the pointer to the variable where the pointer to the string is stored.

Code: Select all

Procedure foo(*strVal.string)
  *strVal\s = UCase(*strVal\s)
EndProcedure

var.string
var\s = "hello world!"
foo(var)
Debug var\s
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
Post Reply