Lack of "fast strings"
Re: Lack of "fast strings"
The inline c version is likely copying the string onto the stack so its doing 2 copies it should push the address obtain the length then it can allocate a chunk and write it out fast.
Also we don't know how large the stack is so it's probably reallocating that as well.
Also we don't know how large the stack is so it's probably reallocating that as well.
Re: Lack of "fast strings"
A StringBuilder alone is not the solution. All functions that take strings must look for the null character to determine the end of the string and because the determined length is not stored, each function must calculate the length again and again.
Here is a test code:
My times:
Here is a test code:
Code: Select all
; Fast string functions were taken from the code of mk-soft:
; FastString - Structured Memory String
; https://www.purebasic.fr/english/viewtopic.php?t=75758
Structure udtFastString
Len.i
Char.c[0]
EndStructure
Procedure AllocateString(String.s = "") ; Result String Pointer
Protected len, *String.udtFastString, *pointer
len = StringByteLength(String)
*String = AllocateMemory(len + SizeOf(Integer) + SizeOf(Character))
If *String
*Pointer = @*String\Char
CopyMemoryString(@String, @*pointer)
*String\Len = len
Else
MessageRequester("Fatal Error", "Out Of Memory - Stop", #PB_MessageRequester_Error)
End
EndIf
ProcedureReturn *String
EndProcedure
Macro FreeString(String)
FreeMemory(String)
EndMacro
Procedure.s GetString(*String.udtFastString)
ProcedureReturn PeekS(@*String\Char, *String\Len / SizeOf(Character))
EndProcedure
Procedure RightString(*String.udtFastString, Length)
Protected len, *NewString.udtFastString
Length * SizeOf(Character)
If Length <= *String\Len
len = Length
Else
len = *String\Len
EndIf
*NewString = AllocateMemory(len + SizeOf(Integer) + SizeOf(Character))
If *NewString
CopyMemory(@*String\Char + *String\Len - len, @*NewString\Char, len)
*NewString\Len = len
Else
MessageRequester("Fatal Error", "Out Of Memory - Stop", #PB_MessageRequester_Error)
End
EndIf
ProcedureReturn *NewString
EndProcedure
CompilerIf #PB_Compiler_Debugger
CompilerError "Turn off the debugger mode!"
CompilerEndIf
Define example$, test$
Define.udtFastString *string, *result
Define.q startTime, endTime, time1, time2
example$ = RSet("Hello", 1000000)
*string = AllocateString(example$)
OpenConsole("")
startTime = ElapsedMilliseconds()
For i = 1 To 10000
test$ = Right(example$, 5)
PrintN(test$)
Next
endTime = ElapsedMilliseconds()
time1 = endTime - startTime
ClearConsole()
startTime = ElapsedMilliseconds()
For i = 1 To 10000
*result = RightString(*string, 5)
test$ = GetString(*result)
FreeString(*result)
PrintN(test$)
Next
endTime = ElapsedMilliseconds()
time2 = endTime - startTime
CloseConsole()
FreeString(*string)
MessageRequester("Result", "Right(): " + Str(time1) + "ms" + #CRLF$ +
"RightString(): " + Str(time2) + "ms")
My times:
- Right(): 2143ms
- RightString(): 32ms
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Re: Lack of "fast strings"
Yes, a StringBuilder alone is not A solution for ALL string functions but it can be a quick solution for the concatenation, waiting for a native solution.
Uncalled procedures are not deleted unless you use chi's pbOptimizer or other.
It is good to know that for small chains, without too much manipulation, the current native PB way can be faster. See the example part below:
Otherwise, of course, I would also like to see a native solution in the future for large chains, avoiding the current bottleneck.
Uncalled procedures are not deleted unless you use chi's pbOptimizer or other.
It is good to know that for small chains, without too much manipulation, the current native PB way can be faster. See the example part below:
Otherwise, of course, I would also like to see a native solution in the future for large chains, avoiding the current bottleneck.
Code: Select all
example$ = "Hello World"
OpenConsole("")
startTime = ElapsedMilliseconds()
For i = 1 To 100
test$ = Right(example$, 5)
PrintN(test$)
Next
endTime = ElapsedMilliseconds()
time1 = endTime - startTime
; ClearConsole() : the console has to be in graphical mode
PrintN(~"\n\n----- Fast string functions -----\n\n")
startTime = ElapsedMilliseconds()
*string = AllocateString(example$)
For i = 1 To 100
*result = RightString(*string, 5)
test$ = GetString(*result)
FreeString(*result)
PrintN(test$)
Next
endTime = ElapsedMilliseconds()
time2 = endTime - startTime
FreeString(*string)
MessageRequester("Result", "Right(): " + Str(time1) + "ms" + #CRLF$ +
"RightString(): " + Str(time2) + "ms")
Re: Lack of "fast strings"
Sure, it can help. Just wanted to mention here again that the other string functions also negatively affect the speed, resulting in the program still being too slow in the end, despite using a StringBuilder.
In this example, the additional calls to GetString() and FreeString() are probably too time-consuming. With a native implementation, things could look different again with the speed.
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Re: Lack of "fast strings"
proof of concept hijack the "+" string operator to work with bstrings so the following code
it still has the same runtime as the native PB method but I'm sure I can fix that.
Will output
it still has the same runtime as the native PB method but I'm sure I can fix that.
Code: Select all
Define.bstrings *s1,*s2,*s3
*s1 = Bstr("Hello ")
*s2 = Bstr("BString ")
*s3 = Bstr("world..")
*s1\s + *s2\s + *s3\s
Debug *s1\s
Debug *s1\len
Hello BString world..
42
Code: Select all
;BSTRING Wraps PB strings to produce bstrings
;facilitates using regular + string operator to concat bstrings
DeclareModule BSTRING
EnableExplicit
Structure bStrings
len.l
s.s
EndStructure
Declare Bstr(s.s,*bstr.bStrings=0)
Declare BstrStr(val.i)
EndDeclareModule
Module BSTRING
Global NewList *lStrings.bstrings()
ProcedureC _SYS_PushStringBasePosition();
PushListPosition(*lStrings())
EndProcedure
ProcedureC _SYS_PopStringBasePosition()
ProcedureReturn PopListPosition(*lStrings())
EndProcedure
ProcedureC _SYS_CopyString(*bstr);
Protected *str.bStrings
If *bstr
AddElement(*lStrings())
*lStrings() = *bstr-4
EndIf
EndProcedure
Global lp
ProcedureC _SYS_AllocateString(*str.bstrings,inpos)
Protected size = 65596
Protected pos
Protected *buff
If *str
*str - 4
*buff = AllocateMemory(size)
While NextElement(*lStrings())
If *lStrings()\len + pos > size
While *lStrings()\len + pos > size
size + 65596
Wend
*buff = ReAllocateMemory(*buff,size)
EndIf
CopyMemory(@*lStrings()\s,*buff+pos,*lStrings()\len)
pos + *lStrings()\len
DeleteElement(*lStrings())
Wend
*str\len = pos
*str\s = PeekS(*buff,-1)
FreeMemory(*buff)
EndIf
EndProcedure
_SYS_PushStringBasePosition()
_SYS_CopyString(0)
_SYS_PopStringBasePosition()
_SYS_AllocateString(0,0)
!#define SYS_PushStringBasePosition() bstringXf__sys_pushstringbaseposition();
!#define SYS_CopyString(bstr) bstringXf__sys_copystring(&(bstr));
!#define SYS_PopStringBasePosition() bstringXf__sys_popstringbaseposition();
!#define SYS_AllocateString4(bstr,inpos) bstringXf__sys_allocatestring(bstr,bstringXf__sys_popstringbaseposition());
Procedure Bstr(s.s,*bstr.bStrings=0)
Protected len
len = StringByteLength(s)
If *bstr
*bstr\s = ""
FreeMemory(*bstr)
EndIf
*bstr = AllocateMemory(SizeOf(bStrings))
*bstr\len = len
*bstr\s = s
ProcedureReturn *bstr
EndProcedure
Procedure BstrStr(val.i)
Protected out.s
Debug val
!SYS_PushStringBasePosition();
!PB_Str(v_val,SYS_PopStringBasePosition());
!SYS_AllocateString4(&v_out,SYS_PopStringBasePosition());
ProcedureReturn Bstr(out)
EndProcedure
EndModule
CompilerIf #PB_Compiler_IsMainFile
UseModule BSTRING
Define.bstrings *s1,*s2,*s3,*out
*s1 = Bstr("Hello ")
*s2 = Bstr("BString ")
*s3 = Bstr("world..")
*s1\s + *s2\s + *s3\s ;use normal syntax to concat bstrings together
Debug *s1\s
Debug *s1\len
;Reuse a bstring frees it
*s1 = Bstr("foo",*s1)
Debug *s1\s
Debug *s1\len
*s1 = Bstr("1",*s1)
st = ElapsedMilliseconds()
For a =0 To 10000
*s1\s + *s2\s
Next
et = ElapsedMilliseconds()
Debug *s1\s
Debug et-st
*out = bstrStr(et-st)
MessageRequester("test",*out\s)
CompilerEndIf
Re: Lack of "fast strings"
@idle, thanks for the example.
I found out that firstly the C backend is needed and secondly I got an error message:
PureBasic Debugger
The debugged executable quit unexpectedly.
BTW: After following the exciting competition for the best large string implementation, i have two questions:
At what point should you stop using standard strings for performance reasons?
Is there a rule of thumb?
I found out that firstly the C backend is needed and secondly I got an error message:
PureBasic Debugger
The debugged executable quit unexpectedly.
BTW: After following the exciting competition for the best large string implementation, i have two questions:
At what point should you stop using standard strings for performance reasons?
Is there a rule of thumb?
Mostly running PureBasic <latest stable version and current alpha/beta> (x64) on Windows 11 Home
Re: Lack of "fast strings"
The decision of when to stop using the current native string functions can't be based on the length of the string alone, because how many operations are applied to those long strings is also crucial. Thus, we can't simply say, starting with a string of length x or more, the current native string functions should no longer be used.
But as a rule of thumb:
- Is speed very important?
- Is it to be expected that the strings to be processed are often very long?
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Re: Lack of "fast strings"
If a loop is used, then this leads to noticeable delays. If the program makes you wait for the execution of the operation, then you should take measures to improve the algorithm. This is programming - the ability not to make useless moves.
Re: Lack of "fast strings"
There is no rule of thumb, it's more like, if you have an application which has to handle very long strings, and you are asking yourself "who the heck stands on the brake here?", then you should consider to use one of all the offered possibilities.
As said already, string handling of PB is fine in most situations.
I personally would not really need that to be changed for those handfull of applications, which are "special".
As said already, string handling of PB is fine in most situations.
I personally would not really need that to be changed for those handfull of applications, which are "special".
{Home}.:|:.{Dialog Design0R}.:|:.{Codes}.:|:.{Downloads}.:|:.{History Viewer Online}
Re: Lack of "fast strings"
That's exactly how I see it.HeX0R wrote: ↑Sat Feb 18, 2023 3:06 pm There is no rule of thumb, it's more like, if you have an application which has to handle very long strings, and you are asking yourself "who the heck stands on the brake here?", then you should consider to use one of all the offered possibilities.
As said already, string handling of PB is fine in most situations.
I personally would not really need that to be changed for those handfull of applications, which are "special".
For very large TXT files I use LinkedList. It is much easier and faster to process the data.
To create a large TXT stream, you can use your own or my string builder and send it directly as a memory.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
Re: Lack of "fast strings"
Don't stop using the regular methods unless there's a clear performance bottle neck that will effect you or users of your code.
The performance issue with strings are generally simple to address, just avoid situations where you need to obtain the length and you're 1/2 way there.
PB's string handling is reasonably complicated we have immediate strings stored in the data section, the PB string manger, and then heap managed strings utf8() ascii() The heap managed strings have their length readily available. Windows provides the memory size for heap allocated memory, Linux and OSX doesn't, so the pb memory allocator stores the size of the chunk of memory before the returned pointer. So a call to memorysize() is cheap.
Yes there's an issue with the bstring sample it crashes when the debugger is on but if I reduce the timing loop to a 1000 times it runs, I had found an issues with peeks, when I set the size parameter it could only handle 8192 bytes, not sure what to make of that, when I switch to -1 so it copies until a null is found, it can handle more than that, so not sure there.
I could do the same with fasm but I'm not sure about nasm/yasm macro capability.
Re: Lack of "fast strings"
@idle, @mk-soft. @HeXOR, @AZJIO, @Sicro,
thank you for your explanations.
Now I don't feel so bad when I use the standard functions.
And still I will continue to experiment with the pointers, because it's just fun.
I know, I'm definitely weird.
Happy coding and stay healthy.
thank you for your explanations.
Now I don't feel so bad when I use the standard functions.
And still I will continue to experiment with the pointers, because it's just fun.
I know, I'm definitely weird.
Happy coding and stay healthy.
Mostly running PureBasic <latest stable version and current alpha/beta> (x64) on Windows 11 Home
Re: Lack of "fast strings"
@marcoagpinto
How you are doing your string handling. What is very slow? Can you give us an example?
We had a discussion on string speed in the german forum some weeks ago.
There are some points you can speed up PB's String handling.
I'm writing a PrueBasic Framework for my applications. The code I publish on Github.
There is a String Module where you can see some of the optimations.
It's in developper state! But if you exactly know what you are doing, maybe you can use some parts of it.
There is one greate trick to speed up pasing strings to a procedure.
Use a Prototype to call your Procedure what receives a String. In this case the PB do not copy the string.
This trick you can see in my module!
https://github.com/Maagic7/PureBasicFra ... _STRING.pb
An other way to speed up StringLength-Command is to use Assembler-SSE String-Commands.
I put my proof of concept in the german forum.
https://www.purebasic.fr/german/viewtopic.php?t=32931
Maybe open a seperate Thread in the standard forum, then we can talk about how to optimize String operations with the
code from your applilcation.
How you are doing your string handling. What is very slow? Can you give us an example?
We had a discussion on string speed in the german forum some weeks ago.
There are some points you can speed up PB's String handling.
I'm writing a PrueBasic Framework for my applications. The code I publish on Github.
There is a String Module where you can see some of the optimations.
It's in developper state! But if you exactly know what you are doing, maybe you can use some parts of it.
There is one greate trick to speed up pasing strings to a procedure.
Use a Prototype to call your Procedure what receives a String. In this case the PB do not copy the string.
This trick you can see in my module!
https://github.com/Maagic7/PureBasicFra ... _STRING.pb
An other way to speed up StringLength-Command is to use Assembler-SSE String-Commands.
I put my proof of concept in the german forum.
https://www.purebasic.fr/german/viewtopic.php?t=32931
Maybe open a seperate Thread in the standard forum, then we can talk about how to optimize String operations with the
code from your applilcation.
Re: Lack of "fast strings"
Nice trick, but why can't you pass your strings byref, using a pointer?
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Re: Lack of "fast strings"
Because you have to pass the pointer to the variable where the pointer to the string is stored.
Code: Select all
Procedure foo(*strVal.string)
*strVal\s = UCase(*strVal\s)
EndProcedure
var.string
var\s = "hello world!"
foo(var)
Debug var\s
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive