String length should be stored for string variables

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: String length should be stored for string variables

Post by mk-soft »

Programmed on the fast one to quickly create very large strings.
Can also be reprogrammed to UTF8, for example to save the data directly as UTF8. :wink:

For me it takes about 50 ms to add the 100000 characters

Code: Select all

; Moved to Tricks 'n' Tips
Link: Fast String
Last edited by mk-soft on Sat Jul 25, 2020 1:41 pm, edited 2 times in total.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: String length should be stored for string variables

Post by wilbert »

mk-soft wrote:Programmed on the fast one to quickly create very large strings.
Can also be reprogrammed to UTF8, for example to save the data directly as UTF8. :wink:
It's even possible to store the internal format and create string procedures that can use Ascii, Unicode or UTF8 internally. 8)
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: String length should be stored for string variables

Post by Sicro »

User_Russian wrote:I asked for the same thing about 6 years ago. viewtopic.php?f=3&t=58892
But it is foolish to expect that it will be done sometime. Because requests are rarely fulfilled!
The requests section needs to be closed, because it's useless. viewtopic.php?f=3&t=75576
I always try to find existing feature requests before I open one myself. Your thread is too much about string builders (merging strings) later.
Through my thread I wanted to start the discussion about fast strings again from scratch. So that, also the fast reading of strings is more considered.

At least recently the wish to update the third-party libraries was fulfilled.
But I agree with you, the feature requests should get more reactions from the PB team, e.g. the PB team could tell us where the problems are if the feature requests are not integrated for a long time. So we could maybe help with the solution search. With the feedback from the PB team, creating feature requests would also make it more fun to create feature requests, because then we know that the PB team is paying attention to them.
kenmo wrote:However, it would be a major change to the string library, and could break 20 years of existing code (memory bugs)
Currently I don't see any problems that could break existing code after the integration of the feature request.
mk-soft wrote:The current string management works and is sufficient for 99.99% of applications.
Here in the forum you can often see that codes with pointers are used when it comes to speed instead of using the normal string functions. I think that the PB string functions should be the preferred functions and should therefore be fast, so that codes with pointers are not necessary at all, but only in special cases.
STARGÅTE wrote:I think some confuse a string with a memory buffer.
With a few megabytes of string size there should be no problems. If the size is much more, I also recommend to use memory rather than variables, because AllocateMemory() allows reacting in case of a failure, so the program does not crash in any case, if the memory could not be allocated. We then have the opportunity to intervene and determine what should happen in case of failure. Then I unfortunately have to work with memory functions, although I actually want to use the string functions, because I actually work with strings. Difficult thing.
Rinzwind wrote:Also storing the length with the string makes it possible to use them for any binary storage (since null doesnt have to have special meaning anymore if you choose so). Can be quite convenient.
My feature request is not to abolish the zero termination character. Memory should still be used for binary data. Unlike with string variables, we can not write the string length before the string in the memory, because then there is the danger that already existing codes do not work correctly anymore. After the implementation of this feature request, PeekS() for example must still search for the string end character (null character).
helpy wrote:-1

If internally storing string length with each PB string a new problem would arise ;-)
This problem would occour if you manipulate a PB string using pointers, memory functions and writing directly to the string memory using Poke or other *PointerToCharcter. Manipulating a string this way would not update the internal string length and PB functions would not work correctly ... :-(
Extending string variables with memory functions (e.g. PokeS()) is already risky, because the PB string management doesn't notice that:

Code: Select all

Define text$ = "Hello"
PokeS(@text$, "Hello world!") ; Overflow in a string memory block
NicTheQuick wrote:Isn't there a thing like MemorySize() for a strings buffer? How does this usually work together with AllocateMemory()?
If the operating system already knows how big the memory buffer to a given pointer is, this could be another idea.
For performance reasons PB always allocates more memory for strings than necessary, so that the memory for the string does not have to be reallocated with every small string extension. So your imaginary MemorySize() function would return the string size plus the extra memory size, but not the real string size.

@mk-soft: Thanks for your code, but it only partially solves the problem. You would have to rewrite every PB function that handles with strings, so the implementation in PB must be done natively.
Last edited by Sicro on Sat Jul 25, 2020 6:05 pm, edited 1 time in total.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: String length should be stored for string variables

Post by mk-soft »

The string management of Purebasic is old and needs to be updated.

Here a sad result for comparison ...
Link: viewtopic.php?f=13&t=75750#p557919

So a big plus to the features request after all.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: String length should be stored for string variables

Post by Saki »

mk-soft wrote:The string management of Purebasic is old and needs to be updated.

Here a sad result for comparison ...
Link: viewtopic.php?f=13&t=75750#p557919

So a big plus to the features request after all.
Great mk_soft, primary, i think, we can say, it's buggy :(
地球上の平和
User_Russian
Addict
Addict
Posts: 1443
Joined: Wed Nov 12, 2008 5:01 pm
Location: Russia

Re: String length should be stored for string variables

Post by User_Russian »

Sicro wrote: I edited my first post for a hint in the fixed string structure. With fixed strings we don't need to store the string length, because the memory buffer surely corresponds exactly to the size needed to store the string and the string length can therefore be easily calculated:

Code: Select all

stringLength = MemorySize(*fixedString) / SizeOf(Character)
Your code will show the maximum length of the string, not how many characters it stores.
Compare with this code.

Code: Select all

s.s{128}

s="1234"
Debug Len(s)
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: String length should be stored for string variables

Post by Sicro »

@User_Russian: Argh, yes, of course. I reversed my edit.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: String length should be stored for string variables

Post by BarryG »

kenmo wrote: Sat Jul 18, 2020 7:40 pmHowever it would be a major change to the string library, and could break 20 years of existing code (memory bugs)
How? It causes no changes to our source codes, so old compiled exes would still work the same, and new compiled exes would just use the new compiled string code. So what's to break?

[Edit] Fixed a typo.
Last edited by BarryG on Tue Oct 18, 2022 8:21 am, edited 1 time in total.
User avatar
kenmo
Addict
Addict
Posts: 1967
Joined: Tue Dec 23, 2003 3:54 am

Re: String length should be stored for string variables

Post by kenmo »

Hi @BarryG , that was 2 years ago so I don't know exactly what bugs I was thinking of. But off the top of my head:

- sometimes in PB, people allocate a string variable, and then poke a different string over it (or an API call writes over it)

- sometimes people poke a NUL char to truncate a string shorter

In these two cases, the stored string length could become out of sync with the actual contents... I'm assuming null chars would still be terminators in this hypothetical new PureBasic string library.
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: String length should be stored for string variables

Post by Sicro »

kenmo wrote: Tue Oct 18, 2022 4:12 am- sometimes in PB, people allocate a string variable, and then poke a different string over it (or an API call writes over it)
Yes, I obviously didn't think of that at the time when I created that feature request, although I also did that in my codes sometimes. Mostly this is done with WinAPI functions because they are compatible with the PB strings unlike the other API functions of the other OS.

In this case the following has to be done after the API function call, which means that old code has to be adapted and thus there is no backwards compatibility:

Code: Select all

result$ = Space(255)
WinAPI_Function(@result$)
result$ = PeekS(@result$)
Debug result$
or with a new function UpdateStringLength() that does not require creation of a new string:

Code: Select all

result$ = Space(255)
WinAPI_Function(@result$)
; -1 = search for the null character
UpdateStringLength(result$, -1)
Debug result$

Code: Select all

result$ = Space(255)
length = WinAPI_Function(@result$)
UpdateStringLength(result$, length)
Debug result$
kenmo wrote: Tue Oct 18, 2022 4:12 am- sometimes people poke a NUL char to truncate a string shorter
That's apparently pretty rare. I can't remember where I've seen this.

----------------

Also, the new format for fixed strings that I suggested in the opening post is problematic because fixed strings are inserted directly into the structure instead of just the pointer, as is the case with normal strings. The structure is then different after the implementation of this feature and this results in no backward compatibility to old code.

----------------

The alternatives with full backward compatibility with old code would be:
  • The new string variable type is implemented as an addition to the existing string variable type. All PB string functions must then support two different string variable types.

    Code: Select all

    newString.z = "Hello"
    oldString.s = newString
    alterString + " World"
    newString = oldString
    Debug newString ; Outputs 'Hello World'
    Debug oldString ; Outputs 'Hello World'

    Code: Select all

    oldString.s{5} = "Hello World"
    newString.z{5} = "Hello World"
    Debug oldString ; Outputs 'Hello'
    Debug newString ; Outputs 'Hello'
    
    The new '.z' declares the new string variable type.
  • Each string function gets an optional parameter that can be used to pass the already known string length to the function, so that the function does not have to calculate the string length again.

    Code: Select all

    oldString.s = "Hello World"
    length = Len(oldString)
    Debug Left(oldString, 5, length) ; Outputs 'Hello'
    Debug Right(oldString, 5, length) ; Outputs 'World'
    
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: String length should be stored for string variables

Post by mk-soft »

I don't see a problem here, since PB requests the memory for the string, but the string functions go to zero bytes. There is no memory leak either, since all memory is freed. Even if the null byte is no longer at the end.

In an API that passes a string as a parameter, the length of the buffer must always be specified as well.
If it is very old API's that can only ASCII, must be worked anyway with memory buffer.

Code: Select all


; API Dummy
Procedure AnyApiW(*String, cbByte)
  Protected r1.s, len
  r1 = "Hello World"
  len = StringByteLength(r1)
  If *String
    If len <= cbByte
      PokeS(*String, r1)
      ProcedureReturn len
    EndIf
  Else
    ProcedureReturn Len
  EndIf
EndProcedure

Debug "****"
t1.s = Space(1024)
r1 = AnyApiW(@t1, StringByteLength(t1))
Debug Left(t1, 5)
Debug Right(t1, 5)

Debug "****"
r1 = AnyApiW(0, 0)
t1.s = Space(r1 >> 1)
r1 = AnyAPIW(@t1, StringByteLength(t1))
Debug Left(t1, 5)
Debug Right(t1, 5)

Debug "****"
Structure sData
  iVal.i
  text.s{20}
  null.w
EndStructure

Define d1.sData
r1 = AnyApiW(@d1\text, 40)
ShowMemoryViewer(d1, 80)
Debug Left(d1\text, 5)
Debug Right(d1\text, 5)

My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: String length should be stored for string variables

Post by Sicro »

@mk-soft

After this feature is implemented, the PB string functions Left(), Right() etc. no longer search for a null character, but use the string length field value prefixed to the string. The problem: The WinAPI function does not update the value of the string length field.

Code: Select all

Procedure WinAPI_Function(*value)
  PokeS(*value, "Test")
EndProcedure

Define value$ = "Example"
; value$\charLength = 7

WinAPI_Function(@value$)

Debug value$
; Outputs 'Testple'
; value$\charLength = 7
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: String length should be stored for string variables

Post by mk-soft »

The normal string functions are terminated with zero bytes. Like now with PB.
If change to Type B-STR in PB, this leads of course to problems.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
AZJIO
Addict
Addict
Posts: 1312
Joined: Sun May 14, 2017 1:48 am

Re: String length should be stored for string variables

Post by AZJIO »

Sicro wrote: Sun Nov 13, 2022 5:53 pm After this feature is implemented, the PB string functions Left(), Right() etc. no longer search for a null character
I was trying to figure out which way is faster. If the length of the string is known, then you use "For i=1 To Len", and if not known, then you use the character test "While *c\c". In either case, you check whether the counter is longer than the string, or whether the character is null.

The idea with the new .z string type is interesting, it would allow the new feature without deleting the old and then deciding whether to delete the old way.
wilbert wrote: Mon Jul 20, 2020 6:11 am And some functions like Split and Join would be a welcome addition.
I assumed that the authors do not add string functions for the reason that they can be made independently based on the existing functionality. Why not make an additional section in the help file with examples of interesting solutions. Many beginners find it difficult to write functions on their own, so if these functions were offered as ready-made functionality in the help file, it would improve the attractiveness of the language during the learning phase.
juergenkulow
Enthusiast
Enthusiast
Posts: 544
Joined: Wed Sep 25, 2019 10:18 am

Re: String length should be stored for string variables

Post by juergenkulow »

Code: Select all

; Linux only x64 8 Byte before string. 
For i=1 To 35 Step 1
  s.s="TEST"+Space(i)
  *plen.Integer=@s-8
  slen=Len(s)
  Debug Str(*plen\i)+" "+Str(slen)+" "+Str((*plen\i-19)/2) ;+" "+Hex(*p)
Next 
ShowMemoryViewer(@s-8,Len(s)*2+10)
CompilerIf #PB_OS_Linux<>#PB_Compiler_OS : CompilerError "only LINUX" :CompilerEndIf
; 33 5 7
; 33 6 7
; 33 7 7
; 49 8 15
; 49 9 15
; 49 10 15
; 49 11 15
; 49 12 15
; 49 13 15
; 49 14 15
; 49 15 15
; 65 16 23
; 65 17 23
; 65 18 23
; 65 19 23
; 65 20 23
; 65 21 23
; 65 22 23
; 65 23 23
; 81 24 31
; 81 25 31
; 81 26 31
; 81 27 31
; 81 28 31
; 81 29 31
; 81 30 31
; 81 31 31
; 97 32 39
; 97 33 39
; 97 34 39
; 97 35 39
; 97 36 39
; 97 37 39
; 97 38 39
; 97 39 39
; 0000000000773878  61 00 00 00 00 00 00 00 54 00 45 00 53 00 54 00  a.......T.E.S.T.
; 0000000000773888  20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00   . . . . . . . .
; 0000000000773898  20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00   . . . . . . . .
; 00000000007738A8  20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00   . . . . . . . .
; 00000000007738B8  20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00   . . . . . . . .
; 00000000007738C8  20 00 20 00 20 00 00 00                           . . ...
Post Reply