String length should be stored for string variables

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

String length should be stored for string variables

Post by Sicro »

As it is now:
  • Each string function must recalculate the string length. This is an extreme performance killer.
As it should be:
  • The length should be stored before the string in the memory:

    Code: Select all

    Structure DynamicStringVariable
      stringLength.i
      *stringPointer
    EndStructure
    
    Structure FixedStringVariable
      stringLength.i
      string.c[0]
    EndStructure

    Code: Select all

    Define text$ = "Example string"
    
    Debug @text$ ; Prints the value of *stringPointer
    
    Debug PeekI(@text$ - SizeOf(Integer)) ; Prints the value of stringLength
Edit:
StringBuilder are not the solution, because we want also read strings fast.
Last edited by Sicro on Sat Jul 25, 2020 6:01 pm, edited 3 times in total.
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User_Russian
Addict
Addict
Posts: 1443
Joined: Wed Nov 12, 2008 5:01 pm
Location: Russia

Re: String length should be stored for string variables

Post by User_Russian »

I asked for the same thing about 6 years ago. viewtopic.php?f=3&t=58892
But it is foolish to expect that it will be done sometime. Because requests are rarely fulfilled!
The requests section needs to be closed, because it's useless. viewtopic.php?f=3&t=75576
User avatar
kenmo
Addict
Addict
Posts: 1967
Joined: Tue Dec 23, 2003 3:54 am

Re: String length should be stored for string variables

Post by kenmo »

This would be a nice change, we can spare the extra bytes for faster performance now :)

However it would be a major change to the string library, and could break 20 years of existing code (memory bugs)

I would save this for maybe a big Version 5.0 update, and include it with other big changes - such as fully going to UTF-8 for PB strings, instead of the current UCS-2 style implementation.
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: String length should be stored for string variables

Post by Saki »

kenmo wrote: However it would be a major change to the string library, and could break 20 years of existing code (memory bugs)
The effort and the problems are certainly too great.
Many users don't know how the PB strings work, they just wonder when things get slow.
Others do not notice it at all because they do not use large strings.
For "Hello World", it is always very fast.
It is not a mistake, but a method.
地球上の平和
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: String length should be stored for string variables

Post by mk-soft »

For Windows a change to UTF8 is not useful, because the API's are all Widechar.
For macOS and Linux maybe.
But then there is no static string anymore and Array of Chars won't work anymore.
A UTF8 character can take from 1 byte to 6 bytes in memory.
This leads to more problems than advantages.

The current string management works and is sufficient for 99.99% of applications.
If you process with string with a length of several megabytes, you should consider if the approach of processing the data has a thought error.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: Die Zeichenfolgenlänge sollte für Zeichenfolgenvariablen

Post by Saki »

Moreover, UTF8 is quite slow due to the variable length and many things become more complicated.
I think there are more important construction sites.
Starting there now would probably produce an avalanche of new bugs.
地球上の平和
User avatar
STARGÅTE
Addict
Addict
Posts: 2067
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: String length should be stored for string variables

Post by STARGÅTE »

mk-soft wrote:The current string management works and is sufficient for 99.99% of applications.
If you process with string with a length of several megabytes, you should consider if the approach of processing the data has a thought error.
+1

I think some confuse a string with a memory buffer.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User_Russian
Addict
Addict
Posts: 1443
Joined: Wed Nov 12, 2008 5:01 pm
Location: Russia

Re: String length should be stored for string variables

Post by User_Russian »

Saki wrote:Others do not notice it at all because they do not use large strings.
100 thousand characters is a small line.
But PB needs more than 20 seconds to complete.

Code: Select all

DisableDebugger
s.s=""
t=ElapsedMilliseconds()
For i=1 To 100000
  s+"x"
Next
MessageRequester("", Str(ElapsedMilliseconds()-t)+" ms")
If the string contains a million characters, you get tired of waiting for the code to execute.
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: String length should be stored for string variables

Post by Saki »

You have to edit the strings binary !

Oh dear,
unsigned integers would be more important, yes, and even feasible without problems.
地球上の平和
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: String length should be stored for string variables

Post by mk-soft »

0 ms ...

Code: Select all

DisableDebugger
s.s=""
t=ElapsedMilliseconds()
s = LSet("", 100000, "x")
MessageRequester("", Str(ElapsedMilliseconds()-t)+" ms")
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: String length should be stored for string variables

Post by Saki »

LOL, you are faster mk-soft, but here a binary 0 ms way

Code: Select all

DisableDebugger
s.s=Space(100000)
t=ElapsedMilliseconds()
For i=0 To 100000-8 Step 8
 PokeQ(@s+i, $0078007800780078)
Next
MessageRequester("", Str(ElapsedMilliseconds()-t)+" ms")
EnableDebugger
ShowMemoryViewer(@s, 100000)
Debug Len(s)
地球上の平和
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: String length should be stored for string variables

Post by mk-soft »

All kidding aside.

Depending on the application PB is fast enough.
You just have to split the data in a sensible way and not pack a huge string all at once.

For our IT I have to convert, fill and export data regularly.
The program processes about 600000 records with 40 fields each in about 10-20 seconds.
So fast enough
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
STARGÅTE
Addict
Addict
Posts: 2067
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: String length should be stored for string variables

Post by STARGÅTE »

User_Russian wrote:
Saki wrote:Others do not notice it at all because they do not use large strings.
100 thousand characters is a small line.
But PB needs more than 20 seconds to complete.

Code: Select all

DisableDebugger
s.s=""
t=ElapsedMilliseconds()
For i=1 To 100000
  s+"x"
Next
MessageRequester("", Str(ElapsedMilliseconds()-t)+" ms")
If the string contains a million characters, you get tired of waiting for the code to execute.
To save the length of the string (the topic of this thread!) would not help to make such code faster, because it is the re-allocation of memory each time, which is the slow part.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User_Russian
Addict
Addict
Posts: 1443
Joined: Wed Nov 12, 2008 5:01 pm
Location: Russia

Re: String length should be stored for string variables

Post by User_Russian »

STARGÅTE wrote:To save the length of the string (the topic of this thread!) would not help to make such code faster, because it is the re-allocation of memory each time, which is the slow part.
Re-allocation of memory takes little time.

Code: Select all

DisableDebugger
t=ElapsedMilliseconds()
For i=1 To 100000
  *p=ReAllocateMemory(*p, i)
Next
MessageRequester("", Str(ElapsedMilliseconds()-t)+" ms")
Most of the time is spent on calculating the length of the string. The longer the string the more time is needed.
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: Die Zeichenfolgenlänge sollte für Zeichenfolgenvariablen

Post by Saki »

Yep, Re-allocation of memory is very fast.
Sarching the thermination is the chunk.
地球上の平和
Post Reply