String length should be stored for string variables

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
User avatar
Tenaja
Addict
Addict
Posts: 1948
Joined: Tue Nov 09, 2010 10:15 pm

Re: String length should be stored for string variables

Post by Tenaja »

There are a million string libraries & routines for C... Instead of asking Fred to break our existing code, why not just implement something on GitHub? Everybody knows it's not difficult to write a module.
Rinzwind
Enthusiast
Enthusiast
Posts: 636
Joined: Wed Mar 11, 2009 4:06 pm
Location: NL

Re: String length should be stored for string variables

Post by Rinzwind »

Tenaja wrote:There are a million string libraries & routines for C... Instead of asking Fred to break our existing code, why not just implement something on GitHub? Everybody knows it's not difficult to write a module.
Because strings are a language feature of PB with special treatment. Otherwise all string operators would need to be replaced by procedures. This change can be mostly backward compatible because it stays a null terminated string. Quite silly the C world kept using this thing with its obvious performance design flaw and more silly PB copied the broken concept. Probably same blindness also kept internationalization support so long problematic in the C world. Even now. Ok, C technically only has arrays. An array of char <> string functionality.

But indeed you can sadly forget any language/syntax improvements. Even obviously missing things like inline array and structure initialization. Its painful, verbose and ugly to have to create and fill arrays line by line just to be passed one time to a procedure for example. Split and Join are also obvious include candidates. They are part of all decent basic languages ;

Inspiration? https://github.com/antirez/sds

Writing a usable, stable and fast set of string functions is not easy in any way. Especially when throwing in internationalization.

I worked around it by using lists as string builder.
User avatar
STARGÅTE
Addict
Addict
Posts: 2067
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: String length should be stored for string variables

Post by STARGÅTE »

User_Russian wrote:Most of the time is spent on calculating the length of the string. The longer the string the more time is needed.
I understand. Because with s+"x" the length of s must first be determined, every time.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: String length should be stored for string variables

Post by Saki »

The string processing of PB is problematic.
There is still a leak when very large strings are requisitioned, per 100mb about 200mb which are not released anymore.
That is at 500mb one GB that is occupied.
This is an enormous amount and in itself means that the string is completely unreleased.
Further it can still come to overwriting if strings are passed in procedures.
I noticed VAL(), which pops depending on the memory usage.
These things can be very difficult to locate.

Code: Select all

x$=Space(5e8)
x$=#Null$
Repeat : ForEver
地球上の平和
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: String length should be stored for string variables

Post by BarryG »

Don't set it to #NULL$ to release it. Use this workaround instead and look at Task Manager:

Code: Select all

Macro FreeString(name)
  name=Left(name,1)
  name=""
EndMacro

x$=Space(5e8)
FreeString(x$)

Repeat : Delay(1) : ForEver
See this thread too -> viewtopic.php?f=7&t=30684
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: String length should be stored for string variables

Post by Saki »

Yes, but you should write it in the manual, which you should use a workaround :D

I don't know how many workarounds are in my software,
but there are already some that.
I don't even know if they are still necessary or not.

Why do I need #Null$ or "" when it doesn't work :?:
地球上の平和
User avatar
Mijikai
Addict
Addict
Posts: 1360
Joined: Sun Sep 11, 2016 2:17 pm

Re: String length should be stored for string variables

Post by Mijikai »

I think the current string handling is ok but appending the size would be a improvement.
mk-soft wrote:... But then there is no static string anymore and Array of Chars won't work anymore.
A UTF8 character can take from 1 byte to 6 bytes in memory.
This leads to more problems than advantages.
I agree, UTF8 would make no sense at all.
BarryG wrote:Don't set it to #NULL$ to release it. Use this workaround instead and look at Task Manager:
...
Resizing memory may look good but...

General Rules:
There are two special constants for strings:
#Empty$: represents an empty string (exactly the same as "")
#Null$ : represents an null string. This can be used for API functions requiring a null pointer to a string, or to really free a string.
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: String length should be stored for string variables

Post by mk-soft »

FreeString is not required and the constant #Null$ is fixed with version v5.72 when assigning to a string. This again assigns Nothing to the string.
(Edit: An empty string was assigned in version v5.70 and v5.71, and not Nothing)

The problem with the large memory requirement is internal help string (PB_StringBasePosition) and this will be reduced again if necessary.

Code: Select all

; Set here Breakpoint (F9) and step (F8)

x$=Space(5e8)
;
x$ = ""

; Force free StringBasePointer
dummmy$ = LSet("",1)
;

x$=Space(5e8)
;
x$ = #Null$

; Force free StringBasePointer
dummmy$ = LSet("",1)
;
End
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
kenmo
Addict
Addict
Posts: 1967
Joined: Tue Dec 23, 2003 3:54 am

Re: String length should be stored for string variables

Post by kenmo »

I don't want to get off topic, but since I brought up UTF-8, I have some responses:

- PureBasic's "Unicode" strings are sort of an incomplete-implementation of Unicode... sometimes PB treats strings as UCS-16 (fixed 2-byte) and other times as UTF-16 (2-4 byte) requiring workarounds for handling chars > $FFFF

- Windows API has used UTF-16 since Windows 2000... which is a variable 2-4 byte encoding http://zuga.net/articles/text-does-wind ... -or-ucs-2/

- I believe Linux and MacOS use UTF-8 for their APIs, so conversions are always happening. Using UTF-8 internally would benefit PB on these platforms
A UTF8 character can take from 1 byte to 6 bytes in memory
- It is actually 1 to 4 bytes to cover all Unicode, but that doesn't change your point :) https://en.wikipedia.org/wiki/UTF-8
But then there is no static string anymore and Array of Chars won't work anymore.
- Since PB is using UTF-16, which is variable length, Array of Chars already fail if you support any Unicode chars > $FFFF
Moreover, UTF8 is quite slow due to the variable length and many things become more complicated.
- The parsing of strings in RAM is dominated by the slowness of what you do with those strings: draw to the screen, file I/O, network I/O... The site http://utf8everywhere.org/ discusses all these pros and cons.
Q: Won’t the conversions between UTF-8 and UTF-16 when passing strings to Windows slow down my application?

A: First, you will do some conversion either way. It’s either when calling the system, or when interacting with the rest of the world, e.g. when sending a text string over TCP. Also, those of OS APIs which accept strings often perform tasks which are inherently slow, such as UI or file system operations.
From Wikipedia:
Microsoft now recommends UTF-8 for Windows programs, while previously they emphasized "Unicode" (meaning UTF-16) Win32 API, this may mean internal use of UTF-8 will increase in the future.
Microsoft link:
https://docs.microsoft.com/en-us/window ... -code-page
Use UTF-8 character encoding for optimal compatibility between web apps and other *nix-based platforms (Unix, Linux, and variants), minimize localization bugs, and reduce testing overhead.

UTF-8 is the universal code page for internationalization and is able to encode the entire Unicode character set. It is used pervasively on the web, and is the default for *nix-based platforms.

An encoded character takes between 1 and 4 bytes. UTF-8 encoding supports longer byte sequences, up to 6 bytes, but the biggest code point of Unicode 6.0 (U+10FFFF) only takes 4 bytes.

Just some things to consider :D I think UTF-8 is the closest thing we have to a universal encoding. Now anyway, back to storing-string-length-with-strings....
Rinzwind
Enthusiast
Enthusiast
Posts: 636
Joined: Wed Mar 11, 2009 4:06 pm
Location: NL

Re: String length should be stored for string variables

Post by Rinzwind »

Also storing the length with the string makes it possible to use them for any binary storage (since null doesnt have to have special meaning anymore if you choose so). Can be quite convenient.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: String length should be stored for string variables

Post by wilbert »

kenmo wrote:- I believe Linux and MacOS use UTF-8 for their APIs, so conversions are always happening. Using UTF-8 internally would benefit PB on these platforms
Most MacOS APIs require NSString / CFString type for strings.

Storing string lengths would make things faster but it's also true that for most cases the current string functions are fast enough.
Instead of adding the length to every string, caching the length of the last accessed strings could also improve things.
And some functions like Split and Join would be a welcome addition.

It would be nice if the PB string library would be open source like the IDE so we could contribute to it.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
helpy
Enthusiast
Enthusiast
Posts: 552
Joined: Sat Jun 28, 2003 12:01 am

Re: String length should be stored for string variables

Post by helpy »

-1

If internally storing string length with each PB string a new problem would arise ;-)
This problem would occour if you manipulate a PB string using pointers, memory functions and writing directly to the string memory using Poke or other *PointerToCharcter. Manipulating a string this way would not update the internal string length and PB functions would not work correctly ... :-(
Windows 10 / Windows 7
PB Last Final / Last Beta Testing
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: String length should be stored for string variables

Post by Saki »

@Wilbert
Yeah, my thought was why not just cache the string lengths over the ardesses.
This would also eliminate overflows if the string end is not found or damaged.
I think you could easily do that with your knowledge.
地球上の平和
User avatar
NicTheQuick
Addict
Addict
Posts: 1224
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: String length should be stored for string variables

Post by NicTheQuick »

Isn't there a thing like MemorySize() for a strings buffer? How does this usually work together with AllocateMemory()?
If the operating system already knows how big the memory buffer to a given pointer is, this could be another idea.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: String length should be stored for string variables

Post by Saki »

@NickTheQuick
Yes, from the basic idea, that in itself is the most sensible approach.
The main problem is probably that the compatibility to older software or special procedures is always a bit broken.
But in principle, it should be like Win 95.
The old one has to give way to the better new one.
Or it will all be like Gorbtschchow said : "He who is late, will be punished by life"
地球上の平和
Post Reply