Very fast split string to array function

Windows specific forum
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Very fast split string to array function

Post by wilbert »

Here's a small lib I wrote (it only contains 1 function) to split a string into an array. Just download the zip file and extract it into the PB UserLibraries folder. On my computer it splits a 64KB string within a few msec.

It works like this...

Code: Select all

Dim MyArray.s(0)

MyString.s = "This is the string to split."
MyArray() = SplitStringByChar(MyString, " .,", #True, MyArray())

If MyArray()
 MyArrayLength = PeekL(MyArray()-8)
 Debug MyArrayLength
EndIf
As you can see, it has 4 parameters.
- The first one is the string to split
- The second one the chars (max 8 chars) to split on
- The third one is #True or #False. If #True, empty strings are skipped on output
- The fourth one the array. This is required to free the array before creating the new one

https://w73.nl/pb/split.zip

Enjoy...
:D
Last edited by wilbert on Fri Dec 15, 2023 1:42 pm, edited 2 times in total.
Dare2
Moderator
Moderator
Posts: 3321
Joined: Sat Dec 27, 2003 3:55 am
Location: Great Southern Land

Post by Dare2 »

This is useful.

Many thanks. :)
@}--`--,-- A rose by any other name ..
swan
Enthusiast
Enthusiast
Posts: 225
Joined: Sat Jul 03, 2004 9:04 am
Location: Sydney Australia
Contact:

Post by swan »

Just the ticket. Thanx. Very useful.

SWAN ...
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Post by wilbert »

Thanks for the feedback :)

I updated the lib to include also a fast function to combine an array of strings into a single string. Make sure your stringbuffer is large enough to combine the array into a string (by default it is 64000 bytes).

Code: Select all

Dim MyArray.s(0)

MyString.s = "This is the string to split."
MyArray() = SplitStringByChar(MyString, " .,", #True, MyArray())

If MyArray()
  MyArrayLength = PeekL(MyArray()-8)
  MyRecombinedArray.s = CombineStringArray(MyArray(), "|")
  Debug MyRecombinedArray
EndIf
Here's a little speed test

Code: Select all

; free the old string buffer

! pushd [_PB_StringBase]
! pushd 0
! pushd [_PB_MemoryBase]
! call _HeapFree@12

; allocate a new string buffer of 260KB

! pushd 266240
! pushd 8
! pushd [_PB_MemoryBase]
! call _HeapAlloc@12
! mov [_PB_StringBase],eax

; create an array with 16384 16 byte strings making a total size of 256KB

Dim a.s(16383)
Dim b.s(0)

For i=0 To 16383
  a(i) = "1234567890ABCDEF"
Next

Delay(500)

; let's measure how long it takes to combine them into a string

timer0 = GetTickCount_()

s.s = CombineStringArray(a(),"")

; now let's measure how long it takes to split this string into another array
; we'll use two separators so the created array will have 32769 items

timer1 = GetTickCount_()

b() = SplitStringByChar(s, "4D", #False, b())

timer2 = GetTickCount_()

result.s = "Combining 16384 items into a 256KB string took "+Str(timer1-timer0)+" ms"+Chr(13)+Chr(10)+"Splitting this string with two separators into an array with 32769 items took "+Str(timer2-timer1)+" ms"
MessageRequester("Results",result.s,0)
Edit:
If you have a fast computer the speed of this test may be hard to measure. Just to give you an idea I increased the values on my computer (athlon xp 1600+) a bit. Combining 262144 items into a 4 megabyte string took 60 ms and splitting this 4MB string with two separators in an array with 524289 items took 80 ms. 8)
User avatar
NoahPhense
Addict
Addict
Posts: 1999
Joined: Thu Oct 16, 2003 8:30 pm
Location: North Florida

Re: Very fast split string to array function

Post by NoahPhense »

Very nice function. You should include your example with your zip.

- np
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Post by wilbert »

I added them to my regular library with some documentation and a small example.
www.geboortegrond.nl/pb/wb_Lib.zip

Also I added a prefix to the functions to avoid possible problems with other libraries.
User avatar
NoahPhense
Addict
Addict
Posts: 1999
Joined: Thu Oct 16, 2003 8:30 pm
Location: North Florida

Post by NoahPhense »

wilbert wrote:I added them to my regular library with some documentation and a small example.
www.geboortegrond.nl/pb/wb_Lib.zip

Also I added a prefix to the functions to avoid possible problems with other libraries.
Good help file, and nice naming conventions.

- np
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Post by wilbert »

NoahPhense wrote:Good help file, and nice naming conventions.
Thanks. :)

I was wondering what other array functions are considered to be useful. :?:
Dare2
Moderator
Moderator
Posts: 3321
Joined: Sat Dec 27, 2003 3:55 am
Location: Great Southern Land

Post by Dare2 »

Hi Wilbert.

That is a very nifty library.

Would it be too hard to get ubound(x) where x is the dimension in a multi-dimensioned array, eg, dim(x,y,z) - ubound(0), ubound(1), ubound(2)?

I recall discovering what appeared to be the total element size at a negative offset of the array address, but never discovering the dimension info.

Not very important, BTW. So if you have a rainy day ... :)

Again, props on that lib.

(OT: BTW, both you and J. Baker are KoolMovers, are you not?)
@}--`--,-- A rose by any other name ..
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Post by wilbert »

Dare2 wrote:Would it be too hard to get ubound(x) where x is the dimension in a multi-dimensioned array, eg, dim(x,y,z) - ubound(0), ubound(1), ubound(2)?
Unfortunately yes.
PB does store these values but as far as I know there's no way to determine if it is a multidimensional array. The functions I created assume a single dimensional array.

Another problem is that these dimensions are stored after the pointer to the array but I haven't found a way to pass this pointer to a function.

What you can do if you know the name of your array is this

Code: Select all

Global tmp1, tmp2, tmp3, tmp4

Dim myArray.l(40,5,6,7)

! mov eax,[a_myArray]
! pushd [eax-8]
! popd [v_tmp1]

! pushd [a_myArray + 4]
! popd [v_tmp2]

! pushd [a_myArray + 8]
! popd [v_tmp3]

! pushd [a_myArray + 12]
! popd [v_tmp4]

ubound1 = tmp1 / tmp2 - 1
ubound2 = tmp2 / tmp3 - 1
ubound3 = tmp3 / tmp4 - 1
ubound4 = tmp4 - 1

Debug ubound1
Debug ubound2
Debug ubound3
Debug ubound4
But that's not very convenient :(

About your BTW... yes, we are both KoolMovers :D
(great application)
Shannara
Addict
Addict
Posts: 1808
Joined: Thu Oct 30, 2003 11:19 pm
Location: Emerald Cove, Unformed

Post by Shannara »

Any way to update this library for PB 3.92? Right now it crashes when being used.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Post by wilbert »

Can you tell me what functions make it crash or give an example ?
When I tried my examples in 3.92 they worked fine :?
Shannara
Addict
Addict
Posts: 1808
Joined: Thu Oct 30, 2003 11:19 pm
Location: Emerald Cove, Unformed

Post by Shannara »

Sorry :( It was another piece of code, tried to come back and erase my post, only to find out that you already replied :) please disreguard my post.
dracflamloc
Addict
Addict
Posts: 1648
Joined: Mon Sep 20, 2004 3:52 pm
Contact:

Post by dracflamloc »

Awesome I've been wanting this =)

Does this only work with windows?
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Post by wilbert »

dracflamloc wrote:Does this only work with windows?
Yes.
Since it's all x86 asm I think it shouldn't be difficult to make a linux version but my knowledge of linux is very little.
Post Reply