High speed split string

Share your advanced PureBasic knowledge/code with the community.
linkerstorm
User
User
Posts: 47
Joined: Sun Feb 18, 2007 11:57 am

Re: High speed split string

Post by linkerstorm »

Hi.

Another less "ASMish" split and fast enough to general purpose, using the good old C library, hopefully shipped with PB.

The presented version here is Unicode. You can use "strstr" for Ascii if needed (plus some little changes to the code).

The function returns the array length.

Code: Select all

ImportC "crtdll.lib"
	wcsstr.i (*str1, *str2)
EndImport

Procedure.i Split_wcsstr(Array StringArray.s(1), StringToSplit.s, Separator.s = " ")
	
	Protected c = CountString(StringToSplit, Separator)
	
	; We have to have a string to split
	If Len(StringToSplit) = 0
		ProcedureReturn -1
	EndIf
	
	; We return back the string as is if no separator found
	If c = 0
		ReDim StringArray(0)
		StringArray(0) = StringToSplit
		ProcedureReturn ArraySize(StringArray()) + 1
	EndIf
	
	ReDim StringArray(c)

	Define *StringToSplit = @StringToSplit
	Define *pfound = wcsstr(*StringToSplit, @Separator)
	Define.i i
	
	While *pfound
		StringArray(i) = PeekS(*StringToSplit, (*pfound - *StringToSplit) / 2)
		*StringToSplit = *pfound + 2
		*pfound = wcsstr(*StringToSplit, @Separator)
		i + 1
	Wend
	
	StringArray(i) = PeekS(*StringToSplit)
	
	ProcedureReturn c + 1
	
EndProcedure
Enjoy !
AZJIO
Addict
Addict
Posts: 1315
Joined: Sun May 14, 2017 1:48 am

Re: High speed split string

Post by AZJIO »

The separator is any character from the specified set. The separator is not an entire string. If the separator is repeated in the line under study, then it is considered as a single separator, that is, empty elements are not added.

Code: Select all

EnableExplicit

Procedure SplitL2(String$, List StringList.s(), Separator$ = #CRLF$ + #TAB$ + #FF$ + #VT$ + " ")
	Protected *S.Integer = @String$
	Protected Len1, Len2, Blen, i, j
	Protected *memChar, *c.Character, *jc.Character
	
	Len1 = Len(Separator$)
	Len2 = Len(String$)
	
	ClearList(StringList())
	
	*c.Character = @String$
	*memChar = @Separator$
	
	For i = 1 To Len2
		*jc.Character = *memChar
		
		For j = 1 To Len1
			If *c\c = *jc\c
				*c\c = 0
				If *S <> *c
					AddElement(StringList())
					StringList() = PeekS(*S)
				EndIf
				*S = *c + SizeOf(Character)
				Break
			EndIf
			*jc + SizeOf(Character)
		Next
		
		*c + SizeOf(Character)
	Next
	AddElement(StringList())
	StringList() = PeekS(*S)
EndProcedure


Define S.s = "This is	a test	   	 	 	 	 	string	   	 	 	 	 	to	   	 	 	 	 	see	   	 	 	 	 	if	split and	join are	working."

Define NewList MyStrings.s()
SplitL2(S, MyStrings(), "	 ")
; Debug ListSize(MyStrings())
ForEach MyStrings()
	Debug MyStrings()
Next
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: High speed split string

Post by idle »

Not at computer but as a general rule pass in strings by address and as strings are null terminated theres no need to check the length. Just loop until null That way you eliminate a parse of the string to get the length for the copy and also a second parse to get the length. For the loop.
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: High speed split string

Post by mk-soft »

See SplitStringArray

Here the end of the string is checked via NULL and with option double-quotes
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
AZJIO
Addict
Addict
Posts: 1315
Joined: Sun May 14, 2017 1:48 am

Re: High speed split string

Post by AZJIO »

idle

(+ While) Speed increase by 3%: 150 -> 145
(+ FindString) Speed increase: 145 -> 300

Code: Select all

EnableExplicit
DisableDebugger

Procedure SplitL2(String$, List StringList.s(), Separator$ = #CRLF$ + #TAB$ + #FF$ + #VT$ + " ")
	Protected *S.Integer = @String$
	Protected *jc.Character, *c.Character = @String$

	ClearList(StringList())
	
	While *c\c
		*jc.Character = @Separator$
		
		While *jc\c
			If *c\c = *jc\c
				*c\c = 0
				If *S <> *c
					AddElement(StringList())
					StringList() = PeekS(*S)
				EndIf
				*S = *c + SizeOf(Character)
				Break
			EndIf
			*jc + SizeOf(Character)
		Wend
		
		*c + SizeOf(Character)
	Wend
	AddElement(StringList())
	StringList() = PeekS(*S)
EndProcedure


Define S.s = "This is	a test	   	 	 	 	 	string	   	 	 	 	 	to	   	 	 	 	 	see	   	 	 	 	 	if	split and	join are	working."

Define NewList MyStrings.s()

Define i
Define StartTime = ElapsedMilliseconds()
For i = 1 To 100000
	SplitL2(S, MyStrings(), "	 ")
Next
MessageRequester("","Completed in " + Str(ElapsedMilliseconds() - StartTime) + " ms")
; Debug "Completed in " + Str(ElapsedMilliseconds() - StartTime) + " ms" + #CRLF$

; Debug ListSize(MyStrings())
ForEach MyStrings()
	Debug "|" + MyStrings() + "|"
Next


(+ FindString)

Code: Select all

	While *c\c
		If FindString(Separator$, Chr(*c\c))
			*c\c = 0
			If *S <> *c
				AddElement(StringList())
				StringList() = PeekS(*S)
			EndIf
			*S = *c + SizeOf(Character)
		EndIf
		*c + SizeOf(Character)
	Wend
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: High speed split string

Post by idle »

Try this its not exactly the same as it skips anything below the separator character

Code: Select all

Procedure StringField_List(*source,List StringFields.s(),separator=' ') 
  Protected *inp.Character    
  ClearList(StringFields()) 
  
  If *source 
    *inp = *source 
     While *inp\c <> 0 
      While (*inp\c > separator )
        *inp+2 
      Wend 
      AddElement(StringFields()) 
      StringFields()= PeekS(*source,(*inp-*source)>>1)
      If *inp\c <> 0 
        While *inp\c <= separator 
        *inp+2 
        *source = *inp 
        Wend  
      Else 
        Break 
      EndIf   
    Wend 
  EndIf 
    
EndProcedure 


Define S.s = "This is	a test	   	 	 	 	 	string	   	 	 	 	 	to	   	 	 	 	 	see	   	 	 	 	 	if	split and	join are	working."

NewList strings.s() 

StringField_List(@S,Strings()) 

ForEach strings() 
  Debug Strings()
Next   
User avatar
Demivec
Addict
Addict
Posts: 4086
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: High speed split string

Post by Demivec »

@AZJIO: your code doesn't seem to be working properly yet. It only is detecting some of the separators.

I tested your code with this test string and it seemed to miss the LF and CR characters:

Code: Select all

Define S.s = "This is	a test	   	 " + #LF$ + "	 	 	 	string	   	 	 	 	 	to	  " + #FF$ + #VT$ + #CRLF$ + " 	 	 	 	 	see	   	" + #CR$ + " 	 	 	 	If	split And	join are	working."
AZJIO
Addict
Addict
Posts: 1315
Joined: Sun May 14, 2017 1:48 am

Re: High speed split string

Post by AZJIO »

@Demivec

Code: Select all

SplitL2(S, MyStrings(), "	 " + #FF$ + #VT$ + #CRLF$)
User avatar
Demivec
Addict
Addict
Posts: 4086
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: High speed split string

Post by Demivec »

AZJIO wrote: Mon Jun 13, 2022 7:33 am @Demivec

Code: Select all

SplitL2(S, MyStrings(), "	 " + #FF$ + #VT$ + #CRLF$)
Thanks for the hint. I had overlooked the test code passing in the Separator$. :oops: Everything is working as it should now.
AZJIO
Addict
Addict
Posts: 1315
Joined: Sun May 14, 2017 1:48 am

Re: High speed split string

Post by AZJIO »

idle wrote: Mon Jun 13, 2022 4:24 am Try this its not exactly the same as it skips anything below the separator character
I had such an idea, given that there are unreadable characters below the space, but I wanted universality, since in my program the user specifies which character he will use. It can be a comma or a custom unicode character in the form of some kind of shape.

Reading is faster if length is specified?

Code: Select all

PeekS(*source,(*inp-*source)>>1)

In my tests, this was within the margin of error.

Code: Select all

StringList() = PeekS(*S, (*c - *S) >> 1)
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: High speed split string

Post by idle »

I was only trying to show that it's faster to pass the string by its address. Your code is otherwise fine.
AZJIO
Addict
Addict
Posts: 1315
Joined: Sun May 14, 2017 1:48 am

Re: High speed split string

Post by AZJIO »

idle wrote: Mon Jun 13, 2022 11:10 am by its address.
In my code, characters in the string are overwritten with zeros, so I cannot use the original string so as not to spoil it.
Post Reply