String concatenation performance

Just starting out? Need help? Post your questions and find answers here.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

String concatenation performance

Post by wilbert »

I'm puzzled

Code: Select all

DisableDebugger

b.s = "..."
a.s = "hello"

t1 = ElapsedMilliseconds()
For i = 1 To 10000
  a + b
Next
t2 = ElapsedMilliseconds()

a.s = "hello"

t3 = ElapsedMilliseconds()
For i = 1 To 10000
  l = Len(a)
  a = LSet(a, l + Len(b))
  PokeS(@a + l*SizeOf(Character), b)
Next
t4 = ElapsedMilliseconds()

MessageRequester("Timings", Str(t2-t1)+" vs "+Str(t4-t3))
Is it just on my computer the second method of combining strings is faster ? :? :shock:
Windows (x64)
Raspberry Pi OS (Arm64)
#NULL
Addict
Addict
Posts: 1440
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: String concatenation performance

Post by #NULL »

I get approximately 200 vs 270 on linux (debugger off)
Mindphazer
Enthusiast
Enthusiast
Posts: 340
Joined: Mon Sep 10, 2012 10:41 am
Location: Savoie

Re: String concatenation performance

Post by Mindphazer »

419 vs 289 on my MacBook pro
Debugger off
MacBook Pro 14" M1 Pro - 16 Gb - MacOS 14 - Iphone 15 Pro Max - iPad at home
...and unfortunately... Windows at work...
User avatar
Lord
Addict
Addict
Posts: 847
Joined: Tue May 26, 2009 2:11 pm

Re: String concatenation performance

Post by Lord »

These ar my results:
x86 wrote: ---------------------------
Timings
---------------------------
696 vs 512
---------------------------
OK
---------------------------
x64 wrote:---------------------------
Timings
---------------------------
688 vs 500
---------------------------
OK
---------------------------
Image
Bitblazer
Enthusiast
Enthusiast
Posts: 732
Joined: Mon Apr 10, 2017 6:17 pm
Location: Germany
Contact:

Re: String concatenation performance

Post by Bitblazer »

350 vs 285 (windows7 64bit)
webpage - discord chat links -> purebasic GPT4All
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: String concatenation performance

Post by BarryG »

Want your mind blown? Check out this version. Original code by Kukulkan here. A very underrated cross-platform tip by him!

Image

Code: Select all

DisableDebugger

b.s = "..."
a.s = "hello"

t1 = ElapsedMilliseconds()
For i = 1 To 10000
  a + b
Next
t2 = ElapsedMilliseconds()

a.s = "hello"

t3 = ElapsedMilliseconds()
For i = 1 To 10000
  l = Len(a)
  a = LSet(a, l + Len(b))
  PokeS(@a + l*SizeOf(Character), b)
Next
t4 = ElapsedMilliseconds()

a.s = "hello"
tmp.s = ""

t5 = ElapsedMilliseconds()
For i = 1 To 10000
  tmp + b
  If Len(tmp) > 2048
    a + tmp
    tmp = ""
  EndIf
Next
a + tmp
t6 = ElapsedMilliseconds()

MessageRequester("Timings", Str(t2-t1)+" vs "+Str(t4-t3)+" vs "+Str(t6-t5))
#NULL
Addict
Addict
Posts: 1440
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: String concatenation performance

Post by #NULL »

@BarryG
It's slightly faster if you don't call Len() all the time

Code: Select all

t5 = ElapsedMilliseconds()
bl = Len(b)
tmpl = 0
For i = 1 To 10000
  tmp + b
  tmpl + bl
  If tmpl > 2048
    a + tmp
    tmp = ""
    tmpl = 0
  EndIf
Next
a + tmp
t6 = ElapsedMilliseconds()
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: String concatenation performance

Post by BarryG »

Nice optimisation, #NULL, but it only works if the appended string is always the same size (like for these tests). In real-world apps, you'd need to calculate Len() before each string append anyway.
Last edited by BarryG on Fri Jul 24, 2020 11:08 am, edited 1 time in total.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: String concatenation performance

Post by wilbert »

BarryG wrote:Want your mind blown? Check out this version.
Well, to be honest that doesn't really blow my mind because I already knew you can combine strings faster with custom code or using a memory buffer you reallocate instead of the PureBasic string functions.

What surprised me with my test code is that both versions append string b to string a one at a time yet still the second version with the additional overhead of calling multiple procedures is significantly faster on Mac and Windows.
In this other thread from last week ( http://forums.purebasic.com/english/vie ... =3&t=75726 ) it was requested that PureBasic would store the length of strings to make string handling faster.
But from the code I posted it looks like it would be possible for Fred to make string concatenation significantly faster without making such major internal changes.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
NicTheQuick
Addict
Addict
Posts: 1223
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: String concatenation performance

Post by NicTheQuick »

Wow. Your timings are so high. I've got 149 vs 172 vs 15 (10:11:1) on my Laptop.

I changed the code to 100000 loops and got this: 14651 vs 17922 vs 132 (111:136:1). This shows the difference better.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
mk-soft
Always Here
Always Here
Posts: 5333
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: String concatenation performance

Post by mk-soft »

With own structured memory strings works faster.
Need ca 4ms for 10000 loops and 33ms for 100000 loops. CPU Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz

BarryG Timings 34467 vs 24513 vs 325 vs 33 (100000 loops)

Code: Select all

;-TOP

; Comment : Fast String
; Author  : mk-soft
; Version : v1.04
; Create  : 20.07.2020
; Update  :

; *********************

EnableExplicit

Structure udtString
  Len.i
  Char.c[0]
EndStructure

; ---

Procedure AllocateString(String.s) ; Result String Pointer
  Protected len, *String.udtString, *pointer
  len = StringByteLength(String)
  *String = AllocateMemory(len + SizeOf(Integer) + SizeOf(Character))
  If *String
    *Pointer = @*String\Char
    CopyMemoryString(@String, @*pointer)
    *String\Len = len
  Else
    MessageRequester("Fatal Error", "Out Of Memory - Stop", #PB_MessageRequester_Error)
    End
  EndIf
  ProcedureReturn *String
EndProcedure

; ----

Procedure LenString(*String.udtString)
  ProcedureReturn *String\Len >> 1
EndProcedure

Procedure.s GetString(*String.udtString)
  ProcedureReturn PeekS(@*String\Char, *String\Len >> 1)
EndProcedure

Macro FreeString(String)
  FreeMemory(String)
EndMacro

; ----

Procedure AddString(*String.udtString, String.s) ; Result New String Pointer
  Protected len, len_new, *pointer
  len = StringByteLength(String)
  len_new = *String\Len + len
  *String = ReAllocateMemory(*String, len_new + SizeOf(Integer) + SizeOf(Character), #PB_Memory_NoClear)
  If *String
    *pointer = @*String\Char + *String\Len
    CopyMemoryString(@String, @*pointer)
    *String\Len = len_new
  Else
    MessageRequester("Fatal Error", "Out Of Memory - Stop", #PB_MessageRequester_Error)
    End
  EndIf
  ProcedureReturn *String
EndProcedure

Procedure ConcatString(*String.udtString, *Add.udtString) ; Result New String Pointer
  Protected len_new, *pointer
  len_new = *String\Len + *Add\len
  *String = ReAllocateMemory(*String, len_new + SizeOf(Integer) + SizeOf(Character), #PB_Memory_NoClear)
  If *String
    *pointer = @*String\Char + *String\Len
    CopyMemory(@*Add\Char, *pointer, *Add\len + SizeOf(Character))
    *String\Len = len_new
  Else
    MessageRequester("Fatal Error", "Out Of Memory - Stop", #PB_MessageRequester_Error)
    End
  EndIf
  ProcedureReturn *String
EndProcedure

Procedure InsertStringEx(*String.udtString, *Insert.udtString, Position) ; Result New String Pointer
  Protected len_new, *pointer
  If *Insert\Len = 0
    ProcedureReturn *String
  EndIf
  Position - 1
  Position << 1
  If Position > *String\Len
    Position = *String\Len
  EndIf
  len_new = *String\Len + *Insert\len
  *String = ReAllocateMemory(*String, len_new + SizeOf(Integer) + SizeOf(Character), #PB_Memory_NoClear)
  If *String
    *pointer = @*String\Char + Position
    MoveMemory(*pointer, *pointer + *Insert\Len, *String\Len - Position + SizeOf(Character))
    CopyMemory(@*Insert\Char, *pointer, *Insert\Len)
    *String\Len = len_new
  Else
    MessageRequester("Fatal Error", "Out Of Memory - Stop", #PB_MessageRequester_Error)
    End
  EndIf
  ProcedureReturn *String
EndProcedure

; ----

Procedure LeftString(*String.udtString, Lenght)
  Protected len, *NewString.udtString
  Lenght << 1
  If Lenght <= *String\Len
    len = Lenght
  Else
    len = *String\Len
  EndIf
  *NewString = AllocateMemory(len + SizeOf(Integer) + SizeOf(Character))
  If *NewString
    CopyMemory(@*String\Char, @*NewString\Char, len)
    *NewString\Len = len
  Else
    MessageRequester("Fatal Error", "Out Of Memory - Stop", #PB_MessageRequester_Error)
    End
  EndIf
  ProcedureReturn *NewString
EndProcedure

Procedure RightString(*String.udtString, Lenght)
  Protected len, *NewString.udtString
  Lenght << 1
  If Lenght <= *String\Len
    len = Lenght
  Else
    len = *String\Len
  EndIf
  *NewString = AllocateMemory(len + SizeOf(Integer) + SizeOf(Character))
  If *NewString
    CopyMemory(@*String\Char + *String\Len - len, @*NewString\Char, len)
    *NewString\Len = len
  Else
    MessageRequester("Fatal Error", "Out Of Memory - Stop", #PB_MessageRequester_Error)
    End
  EndIf
  ProcedureReturn *NewString
EndProcedure

Procedure MidString(*String.udtString, Position, Lenght = #PB_Ignore)
  Protected len, *NewString.udtString
  Position - 1
  Position << 1
  If Position => *String\Len Or Position < 0
    *NewString = AllocateMemory(SizeOf(Integer) + SizeOf(Character))
    If *NewString
      ProcedureReturn *NewString
    Else
      MessageRequester("Fatal Error", "Out Of Memory - Stop", #PB_MessageRequester_Error)
      End
    EndIf
  EndIf
  
  If Lenght = #PB_Ignore
    Lenght = *String\Len - Position
  Else
    Lenght << 1
  EndIf
  
  If Lenght + Position <= *String\Len
    len = Lenght
  Else
    len = *String\Len - Position
  EndIf
  *NewString = AllocateMemory(len + SizeOf(Integer) + SizeOf(Character))
  If *NewString
    CopyMemory(@*String\Char + Position, @*NewString\Char, len)
    *NewString\Len = len
  Else
    MessageRequester("Fatal Error", "Out Of Memory - Stop", #PB_MessageRequester_Error)
    End
  EndIf
  ProcedureReturn *NewString
EndProcedure

; ****

CompilerIf #PB_Compiler_IsMainFile
  
  DisableDebugger
  
  Define i, a.s, b.s, c.s, *a, *b, t1, t2
  
  b.s = "..."
  a.s = "hello"
  
  t1 = ElapsedMilliseconds()
  *a = AllocateString(a)
  *b = AllocateString(b)
  For i = 1 To 10000
    *a = ConcatString(*a, *b)
  Next
  c.s = GetString(*a)
  t2 = ElapsedMilliseconds() - t1
  
  EnableDebugger
  Debug "Time " + t2 + "ms"
  Debug "Len c = " + Len(c) 
CompilerEndIf
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
Rinzwind
Enthusiast
Enthusiast
Posts: 636
Joined: Wed Mar 11, 2009 4:06 pm
Location: NL

Re: String concatenation performance

Post by Rinzwind »

Wilbert is surprised about how something that looks more complex can be faster than the concatenate operator + (and knows about the workarounds). So possibly there are some easy improvements to be made under the hood? We know there are faster methods (but it should be transparent, JavaScript does it too behind the scenes, no need for stringbuilder etc). Anyway, mk-soft, thanks again for that nice code.

For what its worth:

Code: Select all

EnableExplicit

Define l
Define b.s = "..."
Define a.s = "hello"

a + b
a + b
Debug a

a.s = "hello"
l = Len(a)
a = LSet(a, l + Len(b))
PokeS(@a + l * SizeOf(Character), b)
l = Len(a)
a = LSet(a, l + Len(b))
PokeS(@a + l * SizeOf(Character), b)

Debug a
gives this ASM:

Code: Select all

; 
; PureBasic 5.62 (Windows - x86) generated code
; 
; (c) 2016 Fantaisie Software
; 
; The header must remain intact for Re-Assembly
; 
; String
; Memory
; :System
; kernel32.lib
; :Import
; 
format MS COFF
; 
; 
extrn _PB_FreeMemorys@0
extrn _PB_InitMemory@0
extrn _PB_Len@4
extrn _PB_LSet@12
extrn _PB_PokeS@8
extrn _ExitProcess@4
extrn _GetModuleHandleW@4
extrn _HeapCreate@12
extrn _HeapDestroy@4
extrn _memset
extrn _SYS_CopyString@4
extrn _SYS_AllocateString4@8
extrn SYS_FastAllocateStringFree
extrn PB_StringBase
extrn _SYS_InitString@0
extrn _SYS_FreeStrings@0
; 
extrn _PB_StringBasePosition
public _PB_Instance
public _PB_ExecutableType
public _PB_OpenGLSubsystem
public _PB_MemoryBase
public PB_Instance
public PB_MemoryBase
public _PB_EndFunctions

macro pb_public symbol
{
  public  _#symbol
  public symbol
_#symbol:
symbol:
}

macro    pb_align value { rb (value-1) - ($-_PB_DataSection + value-1) mod value }
macro pb_bssalign value { rb (value-1) - ($-_PB_BSSSection  + value-1) mod value }

public PureBasicStart
; 
section '.code' code readable executable align 4096
; 
; 
PureBasicStart:
; 
  PUSH   dword I_BSSEnd-I_BSSStart
  PUSH   dword 0
  PUSH   dword I_BSSStart
  CALL  _memset
  ADD    esp,12
  PUSH   dword 0
  CALL  _GetModuleHandleW@4
  MOV    [_PB_Instance],eax
  PUSH   dword 0
  PUSH   dword 4096
  PUSH   dword 0
  CALL  _HeapCreate@12
  MOV    [PB_MemoryBase],eax
  CALL  _SYS_InitString@0
  CALL  _PB_InitMemory@0
; EnableExplicit
; 
; Define l
; Define b.s = "..."
  MOV    edx,_S1
  LEA    ecx,[v_b]
  CALL   SYS_FastAllocateStringFree
; Define a.s = "hello"
  MOV    edx,_S2
  LEA    ecx,[v_a]
  CALL   SYS_FastAllocateStringFree
; 
; a + b
  MOV    edx,dword [v_a]
  PUSH   dword [_PB_StringBasePosition]
  PUSH   edx
  CALL  _SYS_CopyString@4
  MOV    edx,dword [v_b]
  PUSH   edx
  CALL  _SYS_CopyString@4
  PUSH   dword v_a
  CALL  _SYS_AllocateString4@8
; a + b
  MOV    edx,dword [v_a]
  PUSH   dword [_PB_StringBasePosition]
  PUSH   edx
  CALL  _SYS_CopyString@4
  MOV    edx,dword [v_b]
  PUSH   edx
  CALL  _SYS_CopyString@4
  PUSH   dword v_a
  CALL  _SYS_AllocateString4@8
; Debug a
; 
; a.s = "hello"
  MOV    edx,_S2
  LEA    ecx,[v_a]
  CALL   SYS_FastAllocateStringFree
; l = Len(a)
  PUSH   dword [v_a]
  CALL  _PB_Len@4
  MOV    dword [v_l],eax
; a = LSet(a, l + Len(b))
  MOV    edx,[_PB_StringBasePosition]
  PUSH   edx
  PUSH   edx
  MOV    ebx,dword [v_l]
  PUSH   dword [v_b]
  CALL  _PB_Len@4
  ADD    ebx,eax
  PUSH   ebx
  PUSH   dword [v_a]
  CALL  _PB_LSet@12
  PUSH   dword v_a
  CALL  _SYS_AllocateString4@8
; PokeS(@a + l * SizeOf(Character), b)
  PUSH   dword [v_b]
  MOV    eax,dword [v_a]
  MOV    ebx,eax
  MOV    edi,dword [v_l]
  ADD    edi,edi
  ADD    ebx,edi
  PUSH   ebx
  CALL  _PB_PokeS@8
; l = Len(a)
  PUSH   dword [v_a]
  CALL  _PB_Len@4
  MOV    dword [v_l],eax
; a = LSet(a, l + Len(b))
  MOV    edx,[_PB_StringBasePosition]
  PUSH   edx
  PUSH   edx
  MOV    ebx,dword [v_l]
  PUSH   dword [v_b]
  CALL  _PB_Len@4
  ADD    ebx,eax
  PUSH   ebx
  PUSH   dword [v_a]
  CALL  _PB_LSet@12
  PUSH   dword v_a
  CALL  _SYS_AllocateString4@8
; PokeS(@a + l * SizeOf(Character), b)
  PUSH   dword [v_b]
  MOV    eax,dword [v_a]
  MOV    ebx,eax
  MOV    edi,dword [v_l]
  ADD    edi,edi
  ADD    ebx,edi
  PUSH   ebx
  CALL  _PB_PokeS@8
; 
; Debug a
_PB_EOP_NoValue:
  PUSH   dword 0
_PB_EOP:
  CALL  _PB_EndFunctions
  CALL  _SYS_FreeStrings@0
  PUSH   dword [PB_MemoryBase]
  CALL  _HeapDestroy@4
  CALL  _ExitProcess@4
_PB_EndFunctions:
  CALL  _PB_FreeMemorys@0
  RET
; 
; 
section '.data' data readable writeable
; 
_PB_DataSection:
_PB_OpenGLSubsystem: db 0
pb_public PB_DEBUGGER_LineNumber
  dd     -1
pb_public PB_DEBUGGER_IncludedFiles
  dd     0
pb_public PB_DEBUGGER_FileName
  db     0
pb_public PB_Compiler_Unicode
  dd     1
pb_public PB_Compiler_Thread
  dd     0
pb_public PB_Compiler_Purifier
  dd     0
pb_public PB_Compiler_Debugger
  dd     0
_PB_ExecutableType: dd 0
align 4
public _SYS_StaticStringStart
_SYS_StaticStringStart:
_S1: dw 46,46,46,0
_S2: dw 104,101,108,108,111,0
pb_public PB_NullString
  dw     0
public _SYS_StaticStringEnd
_SYS_StaticStringEnd:
align 4
align 4
align 4
s_s:
  dd     0
  dd     -1
align 4
; 
section '.bss' readable writeable
_PB_BSSSection:
align 4
; 
I_BSSStart:
_PB_MemoryBase:
PB_MemoryBase: rd 1
_PB_Instance:
PB_Instance: rd 1
; 
align 4
PB_DataPointer rd 1
v_a rd 1
v_b rd 1
v_l rd 1
align 4
align 4
align 4
align 4
I_BSSEnd:
section '.data' data readable writeable
SYS_EndDataSection:
Way over my head to analyze this ;

ASM of opening post:

Code: Select all

; 
; PureBasic 5.62 (Windows - x86) generated code
; 
; (c) 2016 Fantaisie Software
; 
; The header must remain intact for Re-Assembly
; 
; System
; String
; Requester
; FileSystem
; Date
; Object
; SimpleList
; Memory
; :System
; kernel32.lib
; :Import
; 
format MS COFF
; 
; 
extrn _PB_ElapsedMilliseconds@0
extrn _PB_FreeFileSystem@0
extrn _PB_FreeMemorys@0
extrn _PB_FreeObjects@0
extrn _PB_InitMemory@0
extrn _PB_InitRequester@0
extrn _PB_Len@4
extrn _PB_LSet@12
extrn _PB_MessageRequester@8
extrn _PB_PokeS@8
extrn _PB_Str@12
extrn _ExitProcess@4
extrn _GetModuleHandleW@4
extrn _HeapCreate@12
extrn _HeapDestroy@4
extrn _memset
extrn _SYS_CopyString@4
extrn _SYS_AllocateString4@8
extrn SYS_FastAllocateStringFree
extrn PB_StringBase
extrn _SYS_InitString@0
extrn _SYS_FreeStrings@0
; 
extrn _PB_StringBasePosition
public _PB_Instance
public _PB_ExecutableType
public _PB_OpenGLSubsystem
public _PB_MemoryBase
public PB_Instance
public PB_MemoryBase
public _PB_EndFunctions

macro pb_public symbol
{
  public  _#symbol
  public symbol
_#symbol:
symbol:
}

macro    pb_align value { rb (value-1) - ($-_PB_DataSection + value-1) mod value }
macro pb_bssalign value { rb (value-1) - ($-_PB_BSSSection  + value-1) mod value }

public PureBasicStart
; 
section '.code' code readable executable align 4096
; 
; 
PureBasicStart:
; 
  PUSH   dword I_BSSEnd-I_BSSStart
  PUSH   dword 0
  PUSH   dword I_BSSStart
  CALL  _memset
  ADD    esp,12
  PUSH   dword 0
  CALL  _GetModuleHandleW@4
  MOV    [_PB_Instance],eax
  PUSH   dword 0
  PUSH   dword 4096
  PUSH   dword 0
  CALL  _HeapCreate@12
  MOV    [PB_MemoryBase],eax
  CALL  _SYS_InitString@0
  CALL  _PB_InitMemory@0
  CALL  _PB_InitRequester@0
; DisableDebugger
; 
; b.s = "..."
  MOV    edx,_S1
  LEA    ecx,[v_b]
  CALL   SYS_FastAllocateStringFree
; a.s = "hello"
  MOV    edx,_S2
  LEA    ecx,[v_a]
  CALL   SYS_FastAllocateStringFree
; 
; t1 = ElapsedMilliseconds()
  CALL  _PB_ElapsedMilliseconds@0
  MOV    dword [v_t1],eax
; For i = 1 To 10000
  MOV    dword [v_i],1
  JMP   _ForSkipDebug1
_For1:
_ForSkipDebug1:
  MOV    eax,10000
  CMP    eax,dword [v_i]
  JL    _Next2
; a + b
  MOV    edx,dword [v_a]
  PUSH   dword [_PB_StringBasePosition]
  PUSH   edx
  CALL  _SYS_CopyString@4
  MOV    edx,dword [v_b]
  PUSH   edx
  CALL  _SYS_CopyString@4
  PUSH   dword v_a
  CALL  _SYS_AllocateString4@8
; Next
_NextContinue2:
  INC    dword [v_i]
  JNO   _For1
_Next2:
; t2 = ElapsedMilliseconds()
  CALL  _PB_ElapsedMilliseconds@0
  MOV    dword [v_t2],eax
; 
; a.s = "hello"
  MOV    edx,_S2
  LEA    ecx,[v_a]
  CALL   SYS_FastAllocateStringFree
; 
; t3 = ElapsedMilliseconds()
  CALL  _PB_ElapsedMilliseconds@0
  MOV    dword [v_t3],eax
; For i = 1 To 10000
  MOV    dword [v_i],1
  JMP   _ForSkipDebug3
_For3:
_ForSkipDebug3:
  MOV    eax,10000
  CMP    eax,dword [v_i]
  JL    _Next4
; l = Len(a)
  PUSH   dword [v_a]
  CALL  _PB_Len@4
  MOV    dword [v_l],eax
; a = LSet(a, l + Len(b))
  MOV    edx,[_PB_StringBasePosition]
  PUSH   edx
  PUSH   edx
  MOV    ebx,dword [v_l]
  PUSH   dword [v_b]
  CALL  _PB_Len@4
  ADD    ebx,eax
  PUSH   ebx
  PUSH   dword [v_a]
  CALL  _PB_LSet@12
  PUSH   dword v_a
  CALL  _SYS_AllocateString4@8
; PokeS(@a + l*SizeOf(Character), b)
  PUSH   dword [v_b]
  MOV    eax,dword [v_a]
  MOV    ebx,eax
  MOV    edi,dword [v_l]
  ADD    edi,edi
  ADD    ebx,edi
  PUSH   ebx
  CALL  _PB_PokeS@8
; Next
_NextContinue4:
  INC    dword [v_i]
  JNO   _For3
_Next4:
; t4 = ElapsedMilliseconds()
  CALL  _PB_ElapsedMilliseconds@0
  MOV    dword [v_t4],eax
; 
; MessageRequester("Timings", Str(t2-t1)+" vs "+Str(t4-t3))
  MOV    edx,[_PB_StringBasePosition]
  PUSH   edx
  PUSH   edx
  PUSH   dword [_PB_StringBasePosition]
  MOV    eax,dword [v_t2]
  CDQ
  PUSH   edx
  PUSH   eax
  MOV    eax,dword [v_t1]
  CDQ
  SUB    [esp],eax
  SBB    [esp+4],edx
  POP    eax
  POP    edx
  PUSH   edx
  PUSH   eax
  CALL  _PB_Str@12
  MOV    edx,_S4
  PUSH   edx
  CALL  _SYS_CopyString@4
  MOV    edx,[_PB_StringBasePosition]
  PUSH   edx
  PUSH   edx
  MOV    eax,dword [v_t4]
  CDQ
  PUSH   edx
  PUSH   eax
  MOV    eax,dword [v_t3]
  CDQ
  SUB    [esp],eax
  SBB    [esp+4],edx
  POP    eax
  POP    edx
  PUSH   edx
  PUSH   eax
  CALL  _PB_Str@12
  POP    eax
  ADD    dword [_PB_StringBasePosition],2
  MOV    eax,_S3
  PUSH   eax
  MOV    edx,[PB_StringBase]
  ADD    [esp+4],edx
  CALL  _PB_MessageRequester@8
  POP    dword [_PB_StringBasePosition]
_PB_EOP_NoValue:
  PUSH   dword 0
_PB_EOP:
  CALL  _PB_EndFunctions
  CALL  _SYS_FreeStrings@0
  PUSH   dword [PB_MemoryBase]
  CALL  _HeapDestroy@4
  CALL  _ExitProcess@4
_PB_EndFunctions:
  CALL  _PB_FreeFileSystem@0
  CALL  _PB_FreeObjects@0
  CALL  _PB_FreeMemorys@0
  RET
; 
; 
section '.data' data readable writeable
; 
_PB_DataSection:
_PB_OpenGLSubsystem: db 0
pb_public PB_DEBUGGER_LineNumber
  dd     -1
pb_public PB_DEBUGGER_IncludedFiles
  dd     0
pb_public PB_DEBUGGER_FileName
  db     0
pb_public PB_Compiler_Unicode
  dd     1
pb_public PB_Compiler_Thread
  dd     0
pb_public PB_Compiler_Purifier
  dd     0
pb_public PB_Compiler_Debugger
  dd     0
_PB_ExecutableType: dd 0
align 4
public _SYS_StaticStringStart
_SYS_StaticStringStart:
_S1: dw 46,46,46,0
_S2: dw 104,101,108,108,111,0
_S4: dw 32,118,115,32,0
_S3: dw 84,105,109,105,110,103,115,0
pb_public PB_NullString
  dw     0
public _SYS_StaticStringEnd
_SYS_StaticStringEnd:
align 4
align 4
align 4
s_s:
  dd     0
  dd     -1
align 4
; 
section '.bss' readable writeable
_PB_BSSSection:
align 4
; 
I_BSSStart:
_PB_MemoryBase:
PB_MemoryBase: rd 1
_PB_Instance:
PB_Instance: rd 1
; 
align 4
PB_DataPointer rd 1
v_a rd 1
v_b rd 1
v_i rd 1
v_l rd 1
v_t1 rd 1
v_t2 rd 1
v_t3 rd 1
v_t4 rd 1
align 4
align 4
align 4
align 4
I_BSSEnd:
section '.data' data readable writeable
SYS_EndDataSection:
User avatar
mk-soft
Always Here
Always Here
Posts: 5333
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: String concatenation performance

Post by mk-soft »

That's right.

FastString from me is also not faster for normal string lengths and is not necessary for 99.99% of the applications.
Only if very long string chains are put together, this can help speed it up.

Code: Select all

  DisableDebugger
  
  a.s = "Value "
  t7 = ElapsedMilliseconds()
  *a = AllocateString(a)
  For i = 1 To 100000
    *a = AddString(*a, Str(i) + #LF$)
  Next
  c.s = GetString(*a)
  t8 = ElapsedMilliseconds()
  
  MessageRequester("Timings", Str(t8-t7))
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
mk-soft
Always Here
Always Here
Posts: 5333
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: String concatenation performance

Post by mk-soft »

PB Strings should be revised in any case.

VB-Script works with variant and strings of type BSTR (with length information)

That here a VB script is faster than Purebasic is very sad.
---------------------------
Timings
---------------------------
Time: PB 324ms / VBS 61ms
Len VBS = 30005
---------------------------
OK
---------------------------

Code: Select all

;-TOP

; Comment   : Modul ActiveScript Example 13
; Version   : v2.09

; Link to ActiveScript  : https://www.purebasic.fr/english/viewtopic.php?f=12&t=71399
; Link to SmartTags     : https://www.purebasic.fr/english/viewtopic.php?f=12&t=71399#p527089
; Link to VariantHelper : https://www.purebasic.fr/english/viewtopic.php?f=12&t=71399#p527090

; ***************************************************************************************

XIncludeFile "Modul_ActiveScript.pb"
;XIncludeFile "Modul_SmartTags.pb"
;XIncludeFile "VariantHelper.pb"

UseModule ActiveScript
;UseModule ActiveSmartTags

; -------------------------------------------------------------------------------------

Procedure.s GetDataSectionText(*Addr.Character)
  Protected result.s, temp.s
  While *Addr\c <> #ETX
    temp = PeekS(*Addr)
    *Addr + StringByteLength(temp) + SizeOf(Character)
    result + temp + #LF$
  Wend
  ProcedureReturn result
EndProcedure

; -------------------------------------------------------------------------------------

Global vbs.s, sValue1.s, sValue2.s

Runtime sValue1, sValue2

; -------------------------------------------------------------------------------------

vbs = GetDataSectionText(?vbs)

Debug "*** New ActiveScript ***"
*Control = NewActiveScript()
If *Control
  Debug "*** Parse ScriptText ***"
  
  DisableDebugger
  
  Define loops = 10000
  Runtime loops
  
  b.s = "..."
  a.s = "hello"
  
  t1 = ElapsedMilliseconds()
  For i = 1 To loops
    a + b
  Next
  t2 = ElapsedMilliseconds()
  
  t3 = ElapsedMilliseconds()
  r1 = ParseScriptText(*Control, vbs)
  If r1 = #S_OK
    Debug "Code Ready 1."
  EndIf
  t4 = ElapsedMilliseconds()
  MessageRequester("Timings", "Time: PB " + Str(t2-t1) + "ms / VBS " + Str(t4-t3) + "ms" + #LF$ + "Len VBS = " + Len(sValue1))
  EnableDebugger
  
  Debug "*** Free ActiveScript ***"
  FreeActiveScript(*Control)
  Debug "*** Finished ActiceScript ***"

  Debug "************************************************************"
EndIf
; -------------------------------------------------------------------------------------

DataSection
  vbs:
  Data.s ~"On Error Resume Next"
  Data.s ~""
  Data.s ~"Dim loops, i, a, b"
  Data.s ~""
  Data.s ~"a = \"Hello\""
  Data.s ~"b = \"...\""
  Data.s ~"loops = Runtime.Integer(\"loops\")"
  Data.s ~""
  Data.s ~"For i = 1 to loops"
  Data.s ~" a = a + b"
  Data.s ~"Next"
  Data.s ~"Runtime.String(\"sValue1\") = a"
  Data.s ~""
  Data.s #ETX$
  Data.i 0
EndDataSection
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
cas
Enthusiast
Enthusiast
Posts: 597
Joined: Mon Nov 03, 2008 9:56 pm

Re: String concatenation performance

Post by cas »

mk-soft wrote:That here a VB script is faster than Purebasic is very sad.
Strings are very low hanging fruit for Fred to optimize. They are currently implemented in a bare-bones way without any dynamic internal buffer, cached length value or small string optimization. The only problem is backwards compatibility dilemma.

I tested string performance in ms visual c++:

Code: Select all

#include <iostream>
#include <chrono>
int main() {
	std::string b = "...";
	std::string a = "hello";

	auto t1 = std::chrono::high_resolution_clock::now();
	for (int k = 0; k < 1000000; k++) { //1M loops
		a += b; //a.append(b);
	}
	auto t2 = std::chrono::high_resolution_clock::now();

	auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(t2 - t1).count();
	std::cout << duration << " ms";
}
Ran every test 5 times and then calculated average:
10k loops: debug:2ms release:0ms
100k loops: debug:24ms release:0ms
1M loops: debug:247ms release:7ms

Ok, not a fair comparison since std::string is 1 byte per character (ansi) and PB strings are 2 bytes per character (ucs-2). Replace std::string with std::wstring to make it 1:1 comparison and here are the results:
10k loops: debug:2ms release:0ms
100k loops: debug:27ms release:1ms
1M loops: debug:270ms release:8ms

Tried /Od and /O2 optimization but there is no difference for this specific code. These results are for x86 build, x64 builds also have no real difference in this specific case (they are a little faster but within margin of error).

For comparison, PB (code in 1st post) benchmark on the same PC:
10k loops: 333 vs 244
100k loops: 33250 vs 25531
1M loops: yeah, not testing that

This c++ code in debug configuration is still many times faster than equivalent PB code with debugger disabled :( . Sad but expected when comparing highly optimized string implementation with implementation that has no optimizations. I tried to beat c++ in PB by preallocating whole final string, use CopyMemory() + pointer aithmetic. Got it to 13ms. Then performed loop unrolling 2 times and that got it down to 7ms for 1M loop (vs 8ms in vc++). Obviously, that code is not practical at all (in comparison with just writing "a+b") and with each use case you need to rewrite it to adapt it to that specific case.

CPUName(): Intel(R) Core(TM) i5-4690K CPU @ 3.50GHz
Post Reply