Terrible slowdown of AddElement used in Threads - Solution: slab allocator

Just starting out? Need help? Post your questions and find answers here.
User avatar
STARGÅTE
Addict
Addict
Posts: 2067
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Terrible slowdown of AddElement used in Threads - Solution: slab allocator

Post by STARGÅTE »

Recently, I observed a strange terrible slowdown of the AddElement() function when it is used in parallel running threads, even if the list is initiallized as Threaded.

In the main code, adding one million elements needs a certain time.
When I now run the same code in a thread, it should need the same time or similar time.
And when I run the same code in multiple independent threads, they should all need again the same time.

But now see what happens (code edited!):

Code: Select all

CompilerIf Not #PB_Compiler_Thread
	CompilerError "Enable Thread-Safe"
CompilerEndIf

Threaded NewList Integer.i()

Procedure Test(Void.i=0)
	Protected I.i
	For I = 1 To 1000000
		AddElement(Integer())
	Next
	ClearList(Integer())
EndProcedure


Define Time.i, Threads.i
Define NewList Thread.i()


OpenConsole()

Time = ElapsedMilliseconds()
Test()
Time = ElapsedMilliseconds() - Time
PrintN("Single Main Thread: "+Str(Time)+" ms")

For Threads = 1 To 4
	
	Time = ElapsedMilliseconds()
	For I = 1 To Threads
		AddElement(Thread()) : Thread() = CreateThread(@Test(), 0)
		PrintN("CreateThread")
	Next
	ForEach Thread()
		WaitThread(Thread())
	Next
	ClearList(Thread())
	Time = ElapsedMilliseconds() - Time
	PrintN(Str(Threads)+" Thread(s): "+Str(Time)+" ms")
	
Next


Input()
(code edited!)
Single Main Thread: 32 ms
CreateThread
1 Thread(s): 32 ms
CreateThread
CreateThread
2 Thread(s): 187 ms
CreateThread
CreateThread
CreateThread
3 Thread(s): 640 ms
CreateThread
CreateThread
CreateThread
CreateThread
4 Thread(s): 1014 ms
As soon as more than one thread is calling AddElement() the running time slows down. But why? There is no interaction between the Lists, because they are initialized with Threaded.
Can anybody confirm this behavior?
Can anybody explain this behavior?

A solution is the use of an own memory allocation module: slab allocator (see later in this thread)

I observed this issue during some parallelization of a code block, but instead of saving time, it takes more time :oops:
Last edited by STARGÅTE on Sun Aug 21, 2022 9:07 am, edited 6 times in total.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
Oso
Enthusiast
Enthusiast
Posts: 595
Joined: Wed Jul 20, 2022 10:09 am

Re: Terrible slowdown of AddElement in Threads

Post by Oso »

STARGÅTE wrote: Thu Aug 18, 2022 9:40 pm As soon as more than one thread is calling AddElement() the running time slows down. But why? There is no interaction between the Lists, because they are initialized with Threaded.
Can anybody confirm this behavior?
Can anybody explain this behavior?
I'm new to PB and I may well be misunderstanding the objective here, but I can see that your thread 2 is being executed twice, thread 3 is being executed three times, thread 4 is being executed four times. You have a loop within a loop and the number of threads launched, increases with each iteration. I'm not sure if that was what you intended.

Single Main Thread: 204 ms
1 count 1 Thread(s): 191 ms
2 count 2 count 2 Thread(s): 380 ms
3 count 3 count 3 count 3 Thread(s): 569 ms
4 count 4 count 4 count 4 count 4 Thread(s): 886 ms

This can be seen by modifying the loop to show the count...

Code: Select all

	For I = 1 To Threads
	  AddElement(Thread()) : Thread() = CreateThread(@Test(), 0)
	  Print(Str(threads) + " count ")
	Next
If I remove the inner loop, I get this...

Single Main Thread: 182 ms
1 count 1 Thread(s): 168 ms
2 count 2 Thread(s): 188 ms
3 count 3 Thread(s): 207 ms
4 count 4 Thread(s): 188 ms
Last edited by Oso on Thu Aug 18, 2022 10:35 pm, edited 1 time in total.
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: Terrible slowdown of AddElement in Threads

Post by netmaestro »

How many threads are you running? My count is 10. Is that intended? I'm guessing not.

And what kind of hellcat computer have you got that runs it in 32 milliseconds? My result is 44. Tradeya, k?

@Oso: You've probably got the debugger on.
BERESHEIT
User avatar
jacdelad
Addict
Addict
Posts: 1431
Joined: Wed Feb 03, 2021 12:46 pm
Location: Planet Riesa
Contact:

Re: Terrible slowdown of AddElement in Threads

Post by jacdelad »

I can confirm it. So, you'd expect the outcome should be NumberOfThreads*TimeForSingleThread?
Maybe it's overhead to synchronize the threads. When I add a mutex the outcome is time for singlethread multiplied with number of threads:

Code: Select all

CompilerIf Not #PB_Compiler_Thread
	CompilerError "Enable Thread-Safe"
CompilerEndIf

Global mutex
mutex=CreateMutex()
Threaded NewList Integer.i()

Procedure Test(Void.i=0)
	Protected I.i
	LockMutex(mutex)
	For I = 1 To 1000000
		AddElement(Integer())
	Next
	ClearList(Integer())
	UnlockMutex(mutex)
EndProcedure


Define Time.i, Threads.i
Define NewList Thread.i()


OpenConsole()

Time = ElapsedMilliseconds()
Test()
Time = ElapsedMilliseconds() - Time
PrintN("Single Main Thread: "+Str(Time)+" ms")

For Threads = 1 To 4
	
	Time = ElapsedMilliseconds()
	For I = 1 To Threads
	  AddElement(Thread())
	  Thread() = CreateThread(@Test(), 0)
	Next
	ForEach Thread()
		WaitThread(Thread())
	Next
	Time = ElapsedMilliseconds() - Time
	PrintN(Str(Threads)+" Thread(s): "+Str(Time)+" ms")
	
Next


Input()
PureBasic 6.04/XProfan X4a/Embarcadero RAD Studio 11/Perl 5.2/Python 3.10
Windows 11/Ryzen 5800X/32GB RAM/Radeon 7770 OC/3TB SSD/11TB HDD
Synology DS1821+/36GB RAM/130TB
Synology DS920+/20GB RAM/54TB
Synology DS916+ii/8GB RAM/12TB
Oso
Enthusiast
Enthusiast
Posts: 595
Joined: Wed Jul 20, 2022 10:09 am

Re: Terrible slowdown of AddElement in Threads

Post by Oso »

netmaestro wrote: Thu Aug 18, 2022 10:34 pm @Oso: You've probably got the debugger on.
Yes, the following after compiling...

Single Main Thread: 89 ms
1 count 1 Thread(s): 73 ms
2 count 2 count 2 Thread(s): 148 ms
3 count 3 count 3 count 3 Thread(s): 218 ms
4 count 4 count 4 count 4 count 4 Thread(s): 294 ms
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Terrible slowdown of AddElement in Threads

Post by idle »

I've noticed this too and just put id down to PB object management.

Also I have never really looked into how Threaded works as it seems incongruous to me (doesn't make any sense)
IF each thread has its own copy, then it's just that, there is no shared resource what is it's purpose?

This works as expected but does it have super powers, no! but its times are flat
Single Main Thread: 31 ms
1 Thread(s): 30 ms
2 Thread(s): 33 ms
3 Thread(s): 34 ms
4 Thread(s): 35 ms

Code: Select all

CompilerIf Not #PB_Compiler_Thread
	CompilerError "Enable Thread-Safe"
CompilerEndIf

Structure node 
  value.i 
  *next.node
EndStructure 

Threaded head.node  

Procedure _ClearList(*list.node) 
  Protected *t.node 
  *list = @head 
  *list = *list\next ;can't free head so skip
  While *list\next 
    *t = *list\next 
    FreeMemory(*list) 
    *list=*t
  Wend   
EndProcedure   

Procedure _AddElement(*list.node,value) 
    
  *list\next = AllocateMemory(SizeOf(node))  
  *list = *list\next
  *list\value = value  
  ProcedureReturn *list    
  
EndProcedure  

Procedure Test(Void.i=0)
	Protected I.i
	Protected *sl.node = @head    
	
	For I = 1 To 1000000
	   *sl= _AddElement(*sl,i) 
	Next
	_ClearList(*sl)  
EndProcedure

Define Time.i, Threads.i,output.s 
Define NewList Thread.i()

OpenConsole()

Time = ElapsedMilliseconds()
Test()
Time = ElapsedMilliseconds() - Time
PrintN("Single Main Thread: "+Str(Time)+" ms")

For Threads = 1 To 4
	
	Time = ElapsedMilliseconds()
	For I = 1 To Threads
		AddElement(Thread()) : Thread() = CreateThread(@Test(), 0)
	Next
	ForEach Thread()
		WaitThread(Thread())
	Next
	Time = ElapsedMilliseconds() - Time
	PrintN(Str(Threads)+" Thread(s): "+Str(Time)+" ms")
	
Next

Input()

User avatar
Mijikai
Addict
Addict
Posts: 1360
Joined: Sun Sep 11, 2016 2:17 pm

Re: Terrible slowdown of AddElement in Threads

Post by Mijikai »

This is a bit faster on my pc than just the mutex version alone...

Code: Select all

CompilerIf Not #PB_Compiler_Thread
	CompilerError "Enable Thread-Safe"
CompilerEndIf

Global m.i = CreateMutex()
Global NewList Integer.i()

Procedure Test(Void.i=0)
  Protected I.i
  
  LockMutex(m)
	For I = 1 To 1000000
	  AddElement(Integer())
	Next
	NewList Integer()
	ClearList(Integer())
	UnlockMutex(m)
EndProcedure


Define Time.i, Threads.i
Define NewList Thread.i()


OpenConsole()
Test()
Time = ElapsedMilliseconds()
Test()
Time = ElapsedMilliseconds() - Time

PrintN("Single Main Thread: "+Str(Time)+" ms")

For Threads = 1 To 4
	
	Time = ElapsedMilliseconds()
	For I = 1 To Threads
		AddElement(Thread()) : Thread() = CreateThread(@Test(), 0)
	Next
	ForEach Thread()
		WaitThread(Thread())
	Next
	Time = ElapsedMilliseconds() - Time
	PrintN(Str(Threads)+" Thread(s): "+Str(Time)+" ms")
	
Next

Input()

End
User avatar
STARGÅTE
Addict
Addict
Posts: 2067
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Terrible slowdown of AddElement in Threads

Post by STARGÅTE »

@Oso: I want to start multiple threads in parallel, and then I wait for all these threads.

@netmaestro: I forgot to clear the list Thread(). Here the correct version:
But there are not 10 running threads. Each time I wait for the threads and then start new threads.

Code: Select all

CompilerIf Not #PB_Compiler_Thread
	CompilerError "Enable Thread-Safe"
CompilerEndIf

Threaded NewList Integer.i()

Procedure Test(Void.i=0)
	Protected I.i
	For I = 1 To 1000000
		AddElement(Integer())
	Next
	ClearList(Integer())
EndProcedure


Define Time.i, Threads.i
Define NewList Thread.i()


OpenConsole()

Time = ElapsedMilliseconds()
Test()
Time = ElapsedMilliseconds() - Time
PrintN("Single Main Thread: "+Str(Time)+" ms")

For Threads = 1 To 4
	
	Time = ElapsedMilliseconds()
	For I = 1 To Threads
		AddElement(Thread()) : Thread() = CreateThread(@Test(), 0)
		PrintN("CreateThread")
	Next
	ForEach Thread()
		WaitThread(Thread())
	Next
	ClearList(Thread())
	Time = ElapsedMilliseconds() - Time
	PrintN(Str(Threads)+" Thread(s): "+Str(Time)+" ms")
	
Next


Input()
However, the time is still increasing with increasing launched threads.
Single Main Thread: 32 ms
CreateThread
1 Thread(s): 32 ms
CreateThread
CreateThread
2 Thread(s): 187 ms
CreateThread
CreateThread
CreateThread
3 Thread(s): 640 ms
CreateThread
CreateThread
CreateThread
CreateThread
4 Thread(s): 1014 ms
Using a mutex, will run the threads in series instead in parallel, so not an option.

@idle:
Using AllocateMemory() instead of AddElement is an option, thanks.
But the question remains, why AddElement become slower?
idle wrote: Thu Aug 18, 2022 11:27 pm Also I have never really looked into how Threaded works as it seems incongruous to me (doesn't make any sense)
IF each thread has its own copy, then it's just that, there is no shared resource what is it's purpose?
In the finale application, I split a calculation runing 100 times into 4 pices and distribute them on four threads. Each thread now can calculation his own 25 calculations. Later (after Wait Thread) the results are combined (not shown in the example code).
Of cause there is no shared resource, otherwise the parallelization would be slowed down because of using mutex.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Terrible slowdown of AddElement in Threads

Post by idle »

That still runs like a dog did you post the right example?
Single Main Thread: 44 ms
CreateThread
1 Thread(s): 40 ms
CreateThread
CreateThread
2 Thread(s): 341 ms
CreateThread
CreateThread
CreateThread
3 Thread(s): 839 ms
CreateThread
CreateThread
CreateThread
CreateThread
4 Thread(s): 1336 ms
vs
Single Main Thread: 33 ms
CreateThread
1 Thread(s): 30 ms
CreateThread
CreateThread
2 Thread(s): 34 ms
CreateThread
CreateThread
CreateThread
3 Thread(s): 36 ms
CreateThread
CreateThread
CreateThread
CreateThread
4 Thread(s): 39 ms

Code: Select all

CompilerIf Not #PB_Compiler_Thread
	CompilerError "Enable Thread-Safe"
CompilerEndIf

Structure node 
  value.i 
  *next.node
EndStructure 

Threaded head.node  

Procedure _ClearList(*list.node) 
  Protected *t.node 
  *list = @head 
  *list = *list\next ;can't free head so skip
  While *list\next 
    *t = *list\next 
    FreeMemory(*list) 
    *list=*t
  Wend   
EndProcedure   

Macro _AddElement(pList,pvalue) 
    
  pList\next = AllocateMemory(SizeOf(node))  
  pList = pList\next
  pList\value = pvalue  
   
EndMacro  

Procedure Test(Void.i=0)
	Protected I.i
	Protected *sl.node = @head    
	For I = 1 To 1000000
	  _AddElement(*sl,i) 
	Next
	_ClearList(*sl)  
EndProcedure

Define Time.i, Threads.i
Define NewList Thread.i()


OpenConsole()

Time = ElapsedMilliseconds()
Test()
Time = ElapsedMilliseconds() - Time
PrintN("Single Main Thread: "+Str(Time)+" ms")

For Threads = 1 To 4
	
	Time = ElapsedMilliseconds()
	For I = 1 To Threads
		AddElement(Thread()) : Thread() = CreateThread(@Test(), 0)
		PrintN("CreateThread")
	Next
	ForEach Thread()
		WaitThread(Thread())
	Next
	ClearList(Thread())
	Time = ElapsedMilliseconds() - Time
	PrintN(Str(Threads)+" Thread(s): "+Str(Time)+" ms")
	
Next

Input()

the problem is pb object management, if you only need to iterate a list one way just a simple list.
User avatar
STARGÅTE
Addict
Addict
Posts: 2067
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Terrible slowdown of AddElement in Threads

Post by STARGÅTE »

idle wrote: Fri Aug 19, 2022 6:30 am That still runs like a dog did you post the right example?
Yes I'm sure. Thats my problem! It is even slower, than running the four threads step by step in series like in the example from jacdelad with the mutex.
idle wrote: Fri Aug 19, 2022 6:30 am the problem is pb object management, if you only need to iterate a list one way just a simple list.
But is the PureBasic Linked List an "object" similar to Gadgets, Sprites, Images, etc.?
And if so, the elements are then encapsulated in this list.
I also read this blog article from Freak LinkedList memory management, but I can not find the issue, when this runs in parallel, because the linked lists and especially the elements are not shared between the threads. So why AddElements slows down?
I hope Fred reads this topic and can help us.

For the moment, I will change to your idea, creating my own linked list. But I will improve it even more, and allocate several elements at once, like it is described in the blog article.
I do not really need the features of the linked list, I need a set of elements. Previously, I observed, AddElement() is faster than AllocateStructure() due to the better memeory management. But this seems only true in case of single thread applications.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User avatar
idle
Always Here
Always Here
Posts: 5042
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Terrible slowdown of AddElement in Threads

Post by idle »

Yes I hope Fred looks at it to but I fear it will just languish in the hard basket.
It's not good but that was also why I was asking about Threaded, I think I get it but I don't know if it means its instancing a whole copy of a PB object or just making an instance, it just doesn't make much sense to me.

I'd actually call it a bug.
User avatar
NicTheQuick
Addict
Addict
Posts: 1224
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Terrible slowdown of AddElement used in Threads

Post by NicTheQuick »

You still have only one kernel in the background which is responsible for memory management. AddElement() allocates very small pieces of memory everytime and also the element before and the root element has to be updated after adding it. When allocating a piece of memory from the kernel there is also one mutex per process to serialize such requests. Because of that it is better to manage your memory by yourself. Just allocate a big piece at once and distribute it in a way which makes sense the most. We had the same problem with a raytracer in university.

Also if multiple threads are doing the same thing you've got a lot of memory transfers going on because the CPU cache is not that good a caching cluttered small memory pieces.

I guess that's the main reason.

Btw it looks like this on my PC:
Single Main Thread: 31 ms
CreateThread
1 Thread(s): 31 ms
CreateThread
CreateThread
2 Thread(s): 181 ms
CreateThread
CreateThread
CreateThread
3 Thread(s): 304 ms
CreateThread
CreateThread
CreateThread
CreateThread
4 Thread(s): 498 ms
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
STARGÅTE
Addict
Addict
Posts: 2067
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Terrible slowdown of AddElement used in Threads

Post by STARGÅTE »

NicTheQuick wrote: Fri Aug 19, 2022 9:48 amAddElement() allocates very small pieces of memory everytime and also the element before and the root element has to be updated after adding it.
This behavior was changed in PB 4.3: LinkedList memory management
Now, PureBasic uses here memory blocks for multiple elements to reduce the number of alloc/free calls.
That is the reason why AddElement() is usually faster than using AllocateMemory() or AllocateStructure().
NicTheQuick wrote: Fri Aug 19, 2022 9:48 am When allocating a piece of memory from the kernel there is also one mutex per process to serialize such requests.
I know. The results of idle, using AllocateMemory() with slightly increasing total time, is a behavior what I was expected also for the linked lists.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User avatar
Mijikai
Addict
Addict
Posts: 1360
Joined: Sun Sep 11, 2016 2:17 pm

Re: Terrible slowdown of AddElement used in Threads

Post by Mijikai »

How fast would it be to copy/merge the lists manually instead of using threaded?
Last edited by Mijikai on Fri Aug 19, 2022 12:39 pm, edited 2 times in total.
User avatar
mk-soft
Always Here
Always Here
Posts: 5335
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Terrible slowdown of AddElement used in Threads

Post by mk-soft »

The problem here is that the list functions are not internally thread-safe and are therefore protected internally with mutex using the thread-safe option.
If you switch off the thread-safe option, this leads to a crash.

So this is not a bug, but an internal optimisation of Fred is necessary.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
Post Reply