Quake III - FastInverseSquareRoot

Just starting out? Need help? Post your questions and find answers here.
User avatar
StarBootics
Addict
Addict
Posts: 984
Joined: Sun Jul 07, 2013 11:35 am
Location: Canada

Quake III - FastInverseSquareRoot

Post by StarBootics »

Hello everyone,

I have found a video explaining the Quake III - Fast Inverse Square Root function written in C.
https://www.youtube.com/watch?v=p8u_k2LIZyo

There is the code in C :

Code: Select all

float FastInverseSquareRoot(float Number)
{
	long i;
	float x2, y;
	const float ThreeHalfs = 1.5F;
	
	x2 = Number * 0.5F;
	y = Number;
	i = * ( long * ) &y;
	i = 0x5f3759df - (i >> 1);
	y = * ( float * ) &i;
	y = y * (ThreeHalfs - (x2 * y * y));
	// y = y * (ThreeHalfs - (x2 * y * y));
	
	return y 
}
I have try to re-write this function in PureBasic :

Code: Select all

Procedure.f FastInverseSquareRoot(Number.f)
  
  x2.f = Number * 0.5
  y.f = Number
  i.l = @y
  i = $5f3759df - (i >> 1)
  y = @i
  y = y * (1.5 - (x2 * y * y))
  
  ProcedureReturn y
EndProcedure
But I can't get the right result, for the input 5.0 I should get 0.44721359014511 but I get -Infinity anyone know how to fix this issue ?

Thanks in advance.

StarBootics
The Stone Age did not end due to a shortage of stones !
User avatar
STARGÅTE
Addict
Addict
Posts: 2067
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Quake III - FastInverseSquareRoot

Post by STARGÅTE »

I can't read the C-code but in PB "i.l = @y" is nonsense.
"@" gives the address of a variable, and addresses are always integers (32/64 bit) and not a long.
Also "y = @i" makes no sense, because "y" is always just the address of the variable "i".
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User avatar
HeX0R
Addict
Addict
Posts: 980
Joined: Mon Sep 20, 2004 7:12 am
Location: Hell

Re: Quake III - FastInverseSquareRoot

Post by HeX0R »

Code: Select all

Structure _H_
	StructureUnion
		l.l
		f.f
	EndStructureUnion
EndStructure

Procedure.f FastInverseSquareRoot(Number.f)
  Protected *i._H_, x2.f, y.f
  x2 = Number * 0.5
  y = Number
  *i = @y
  *i\l = $5f3759df - (*i\l >> 1)
  y = *i\f
  y = y * (1.5 - (x2 * y * y))
  
  ProcedureReturn y
EndProcedure

Debug FastInverseSquareRoot(5.0)
User avatar
luis
Addict
Addict
Posts: 3876
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: Quake III - FastInverseSquareRoot

Post by luis »

StarBootics

The equivalent in PB of
i = * ( long * ) &y;
and
y = * ( float * ) &i;

is

i.l = PeekL(@y)
and
y.f = PeekF(@i)


The C code means "get the address of a var, treat that var at that location like if it were of the type I'm casting it to, and get its value".

The method of HeXOR avoids calling a function by using a union and doing the conversion similarly to what C does more cleanly using a cast.

Anyway CPUs today do this faster and more precisely without the need of this (clever) hack.
"Have you tried turning it off and on again ?"
A little PureBasic review
User avatar
StarBootics
Addict
Addict
Posts: 984
Joined: Sun Jul 07, 2013 11:35 am
Location: Canada

Re: Quake III - FastInverseSquareRoot

Post by StarBootics »

Hello again,

I have found a way using CopyMemory() but the end result is not fast at all :

Code: Select all

Procedure.f FastInverseSquareRoot(Number.f)
  
  x2.f = Number * 0.5
  y.f = Number
  
  CopyMemory(@y, @i.l, 4)

  i = $5f3759df - (i >> 1)
  
  CopyMemory(@i, @y, 4)
  
  y = y * (1.5 - (x2 * y * y))
  ;y = y * (1.5 - (x2 * y * y))
  
  ProcedureReturn y
EndProcedure

TempsDepart0 = ElapsedMilliseconds()

For TestID = 0 To 10000000
  A.f = FastInverseSquareRoot(5.0)
Next

TempsEcoule0 = ElapsedMilliseconds() - TempsDepart0

TempsDepart1 = ElapsedMilliseconds()

For TestID = 0 To 10000000
  B.f = 1.0 / Sqr(5.0)
Next

TempsEcoule1 = ElapsedMilliseconds() - TempsDepart1


MessageRequester("Temps", "FastInverseSquareRoot(5.0) -> (" + StrF(A, 8) + ") -> " + Str(TempsEcoule0) + #LF$ + "1.0 / Sqr(5.0) -> (" + StrF(B, 8) + ") -> " + Str(TempsEcoule1))
In terms of the result the values are close enough but way too slow to be practical. So ...

Best regards
StarBootics
The Stone Age did not end due to a shortage of stones !
User avatar
luis
Addict
Addict
Posts: 3876
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: Quake III - FastInverseSquareRoot

Post by luis »

As suggested in my post, and suggested by HeXOR code, calling a function twice in that code is the worst thing you can do.

Also Peek* is probably faster because is not general purpose like CopyMemory.

But even doing the best you can do would change the result only marginally anyway.

https://levelup.gitconnected.com/death- ... fd2eadd7b7
"Have you tried turning it off and on again ?"
A little PureBasic review
User avatar
StarBootics
Addict
Addict
Posts: 984
Joined: Sun Jul 07, 2013 11:35 am
Location: Canada

Re: Quake III - FastInverseSquareRoot

Post by StarBootics »

Hello again,

Some speed test here with all proposed solutions.

Code: Select all


Procedure.f FastInverseSquareRootPeek(Number.f)
  
  x2.f = Number * 0.5
  y.f = Number
  i.l = PeekL(@y)
  i = $5f3759df - (i >> 1)
  y = PeekF(@i)
  y = y * (1.5 - (x2 * y * y))
  
  ProcedureReturn y
EndProcedure

Structure _H_
	StructureUnion
		l.l
		f.f
	EndStructureUnion
EndStructure

Procedure.f FastInverseSquareRootStruct(Number.f)
  
  Protected *i._H_, x2.f, y.f
  
  x2 = Number * 0.5
  y = Number
  *i = @y
  *i\l = $5f3759df - (*i\l >> 1)
  y = *i\f
  y = y * (1.5 - (x2 * y * y))
  
  ProcedureReturn y
EndProcedure

Procedure.f FastInverseSquareRootCopyMem(Number.f)
  
  x2.f = Number * 0.5
  y.f = Number
  
  CopyMemory(@y, @i.l, 4)

  i = $5f3759df - (i >> 1)
  
  CopyMemory(@i, @y, 4)
  
  y = y * (1.5 - (x2 * y * y))
  ;y = y * (1.5 - (x2 * y * y))
  
  ProcedureReturn y
EndProcedure

TempsDepart0 = ElapsedMilliseconds()

For TestID = 0 To 10000000
  A.f = FastInverseSquareRootStruct(5.0)
Next

TempsEcoule0 = ElapsedMilliseconds() - TempsDepart0

TempsDepart1 = ElapsedMilliseconds()

For TestID = 0 To 10000000
  A.f = FastInverseSquareRootCopyMem(5.0)
Next

TempsEcoule1 = ElapsedMilliseconds() - TempsDepart1



TempsDepart2 = ElapsedMilliseconds()

For TestID = 0 To 10000000
  A.f = FastInverseSquareRootPeek(5.0)
Next

TempsEcoule2 = ElapsedMilliseconds() - TempsDepart2

TempsDepart3 = ElapsedMilliseconds()

For TestID = 0 To 10000000
  B.f = 1.0 / Sqr(5.0)
Next

TempsEcoule3 = ElapsedMilliseconds() - TempsDepart3


MessageRequester("Temps FastInverseSquareRoot(5.0) vs 1.0 / Sqr(5.0)", "With Struct Union -> " + Str(TempsEcoule0) + #LF$ +
                                                                    "With Copy Memory -> " + Str(TempsEcoule1) + #LF$ +
                                                                    "With PeekL/F -> " + Str(TempsEcoule2) + #LF$ +
                                                                    "Reference -> " + Str(TempsEcoule3) + #LF$)
There is the result I get :

With Struct Union -> 193 ms
With Copy Memory -> 189 ms
With PeekL/F -> 175 ms
Reference -> 32 ms

1.0 / Sqr(5.0) is faster, no performance gain at all with FastInverseSquareRoot() so ...

Best regards
StarBootics
The Stone Age did not end due to a shortage of stones !
User avatar
STARGÅTE
Addict
Addict
Posts: 2067
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Quake III - FastInverseSquareRoot

Post by STARGÅTE »

Code: Select all

Procedure.f FastInverseSquareRoot(Number.f)
	! MOVSS   xmm0, [p.v_Number]
	! RSQRTSS xmm0, xmm0
	! MOVSS   [p.v_Number], xmm0
	ProcedureReturn Number
EndProcedure
But using a fast inverse square root algorithm is not a good idea for normalizing vector, because this inverse root is not precise enough.
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User avatar
StarBootics
Addict
Addict
Posts: 984
Joined: Sun Jul 07, 2013 11:35 am
Location: Canada

Re: Quake III - FastInverseSquareRoot

Post by StarBootics »

Hello once again

The speed test including the STARGATE ASM version :

Code: Select all

Procedure.f FastInverseSquareRootPeek(Number.f)
  
  x2.f = Number * 0.5
  y.f = Number
  i.l = PeekL(@y)
  i = $5f3759df - (i >> 1)
  y = PeekF(@i)
  y = y * (1.5 - (x2 * y * y))
  
  ProcedureReturn y
EndProcedure

Structure _H_
	StructureUnion
		l.l
		f.f
	EndStructureUnion
EndStructure

Procedure.f FastInverseSquareRootStruct(Number.f)
  
  Protected *i._H_, x2.f, y.f
  
  x2 = Number * 0.5
  y = Number
  *i = @y
  *i\l = $5f3759df - (*i\l >> 1)
  y = *i\f
  y = y * (1.5 - (x2 * y * y))
  
  ProcedureReturn y
EndProcedure

Procedure.f FastInverseSquareRootCopyMem(Number.f)
  
  x2.f = Number * 0.5
  y.f = Number
  
  CopyMemory(@y, @i.l, 4)

  i = $5f3759df - (i >> 1)
  
  CopyMemory(@i, @y, 4)
  
  y = y * (1.5 - (x2 * y * y))
  ;y = y * (1.5 - (x2 * y * y))
  
  ProcedureReturn y
EndProcedure


Procedure.f FastInverseSquareRootASM(Number.f)
	! MOVSS   xmm0, [p.v_Number]
	! RSQRTPS xmm0, xmm0
	! MOVSS   [p.v_Number], xmm0
	ProcedureReturn Number
EndProcedure

TempsDepart0 = ElapsedMilliseconds()

For TestID = 0 To 10000000
  A.f = FastInverseSquareRootStruct(5.0)
Next

TempsEcoule0 = ElapsedMilliseconds() - TempsDepart0

TempsDepart1 = ElapsedMilliseconds()

For TestID = 0 To 10000000
  A.f = FastInverseSquareRootCopyMem(5.0)
Next

TempsEcoule1 = ElapsedMilliseconds() - TempsDepart1



TempsDepart2 = ElapsedMilliseconds()

For TestID = 0 To 10000000
  A.f = FastInverseSquareRootPeek(5.0)
Next

TempsEcoule2 = ElapsedMilliseconds() - TempsDepart2

TempsDepart3 = ElapsedMilliseconds()

For TestID = 0 To 10000000
  A.f = FastInverseSquareRootASM(5.0)
Next

TempsEcoule3 = ElapsedMilliseconds() - TempsDepart3

TempsDepart4 = ElapsedMilliseconds()

For TestID = 0 To 10000000
  B.f = 1.0 / Sqr(5.0)
Next

TempsEcoule4 = ElapsedMilliseconds() - TempsDepart4


MessageRequester("Temps FastInverseSquareRoot(5.0) vs 1.0 / Sqr(5.0)", "With Struct Union -> " + Str(TempsEcoule0) + #LF$ +
                                                                       "With Copy Memory -> " + Str(TempsEcoule1) + #LF$ +
                                                                       "With PeekL/F -> " + Str(TempsEcoule2) + #LF$ +
                                                                       "With ASM -> " + Str(TempsEcoule3) + #LF$ +
                                                                       "Reference -> " + Str(TempsEcoule4) + #LF$)
There is the result I get :

With Struct Union -> 186 ms
With Copy Memory -> 187 ms
With PeekL/F -> 183 ms
With ASM -> 76 ms
Reference -> 33 ms

Again no gain at all.

Best regards
StarBootics
The Stone Age did not end due to a shortage of stones !
User avatar
STARGÅTE
Addict
Addict
Posts: 2067
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: Quake III - FastInverseSquareRoot

Post by STARGÅTE »

On my machine (Ryzen 9 3900X) ASM is the fastest:
---------------------------
Temps FastInverseSquareRoot(5.0) vs 1.0 / Sqr(5.0)
---------------------------
With Struct Union -> 101
With Copy Memory -> 321
With PeekL/F -> 125
With ASM -> 30
Reference -> 40

---------------------------
OK
---------------------------
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
User avatar
StarBootics
Addict
Addict
Posts: 984
Joined: Sun Jul 07, 2013 11:35 am
Location: Canada

Re: Quake III - FastInverseSquareRoot

Post by StarBootics »

STARGÅTE wrote: Mon Oct 25, 2021 1:02 pm On my machine (Ryzen 9 3900X) ASM is the fastest:
My machine : AMD® Ryzen 7 5800x 8-core processor × 16
The Stone Age did not end due to a shortage of stones !
User avatar
luis
Addict
Addict
Posts: 3876
Joined: Wed Aug 31, 2005 11:09 pm
Location: Italy

Re: Quake III - FastInverseSquareRoot

Post by luis »

Using modern CPU instructions is faster on my CPU too (AMD Ryzen 5 3600)

PB5.73 x86
With Struct Union -> 70
With Copy Memory -> 147
With PeekL/F -> 71
With ASM -> 18
Reference -> 40

PB5.73 x64
With Struct Union -> 103
With Copy Memory -> 328
With PeekL/F -> 128
With ASM -> 31
Reference -> 40
"Have you tried turning it off and on again ?"
A little PureBasic review
User avatar
StarBootics
Addict
Addict
Posts: 984
Joined: Sun Jul 07, 2013 11:35 am
Location: Canada

Re: Quake III - FastInverseSquareRoot

Post by StarBootics »

The little gain in performance and the lost of precision don't worth the effort.

The only way to have the same precision it's to do the Newton Iteration (y = y * (1.5 - (x2 * y * y))) twice.

So I'm not going to change anything to my Vector Normalizing procedure.

Thanks to everyone.
StarBootics
The Stone Age did not end due to a shortage of stones !
Post Reply