PureBasic Forum http://forums.purebasic.com/english/ |
|
FPU how to cleanup register? http://forums.purebasic.com/english/viewtopic.php?f=35&t=74625 |
Page 1 of 1 |
Author: | Mijikai [ Sat Feb 15, 2020 1:59 pm ] |
Post subject: | FPU how to cleanup register? |
I want to calculate timing differences but when i use my code without finit it works only a few times in succesion? I suspect that my code has an issue. What can i do/ clean up to make it work without finit? Code: Code: proc Timing
finit fld qword[rcx+0x20] fild qword[rcx+0x30] fild qword[rcx+0x38] fsub st0,st1 fdiv st0,st2 fstp qword[rcx+0x40];<- this pops st0 ret endp |
Author: | wilbert [ Sat Feb 15, 2020 2:14 pm ] |
Post subject: | Re: FPU how to cleanup register? |
You load three values onto the fpu stack and only remove one. That won't work. Use fsubp and fdivp instead of fsub and fdiv. |
Author: | Mijikai [ Sat Feb 15, 2020 2:55 pm ] |
Post subject: | Re: FPU how to cleanup register? |
Thanks wilbert ![]() I changed it and it works but im still not sure if it is correct? Code: Code: proc Timing
fild qword[rcx+0x38] fild qword[rcx+0x30] fsubp fld qword[rcx+0x20] fdivp fstp qword[rcx+0x40] ret endp |
Author: | wilbert [ Sat Feb 15, 2020 4:05 pm ] |
Post subject: | Re: FPU how to cleanup register? |
Mijikai wrote: I changed it and it works but im still not sure if it is correct? The code looks fine to me. |
Author: | Mijikai [ Sat Feb 15, 2020 6:22 pm ] |
Post subject: | Re: FPU how to cleanup register? |
wilbert wrote: Mijikai wrote: I changed it and it works but im still not sure if it is correct? The code looks fine to me. Thank you ![]() I also found no more issues while testing. I hardly touch the fpu. |
Author: | wilbert [ Sat Feb 15, 2020 7:14 pm ] |
Post subject: | Re: FPU how to cleanup register? |
Mijikai wrote: I hardly touch the fpu. In most cases you can use SSE2 instructions if you don't like to use the x87 ones. By the way, if the value you are using for division is a constant, it is a lot faster to calculate the reciprocal once and multiply by that. |
Author: | Mijikai [ Sat Feb 15, 2020 7:27 pm ] |
Post subject: | Re: FPU how to cleanup register? |
wilbert wrote: Mijikai wrote: I hardly touch the fpu. In most cases you can use SSE2 instructions if you don't like to use the x87 ones. By the way, if the value you are using for division is a constant, it is a lot faster to calculate the reciprocal once and multiply by that. Sounds great but i have no idea what a reciprocal is ![]() [rcx+0x20] in my case is static. Is it just: Code: Constant = (1 / Constant)
Result = Input * Constant |
Author: | wilbert [ Sat Feb 15, 2020 7:58 pm ] |
Post subject: | Re: FPU how to cleanup register? |
Mijikai wrote: Is it just: Code: Constant = (1 / Constant) Result = Input * Constant Yes, that is it ![]() Multiply is a much faster cpu operation compared to division. |
Author: | Helle [ Sat Feb 15, 2020 8:23 pm ] |
Post subject: | Re: FPU how to cleanup register? |
With this you can check the FPU-Registers: Code: Procedure FPU()
Global.q SaveRam, SaveRamAli, Ali Global.d FST0, FST1, FST2, FST3, FST4, FST5, FST6, FST7 Global Tag.b Protected Tag0$=" Free", Tag1$=" Free", Tag2$=" Free", Tag3$=" Free", Tag4$=" Free", Tag5$=" Free", Tag6$=" Free", Tag7$=" Free" SaveRam = AllocateMemory(512+64) ;512 Byte FXSAVE Ali = SaveRam % 64 SaveRamAli = SaveRam + Ali !mov rsi,[v_SaveRamAli] !fxsave64 [rsi] !mov al,[rsi+4] ;Tag !mov [v_Tag],al !test [v_Tag],1 !jz @f Tag7$=" Busy" !@@: !test [v_Tag],2 !jz @f Tag6$=" Busy" !@@: !test [v_Tag],4 !jz @f Tag5$=" Busy" !@@: !test [v_Tag],8 !jz @f Tag4$=" Busy" !@@: !test [v_Tag],10h !jz @f Tag3$=" Busy" !@@: !test [v_Tag],20h !jz @f Tag2$=" Busy" !@@: !test [v_Tag],40h !jz @f Tag1$=" Busy" !@@: !test [v_Tag],80h !jz @f Tag0$=" Busy" !@@: !mov rsi,[v_SaveRamAli] !fld tword[rsi+32] !fstp qword[v_FST0] !fld tword[rsi+48] !fstp qword[v_FST1] !fld tword[rsi+64] !fstp qword[v_FST2] !fld tword[rsi+80] !fstp qword[v_FST3] !fld tword[rsi+96] !fstp qword[v_FST4] !fld tword[rsi+112] !fstp qword[v_FST5] !fld tword[rsi+128] !fstp qword[v_FST6] !fld tword[rsi+144] !fstp qword[v_FST7] Reg$ = "ST0 = " + StrD(FST0, 20) + Tag0$ + #LFCR$ Reg$ + "ST1 = " + StrD(FST1, 20) + Tag1$ + #LFCR$ Reg$ + "ST2 = " + StrD(FST2, 20) + Tag2$ + #LFCR$ Reg$ + "ST3 = " + StrD(FST3, 20) + Tag3$ + #LFCR$ Reg$ + "ST4 = " + StrD(FST4, 20) + Tag4$ + #LFCR$ Reg$ + "ST5 = " + StrD(FST5, 20) + Tag5$ + #LFCR$ Reg$ + "ST6 = " + StrD(FST6, 20) + Tag6$ + #LFCR$ Reg$ + "ST7 = " + StrD(FST7, 20) + Tag7$ FreeMemory(SaveRam) MessageRequester("FPU-Register", Reg$) EndProcedure ;Test Data1.d = 20.0 Data2.q = 30 Data3.q = 40 Result.d ;FPU() ;!finit FPU() !fild qword[v_Data3] FPU() !fild qword[v_Data2] FPU() !fsubp FPU() !fdiv qword[v_Data1] FPU() !fstp qword[v_Result] FPU() Debug Result |
Author: | Mijikai [ Sat Feb 15, 2020 8:40 pm ] |
Post subject: | Re: FPU how to cleanup register? |
Thanks wilbert & Helle ![]() That is really helpful Helle ![]() now i can easily see my mistakes XD I created a small timer library in fasm to calculate FPS, frametime and deltatime. I would post the source for everyone if i get some help in optimizing/ making it better. Its not that big (~400 lines) and it might become a nice tutorial for anyone that is interested in asm ![]() |
Author: | Helle [ Sun May 03, 2020 5:58 pm ] |
Post subject: | Re: FPU how to cleanup register? |
Corona: No pubs, no bars... Much time for program -> corrected version (physical register): Code: Procedure FPU(FPU_ID)
Protected.q SaveRam, SaveRamAli, C1, PhysReg, RFlags, CW, SW, Tags Protected.d FST0, FST1, FST2, FST3, FST4, FST5, FST6, FST7 Protected.s sN = "Clear", sY = "Set", sSW = "S E T !", sTag0 = "Free", sTag1 = "Free", sTag2 = "Free", sTag3 = "Free", sTag4 = "Free", sTag5 = "Free", sTag6 = "Free", sTag7 = "Free", sOcc = "Occupied" Protected.s sTag00, sTag01, sTag02, sTag03, sTag04, sTag05, sTag06, sTag07 Protected.s sSF = " Stack Fault: ", sSet = "Settings:", sStat = "Status:", sEF = " Exception Flags:", sCCF = " Condition Code Flags:", sC0 = " C0: ", sC1 = " C1: ", sC2 = " C2: ", sC3 = " C3: " Protected.s sST0 = "ST0 = ", sST1 = "ST1 = ", sST2 = "ST2 = ", sST3 = "ST3 = ", sST4 = "ST4 = ", sST5 = "ST5 = ", sST6 = "ST6 = ", sST7 = "ST7 = ", sTS = " Top of Stack: Physical Register R" Protected.s sEM = " Exception Masks:", sIO = " Invalid Operation: ", sDO = " Denormal Operand: ", sZD = " Zero Divide: ", sOF = " Overflow: ", sUF = " Underflow: ", sPR = " Precision: " Protected.s sPrec = " Precision: ", sRound = " Rounding: Round ", sR1 = "to nearest (even)", sR2 = "down (toward -Infinity)", sR3 = "up (toward +Infinity)", sR4 = "toward zero (Truncate)" Protected.s sPR1 = " (Low Precision! Inexact Result!)", sWarn = " W A R N I N G ! ! !", sPrecE = "Double Extended Precision 64 Bits", sPrecD = "Double Precision 53 Bits", sPrecS = "Single Precision 24 Bits" Protected.s sReg, sReport, sIOM, sIOE, sDOM, sDOE, sZDM, sZDE, sOFM, sOFE, sUFM, sUFE, sPRM, sPRE !pushfq ;save RFlag-Registers; FCOMI(P) etc! !pop qword[p.v_RFlags] SaveRam = AllocateMemory(512 + 64) ;512 Byte for FXSAVE SaveRamAli = SaveRam + (SaveRam % 64) !mov r10,[p.v_SaveRamAli] !fxsave64 [r10] !mov ax,[r10] !mov [p.v_CW],rax ;Control Word !mov ax,[r10+2] !mov [p.v_SW],rax ;Status Word !mov al,[r10+4] !mov [p.v_Tags],rax ;Tags !fld tword[r10+32] ;Values !fstp qword[p.v_FST0] !fld tword[r10+48] !fstp qword[p.v_FST1] !fld tword[r10+64] !fstp qword[p.v_FST2] !fld tword[r10+80] !fstp qword[p.v_FST3] !fld tword[r10+96] !fstp qword[p.v_FST4] !fld tword[r10+112] !fstp qword[p.v_FST5] !fld tword[r10+128] !fstp qword[p.v_FST6] !fld tword[r10+144] !fstp qword[p.v_FST7] ;Control Word !test word[p.v_CW],100h ;FPUCW Precision, Bits 8 and 9 !jnz .ext !test word[p.v_CW],200h !jnz .double sPrec + sPrecS ;Single Precision 24 Bits !jmp .precend !.double: sPrec + sPrecD ;Double Precision 53 Bits !jmp .precend !.ext: sPrec + sPrecE ;Double Extended Precision 64 Bits !.precend: sPrec + #LFCR$ !test word[p.v_CW],0c00h ;FPUCW Rounding, Bits 10 and 11 !jz .round1 ;00 !test word[p.v_CW],0800h !jz .round2 ;01 !test word[p.v_CW],0400h !jz .round3 ;10 sRound + sR4 ;11 toward zero (Truncate) !jmp .roundend !.round3: sRound + sR3 ;up (toward +Infinity) !jmp .roundend !.round2: sRound + sR2 ;down (toward -Infinity) !jmp .roundend !.round1: sRound + sR1 ;to nearest (even) !.roundend: sRound + #LFCR$ !test word[p.v_CW],1 ;FPUCW Invalid Operation !jz @f sIOM = sIO + sY !jmp .IOM !@@: sIOM = sIO + sN !.IOM: !test word[p.v_CW],2 ;FPUCW Denormal Operand !jz @f sDOM = sDO + sY !jmp .DO !@@: sDOM = sDO + sN !.DO: !test word[p.v_CW],4 ;FPUCW Zero Divide !jz @f sZDM = sZD + sY !jmp .ZD !@@: sZDM = sZD + sN !.ZD: !test word[p.v_CW],8 ;FPUCW Overflow !jz @f sOFM = sOF + sY !jmp .OF !@@: sOFM = sOF + sN !.OF: !test word[p.v_CW],10h ;FPUCW Underflow !jz @f sUFM = sUF + sY !jmp .UF !@@: sUFM = sUF + sN !.UF: !test word[p.v_CW],20h ;FPUCW Precision !jz @f sPRM = sPR + sY !jmp .PM !@@: sPRM = sPR + sN !.PM: ;Status Word !mov ax,[p.v_SW] ;FPUSW PhysReg TOS !shr rax,0bh ;shift in Bit 0-2 !and rax,111b !mov [p.v_PhysReg],rax sTS + Str(PhysReg) + #LFCR$ !test word[p.v_SW],1 ;FPUSW Invalid Operation !jz @f sIOE = sIO + sSW + sWarn !jmp .IOE !@@: sIOE = sIO + sN !.IOE: !test word[p.v_SW],2 ;FPUSW Denormal Operand !jz @f sDOE = sDO + sSW + sWarn !jmp .DOE !@@: sDOE = sDO + sN !.DOE: !test word[p.v_SW],4 ;FPUSW Zero Divide !jz @f sZDE = sZD + sSW + sWarn !jmp .ZDE !@@: sZDE = sZD + sN !.ZDE: !test word[p.v_SW],8 ;FPUSW Overflow !jz @f sOFE = sOF + sSW + sWarn !jmp .OFE !@@: sOFE = sOF + sN !.OFE: !test word[p.v_SW],10h ;FPUSW Underflow !jz @f sUFE = sUF + sSW + sWarn !jmp .UFE !@@: sUFE = sUF + sN !.UFE: !test word[p.v_SW],100h ;FPUSW C0 !jz @f sC0 + sSW !jmp .C0E !@@: sC0 + sN !.C0E: !test word[p.v_SW],200h ;FPUSW C1 !jz @f C1 = 1 ;for Stack Fault sC1 + sSW !jmp .C1E !@@: sC1 + sN !.C1E: !test word[p.v_SW],400h ;FPUSW C2 !jz @f sC2 + sSW !jmp .C2E !@@: sC2 + sN !.C2E: !test word[p.v_SW],4000h ;FPUSW C3 !jz @f sC3 + sSW !jmp .C3E !@@: sC3 + sN !.C3E: !test word[p.v_SW],20h ;FPUSW Precision !jz @f sPRE = sPR + sSW + sPR1 !jmp .PRE !@@: sPRE = sPR + sN !.PRE: !test word[p.v_SW],40h ;FPUSW Stack Fault !jz @f sSF + sSW ;If Invalid Operation If C1 sSF + sOF + sWarn Else sSF + sUF + sWarn EndIf ;EndIf !jmp .SF !@@: sSF + sN !.SF: ;Tags !test byte[p.v_Tags],1 !jz @f sTag0 = sOcc !@@: !test byte[p.v_Tags],2 !jz @f sTag1 = sOcc !@@: !test byte[p.v_Tags],4 !jz @f sTag2 = sOcc !@@: !test byte[p.v_Tags],8 !jz @f sTag3 = sOcc !@@: !test byte[p.v_Tags],10h !jz @f sTag4 = sOcc !@@: !test byte[p.v_Tags],20h !jz @f sTag5 = sOcc !@@: !test byte[p.v_Tags],40h !jz @f sTag6 = sOcc !@@: !test byte[p.v_Tags],80h !jz @f sTag7 = sOcc !@@: Select PhysReg ;TOS, better and more correctly infos Case 0 sST0 + "R0 = " : sTag00 = sTag0 sST1 + "R1 = " : sTag01 = sTag1 sST2 + "R2 = " : sTag02 = sTag2 sST3 + "R3 = " : sTag03 = sTag3 sST4 + "R4 = " : sTag04 = sTag4 sST5 + "R5 = " : sTag05 = sTag5 sST6 + "R6 = " : sTag06 = sTag6 sST7 + "R7 = " : sTag07 = sTag7 Case 1 sST0 + "R1 = " : sTag00 = sTag1 sST1 + "R2 = " : sTag01 = sTag2 sST2 + "R3 = " : sTag02 = sTag3 sST3 + "R4 = " : sTag03 = sTag4 sST4 + "R5 = " : sTag04 = sTag5 sST5 + "R6 = " : sTag05 = sTag6 sST6 + "R7 = " : sTag06 = sTag7 sST7 + "R0 = " : sTag07 = sTag0 Case 2 sST0 + "R2 = " : sTag00 = sTag2 sST1 + "R3 = " : sTag01 = sTag3 sST2 + "R4 = " : sTag02 = sTag4 sST3 + "R5 = " : sTag03 = sTag5 sST4 + "R6 = " : sTag04 = sTag6 sST5 + "R7 = " : sTag05 = sTag7 sST6 + "R0 = " : sTag06 = sTag0 sST7 + "R1 = " : sTag07 = sTag1 Case 3 sST0 + "R3 = " : sTag00 = sTag3 sST1 + "R4 = " : sTag01 = sTag4 sST2 + "R5 = " : sTag02 = sTag5 sST3 + "R6 = " : sTag03 = sTag6 sST4 + "R7 = " : sTag04 = sTag7 sST5 + "R0 = " : sTag05 = sTag0 sST6 + "R1 = " : sTag06 = sTag1 sST7 + "R2 = " : sTag07 = sTag2 Case 4 sST0 + "R4 = " : sTag00 = sTag4 sST1 + "R5 = " : sTag01 = sTag5 sST2 + "R6 = " : sTag02 = sTag6 sST3 + "R7 = " : sTag03 = sTag7 sST4 + "R0 = " : sTag04 = sTag0 sST5 + "R1 = " : sTag05 = sTag1 sST6 + "R2 = " : sTag06 = sTag2 sST7 + "R3 = " : sTag07 = sTag3 Case 5 sST0 + "R5 = " : sTag00 = sTag5 sST1 + "R6 = " : sTag01 = sTag6 sST2 + "R7 = " : sTag02 = sTag7 sST3 + "R0 = " : sTag03 = sTag0 sST4 + "R1 = " : sTag04 = sTag1 sST5 + "R2 = " : sTag05 = sTag2 sST6 + "R3 = " : sTag06 = sTag3 sST7 + "R4 = " : sTag07 = sTag4 Case 6 sST0 + "R6 = " : sTag00 = sTag6 sST1 + "R7 = " : sTag01 = sTag7 sST2 + "R0 = " : sTag02 = sTag0 sST3 + "R1 = " : sTag03 = sTag1 sST4 + "R2 = " : sTag04 = sTag2 sST5 + "R3 = " : sTag05 = sTag3 sST6 + "R4 = " : sTag06 = sTag4 sST7 + "R5 = " : sTag07 = sTag5 Case 7 sST0 + "R7 = " : sTag00 = sTag7 sST1 + "R0 = " : sTag01 = sTag0 sST2 + "R1 = " : sTag02 = sTag1 sST3 + "R2 = " : sTag03 = sTag2 sST4 + "R3 = " : sTag04 = sTag3 sST5 + "R4 = " : sTag05 = sTag4 sST6 + "R5 = " : sTag06 = sTag5 sST7 + "R6 = " : sTag07 = sTag6 EndSelect sReg = sST0 + StrD(FST0, 18) + Space(4) + sTag00 + #LFCR$ sReg + sST1 + StrD(FST1, 18) + Space(4) + sTag01 + #LFCR$ sReg + sST2 + StrD(FST2, 18) + Space(4) + sTag02 + #LFCR$ sReg + sST3 + StrD(FST3, 18) + Space(4) + sTag03 + #LFCR$ sReg + sST4 + StrD(FST4, 18) + Space(4) + sTag04 + #LFCR$ sReg + sST5 + StrD(FST5, 18) + Space(4) + sTag05 + #LFCR$ sReg + sST6 + StrD(FST6, 18) + Space(4) + sTag06 + #LFCR$ sReg + sST7 + StrD(FST7, 18) + Space(4) + sTag07 sEM + #LFCR$ + sIOM + #LFCR$ + sDOM + #LFCR$ + sZDM + #LFCR$ + sOFM + #LFCR$ + sUFM + #LFCR$ + sPRM + #LFCR$ ;Exception Masks sCCF + #LFCR$ + sC0 + #LFCR$ + sC1 + #LFCR$ + sC2 + #LFCR$ + sC3 + #LFCR$ ;Condition Code Flags sEF + #LFCR$ + sIOE + #LFCR$ + sDOE + #LFCR$ + sZDE + #LFCR$ + sOFE + #LFCR$ + sUFE + #LFCR$ + sPRE ;Exception Flags sReport = sSet + #LFCR$ + sPrec + sRound + sEM + #LFCR$ + sStat + #LFCR$ + sCCF + sEF + #LFCR$ + sSF + #LFCR$ + sTS + #LFCR$ + sReg MessageRequester("FPU (" + Str(FPU_ID) + ")", sReport) FreeMemory(SaveRam) !push qword[p.v_RFlags] !popfq EndProcedure ;Test ArcusCosinus(Data1) Result.d Data1.d = 0.1 ;FPU(0) ; !fninit FPU(1) !fld1 FPU(2) !lea rsi,[v_Data1] !fld qword[rsi] FPU(3) !fst st2 FPU(4) !fmul st0,st0 FPU(5) !fsubp FPU(6) !fsqrt FPU(7) !fxch FPU(8) !fpatan FPU(9) !lea rsi,[v_Result] !fstp qword[rsi] ;Result FPU(10) Debug Result End |
Page 1 of 1 | All times are UTC + 1 hour |
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group http://www.phpbb.com/ |