PeekS with wrong Byte1 UTF-8 Codes or BOM

Just starting out? Need help? Post your questions and find answers here.
juergenkulow
Enthusiast
Enthusiast
Posts: 540
Joined: Wed Sep 25, 2019 10:18 am

PeekS with wrong Byte1 UTF-8 Codes or BOM

Post by juergenkulow »

Code: Select all

; PeekS Label arithmetic with Debugger Linux ASM-Backend 6.01 beta 4 
s.s=PeekS(?start,?stop-?start,#PB_UTF8) ; Text$ = PeekS(*MemoryBuffer [, Length [, Format]])
DataSection
  start:
  ; IncludeBinary "/tmp/test.txt" ; works
  IncludeBinary "/media/kulow/BCE4-8C0F/STEP_EXPRESS/PyraKeywords.pb" ; do not work. 
  stop: 
EndDataSection
Debug ?Stop-?Start
Debug Len(s)
Debug s
CompilerIf #PB_Compiler_Debugger<>1 Or #PB_Compiler_Backend<>#PB_Backend_Asm Or #PB_Compiler_OS<>#PB_OS_Linux
  CompilerError "Plese use Debugger, use ASM-Backend and use Linux"
CompilerEndIf 
SetClipboardText(s+Str(?Stop-?Start))
; PureBasic 6.01 LTS beta 4 (Linux - x64)
; Loading external modules...
; Starting compilation...
; Starting compilation...
; 20 lines processed.
; Creating the executable.
; 
; - Feel the ..PuRe.. Power -
; 
; [Debugger]  2273
; [Debugger]  0
; [Debugger]  
; ; s.s=PeekS(?start,?stop-?start,#PB_UTF8)
;   MOV    rdi,1
;   CALL   DBL
;   PUSH   qword [PB_StringBasePosition]
;   SUB    rsp,8
;   PUSH   qword [PB_StringBasePosition]
;   PUSH   qword 2
;   MOV    rbp,l_stop
;   MOV    r15,rbp
;   MOV    rbp,l_start
;   SUB    r15,rbp
;   MOV    rax,r15
;   PUSH   rax
;   MOV    rbp,l_start
;   MOV    rax,rbp
;   PUSH   rax
;   POP    rdi
;   POP    rsi
;   POP    rdx
;   POP    rcx
;   MOV    r10,PB_PeekS3_DEBUG
;   MOV    r11,0
;   CALL  _DBG_CallDebug
;   CALL   PB_PeekS3
;   ADD    rsp,8
;   LEA    rdi,[v_s]
;   POP    rsi
;   CALL   SYS_AllocateString4
/media/kulow/BCE4-8C0F/STEP_EXPRESS/PyraKeywords.pb:

Code: Select all

ADVANCED_BREP_SHAPE_REPRESENTATION('',(#173),#273);
ADVANCED_FACE('',(#134),#144,.T.);
APPLICATION_PROTOCOL_DEFINITION('international standard','ap242_managed_model_based_3d_engineering',2011,#287);
APPLICATION_CONTEXT('managed model based 3d engineering');
AXIS2_PLACEMENT_3D('',#229,#187,#188);
CARTESIAN_POINT('',(0.,0.,0.))       ;
CLOSED_SHELL('',(#154,#155,#156,#157,#158,#159,#160,#161,#162,#163));
COLOUR_RGB('',0.615686274509804,0.811764705882353,0.929411764705882);
DIRECTION('',(0.,0.,1.));
EDGE_CURVE('',#72,#73,#84,.T.);
EDGE_LOOP('',(#12,#13,#14,#15));
FACE_BOUND('',#124,.T.)        ;
FILL_AREA_STYLE('',(#171));
FILL_AREA_STYLE_COLOUR('',#172);
Line('',#231,#104);
MANIFOLD_SOLID_BREP('Part 1',#164)                                  ;
MECHANICAL_DESIGN_GEOMETRIC_PRESENTATION_REPRESENTATION('',(#165),#273);
ORIENTED_EDGE('',*,*,#52,.F.)                                ;
PLANE('',#177);
PRESENTATION_STYLE_ASSIGNMENT((#167));
PRODUCT('Part 1','Part 1','Part 1',(#285));
PRODUCT_CATEGORY('','');
PRODUCT_CONTEXT('',#287,'mechanical');
PRODUCT_DEFINITION_SHAPE('','',#279);
PRODUCT_DEFINITION('','',#281,#280);
PRODUCT_DEFINITION_CONTEXT('',#287,'design');
PRODUCT_DEFINITION_FORMATION_WITH_SPECIFIED_SOURCE('','',#283,.NOT_KNOWN.);
PRODUCT_RELATED_PRODUCT_CATEGORY('','',(#283));
SHAPE_DEFINITION_REPRESENTATION(#278,#175);
SHAPE_REPRESENTATION('Part 1',(#176),#273);
SHAPE_REPRESENTATION_RELATIONSHIP('','',#175,#11);
STYLED_ITEM('',(#166),#173)                      ;
SURFACE_SIDE_STYLE('',(#169))                    ;
SURFACE_STYLE_FILL_AREA(#170);
SURFACE_STYLE_USAGE(.BOTH.,#168);
UNCERTAINTY_MEASURE_WITH_UNIT(LENGTH_MEASURE(5.E-6),#277,'DISTANCE_ACCURACY_VALUE','Maximum Tolerance applied to model');
VECTOR('',#191,1.);
VERTEX_POINT('',#232)                                          ;

(
GEOMETRIC_REPRESENTATION_CONTEXT(3)
GLOBAL_UNCERTAINTY_ASSIGNED_CONTEXT((#274))
GLOBAL_UNIT_ASSIGNED_CONTEXT((#277,#276,#275))
REPRESENTATION_CONTEXT('Part 1','TOP_LEVEL_ASSEMBLY_PART')
);
(
LENGTH_UNIT()
NAMED_UNIT(*)
SI_UNIT($,.METRE.)
);
(
NAMED_UNIT(*)
PLANE_ANGLE_UNIT()
SI_UNIT($,.RADIAN.)
);
(
NAMED_UNIT(*)
SI_UNIT($,.STERADIAN.)
SOLID_ANGLE_UNIT()
);
Edit Add ;
Edit Title
Last edited by juergenkulow on Mon Feb 27, 2023 10:34 am, edited 2 times in total.
User avatar
Demivec
Addict
Addict
Posts: 4082
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: PeekS Label arithmetic with Debugger

Post by Demivec »

Is there a byte order marker at the beginning of media/kulow/BCE4-8C0F/STEP_EXPRESS/PyraKeywords.pb ?
Last edited by Demivec on Wed Feb 22, 2023 10:13 pm, edited 1 time in total.
infratec
Always Here
Always Here
Posts: 6810
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: PeekS Label arithmetic with Debugger

Post by infratec »

If you use:

Code: Select all

PeekS(?start,?stop-?start,#PB_UTF8)
you should always do it like this:

Code: Select all

PeekS(?start,?stop-?start,#PB_UTF8|#PB_ByteLength)
else you can get wrong results.
User avatar
NicTheQuick
Addict
Addict
Posts: 1218
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: PeekS Label arithmetic with Debugger

Post by NicTheQuick »

I do not understand anything in this Bug report. Read this first before posting any bugs: viewtopic.php?t=28694
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
juergenkulow
Enthusiast
Enthusiast
Posts: 540
Joined: Wed Sep 25, 2019 10:18 am

Re: PeekS Label arithmetic with Debugger

Post by juergenkulow »

PeekS does not always generate the string with contents of the IncludeBinary file.
I turn off the debbugger and PeekS works. turn it on and get the error.
I turn on the C backend and PeekS works, turn on ASM backend and get the error.
I used other files and PeekS works. What is the cause of the error?
infratec
Always Here
Always Here
Posts: 6810
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: PeekS Label arithmetic with Debugger

Post by infratec »

As already written:

if the file has a BOM in front you have to skip this.

https://en.wikipedia.org/wiki/Byte_order_mark

Else you get wrong characters at the beginning of the text.
And if if it is not ASCII or UTF8 you run also in to a problem.

You should definately write a procedure which inspects the first bytes of the included file and set the encoding of PeekS()
to the right value.
juergenkulow
Enthusiast
Enthusiast
Posts: 540
Joined: Wed Sep 25, 2019 10:18 am

Re: PeekS Label arithmetic with Debugger

Post by juergenkulow »

@infratec
Thank you for the BOM information. I work with the C Backend and BOM makes no problems in my little program.
juergenkulow
Enthusiast
Enthusiast
Posts: 540
Joined: Wed Sep 25, 2019 10:18 am

Re: PeekS with wrong Byte1 UTF-8 Codes or BOM

Post by juergenkulow »

Code: Select all

; PeekS(*MemoryBuffer,-1,#PB_UTF8) with wrong Byte1 UTF-8 Codes or BOM
; $80 To $BF, $F8 To $FF: Interrupts string assignment.
; $C0 $E0, $F0 b%$20=0: Zero terminates a string. 
; Linux x64 6.01 Beta 4 ASM-Backend Debbugger on 
*p=UTF8("Hello   visible universe.")
*b.Byte=*p+6
For i=$80 To $ff
  *b\b=i
  s.s="_"+PeekS(*p,-1,#PB_UTF8)+" Phone home."
  Debug Hex(i)+":"+s
Next 
; 80:_
; 81:_
; 82:_
; 83:_
; 84:_
; 85:_
; 86:_
; 87:_
; 88:_
; 89:_
; 8A:_
; 8B:_
; 8C:_
; 8D:_
; 8E:_
; 8F:_
; 90:_
; 91:_
; 92:_
; 93:_
; 94:_
; 95:_
; 96:_
; 97:_
; 98:_
; 99:_
; 9A:_
; 9B:_
; 9C:_
; 9D:_
; 9E:_
; 9F:_
; A0:_
; A1:_
; A2:_
; A3:_
; A4:_
; A5:_
; A6:_
; A7:_
; A8:_
; A9:_
; AA:_
; AB:_
; AC:_
; AD:_
; AE:_
; AF:_
; B0:_
; B1:_
; B2:_
; B3:_
; B4:_
; B5:_
; B6:_
; B7:_
; B8:_
; B9:_
; BA:_
; BB:_
; BC:_
; BD:_
; BE:_
; BF:_
; C0:_Hello 
; C1:_Hello visible universe. Phone home.
; C2:_Hello visible universe. Phone home.
; C3:_Hello visible universe. Phone home.
; C4:_Hello visible universe. Phone home.
; C5:_Hello visible universe. Phone home.
; C6:_Hello visible universe. Phone home.
; C7:_Hello visible universe. Phone home.
; C8:_Hello visible universe. Phone home.
; C9:_Hello 	visible universe. Phone home.
; CA:_Hello 
; visible universe. Phone home.
; CB:_Hello visible universe. Phone home.
; CC:_Hello visible universe. Phone home.
; CD:_Hello 
; visible universe. Phone home.
; CE:_Hello visible universe. Phone home.
; CF:_Hello visible universe. Phone home.
; D0:_Hello visible universe. Phone home.
; D1:_Hello visible universe. Phone home.
; D2:_Hello visible universe. Phone home.
; D3:_Hello visible universe. Phone home.
; D4:_Hello visible universe. Phone home.
; D5:_Hello visible universe. Phone home.
; D6:_Hello visible universe. Phone home.
; D7:_Hello visible universe. Phone home.
; D8:_Hello visible universe. Phone home.
; D9:_Hello visible universe. Phone home.
; DA:_Hello visible universe. Phone home.
; DB:_Hello visible universe. Phone home.
; DC:_Hello visible universe. Phone home.
; DD:_Hello visible universe. Phone home.
; DE:_Hello visible universe. Phone home.
; DF:_Hello visible universe. Phone home.
; E0:_Hello 
; E1:_Hello visible universe. Phone home.
; E2:_Hello visible universe. Phone home.
; E3:_Hello visible universe. Phone home.
; E4:_Hello visible universe. Phone home.
; E5:_Hello visible universe. Phone home.
; E6:_Hello visible universe. Phone home.
; E7:_Hello visible universe. Phone home.
; E8:_Hello visible universe. Phone home.
; E9:_Hello 	visible universe. Phone home.
; EA:_Hello 
; visible universe. Phone home.
; EB:_Hello visible universe. Phone home.
; EC:_Hello visible universe. Phone home.
; ED:_Hello 
; visible universe. Phone home.
; EE:_Hello visible universe. Phone home.
; EF:_Hello visible universe. Phone home.
; F0:_Hello 
; F1:_Hello visible universe. Phone home.
; F2:_Hello visible universe. Phone home.
; F3:_Hello visible universe. Phone home.
; F4:_Hello visible universe. Phone home.
; F5:_Hello visible universe. Phone home.
; F6:_Hello visible universe. Phone home.
; F7:_Hello visible universe. Phone home.
; F8:_
; F9:_
; FA:_
; FB:_
; FC:_
; FD:_
; FE:_
; FF:_
Wiki UTF-8
infratec
Always Here
Always Here
Posts: 6810
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: PeekS with wrong Byte1 UTF-8 Codes or BOM

Post by infratec »

juergenkulow wrote: Mon Feb 27, 2023 10:37 am

Code: Select all

*p=UTF8("Hello   visible universe.")
*b.Byte=*p+6
For i=$80 To $ff
  *b\b=i
  s.s="_"+PeekS(*p,-1,#PB_UTF8)+" Phone home."
  Debug Hex(i)+":"+s
Next 
Wiki UTF-8
Your code produce invalid UTF-8 code. ($80...$FF, $20)

According to your link $80 to $BF is not allowed as single byte.
And if the first byte is >= $C0 and < $E0 then the second byte needs to be larger then $80 and < $C0
else UTF-8 is invalid.
juergenkulow
Enthusiast
Enthusiast
Posts: 540
Joined: Wed Sep 25, 2019 10:18 am

Re: PeekS with wrong Byte1 UTF-8 Codes or BOM

Post by juergenkulow »

I don't want to have to program .s PeekSUtf8(*mem,len) because in the wild BOM is not always at the beginning or junk is generated. I want robust functions at the compiler.
Post Reply