Ide - changing file format - Ide open source project

Everything else that doesn't fall into one of the other PB categories.
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Ide - changing file format - Ide open source project

Post by Josh »

As far as I remember, some Pb versions ago a bug was fixed that occurred when changing the file format from plain-text to Utf8 or vice versa. But there is still a bug with special characters. Set the file format to plain-text, insert the following code and change the file format to Utf8. You will see that the special characters in the range $80 - $9F are misinterpreted.

Code: Select all

A_00_1F_$ = "................................" ; Control characters replaced by dots
A_20_3F_$ = " !.#$%&'()*+,-./0123456789:;<=>?" ; DQ replaced by dot
A_40_5F_$ = "@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_"
A_60_7F_$ = "`abcdefghijklmnopqrstuvwxyz{|}~." ; DEL replaced by dot
A_80_9F_$ = "€.‚ƒ„…†‡ˆ‰Š‹Œ.Ž..‘’“”•–—.™š›œ.žŸ" ; Small tilde replaced by dot
A_A0_BF_$ = " ¡¢£¤¥¦§¨©ª«¬.®¯°±²³´µ¶·¸¹º»¼½¾¿" ; One character replaced by dot
A_C0_DF_$ = "ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞß"
A_E0_FF_$ = "àáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ"

Debug A_00_1F_$
Debug A_20_3F_$
Debug A_40_5F_$
Debug A_60_7F_$
Debug A_80_9F_$
Debug A_A0_BF_$
Debug A_C0_DF_$
Debug A_E0_FF_$
To the specialists of the Ide open source project:

Unfortunately, I haven't been able to install the execution environment for the Ide yet, and I haven't been able to learn about Github. Can anyone do a test by replacing the procedure AsciiToUTF8() in the file SourceManagement.pb with the following code (including the structure ASCIIARRAY)

Code: Select all

Structure ASCIIARRAY
  a.a[0]
EndStructure

Procedure AsciiToUTF8(*out.ASCIIARRAY, *outlen.LONG, *in.ASCIIARRAY, *inlen.LONG)

  *in_end    = *in + *inlen\l   ; copy to local vars for access speed
  *out_start = *out

  While *in <= *in_end
    If     *in\a < $80  :  *out\a=*in\a                                :  *in+1 : *out+1
    ElseIf *in\a < $A0
      Select *in\a
        Case $80        :  *out\a=$E2 : *out\a[1]=$82 : *out\a[2]=$AC  :  *in+1 : *out+3 ; €
        Case $81        :  *out\a=$C2 : *out\a[1]=*in\a                :  *in+1 : *out+2 ; n.a.
        Case $82        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$9A  :  *in+1 : *out+3 ; ‚
        Case $83        :  *out\a=$C6 : *out\a[1]=$92 :                :  *in+1 : *out+2 ; ƒ
        Case $84        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$9E  :  *in+1 : *out+3 ; „
        Case $85        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$A6  :  *in+1 : *out+3 ; …
        Case $86        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$A0  :  *in+1 : *out+3 ; †
        Case $87        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$A1  :  *in+1 : *out+3 ; ‡
        Case $88        :  *out\a=$CB : *out\a[1]=$86 :                :  *in+1 : *out+2 ; ˆ
        Case $89        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$B0  :  *in+1 : *out+3 ; ‰
        Case $8A        :  *out\a=$C5 : *out\a[1]=$A0 :                :  *in+1 : *out+2 ; Š
        Case $8B        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$B9  :  *in+1 : *out+3 ; ‹
        Case $8C        :  *out\a=$C5 : *out\a[1]=$92 :                :  *in+1 : *out+2 ; Œ
        Case $8D        :  *out\a=$C2 : *out\a[1]=*in\a                :  *in+1 : *out+2 ; n.a.
        Case $8E        :  *out\a=$C5 : *out\a[1]=$BD :                :  *in+1 : *out+2 ; Ž
        Case $8F        :  *out\a=$C2 : *out\a[1]=*in\a                :  *in+1 : *out+2 ; n.a.
        Case $90        :  *out\a=$C2 : *out\a[1]=*in\a                :  *in+1 : *out+2 ; n.a.
        Case $91        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$98  :  *in+1 : *out+3 ; ‘
        Case $92        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$99  :  *in+1 : *out+3 ; ’
        Case $93        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$9C  :  *in+1 : *out+3 ; “
        Case $94        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$9D  :  *in+1 : *out+3 ; ”
        Case $95        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$A2  :  *in+1 : *out+3 ; •
        Case $96        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$93  :  *in+1 : *out+3 ; –
        Case $97        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$94  :  *in+1 : *out+3 ; —
        Case $98        :  *out\a=$CB : *out\a[1]=$9C :                :  *in+1 : *out+2 ; ˜
        Case $99        :  *out\a=$E2 : *out\a[1]=$84 : *out\a[2]=$A2  :  *in+1 : *out+3 ; ™
        Case $9A        :  *out\a=$C5 : *out\a[1]=$A1 :                :  *in+1 : *out+2 ; š
        Case $9B        :  *out\a=$E2 : *out\a[1]=$80 : *out\a[2]=$BE  :  *in+1 : *out+3 ; ›
        Case $9C        :  *out\a=$C5 : *out\a[1]=$93 :                :  *in+1 : *out+2 ; œ
        Case $9D        :  *out\a=$C2 : *out\a[1]=*in\a                :  *in+1 : *out+2 ; n.a.
        Case $9E        :  *out\a=$C5 : *out\a[1]=$BE :                :  *in+1 : *out+2 ; ž
        Case $9F        :  *out\a=$C5 : *out\a[1]=$B8 :                :  *in+1 : *out+2 ; Ÿ
      EndSelect
    ElseIf *in\a < $C0  :  *out\a=$C2 : *out\a[1]=*in\a                :  *in+1 : *out+2
    Else                :  *out\a=$C3 : *out\a[1]=*in\a - 64           :  *in+1 : *out+2
    EndIf
  Wend

  *outlen\l = *out - *out_start

EndProcedure

This works only in one direction. If this works, I will also have a look at the reverse functions.
sorry for my bad english
User_Russian
Addict
Addict
Posts: 1443
Joined: Wed Nov 12, 2008 5:01 pm
Location: Russia

Re: Ide - changing file format - Ide open source project

Post by User_Russian »

Josh wrote:As far as I remember, some Pb versions ago a bug was fixed that occurred when changing the file format from plain-text to Utf8 or vice versa.
Is this a joke? Nothing is fixed.
Here is the text in UTF-8.

Image

If change the encoding to ASCII, this is what happens.

Image

Therefore, when changing the encoding, you have to copy there is text on the clipboard, change the encoding and paste the text into the editor.
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Ide - changing file format - Ide open source project

Post by Josh »

In my first thread, I was speaking about converting from Ascii to Utf8. This should always be possible.

As I have just seen, the code table differs depending on the set language:
English codepage
German codepage

Let's give the German language a try first. If that works, we can see further.
sorry for my bad english
User_Russian
Addict
Addict
Posts: 1443
Joined: Wed Nov 12, 2008 5:01 pm
Location: Russia

Re: Ide - changing file format - Ide open source project

Post by User_Russian »

Josh wrote:I was speaking about converting from Ascii to Utf8. This should always be possible.
Ascii

Image

In IDE I changed the encoding to UTF-8.
Result.

Image
User avatar
kenmo
Addict
Addict
Posts: 1967
Joined: Tue Dec 23, 2003 3:54 am

Re: Ide - changing file format - Ide open source project

Post by kenmo »

Disclaimer: I haven't investigated the IDE code yet...

But: I think this is the Scintilla behavior when you change the Scintilla's encoding in-place instead of changing the encoding then replacing the text.

For example, in Notepad++ (which has like 100 billion users :D ) the same thing occurs.
Create a new file with the encoding "ANSI" (what PB calls "plain text")
Paste in your example code.
Change the encoding to "UTF-8" and some characters become garbled.

The easy Notepad++ solution is Copy All, then change the encoding, then paste it all back in. The characters are correct and the encoding is really changed now.

The same workaround works in the PB IDE!


All that being said, I think both PB and Notepad++ should handle this automatically! Change the encoding, but also correct all characters. Internally I think this would be GetAllText --> SetEncoding --> SetAllText
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Ide - changing file format - Ide open source project

Post by Josh »

kenmo wrote:For example, in Notepad++ (which has like 100 billion users :D ) the same thing occurs.
Create a new file with the encoding "ANSI" (what PB calls "plain text")
Paste in your example code.
Change the encoding to "UTF-8" and some characters become garbled.
No, in Notepad++ it works fine, no matter how often I convert the text back and forth. It always shows the correct code. In Notepad++ you have to use the lower menu items 'Convert to ...' in the 'Encoding' menu.

But maybe I thought too short. I used the Ascii code table 1252, which corresponds to the code table in the German Pb help.
sorry for my bad english
User avatar
gurj
Enthusiast
Enthusiast
Posts: 664
Joined: Thu Jan 22, 2009 3:48 am
Location: china
Contact:

Re: Ide - changing file format - Ide open source project

Post by gurj »

CutAllText --> SetCodeFormat --> Paste
my pb for chinese:
http://ataorj.ys168.com
User avatar
kenmo
Addict
Addict
Posts: 1967
Joined: Tue Dec 23, 2003 3:54 am

Re: Ide - changing file format - Ide open source project

Post by kenmo »

Josh wrote:No, in Notepad++ it works fine, no matter how often I convert the text back and forth. It always shows the correct code. In Notepad++ you have to use the lower menu items 'Convert to ...' in the 'Encoding' menu.
Oops, I never paid attention to separate "Encode" and "Convert" actions! So "Convert" is probably doing exactly what I said, grab the text, change encoding, restore the text.

If you can confirm that copy > encoding > paste works in the PB IDE for you, I can implement that in the open source IDE and submit it.
User_Russian
Addict
Addict
Posts: 1443
Joined: Wed Nov 12, 2008 5:01 pm
Location: Russia

Re: Ide - changing file format - Ide open source project

Post by User_Russian »

kenmo wrote:If you can confirm that copy > encoding > paste works in the PB IDE
Yes it works.
Post Reply