ASCII Conversion Bug in Windows Unicode Version

Just starting out? Need help? Post your questions and find answers here.
swhite
Enthusiast
Enthusiast
Posts: 727
Joined: Thu May 21, 2009 6:56 pm

ASCII Conversion Bug in Windows Unicode Version

Post by swhite »

Hi

This code shows that ASCII characters between 128-159 are changes to 63 in the Windows version of PB but works perfectly in the Linux version. It seems to me whether characters are defined or not the ASCII value put into the memory should not be altered. I have equipment that sends text strings that can contains these values which represents the status of the equipment so having the value changed means I cannot accurately process the data in the Windows version. I will now have a Windows and a Linux version of this particular routine to handle the data properly.

Code: Select all

*txBuffer=AllocateMemory(5)
For ln.i=1 To 255
   lcTxt.s = Chr(ln)
   PokeS(*txBuffer,lcTxt,StringByteLength(lcTxt,#PB_Ascii),#PB_Ascii)
   If PeekA(*txBuffer) <> ln
      Debug Str(ln)+" "+Str(PeekA(*txBuffer))
   EndIf
Next
FreeMemory(*txBuffer)
Simon
Simon White
dCipher Computing
infratec
Always Here
Always Here
Posts: 6866
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: ASCII Conversion Bug in Windows Unicode Version

Post by infratec »

That the behaviour is different on different OSs is definately a no go.

For Chr(): maybe an additional format flag is required. (#PB_Ascii, #PB_UFT8 ...)
Little John
Addict
Addict
Posts: 4527
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: ASCII Conversion Bug in Windows Unicode Version

Post by Little John »

infratec wrote:That the behaviour is different on different OSs is definately a no go.
I agree.

And changing values 128-159 to 63 doesn't make much sense either. Why changing them to 63 and not to, say 27 or 95? ;-)
I think the best is to let those values untouched, so that the above example code by swhite will never display a Debug message.
User avatar
mk-soft
Always Here
Always Here
Posts: 5387
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: ASCII Conversion Bug in Windows Unicode Version

Post by mk-soft »

On Mac and Linux works with unknown chars fine.
On Window replace to char 63

I don't know is this a bug, but is not good...

Code: Select all

Structure aArray
  a.a[0]
EndStructure

Define s1.s
Define *mem.aArray
Define *mem2.aArray = AllocateMemory(256)
Define i.i

For i = 32 To 254
  s1 + Chr(i)
Next

Debug s1
*mem = Ascii(s1)
PokeS(*mem2, s1, -1, #PB_Ascii)

For i = 0 To 222
  Debug "ASCII = " + Str(*mem\a[i]) + "  POKE = " + Str(*mem2\a[i]) + "  STRING = " + Asc(Mid(s1, i+1))
Next
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: ASCII Conversion Bug in Windows Unicode Version

Post by Josh »

I don't think, it's a bug.

In line 3 you try to get a character that does not exist in Unicode (your program is running in Unicode). In Unicode the characters 128 - 159 are control characters and therefore there is no associated character for display.

See here
sorry for my bad english
#NULL
Addict
Addict
Posts: 1440
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: ASCII Conversion Bug in Windows Unicode Version

Post by #NULL »

I'm with Josh.
ASCII is in the range 0..127 (7 bit). Above is Extended ASCII which doesn't map to Unicode.
User avatar
kenmo
Addict
Addict
Posts: 1967
Joined: Tue Dec 23, 2003 3:54 am

Re: ASCII Conversion Bug in Windows Unicode Version

Post by kenmo »

63 is '?' which is used as a replacement for unmappable characters.


PB is probably calling Windows API for the conversion, which gives the same result:

Code: Select all

For i = 1 To 255
  Uni.s = Chr(i)
  AsciiByte.a
  WideCharToMultiByte_(#CP_ACP, 0, @Uni, 1, @AsciiByte, 1, @"?", #Null)
  
  If (AsciiByte <> i)
    Debug Str(i) + "  " + Str(AsciiByte)
  EndIf
Next i
These don't map because most Ascii (actually ANSI) characters in the 128-159 range map to Unicode characters > 255.


Example: Copying Unicode character 145 to Ascii character 145 is NOT the right thing to do! Ascii char 145 actually pairs with Unicode 8216.

Code: Select all

A.a = 145
Debug Asc(PeekS(@A, 1, #PB_Ascii))
Confirmed in Windows-1252 table here (quick Google result): http://www.alanwood.net/demos/ansi.html
User avatar
kenmo
Addict
Addict
Posts: 1967
Joined: Tue Dec 23, 2003 3:54 am

Re: ASCII Conversion Bug in Windows Unicode Version

Post by kenmo »

Or, looking at it from the other direction...

You have Unicode strings in the 128-159 ($80-$9F) char range, which are "C1 Control Characters"
https://en.wikipedia.org/wiki/Latin-1_S ... ode_block)

When you Poke these to Ascii, there are no "C1 Control Characters" defined in the Ascii table, so they are replaced with '?'

PB+Windows sees the problem and replaces them with question marks.

Maybe Linux+Mac see this and just pass the values, despite this changing their meaning...


Anyway, this does what you want:

Code: Select all

Procedure.i PokeAsciiString(*Memory, Text.s, Length.i = -1, Flags.i = #Null)
  Protected Result.i = 0
  Protected *In.CHARACTER = @Text
  Protected *Out.ASCII = *Memory
  While (*In\c And ((Length = -1) Or (Result < Length)))
    *Out\a = *In\c
    Result + 1
    *Out + SizeOf(ASCII)
    *In  + SizeOf(CHARACTER)
  Wend
  If (Not (Flags & #PB_String_NoZero))
    *Out\a = #NUL
  EndIf
  ProcedureReturn (Result)
EndProcedure


EDIT: Oh I see a lot of this was already discussed here
viewtopic.php?f=13&t=71305
swhite
Enthusiast
Enthusiast
Posts: 727
Joined: Thu May 21, 2009 6:56 pm

Re: ASCII Conversion Bug in Windows Unicode Version

Post by swhite »

Hi

I had no idea that the Unicode table had these mapping issues in the range 128 - 159. I can accept the idea that the Unicode value maybe different from what I expected but I was specifically asking for an ASCII string so it seems to me that it should use the ASCII values regardless of the Unicode mappings because the PokeS() function had the type set to ASCII not Unicode and that should also apply to the new Ascii() function.

Now I am aware of the issue I can code around it but that the Windows version differs from both the Linux & Mac is a problem.

Thanks,
Simon
Simon White
dCipher Computing
User avatar
Olliv
Enthusiast
Enthusiast
Posts: 542
Joined: Tue Sep 22, 2009 10:41 pm

Re: ASCII Conversion Bug in Windows Unicode Version

Post by Olliv »

This subject is good in Feature request and wishlist.

The ASCII characters available here (code page 437) :

Code: Select all

;***********************************************************************************************************************************************
Global CNV.S
CNV = "00c700fc00e900e200e400e000e500e700ea00eb00e800ef00ee00ec00c400c5"
CNV + "00c900e600c600f400f600f200fb00f900ff00d600dc00a200a300a520a70192"
CNV + "00e100ed00f300fa00f100d100aa00ba00bf231000ac00bd00bc00a100ab00bb"
CNV + "259125922593250225242561256225562555256325512557255d255c255b2510"
CNV + "25142534252c251c2500253c255e255f255a25542569256625602550256c2567"
CNV + "2568256425652559255825522553256b256a2518250c25882584258c25902580"
CNV + "03b100df039303c003a303c300b503c403a6039803a903b4221e03c603b52229"
CNV + "226100b1226522642320232100f7224800b0221900b7221a207f00b225a000a0"

Procedure.S PeekANSI(*A, Length = 1)
        Define.S String
        Define I, Code
        If Length
                *X = AllocateMemory(Length * 2)
                For I = 1 To Length
                        Code = PeekA(*A + (I - 1) )
                        If Code > 128
                                Part.S = Mid(CNV, ((Code - 128) * 4) + 1, 4)
                                Code = Val("$" + Part)
                        EndIf
                        PokeU(*X + (2 * (I - 1) ), Code)
                Next
                String = PeekS(*X, Length)
                FreeMemory(*X)               
        EndIf
        ProcedureReturn String
EndProcedure
swhite
Enthusiast
Enthusiast
Posts: 727
Joined: Thu May 21, 2009 6:56 pm

Re: ASCII Conversion Bug in Windows Unicode Version

Post by swhite »

Hi

I discovered that the problem only occurs when you include the #PB_Ascii flag. The following works correctly PB 5.62.

Code: Select all

*txBuffer=AllocateMemory(5)
For ln.i=1 To 255
   lcTxt.s = Chr(ln)
   PokeS(*txBuffer,lcTxt,StringByteLength(lcTxt,#PB_Ascii))
   If PeekA(*txBuffer) <> ln
      Debug Str(ln)+" "+Str(PeekA(*txBuffer))
   EndIf
Next
FreeMemory(*txBuffer)
Simon White
dCipher Computing
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: ASCII Conversion Bug in Windows Unicode Version

Post by wilbert »

swhite wrote:I discovered that the problem only occurs when you include the #PB_Ascii flag. The following works correctly PB 5.62.

Code: Select all

*txBuffer=AllocateMemory(5)
For ln.i=1 To 255
   lcTxt.s = Chr(ln)
   PokeS(*txBuffer,lcTxt,StringByteLength(lcTxt,#PB_Ascii))
   If PeekA(*txBuffer) <> ln
      Debug Str(ln)+" "+Str(PeekA(*txBuffer))
   EndIf
Next
FreeMemory(*txBuffer)
You aren't poking an ascii string.
What you are doing is poking a one character unicode string and you are reading back only the lowest 8 bits of the 16 bit.
There is no conversion in this code.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: ASCII Conversion Bug in Windows Unicode Version

Post by Josh »

swhite wrote:I discovered that the problem only occurs when you include the #PB_Ascii flag. The following works correctly PB 5.62.
I just don't understand what you're trying to prove with your Peeks, Pokes and the nonsensically used StringByteLength.

The only problem that exists is that you want to use something that just doesn't exist. In Unicode the characters 128 - 159 are control characters and will always remain control characters, no matter if you try to prove something else with any constructs.
sorry for my bad english
swhite
Enthusiast
Enthusiast
Posts: 727
Joined: Thu May 21, 2009 6:56 pm

Re: ASCII Conversion Bug in Windows Unicode Version

Post by swhite »

Hi

The reason for the non-nonsensical StringByteLength() is that I wanted to test the code as close to the actual code that uses longer strings where PeekS and PokeS are required. The problem arose because I have a lot of code talking to equipment that expects to receive single byte characters between 0-255. Currently that code is compiled using ASCII mode and everything works fine. Now the same applications require some additional features that need Unicode. So I am in the process of updating the code to run in Unicode mode and hence the problem with characters in the range of 129-159.

As Wilbert mentioned I am only using the the low byte which is all I need. The code demonstrates that the #PB_ASCII flag triggers PB to replace ASCII values with the Unicode values where the low byte is not what I expected. That seems counter-intuitive given that I am asking for the ASCII values not the Unicode equivalents. So I discovered as long as I do not use the #PB_ASCII flag in the PokeS() function I get the result I was expecting.

Simon
Simon White
dCipher Computing
Post Reply