ASCII Conversion Bug in Windows Unicode Version

swhite · Post by **swhite** » Thu Aug 30, 2018 12:54 am

Hi

This code shows that ASCII characters between 128-159 are changes to 63 in the Windows version of PB but works perfectly in the Linux version. It seems to me whether characters are defined or not the ASCII value put into the memory should not be altered. I have equipment that sends text strings that can contains these values which represents the status of the equipment so having the value changed means I cannot accurately process the data in the Windows version. I will now have a Windows and a Linux version of this particular routine to handle the data properly.

Code: Select all

*txBuffer=AllocateMemory(5)
For ln.i=1 To 255
   lcTxt.s = Chr(ln)
   PokeS(*txBuffer,lcTxt,StringByteLength(lcTxt,#PB_Ascii),#PB_Ascii)
   If PeekA(*txBuffer) <> ln
      Debug Str(ln)+" "+Str(PeekA(*txBuffer))
   EndIf
Next
FreeMemory(*txBuffer)

Simon

infratec · Post by **infratec** » Thu Aug 30, 2018 9:03 am

That the behaviour is different on different OSs is definately a no go.

For Chr(): maybe an additional format flag is required. (#PB_Ascii, #PB_UFT8 ...)

Little John · Post by **Little John** » Thu Aug 30, 2018 9:54 am

infratec wrote:That the behaviour is different on different OSs is definately a no go.

I agree.

And changing values 128-159 to 63 doesn't make much sense either. Why changing them to 63 and not to, say 27 or 95?

I think the best is to let those values untouched, so that the above example code by swhite will never display a Debug message.

mk-soft · Post by **mk-soft** » Thu Aug 30, 2018 11:28 am

On Mac and Linux works with unknown chars fine.
On Window replace to char 63

I don't know is this a bug, but is not good...

Code: Select all

Structure aArray
  a.a[0]
EndStructure

Define s1.s
Define *mem.aArray
Define *mem2.aArray = AllocateMemory(256)
Define i.i

For i = 32 To 254
  s1 + Chr(i)
Next

Debug s1
*mem = Ascii(s1)
PokeS(*mem2, s1, -1, #PB_Ascii)

For i = 0 To 222
  Debug "ASCII = " + Str(*mem\a[i]) + "  POKE = " + Str(*mem2\a[i]) + "  STRING = " + Asc(Mid(s1, i+1))
Next

Josh · Post by **Josh** » Thu Aug 30, 2018 12:21 pm

I don't think, it's a bug.

In line 3 you try to get a character that does not exist in Unicode (your program is running in Unicode). In Unicode the characters 128 - 159 are control characters and therefore there is no associated character for display.

See here

#NULL · Post by **#NULL** » Thu Aug 30, 2018 12:59 pm

I'm with Josh.
ASCII is in the range 0..127 (7 bit). Above is Extended ASCII which doesn't map to Unicode.

kenmo · Post by **kenmo** » Thu Aug 30, 2018 1:58 pm

63 is '?' which is used as a replacement for unmappable characters.

PB is probably calling Windows API for the conversion, which gives the same result:

Code: Select all

For i = 1 To 255
  Uni.s = Chr(i)
  AsciiByte.a
  WideCharToMultiByte_(#CP_ACP, 0, @Uni, 1, @AsciiByte, 1, @"?", #Null)
  
  If (AsciiByte <> i)
    Debug Str(i) + "  " + Str(AsciiByte)
  EndIf
Next i

These don't map because most Ascii (actually ANSI) characters in the 128-159 range map to Unicode characters > 255.

Example: Copying Unicode character 145 to Ascii character 145 is NOT the right thing to do! Ascii char 145 actually pairs with Unicode 8216.

Code: Select all

A.a = 145
Debug Asc(PeekS(@A, 1, #PB_Ascii))

Confirmed in Windows-1252 table here (quick Google result): http://www.alanwood.net/demos/ansi.html

kenmo · Post by **kenmo** » Thu Aug 30, 2018 2:14 pm

Or, looking at it from the other direction...

You have Unicode strings in the 128-159 ($80-$9F) char range, which are "C1 Control Characters"
https://en.wikipedia.org/wiki/Latin-1_S ... ode_block)

When you Poke these to Ascii, there are no "C1 Control Characters" defined in the Ascii table, so they are replaced with '?'

PB+Windows sees the problem and replaces them with question marks.

Maybe Linux+Mac see this and just pass the values, despite this changing their meaning...

Anyway, this does what you want:

Code: Select all

Procedure.i PokeAsciiString(*Memory, Text.s, Length.i = -1, Flags.i = #Null)
  Protected Result.i = 0
  Protected *In.CHARACTER = @Text
  Protected *Out.ASCII = *Memory
  While (*In\c And ((Length = -1) Or (Result < Length)))
    *Out\a = *In\c
    Result + 1
    *Out + SizeOf(ASCII)
    *In  + SizeOf(CHARACTER)
  Wend
  If (Not (Flags & #PB_String_NoZero))
    *Out\a = #NUL
  EndIf
  ProcedureReturn (Result)
EndProcedure

EDIT: Oh I see a lot of this was already discussed here
viewtopic.php?f=13&t=71305

swhite · Post by **swhite** » Thu Aug 30, 2018 2:22 pm

Hi

I had no idea that the Unicode table had these mapping issues in the range 128 - 159. I can accept the idea that the Unicode value maybe different from what I expected but I was specifically asking for an ASCII string so it seems to me that it should use the ASCII values regardless of the Unicode mappings because the PokeS() function had the type set to ASCII not Unicode and that should also apply to the new Ascii() function.

Now I am aware of the issue I can code around it but that the Windows version differs from both the Linux & Mac is a problem.

Thanks,
Simon

Olliv · Post by **Olliv** » Sun Sep 02, 2018 12:30 am

This subject is good in Feature request and wishlist.

The ASCII characters available here (code page 437) :

Code: Select all

;***********************************************************************************************************************************************
Global CNV.S
CNV = "00c700fc00e900e200e400e000e500e700ea00eb00e800ef00ee00ec00c400c5"
CNV + "00c900e600c600f400f600f200fb00f900ff00d600dc00a200a300a520a70192"
CNV + "00e100ed00f300fa00f100d100aa00ba00bf231000ac00bd00bc00a100ab00bb"
CNV + "259125922593250225242561256225562555256325512557255d255c255b2510"
CNV + "25142534252c251c2500253c255e255f255a25542569256625602550256c2567"
CNV + "2568256425652559255825522553256b256a2518250c25882584258c25902580"
CNV + "03b100df039303c003a303c300b503c403a6039803a903b4221e03c603b52229"
CNV + "226100b1226522642320232100f7224800b0221900b7221a207f00b225a000a0"

Procedure.S PeekANSI(*A, Length = 1)
        Define.S String
        Define I, Code
        If Length
                *X = AllocateMemory(Length * 2)
                For I = 1 To Length
                        Code = PeekA(*A + (I - 1) )
                        If Code > 128
                                Part.S = Mid(CNV, ((Code - 128) * 4) + 1, 4)
                                Code = Val("$" + Part)
                        EndIf
                        PokeU(*X + (2 * (I - 1) ), Code)
                Next
                String = PeekS(*X, Length)
                FreeMemory(*X)               
        EndIf
        ProcedureReturn String
EndProcedure

swhite · Post by **swhite** » Thu Sep 27, 2018 9:19 pm

Hi

I discovered that the problem only occurs when you include the #PB_Ascii flag. The following works correctly PB 5.62.

Code: Select all

*txBuffer=AllocateMemory(5)
For ln.i=1 To 255
   lcTxt.s = Chr(ln)
   PokeS(*txBuffer,lcTxt,StringByteLength(lcTxt,#PB_Ascii))
   If PeekA(*txBuffer) <> ln
      Debug Str(ln)+" "+Str(PeekA(*txBuffer))
   EndIf
Next
FreeMemory(*txBuffer)

wilbert · Post by **wilbert** » Fri Sep 28, 2018 5:39 am

swhite wrote:I discovered that the problem only occurs when you include the #PB_Ascii flag. The following works correctly PB 5.62.
Code: Select all
*txBuffer=AllocateMemory(5)
For ln.i=1 To 255
   lcTxt.s = Chr(ln)
   PokeS(*txBuffer,lcTxt,StringByteLength(lcTxt,#PB_Ascii))
   If PeekA(*txBuffer) <> ln
      Debug Str(ln)+" "+Str(PeekA(*txBuffer))
   EndIf
Next
FreeMemory(*txBuffer)

You aren't poking an ascii string.
What you are doing is poking a one character unicode string and you are reading back only the lowest 8 bits of the 16 bit.
There is no conversion in this code.

Josh · Post by **Josh** » Fri Sep 28, 2018 6:31 am

swhite wrote:I discovered that the problem only occurs when you include the #PB_Ascii flag. The following works correctly PB 5.62.

I just don't understand what you're trying to prove with your Peeks, Pokes and the nonsensically used StringByteLength.

The only problem that exists is that you want to use something that just doesn't exist. In Unicode the characters 128 - 159 are control characters and will always remain control characters, no matter if you try to prove something else with any constructs.

swhite · Post by **swhite** » Sat Sep 29, 2018 12:47 am

Hi

The reason for the non-nonsensical StringByteLength() is that I wanted to test the code as close to the actual code that uses longer strings where PeekS and PokeS are required. The problem arose because I have a lot of code talking to equipment that expects to receive single byte characters between 0-255. Currently that code is compiled using ASCII mode and everything works fine. Now the same applications require some additional features that need Unicode. So I am in the process of updating the code to run in Unicode mode and hence the problem with characters in the range of 129-159.

As Wilbert mentioned I am only using the the low byte which is all I need. The code demonstrates that the #PB_ASCII flag triggers PB to replace ASCII values with the Unicode values where the low byte is not what I expected. That seems counter-intuitive given that I am asking for the ASCII values not the Unicode equivalents. So I discovered as long as I do not use the #PB_ASCII flag in the PokeS() function I get the result I was expecting.

Simon

PureBasic Forums - English

ASCII Conversion Bug in Windows Unicode Version

ASCII Conversion Bug in Windows Unicode Version

Re: ASCII Conversion Bug in Windows Unicode Version

Re: ASCII Conversion Bug in Windows Unicode Version

Re: ASCII Conversion Bug in Windows Unicode Version

Re: ASCII Conversion Bug in Windows Unicode Version

Re: ASCII Conversion Bug in Windows Unicode Version

Re: ASCII Conversion Bug in Windows Unicode Version

Re: ASCII Conversion Bug in Windows Unicode Version

Re: ASCII Conversion Bug in Windows Unicode Version

Re: ASCII Conversion Bug in Windows Unicode Version

Re: ASCII Conversion Bug in Windows Unicode Version

Re: ASCII Conversion Bug in Windows Unicode Version

Re: ASCII Conversion Bug in Windows Unicode Version

Re: ASCII Conversion Bug in Windows Unicode Version