Hello.
In the PB editor, at the "tools" menu, there is a useful option called "Character table", it is extended ASCII only.
Since newer versions of the compiler are unicode only, there would be interesting to implemente a Unicode Character Table.
Wish to the list: PBEditor Character table also in unicode
- Psychophanta
- Addict
- Posts: 4997
- Joined: Wed Jun 11, 2003 9:33 pm
- Location: Lípetsk, Russian Federation
- Contact:
Wish to the list: PBEditor Character table also in unicode
http://www.zeitgeistmovie.com
While world=business:world+mafia:Wend
Will never leave this forum until the absolute bugfree PB
While world=business:world+mafia:Wend
Will never leave this forum until the absolute bugfree PB
Re: Wish to the list: PBEditor Character table also in unicode
The idea is good, but what are the signs ?
All of them ?
All of them ?
地球上の平和
Re: Wish to the list: PBEditor Character table also in unicode
The unicode codepoints are quite extensive and also still in a state of change.
Perhaps a link to the symbols would be better.
http://www.unicode.org/charts/
Perhaps a link to the symbols would be better.
http://www.unicode.org/charts/
Re: Wish to the list: PBEditor Character table also in unicode
PureBasic uses UCS-2 (2 bytes per character) and is therefore limited to the character codes from 0 to 65,535 (see PB help).
But even with this limitation the filling of the list takes some seconds (I tested it with the source code of the PureBasic IDE).
Maybe it would be better if not all characters are displayed at once. At the top of the window we could place several buttons with different character ranges or take a ComboBoxGadget for it, with which we could switch the displayed character ranges in the list.
But even with this limitation the filling of the list takes some seconds (I tested it with the source code of the PureBasic IDE).
Maybe it would be better if not all characters are displayed at once. At the top of the window we could place several buttons with different character ranges or take a ComboBoxGadget for it, with which we could switch the displayed character ranges in the list.
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
Re: Wish to the list: PBEditor Character table also in unicode
It just doesn't work, there are way too many.Demivec wrote: ↑Sun Apr 25, 2021 4:33 pm The unicode codepoints are quite extensive and also still in a state of change.
Perhaps a link to the symbols would be better.
http://www.unicode.org/charts/
The many foreign PB users must then also be supported.
This website from @Demivec is so far the best I have seen.
http://www.columbia.edu/kermit/ucs2.html
地球上の平和
Re: Wish to the list: PBEditor Character table also in unicode
@Sicro: PureBasic says it uses UCS-2 internally but I think that is a bit fiddly. I think the truth is that all of its string functions like Mid() , LSet() and so on simply operate on codepoints as if they were all two bytes long. Many functions that utilize strings actually make use of UTF-16. UTF-16 allows all of the Unicode codepoints to be written using either two or four bytes with a surrogate mechanism.Sicro wrote: ↑Sun Apr 25, 2021 5:59 pm PureBasic uses UCS-2 (2 bytes per character) and is therefore limited to the character codes from 0 to 65,535 (see PB help).
But even with this limitation the filling of the list takes some seconds (I tested it with the source code of the PureBasic IDE).
Maybe it would be better if not all characters are displayed at once. At the top of the window we could place several buttons with different character ranges or take a ComboBoxGadget for it, with which we could switch the displayed character ranges in the list.
Here is a demonstration:
Code: Select all
Procedure handleError(value, text.s)
If Not value
MessageRequester("Error", text)
End
EndIf
EndProcedure
Procedure.s _Chr(v.i) ;return a proper surrogate pair for unicode values outside the BMP (Basic Multilingual Plane)
Protected high, low
If v < $10000
ProcedureReturn Chr(v)
Else
;calculate surrogate pair of unicode codepoints to represent value in UTF-16
v - $10000
high = v / $400 + $D800 ;high/lead surrogate value
low = v % $400 + $DC00 ;low/tail surrogate value
ProcedureReturn Chr(high) + Chr(low)
EndIf
EndProcedure
#imageWidth = 310
#imageHeight = 310
handleError(LoadFont(0, "Courier", 200), "Can't load font.")
handleError(CreateImage(0, #imageWidth, #imageHeight), "Can't to create image.")
If StartDrawing(ImageOutput(0))
DrawingFont(FontID(0))
a$ = _Chr($1F600)
DrawText(0, 0, a$)
StopDrawing()
EndIf
handleError(OpenWindow(0, 0, 0, #imageWidth, #imageHeight + 20, a$ + "Unicode Test" + a$), "Can't open window.")
ImageGadget(0, 0, 0, 0, 0, ImageID(0))
TextGadget(1, 5, #imageHeight, #imageWidth, 20, ReplaceString(Space(25), " ", a$))
Repeat: Until WaitWindowEvent() = #PB_Event_CloseWindow
- If you see a smiling emoji "" after running the above code you can see that UTF-16 is being used by the DrawText() function and not UCS-2.
- If you see a line of smiling emoji in the TextGadget then you can see that UTF-16 is being used by the TextGadget() and not UCS-2.
- If you see a smiling emoji at the beginning and end of the Window's title then you can see that UTF-16 is being used by the OpenWindow() function and not UCS-2.
- If you see a smiling emoji in the debug window while debugging than you can see that the Debug command is using UTF-16 and that the font you are using in the Debug window also has a character for that codepoint.
As far as a chart of unicode or even only UCS-2 codepoints (and characters) goes, the number is very large and it wouldn't really make much sense to put that much info in picture form into the Help file. Also, as stated earlier the codepoint definitions are still in a process of change. UCS-2 is updated to keep it synchronized to changes in the BMP (Basic Multilingual Plane) of unicode. You'll notice that the chart that Saki linked to has many visible characters with a description of '(unknown)' which shows that the chart is not up-to-date and the website it was posted on was last updated in 2011 (by my guess). One example is codepoint 0220 ('Ƞ '). Codepoint 0220 has a description of 'LATIN CAPITAL LETTER N WITH LONG RIGHT LEG' in the unicode charts available from the link I posted.
I don't think buttons would work very well to select portions of the codepoint range to display simply because it is such a large range.
Note: I verified that the forum update now allows Unicode characters outside the BMP to be posted in messages. The Smiley emoticon in this message is the test case. Here are a few more 🀁🀂🀃🀢🀣🀤🀥🀦🀧🀨🀩(mahjong tiles).
Re: Wish to the list: PBEditor Character table also in unicode
Yes, the functions that display or draw strings interpret the UCS-2 string as UTF-16 (which is an extension of UCS-2). But it is actually the OS API functions that do that, not the PB functions.Demivec wrote: ↑Thu Apr 29, 2021 2:44 am @Sicro: PureBasic says it uses UCS-2 internally but I think that is a bit fiddly. I think the truth is that all of its string functions like Mid() , LSet() and so on simply operate on codepoints as if they were all two bytes long. Many functions that utilize strings actually make use of UTF-16. UTF-16 allows all of the Unicode codepoints to be written using either two or four bytes with a surrogate mechanism.
Here is a demonstration:
[...]
But ok, since UTF-16 can be displayed and drawn and the PB string functions do not destroy the other UTF-16 characters in the UCS-2 string, it can be seen that PB supports UTF-16 - even if you have to take into account that then
Code: Select all
Len(one character string)
I didn't know about the surrogate mechanism thing, thanks. I don't deal much with the different Unicode encodings.
Ok, then I also think it would be better if a link to an always-up-to-date web page is inserted at the bottom of the characters table window.
That's cool. Probably the old forum did not use
Code: Select all
<meta charset="utf-8">
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version