Removing 'ASCII' switch from PureBasic

wilbert · Post by **wilbert** » Fri Aug 08, 2014 6:31 am

User_Russian wrote:Unicode strings in PB, much slower than the ASCII

It doesn't have to stay this way.
When there's only one version, there might be more room for the PB team to optimize the PB string system. I don't see why it would have to be much slower.

RichAlgeni wrote:If we could get SSE42 based string functions as Wilbert suggested

I suggested SSE2 since that is part of the x86-64 specification and guaranteed to work on all x86-64 processors

juror wrote:PB 5.22 LTS expires in mid 2015

When support expires that doesn't mean you can't use it anymore.
If 5.2x LTS is stable enough for someone to release a commercial application at this time, why wouldn't it be suitable anymore after support ends.
You could still keep the 5.2x LTS compiler for existing projects requiring ascii only and use a newer PB version for new projects.

Little John · Post by **Little John** » Fri Aug 08, 2014 6:45 am

wilbert wrote:When support expires that doesn't mean you can't use it anymore.
If 5.2x LTS is stable enough for someone to release a commercial application at this time, why wouldn't it be suitable anymore after support ends.

When we write new code, there is always the possibility that we encounter new bugs.
So I can understand, that especially for writing commercial programs, people want to use a PB version which is actively supported (read: gets bug fixes from time to time).

Lebostein · Post by **Lebostein** » Fri Aug 08, 2014 6:52 am

All string based things are slower with UNICODE. All Map() based things for example:

Code: Select all

NewMap test.l()

; Prepare keys
#count = 200000
Dim key$(#count)
For i = 0 To #count
  key$(i) = "Testing Key " + Str(i)
Next i

a1 = ElapsedMilliseconds()
For i = 0 To #count
  AddMapElement(test(), key$(i))
Next i
a2 = ElapsedMilliseconds()
For i = 0 To #count
  FindMapElement(test(),key$(i))
Next i
a3 = ElapsedMilliseconds()

text$ = "Add: " + Str(a2-a1) + #CRLF$ + "Find: " + Str(a3-a2)
SetClipboardText(text$)
MessageRequester("Results", text$)

UNICODE
Add: 2815
Find: 2655

ASCII
Add: 2250
Find: 2139

If you remove the word "highly" from "compiler which creates highly optimized executables" from the PB homepage, then I agree with this step (reluctantly)

Samuel · Post by **Samuel** » Fri Aug 08, 2014 7:36 am

wilbert wrote: When support expires that doesn't mean you can't use it anymore.
If 5.2x LTS is stable enough for someone to release a commercial application at this time, why wouldn't it be suitable anymore after support ends.
You could still keep the 5.2x LTS compiler for existing projects requiring ascii only and use a newer PB version for new projects.

You're exactly right.

If people are having issues with this. I wonder what will happen when Fred drops 32 bit?
Is everyone going to go grab their pitchforks and torches?

Deluxe0321 · Post by **Deluxe0321** » Fri Aug 08, 2014 8:30 am

Let's make a deal then;

If Fred fixes the speed related issues in the string library I would fully support the transition.

In Addition: Of course that would mean that he implements an easy way to output content in ascii too - by Memory (ToAscii()?) or by any other way.
What happens PB internally is not that problem, just make sure we won't loose speed.

Thank you!

luis · Post by **luis** » Fri Aug 08, 2014 11:13 am

Deluxe0321 wrote: What happens PB internally is not that problem, just make sure we won't loose speed.

UCS-2 data is twice as large and must be moved around in memory.

EDIT: btw this may be cheated through using a different instructions set instead of 386+fpu only since you can move more data at once then, but that would improve speed everywhere and not only for unicode strings operations.
Also I don't know if the PB string library is written in C or not (I suppose yes). If it is could be sufficient to get a boost to enable sse / sse2 in the C compiler (as someone already said I think). The optimum would be for PB to generate sse/sse2 instructions for our code too.

codeprof · Post by **codeprof** » Fri Aug 08, 2014 11:19 am

freak wrote:1) Support for the 3 OS
2) Support for ascii/unicode
3) Support for quirks of specific OS versions within the same OS type (largely fixes for glitches in specific Windows versions)
4) Support for threaded programs
5) Support for 32bit/64bit

This sounds a bit strange to me. Supporting a whole processor architecture is less effort than supporting ascii strings?

Personally i need the UCase()/LCase() commands really often when I work with strings. However these are extremly much slower.

Code: Select all

DisableDebugger

Str.s
#Text = "1234567890"

Time = ElapsedMilliseconds()

For i=1 To 10000
  Str = UCase(Str+#Text)
Next i

MessageRequester("", StrF((ElapsedMilliseconds()-Time)/1000, 3))

;Results:  (Tested with Linux 64Bit)
;0.7s ASCII
;5.5s Unicode

luis · Post by **luis** » Fri Aug 08, 2014 11:28 am

codeprof wrote: This sounds a bit strange to me. Supporting a whole processor architecture is less effort than supporting ascii strings?

After you wrote it. Sound reasonable to me.
Theoretically the code generation stays always the same if not bugged, while support libraries changes and grows.
The changes and additions to the libraries must support ascii and unicode (case in point) and must be hand-tailored every time for that.
The code generation step does not care about all this an stays the same.

heartbone · Post by **heartbone** » Fri Aug 08, 2014 11:32 am

juror wrote:And if you don't understand Rescator you need to understand Dunning-Kruger.

Thank you for that perspective juror.
I was feeling somewhat stupid and obsolete for considering text to be a string of characters.

Message body:
Enter your message here, it may contain no more than 60000 characters.

Now I won't need to ponder the true meaning of that instruction.

wilbert · Post by **wilbert** » Fri Aug 08, 2014 12:34 pm

codeprof wrote:Personally i need the UCase()/LCase() commands really often when I work with strings. However these are extremly much slower.

They are a bit slower but not that much. What is slow in your code is combining strings.
Try this to compare UCase only

Code: Select all

DisableDebugger

Str.s
#Text = "1234567890"

For i=1 To 1000
  Str + #Text
Next i

Time = ElapsedMilliseconds()

For i=1 To 10000
  UCase(Str)
Next i

MessageRequester("", StrF((ElapsedMilliseconds()-Time)/1000, 3))

The speed of string handling could probably be increased a lot if PB would cache the length of strings.

juror · Post by **juror** » Fri Aug 08, 2014 1:10 pm

wilbert wrote:
juror wrote:PB 5.22 LTS expires in mid 2015
When support expires that doesn't mean you can't use it anymore.
If 5.2x LTS is stable enough for someone to release a commercial application at this time, why wouldn't it be suitable anymore after support ends.
You could still keep the 5.2x LTS compiler for existing projects requiring ascii only and use a newer PB version for new projects.

That is certainly a possibility - unless you are a small vendor (as we are) and have contractual agreements with larger customers, especially govt agencies, who in the interests of removing all liability from themselves, have a contract provision which states to the effect "any/all software/utilities/products provided to (customer) by HTC (us) are warranted to have been developed and maintained using fully licensed and supported hardware and software. Furthermore, HTC warrants continuing support of all HTC provided products throughout the licensing period."

The exact terminology will vary and is not present in all contracts, but you get the idea. They want to protect themselves from any liability from 1) us using illegal software in our development environment and/or on hardware which isn't ours/theirs (either of which they feel could make them partially liable) and 2) assure our provided products are and will be maintained. This forces us to make the conversion to unicode while 5.22 LTS is still supported. We can do it, but it would be nicer to have more of a cushion, e.g. another ascii LTS.

Sure, we can push back, but that may well mean we do not get the contract and frankly, we can't afford to lose contracts. Selling to agencies has become the lifeblood of our company. We could not make it on individual sales to end users.

DK_PETER · Post by **DK_PETER** » Fri Aug 08, 2014 4:24 pm

@Juror

That is certainly a possibility - unless you are a small vendor (as we are) and have contractual agreements with larger customers, especially govt agencies, who in the interests of removing all liability from themselves, have a contract provision which states to the effect "any/all software/utilities/products provided to (customer) by HTC (us) are warranted to have been developed and maintained using fully licensed and supported hardware and software. Furthermore, HTC warrants continuing support of all HTC provided products throughout the licensing period."

Those conditions are in my view completely intolerable.
If you can satisfy their needs using an obsolete version of PB to their satisfaction - they should have a problem..???
Do they require, that you show receipts for your PB purchase and examine your version of PB?
This is pure madness. Under no circumstance would I agree to such terms. If I can provide the services they need to the required standards
by using cobolt, pascal or Casper Fudd's minor league programming language, then the demands are met.
If this is truly the terms, then I would recommennd, that you switch to unicode as asap.

IdeasVacuum · Post by **IdeasVacuum** » Fri Aug 08, 2014 4:37 pm

we have have not had 1 request for Unicode

Same here - but isn't that because the customers do not program themselves? What I have got are customers that want their app to support a multitude of different languages (inc. Chinese Simplified/Traditional) and using Unicode has made that easy to develop and test.

graph100 · Post by **graph100** » Fri Aug 08, 2014 4:54 pm

I took the code of codeprof and made some tests :

Code: Select all

DisableDebugger

Str.s
#Text = "1234567890"

Time = ElapsedMilliseconds()

For i=1 To 10000
	Str = UCase(Str+#Text)
Next i

time = ElapsedMilliseconds()-Time                      

MessageRequester("", StrF(time/1000, 3))

;Results:  (Tested With Linux 64Bit)
;5.5s Unicode
;0.7s ASCII


;Windows 8 x64, PB 5.21 x64
; 3.38 - 3.47 unicode
; 3.26 - 3.21 ascii

;Windows 8 x64, PB 5.22 x86
; 3.56 - 3.50 unicode
; 3.13 - 3.10 ascii

;Windows 8 x64, PB 5.30 x86
; 3.55 - 3.45 unicode
; 3.37 - 3.39 ascii

;Mandriva 2010.2 x86, PB 5.22 x86
; 5.38 - 5.22 unicode
; 0.52 - 0.50 ascii

;MacOS X, PB 5.21 x86
; 3.72 - 3.62 unicode
; 3.25 - 3.13 ascii

-> We can see that ascii is always faster than unicode.
-> Then, on my windows and mac, ascii or unicode are really close.. like 5% slower for unicode.
-> on linux, (mine x86 and his x64), unicode is much much slower than ascii, around 10 times slower

the difference on linux must come from some really optimized routines for ascii.
Only the dev can know from where it come.

juror · Post by **juror** » Fri Aug 08, 2014 9:16 pm

DK_PETER wrote:@Juror
Those conditions are in my view completely intolerable.

Actually, they're not that unusual for government agencies in this country. I've worked for large companies where the contracts with small vendors were even worse.

DK_PETER wrote: If you can satisfy their needs using an obsolete version of PB to their satisfaction - they should have a problem..???

Most of these types of contracts have grown over the years to accommodate each/every eventuality. Some agency may have been burned sometime using unsupported software. As I said, I've seen worse than these. When I worked in the pharmaceutical industry, our contracts were much worse than these, largely because we had to cover every eventuality from an FDA regulatory view. From what I've read in the forums, frequent contributor DoubleDutch may work in an environment even more restrictive than ours.

DK_PETER wrote:Do they require, that you show receipts for your PB purchase and examine your version of PB?

They haven't yet, but our records are audit-able by them anytime they demand.

DK_PETER wrote:This is pure madness. Under no circumstance would I agree to such terms.

You're obviously much more successful than we are. We don't love the terms, but we can live with them in order to get the business. As I said, without their contract business, we don't remain a viable business. Individual end-user sales are not sufficient. We're actually phasing out our end-user sales in order to concentrate on securing additional contracts. It's possible that since many small suppliers feel like you do, and will not agree to their terms, we are at an advantage because we will.

Sorry. I'm beginning to stray far too "off-topic". We will accommodate the unicode change. It would just be nicer if we had a little more time. And it's not an option for us to "just use an unsupported version".

PureBasic Forums - English

Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic

Re: Removing 'ASCII' switch from PureBasic