ReadString slowness...

Everything else that doesn't fall into one of the other PB categories.
Rinzwind
Enthusiast
Enthusiast
Posts: 636
Joined: Wed Mar 11, 2009 4:06 pm
Location: NL

ReadString slowness...

Post by Rinzwind »

This can't be intended behaviour?

Code: Select all

EnableExplicit

Procedure.s LoadFile1(Filename.s) ;SLOW!
  Protected r.s
  Protected f = ReadFile(#PB_Any, Filename, #PB_UTF8)
  If f
    r = ReadString(f, #PB_File_IgnoreEOL)
    CloseFile(f)
    ProcedureReturn r
  EndIf
EndProcedure

Procedure.s LoadFile2(Filename.s)
  Protected *p, r.s
  Protected f = ReadFile(#PB_Any, Filename)
  If f
    *p = AllocateMemory(Lof(f), #PB_Memory_NoClear)
    ReadData(f, *p, Lof(f))
    CloseFile(f)    
    r = PeekS(*p, -1, #PB_UTF8)
    FreeMemory(*p)
    ProcedureReturn r
  EndIf
EndProcedure

Define t1, r1, r2, r.s
t1 = ElapsedMilliseconds()
r = LoadFile1("c:\test\test1.html")
r1 = ElapsedMilliseconds() - t1


r = ""
t1 = ElapsedMilliseconds()
r = LoadFile2("c:\test\test1.html")
r2 = ElapsedMilliseconds() - t1

MessageRequester("", Str(r1) + #TAB$ + Str(r2))


File is a HTML frontpage of some site.
---------------------------

---------------------------
295 17
---------------------------
OK
---------------------------

If anything, you would expect ReadString to be faster, since no PeekS and freeing needed, but it's 17 times slower...

Seems #PB_File_IgnoreEOL doesn't make it any faster than reading lines one by one.

Also weird, seems that specifying the size with PeekS is slower than just passing -1.

// Moved from "Bugs - Windows". Slowness is not a bug. (Kiffi)
Last edited by Rinzwind on Wed Sep 29, 2021 3:38 am, edited 1 time in total.
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: ReadString slowness...

Post by BarryG »

We can't confirm it's a bug without having access to the HTML file to see. For the record, I use ReadString() with #PB_File_IgnoreEOL on large text files and it's pretty much instant.
Rinzwind
Enthusiast
Enthusiast
Posts: 636
Joined: Wed Mar 11, 2009 4:06 pm
Location: NL

Re: ReadString slowness...

Post by Rinzwind »

Just go to any website like whatever and save as html file. To make good use of utf-8 let's say https://www.thairath.co.th/home
(around 2.6 MB file)

277ms vs 18ms
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: ReadString slowness...

Post by BarryG »

Okay, you're right... but is slow speed a bug, or a feature request? ReadString() and ReadData() probably work very differently behind the scenes.
Fred
Administrator
Administrator
Posts: 16618
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: ReadString slowness...

Post by Fred »

Of course, you can't compare a raw read without doing anything, against a small read which create a new string buffer, parse every bytes to detect the end of line etc. ReadString() uses an internal cache which makes it much faster that it was before (pre-4.00 IIRC).
#NULL
Addict
Addict
Posts: 1440
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: ReadString slowness...

Post by #NULL »

@Fred, can you clarify what you mean by that?
Fred wrote: Wed Sep 29, 2021 9:16 am Of course, you can't compare a raw read without doing anything,
That sounds like it should be faster.
against a small read
What do you mean by small read, both read the whole file/bytes, don't they?
which create a new string buffer, parse every bytes to detect the end of line etc.
So why is doing all that stuff faster in the end?
ReadString() uses an internal cache which makes it much faster that it was before (pre-4.00 IIRC).
But what good stuff is it doing that makes it slower than the manual way? Maybe some additional handling of BOM and null bytes etc for correctness?
Fred
Administrator
Administrator
Posts: 16618
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: ReadString slowness...

Post by Fred »

Forget what I said, I misread the original post. This particular case is slower, because we don't read the whole file at once, but chunk by chunk. That way we don't have to reserve a massive memory area if the file is very big.
#NULL
Addict
Addict
Posts: 1440
Joined: Thu Aug 30, 2007 11:54 pm
Location: right here

Re: ReadString slowness...

Post by #NULL »

That explains it, thanks for clarifying.
Rinzwind
Enthusiast
Enthusiast
Posts: 636
Joined: Wed Mar 11, 2009 4:06 pm
Location: NL

Re: ReadString slowness...

Post by Rinzwind »

The #PB_File_IgnoreEOL flag specifies to read the whole file at once into memory too. It's not a little bit slower, but 17 times. Would expect similar behavior in that case. I expected it to be the fastest way, because no extra steps needed by programmer. But it's the slowest. Counterintuitive for me at least. So behind the scenes it still reads line by line with #PB_File_IgnoreEOL? That's unnecessary overhead. Anyway, I found it worth mentioning since I found out by chance.
Post Reply