ReadString() performance best practice

Just starting out? Need help? Post your questions and find answers here.
Oso
Enthusiast
Enthusiast
Posts: 595
Joined: Wed Jul 20, 2022 10:09 am

Re: ReadString() performance best practice

Post by Oso »

infratec wrote: Mon Aug 15, 2022 10:38 pm You are right:

Code: Select all

*FileEnd = *File + MemorySize(*File) - 1
If you have enough RAM ... no problem.
I already read 1GB files in RAM without problems.

Or you have to do a chunk management which is not trivial if a rest of bytes needs to be copied to the begin.
If I remember, I already posted such an example.
It's only one 1 byte longer than the size of the file. I was wondering though, if something else happens to be occupying that memory location (1 byte past the size of our file), then its 'byte' could be included in the process. It's a minor point but I just wanted to make sure I was following it correctly ;-)
User avatar
jacdelad
Addict
Addict
Posts: 1431
Joined: Wed Feb 03, 2021 12:46 pm
Location: Planet Riesa
Contact:

Re: ReadString() performance best practice

Post by jacdelad »

Oso wrote: Tue Aug 16, 2022 1:18 am
jacdelad wrote: Mon Aug 15, 2022 11:34 pm Keep in mind, that, if you don't need to read the full file, you maybe shouldn't read the full file. It's not clear to me, if you need to process the whole file.
Yes, understood. The code that I've written looks for an identifying key in the file, but once it has found it, then the process is complete and it doesn't need to look any further.
I assume this key isn't always in the same position?! For large files I would chunk it into 1MB pieces (or some other useful size) and analyze them. ReadData() would be your friend.
PureBasic 6.04/XProfan X4a/Embarcadero RAD Studio 11/Perl 5.2/Python 3.10
Windows 11/Ryzen 5800X/32GB RAM/Radeon 7770 OC/3TB SSD/11TB HDD
Synology DS1821+/36GB RAM/130TB
Synology DS920+/20GB RAM/54TB
Synology DS916+ii/8GB RAM/12TB
Oso
Enthusiast
Enthusiast
Posts: 595
Joined: Wed Jul 20, 2022 10:09 am

Re: ReadString() performance best practice

Post by Oso »

jacdelad wrote: Tue Aug 16, 2022 3:20 am I assume this key isn't always in the same position?! For large files I would chunk it into 1MB pieces (or some other useful size) and analyze them. ReadData() would be your friend.
That's right, there are no positions in the file, only delimited sequences of variable-length data. I might see better performance using ReadData() into fixed blocks as you say, the only problem would be that if I'm searching for a string of bytes that happens to straggle two consecutive blocks, it's difficult to find it, but of course there are ways around that, such as saving the last 'n' bytes from the block before.

But at the moment, using ReadByte() to process small delimited sections at a time is giving me fairly good performance. It takes 15 seconds to find something in a 45Mb file. I don't know if that sounds reasonable.
User avatar
jacdelad
Addict
Addict
Posts: 1431
Joined: Wed Feb 03, 2021 12:46 pm
Location: Planet Riesa
Contact:

Re: ReadString() performance best practice

Post by jacdelad »

Yeah, just add the subtract the length of the search string (be aware of ASCII and unicode differences) from the actual read position and read the next block.
PureBasic 6.04/XProfan X4a/Embarcadero RAD Studio 11/Perl 5.2/Python 3.10
Windows 11/Ryzen 5800X/32GB RAM/Radeon 7770 OC/3TB SSD/11TB HDD
Synology DS1821+/36GB RAM/130TB
Synology DS920+/20GB RAM/54TB
Synology DS916+ii/8GB RAM/12TB
Post Reply