Faster load a very big text file in memory [Resolved]

Just starting out? Need help? Post your questions and find answers here.
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Faster load a very big text file in memory [Resolved]

Post by Kwai chang caine »

Hello at all

I want load a DataBase of IMDB in memory found at this adress

https://datasets.imdbws.com/

I choose "title.basics.tsv.gz" and uncompress it, it's a TXT 8)
The size of this txt is 775 MO :shock:

I run this code

Code: Select all

Fichier$ = "D:\Title.basics du 221004.tsv"
Canal = ReadFile(#PB_Any, Fichier$, #PB_UTF8)
        
If Canal
 
 TailleFichier = Lof(Canal)
 *Ptr = AllocateMemory(TailleFichier)
 ReadData(Canal, *Ptr, TailleFichier)
 Donnee$ = PeekS(*Ptr, TailleFichier, #PB_UTF8)
 FreeMemory(*Ptr)
 CloseFile(Canal)
 
Else

 MessageRequester("Erreur fichier", "Le fichier" + #CRLF$ + Fichier$ + #CRLF$ + "n'a pu être ouvert.")
 
EndIf
and that create an IMA error "Wrtite error at line 0" in line 9 :|
Someone know why and how i can load this file ?

Have a good day
Last edited by Kwai chang caine on Tue Oct 04, 2022 4:47 pm, edited 1 time in total.
ImageThe happiness is a road...
Not a destination
User avatar
mk-soft
Always Here
Always Here
Posts: 5334
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Faster load a very big text file in memory

Post by mk-soft »

You must check whether such a large amount of memory could be requested at all.
Even if it takes a little longer, I would transfer the file to a LinkedList.
Then transfer it to an SQLite database.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Faster load a very big text file in memory

Post by Kwai chang caine »

Thanks a lot for your quick answer 8)

It's a good idea :idea:
But several hours can probably be usefull :|
This file apparently have 9 270 357 lines, it's mad how much films exists :shock: :D

I know that, because, i can read it with an amazing software : LTF (Large Text Viewver) 8)
Furthermore, it eat this file in few seconds without problem :shock:
It's the reason why i know my PC can read it....but not with the PB method :|

The NotePad too can fully read it...incredible no ? :shock:
Sure... less quickly than LTF, but with a reasonable time when even 8)
ImageThe happiness is a road...
Not a destination
User avatar
Kiffi
Addict
Addict
Posts: 1353
Joined: Tue Mar 02, 2004 1:20 pm
Location: Amphibios 9

Re: Faster load a very big text file in memory

Post by Kiffi »

Here is a small code to load the file into a SQLite memory database (For me it takes about 70 seconds)

Code: Select all

EnableExplicit

UseSQLiteDatabase()

Define Database
Define Line.s
Define Fields.s
Define Values.s
Define FF
Define T1, T2

T1 = ElapsedMilliseconds()

Database = OpenDatabase(#PB_Any, ":memory:", "", "", #PB_Database_SQLite)

Fields.s = "tconst, titleType, primaryTitle, originalTitle, isAdult, startYear, endYear, runtimeMinutes, genres"

DatabaseUpdate(Database, "Create Table titlebasics (" + fields + ")")

DatabaseUpdate(Database, "Begin Trans")

FF = ReadFile(#PB_Any, "[YourPathTo]\title.basics.tsv\data.tsv")

If FF
  
  While Not Eof(FF)
    
    Line = ReplaceString(ReadString(FF), "'", "''")
    
    Values = "'" + ReplaceString(Line, #TAB$, "' , '") + "'"
    
    DatabaseUpdate(Database, "Insert Into titlebasics (" + Fields + ") Values (" + Values + ")" )
    
    If DatabaseError()
      Debug DatabaseError()
      Break
    EndIf
    
  Wend
  
  CloseFile(FF)
  
EndIf

DatabaseUpdate(Database, "Commit")

DatabaseQuery(Database, "Select Count(tconst) As Count From titlebasics")

NextDatabaseRow(Database)

Debug "Number of records: " + GetDatabaseLong(Database, 0)

T2 = ElapsedMilliseconds()

Debug "In approx. " + Str(T2-T1) + " msecs"

CloseDatabase(Database)
Hygge
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Faster load a very big text file in memory

Post by Kwai chang caine »

Thanks a lot KIFFI 8)
That works, but apparently the MASTER machine is like his owner :lol:
Kiffi wrote:(For me it takes about 70 seconds
Kcc's machine is really slower to learn than the master's :mrgreen:
See my result
Number of records: 9270356
In approx. 94004 msecs
Again thanks at you two MASTERS 8)
ImageThe happiness is a road...
Not a destination
miskox
User
User
Posts: 95
Joined: Sun Aug 27, 2017 7:37 pm
Location: Slovenia

Re: Faster load a very big text file in memory [Resolved]

Post by miskox »

My thinking why LTF and notepad load file very quickly: because they load *only* part of the file that is currently displayed (+ some more of course) and then load the rest while you are working with it (for example Total Commander's View/Lister (F3) works like that).

On DOS command prompt you can use this

Code: Select all

find /v /c "" *.txt
to count the records (I suspect these files are in UNIX format so maybe Windows' 10 find.exe can do it right).

Saso
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Faster load a very big text file in memory [Resolved]

Post by Kwai chang caine »

because they load *only* part of the file that is currently displayed
It's possible :wink:
But what it's sure, it's really fluid :shock:
So much so that it looks like there are only a few lines in the file
And it's when you read the number below that you get scared :shock: :lol:

Thanks for your advice 8)
ImageThe happiness is a road...
Not a destination
ricardo_sdl
Enthusiast
Enthusiast
Posts: 109
Joined: Sat Sep 21, 2019 4:24 pm

Re: Faster load a very big text file in memory [Resolved]

Post by ricardo_sdl »

How I've done it:

Code: Select all

#File_Buffer_RW_Size = 4096;in bytes

;reads the contents from pathinputfile into a memory buffer
;puts the size of the buffer in *BufferSize
;returns the buffer address, or #null if some error occurred
Procedure.i ReadFileToMemoryBuffer(PathInputFile.s, *BufferSize.Quad)
  Protected InputFile = ReadFile(#PB_Any, PathInputFile)
  If InputFile = 0
    ProcedureReturn #Null
  EndIf
  
  Protected InputFileSize.q = Lof(InputFile)
  Protected *FileBuffer = AllocateMemory(InputFileSize, #PB_Memory_NoClear)
  If *FileBuffer = 0
    CloseFile(InputFile)
    ProcedureReturn #Null
  EndIf
  
  Protected *FileBufferPos = *FileBuffer
  Protected TotalBytesRead.q = 0
  Protected BytesToRead = #File_Buffer_RW_Size
  
  Repeat
    Protected BytesLeftToReadOnFile.q = InputFileSize - TotalBytesRead
    If BytesLeftToReadOnFile >= #File_Buffer_RW_Size
      BytesToRead = #File_Buffer_RW_Size
    Else
      BytesToRead = BytesLeftToReadOnFile
    EndIf
    
    Protected BytesRead = ReadData(InputFile, *FileBufferPos, BytesToRead)
    *FileBufferPos + BytesRead
    TotalBytesRead + BytesRead
  Until Eof(InputFile)
  
  CloseFile(InputFile)
  *BufferSize\q = InputFileSize
  ProcedureReturn *FileBuffer
  
EndProcedure
You can check my games at:
https://ricardo-sdl.itch.io/
Bitblazer
Enthusiast
Enthusiast
Posts: 732
Joined: Mon Apr 10, 2017 6:17 pm
Location: Germany
Contact:

Re: Faster load a very big text file in memory [Resolved]

Post by Bitblazer »

Sounds like a typical case for using a memory-mapped file and demand paging.
webpage - discord chat links -> purebasic GPT4All
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Faster load a very big text file in memory [Resolved]

Post by Kwai chang caine »

Thanks at you two for yours tips 8)
ImageThe happiness is a road...
Not a destination
Olli
Addict
Addict
Posts: 1071
Joined: Wed May 27, 2020 12:26 pm

Re: Faster load a very big text file in memory [Resolved]

Post by Olli »

My little biche,

for you, I will take all the risks, even write non-sense. Let's try this :

Code: Select all

Fichier$ = "D:\Title.basics du 221004.tsv"
Canal = ReadFile(#PB_Any, Fichier$)

If Canal
 fmt = ReadStringFormat(Canal)
 TailleFichier = Lof(Canal) - Loc(Canal)
 *Ptr = AllocateMemory(TailleFichier)
 ReadData(Canal, *Ptr, TailleFichier)
 Donnee$ = PeekS(*Ptr, -1, fmt)
 FreeMemory(*Ptr)
 CloseFile(Canal)
 
Else

 MessageRequester("Erreur fichier", "Le fichier" + #CRLF$ + Fichier$ + #CRLF$ + "n'a pu être ouvert.")
 
EndIf
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Faster load a very big text file in memory [Resolved]

Post by Kwai chang caine »

Hello my Bambi friend :mrgreen:

First thanks for your try and kind message 8)

Veni, vidi,.......and perdidi :lol:

Always writing error adress 0 at line 9
Thanks when even for your help 8)
ImageThe happiness is a road...
Not a destination
Post Reply