[Done] SQLite - the basis

Just starting out? Need help? Post your questions and find answers here.
ZX80
Enthusiast
Enthusiast
Posts: 331
Joined: Mon Dec 12, 2016 1:37 pm

[Done] SQLite - the basis

Post by ZX80 »

Hello, everyone.

Please tell me how you can create a simple database. I think I need only one table inside and two columns(both type are string). The first is the full path to the file, the second is md5-hash. I already have all data in the list(json-file). And now I would like to transfer them to the database. What is goal? I want to find all duplicate files by hash data. So I decided to use sqlite. Now the list has this structure:

Code: Select all

Structure LogData
  FullPath.s
  FSize.q
  FHash.s
EndStructure

Global NewList FObjects.LogData()
I think I need something like this code:

Code: Select all

UseSQLiteDatabase()
dbfile$ = "db.sqlite"
 
If CreateFile(0, dbfile$)
  CloseFile(0)
  glDBSQLite = OpenDatabase(0, dbfile$, "", "", #PB_Database_SQLite)
  If glDBSQLite
    psSQLRequest.s = "CREATE TABLE IF NOT EXISTS fobjects ("
    psSQLRequest + "id_fobject INTEGER PRIMARY KEY AUTOINCREMENT Not NULL, "
    psSQLRequest + "fobject_fpath TEXT Not NULL, "
    psSQLRequest + "fobject_hash TEXT Not NULL, "
    psSQLRequest + "fobject_size INTEGER"
    psSQLRequest + ")"
    If DatabaseUpdate(glDBSQLite, psSQLRequest)=0
      MessageRequester("error", "can't create the table"+#CRLF$+DatabaseError())
    Else
      FinishDatabaseQuery(glDBSQLite)

;       FirstDatabaseRow(0)
;       ForEach FObjects()
;         SetDatabaseString(glDBSQLite, 0, FObjects()\FullPath)
;         SetDatabaseString(glDBSQLite, 1, FObjects()\FHash)
;         NextDatabaseRow(glDBSQLite)
;       Next
      
      ForEach FObjects()
        psSQLRequest = "INSERT INTO fobjects"
        psSQLRequest + "(fobject_fpath, fobject_hash, fobject_size)"
        psSQLRequest + "VALUES ("
        psSQLRequest + "'"+FObjects()\FullPath+"',"
        psSQLRequest + "'"+FObjects()\FHash+"',"
        psSQLRequest + "'"+FObjects()\FSize+"')"
        If DatabaseUpdate(glDBSQLite, psSQLRequest) = 0
          MessageRequester("error", "can't insert data into the table"+#CRLF$+DatabaseError())
        EndIf
      Next
      CloseDatabase(glDBSQLite)
    EndIf
  Else
    MessageRequester("error", "can't open the table"+#CRLF$+DatabaseError())
  EndIf
Else
  MessageRequester("error", "can't create the db-file")
EndIf
Please, help me.
Last edited by ZX80 on Mon Jun 01, 2020 6:08 pm, edited 1 time in total.
infratec
Always Here
Always Here
Posts: 6883
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: SQLite - the basis

Post by infratec »

If you want only find them temporarily, you can use SQLite in memory, which is much faster.

Code: Select all

EnableExplicit

Structure LogData
  FullPath.s
  FSize.q
  FHash.s
EndStructure

Define.i glDBSQLite
Define.s psSQLRequest
Define NewList FObjects.LogData()

AddElement(FObjects())
FObjects()\FullPath = "123.txt"
FObjects()\FSize = 123
FObjects()\FHash = "H123"

AddElement(FObjects())
FObjects()\FullPath = "456.txt"
FObjects()\FSize = 456
FObjects()\FHash = "H456"

AddElement(FObjects())
FObjects()\FullPath = "123.txt"
FObjects()\FSize = 123
FObjects()\FHash = "H123"


UseSQLiteDatabase()

glDBSQLite = OpenDatabase(#PB_Any, ":memory:", "", "", #PB_Database_SQLite)
If glDBSQLite
  psSQLRequest = "CREATE TABLE IF NOT EXISTS fobjects ("
  psSQLRequest + "id INTEGER PRIMARY KEY AUTOINCREMENT Not NULL, "
  psSQLRequest + "fpath TEXT Not NULL, "
  psSQLRequest + "hash TEXT Not NULL, "
  psSQLRequest + "size INTEGER"
  psSQLRequest + ")"
  If DatabaseUpdate(glDBSQLite, psSQLRequest)=0
    MessageRequester("error", "can't create the table"+#CRLF$+DatabaseError())
  Else
    
    ForEach FObjects()
      psSQLRequest = "INSERT INTO fobjects"
      psSQLRequest + "(fpath, hash, size)"
      psSQLRequest + "VALUES ("
      psSQLRequest + "'"+FObjects()\FullPath+"',"
      psSQLRequest + "'"+FObjects()\FHash+"',"
      psSQLRequest + "'"+FObjects()\FSize+"')"
      If DatabaseUpdate(glDBSQLite, psSQLRequest) = 0
        MessageRequester("error", "can't insert data into the table"+#CRLF$+DatabaseError())
      EndIf
    Next
    
    
    psSQLRequest = "SELECT * FROM fobjects"
    If DatabaseQuery(glDBSQLite, psSQLRequest)
      
      While NextDatabaseRow(glDBSQLite)
        Debug GetDatabaseString(glDBSQLite, 1) + " " + GetDatabaseString(glDBSQLite, 2) + " " + GetDatabaseString(glDBSQLite, 3)
      Wend
      
      FinishDatabaseQuery(glDBSQLite)
    Else
      Debug psSQLRequest
      Debug DatabaseError()
    EndIf
    
    
    psSQLRequest = "SELECT a.id, b.id, a.fpath, b.fpath, a.hash FROM fobjects a, fobjects b WHERE a.hash = b.hash AND a.id <> b.id"
    If DatabaseQuery(glDBSQLite, psSQLRequest)
      
      While NextDatabaseRow(glDBSQLite)
        Debug GetDatabaseString(glDBSQLite, 0) + " - " + GetDatabaseString(glDBSQLite, 1) + " " + GetDatabaseString(glDBSQLite, 2) + " " + GetDatabaseString(glDBSQLite, 3)  + " " + GetDatabaseString(glDBSQLite, 4)
      Wend
      
      FinishDatabaseQuery(glDBSQLite)
    Else
      Debug psSQLRequest
      Debug DatabaseError()
    EndIf
      
  EndIf
  
  CloseDatabase(glDBSQLite)
Else
  MessageRequester("error", "can't create the db-file")
EndIf
But why a list when you want to use SQLite :?:
Then you can use SQLite instead of the list :wink:
User avatar
spikey
Enthusiast
Enthusiast
Posts: 590
Joined: Wed Sep 22, 2010 1:17 pm
Location: United Kingdom

Re: SQLite - the basis

Post by spikey »

Or alternatively - why use SQLite at all when you can get the result you want from the list directly? :D

Code: Select all

Structure LogData
  FullPath.s
  FSize.q
  FHash.s
EndStructure

NewList FObjects.LogData()
Define.S LastHash

AddElement(FObjects())
FObjects()\FullPath = "123.txt"
FObjects()\FSize = 123
FObjects()\FHash = "H123"

AddElement(FObjects())
FObjects()\FullPath = "456.txt"
FObjects()\FSize = 456
FObjects()\FHash = "H456"

AddElement(FObjects())
FObjects()\FullPath = "123.txt"
FObjects()\FSize = 123
FObjects()\FHash = "H123"

; Sort the list into ascending order of hash.
SortStructuredList(FObjects(), #PB_Sort_Ascending, OffsetOf(LogData\FHash), #PB_String)

; Iterate list - any hash which appears more than once is a duplicated item.
LastHash = #Empty$
ForEach FObjects()
  If FObjects()\FHash = LastHash
    ; This hash appeared before.
    Debug FObjects()\FullPath + " is a duplicate item."
  Else
    ; This is a new hash.
    LastHash = FObjects()\FHash
  EndIf
Next FObjects()
infratec
Always Here
Always Here
Posts: 6883
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: SQLite - the basis

Post by infratec »

As always ... it depends ...

If the list is small, SQLite make no sense. That's true.
ZX80
Enthusiast
Enthusiast
Posts: 331
Joined: Mon Dec 12, 2016 1:37 pm

Re: [Done] SQLite - the basis

Post by ZX80 »

Dear, infratec :!:

I have no words...

You are winner! You confirm it again and again. Your code is really very fast! Incredible! What else can I say... ? Cool!

I know that this forum also has other very good specialists. And I really don't know, why they didn't answer my question. Maybe the question was too simple? I dont know... But... And therefore... in this case, all my congratulations are just for you!

This is exactly what I need. Thank you so much for your help! The code works as expected. I changed it a little bit for my needs. I hope, that I didn't destroy it (about speed processing). Because I used some string manipulation. Especially... As we know... assignment operation for string variable... some string... And checking the condition of such a variable using a specific string... It eats a lot of time. And this is unforgivable. Inadmissible luxury. But thanks to your experience, I had this stock. I hope that I did not completely spend it. And found a compromise(balance) between speed and simplicity of code. Here I also used functions, such as: "StringField" and "Val" - inverse conversion of a string to a number (structure field - size). I hope you were not upset when you heard about it. And all your efforts have turned into a slow thing. Sorry if so. But I don't have much experience. It may be easy for you, but I rejoice for every tip from the professionals ... as a child. :oops:

Also I had to use a map to format the result. And copy back to list. Perhaps, you do not understand me now. But I think that you already understood what was going on. And here the mk-soft code really helped me (look here). Thanks to him for that.


spikey, you are absolutely right! Thank you so much too :!:

I already looked at about the same code (here). c4s, thank you!
But in my case sqlite will be much more powerful and flexible. I apologize to both of you for late reply. :oops:
But better late than never.

Returning to the question of the appropriateness of using a list.
But why a list when you want to use SQLite :?:
Because the resulting list is formed by reading multiple files. Each such file = one storage medium. I can
not afford to scan several disks at once pass (in queue). There is a high probability of failure. And this
is a very long time.

I understand that my next question is for another topic, but I wanted to ask you:
what's the best way to count md5 for large files?

I saw that for such files, the calculation is done in parts. But I think that don't need it. Instead, I wanted to use byte comparison of files(compare by content). infratec, thank you again! Is this the right trick or not? Of course, such files will not be written to the list.
But for small files, I also wanted to use byte comparison. What for? To avoid a collision. I am afraid that
this will all work very slowly.

Anyway, thank you again. For all.


P.S. I do not consider calculating hashes in multiple threads. Although I know this is a little faster. But this is a big headache. :shock:

[Added a bit later]
I read this topic. nco2k, thank you!
infratec
Always Here
Always Here
Posts: 6883
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: [Done] SQLite - the basis

Post by infratec »

I would use simply the FileFingerprint() procedure.
But ...

You can use more than 1 thread.
Which makes it faster over all.
ZX80
Enthusiast
Enthusiast
Posts: 331
Joined: Mon Dec 12, 2016 1:37 pm

Re: [Done] SQLite - the basis

Post by ZX80 »

infratec

I already did this with FileFingerprint(). Yes, that’s easy.
Thanks again for your attention to this topic.
You can use more than 1 thread.
Which makes it faster over all.
But then I have to use mutex, right?
As already mentioned in one of the topics to which I gave links (nco2k wrote):
but i think the bigger bottleneck will be the hdd in this case.
Moreover, we are talking about flash drives (including). I don't think that the profits will be big. But significantly complicate the code.
Last edited by ZX80 on Mon Jun 01, 2020 7:03 pm, edited 1 time in total.
User avatar
skywalk
Addict
Addict
Posts: 4003
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: [Done] SQLite - the basis

Post by skywalk »

Curious?
How are the multiple threads deployed?
Aside from the main gui thread, is there a master thread building the file list and then sending the filefingerprint() request to multiple threads?
The number of threads are set by number of available cores?
Or let the O/S decide?
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
ZX80
Enthusiast
Enthusiast
Posts: 331
Joined: Mon Dec 12, 2016 1:37 pm

Re: [Done] SQLite - the basis

Post by ZX80 »

skywalk
Sorry, but I didn’t understand for whom your question?
If to me, then I do not plan to use GUI. In any case, at first. I only need a list.
Maybe this is bad. Because there is no way to manage the program. Abort at the request of the user this lengthy process. But if you have some time for me, then I will be glad to see your offer.

Sorry, but now I have to go.
User avatar
skywalk
Addict
Addict
Posts: 4003
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: [Done] SQLite - the basis

Post by skywalk »

My question is about setting up the threads mentioned in this topic?
I mentioned gui, since that is most often the main thread.
Why would you resist a simple gui? In PB, they are tiny amounts of code.
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
infratec
Always Here
Always Here
Posts: 6883
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: [Done] SQLite - the basis

Post by infratec »

For my directory it needs 9 seconds without threads and 6 seconds with threads.

Code: Select all

EnableExplicit

Enumeration #PB_Event_FirstCustomValue
  #FingerPrintThreadEvent
EndEnumeration

Structure FileFingerPrintStructure
  Filename$
  MD5$
EndStructure


Procedure FingerPrintThread(*FileFingerPrint.FileFingerPrintStructure)
  *FileFingerPrint\MD5$ = FileFingerprint(*FileFingerPrint\Filename$, #PB_Cipher_MD5)
  PostEvent(#FingerPrintThreadEvent)
EndProcedure


Define.i ThreadCounter, Event, StartTime
Define Directory$
Define *Element
Define NewList FileList.FileFingerPrintStructure()

UseMD5Fingerprint()

;Directory$ = GetHomeDirectory() ; Listet alle Dateien und Ordner im 'Home'-Verzeichnis auf
Directory$ = "c:\tmp\"
If ExamineDirectory(0, Directory$, "*.*")
  While NextDirectoryEntry(0)
    If DirectoryEntryType(0) = #PB_DirectoryEntry_File
      *Element = AddElement(FileList())
      If *Element
        FileList()\Filename$ = Directory$ + DirectoryEntryName(0)
      EndIf
    EndIf
  Wend
  FinishDirectory(0)
EndIf





OpenWindow(0, 0, 0, 400, 300, "Test", #PB_Window_ScreenCentered|#PB_Window_SystemMenu)


StartTime = ElapsedMilliseconds()
ForEach FileList()
  FileFingerprint(FileList()\Filename$, #PB_Cipher_MD5)
Next
Debug ElapsedMilliseconds() - StartTime


StartTime = ElapsedMilliseconds()
ForEach FileList()
  CreateThread(@FingerPrintThread(), @FileList())
  ThreadCounter + 1
Next

Repeat
  Event = WaitWindowEvent()
  
  Select Event
    Case #FingerPrintThreadEvent
      ThreadCounter - 1
      If ThreadCounter = 0
        
        Debug ElapsedMilliseconds() - StartTime
        
        ForEach FileList()
          Debug FileList()\Filename$ + " " + FileList()\MD5$
        Next
      EndIf
      
  EndSelect
  
Until Event = #PB_Event_CloseWindow
User avatar
skywalk
Addict
Addict
Posts: 4003
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: [Done] SQLite - the basis

Post by skywalk »

Interesting, but I thought true time measurements only without debugger?
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
ZX80
Enthusiast
Enthusiast
Posts: 331
Joined: Mon Dec 12, 2016 1:37 pm

Re: [Done] SQLite - the basis

Post by ZX80 »

>skywalk
Why would you resist a simple gui? In PB, they are tiny amounts of code.
I know. I know.
I'm certainly a beginner, but not so much as to be unable to create a window :mrgreen:
Thanks :!:



>infratec

Is it safely? I don't think so. You create a new thread for each file. :shock: :shock: :shock:

And if there are thousands of files? Especially in your code there is no recursion. But it is easy to add. The question was different. Whether the program will freeze in this case? I doubt this is a good point. You probably expected that most files would be small size. But we do not know this in advance. Maybe the user will scan a disk which consist of only films. Even if these are 200 MB files, then your plan will fail. Namely: calculating the hash-sum for a small file will take a little/short time and then it's not scary to create a lot of threads. Because the created threads will be completed almost immediately after launch.

:idea: I would split a common list for 10 files per thread (minimum :!:).

To be honest, I never created more than two threads (to be precise, there is only one additional stream). As skywalk already said: main-thread for GUI, which will be created automatically after launch your program; and second-thread for my work. Because all of them need to be closely controlled. To avoid memory leak.

Anyway, thanks for your participation again :!:
I will think further.

P.S. As an option(alternatively) I can use the BarryG-code (from here) :wink:
User avatar
skywalk
Addict
Addict
Posts: 4003
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: [Done] SQLite - the basis

Post by skywalk »

Threading is a complex topic and should be expanded in another post.
Using threads is easily understood, but efficiently deploying them to reduce a complex task requires more analysis. At least for my simple brain. :oops:
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
infratec
Always Here
Always Here
Posts: 6883
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: [Done] SQLite - the basis

Post by infratec »

You have to enable Thread Safe in the compileroptions.

The internal procedure FileFingerprint() should be threadsafe.
And since I use a pointer to the list element there can be no conflict for write accesses.
The Counter itself is reduced via a PostEvent, which is handled 'serial' in the main thread.
So there is also nothing to fear.

And since the stuff is done in threads, you can still move the window or even close it,
which would not be possible if you don't use threads (at least one)

Ok, you can use a limit for the threads.
And if they went down, to maybe 10, you can start another 20 threads.

This was only an example how to use it safely.
Post Reply