Find And Delete Duplicate Files

Share your advanced PureBasic knowledge/code with the community.
collectordave
Addict
Addict
Posts: 1309
Joined: Fri Aug 28, 2015 6:10 pm
Location: Portugal

Find And Delete Duplicate Files

Post by collectordave »

Just a small utility to find and delete duplicate files in two folders.

Choose the base folder then choose the folder to compare.

Select individual files or all then click delete and the selected files will be deleted and the list updated.

Careful no code to check duplicate folders will be adding later.

The utility ignores filenames and uses file size and fingerprint to identify duplicates so will detect files where (1) etc. has been added.

Code: Select all

UseMD5Fingerprint()

Global Window_0

Global txtBaseFolder, strBaseFolder, btnBaseFolder, txtCompFolder, strCompFolder, btnCompFolder, lstDuplicates, btnSelectAll, btnClearSelection, btnDelete, btnDone

Global Event.i

Global BaseFolder.s,CompFolder.s

Structure File
  Name.s
  Size.l
  FPrint.s
EndStructure

Global NewList BaseFiles.File()
Global NewList CompFiles.File()

Procedure.s ChooseBaseFolder()
  
  Define InitialPath.s
  
  ClearGadgetItems(lstDuplicates)
  
  InitialPath = ""
  BaseFolder = PathRequester("Please choose Main Folder", InitialPath)
 
  ProcedureReturn BaseFolder

EndProcedure

Procedure.s ChooseCompareFolder()
  
  Define InitialPath.s
  
  ClearGadgetItems(lstDuplicates)
  
  InitialPath = ""
  CompFolder = PathRequester("Please choose Folder To Compare", InitialPath)
 
  ProcedureReturn CompFolder

EndProcedure

Procedure CheckBaseFolder()
  
  ClearList(BaseFiles())
  

  If ExamineDirectory(0, BaseFolder, "*.*")  
    While NextDirectoryEntry(0)
      If DirectoryEntryType(0) = #PB_DirectoryEntry_File
        
        AddElement(BaseFiles())
        BaseFiles()\Name = BaseFolder + DirectoryEntryName(0)
        BaseFiles()\Size = DirectoryEntrySize(0)
        baseFiles()\FPrint = FileFingerprint(BaseFiles()\Name, #PB_Cipher_MD5)
        
      EndIf

    Wend
    FinishDirectory(0)
  EndIf
  
  SortStructuredList(BaseFiles(),#PB_Sort_Ascending,OffsetOf(File\Size),TypeOf(File\Size))

  
  EndProcedure

Procedure CheckCompFolder()
  
  ClearList(CompFiles())
 
  If ExamineDirectory(0, CompFolder, "*.*")  
    While NextDirectoryEntry(0)
      If DirectoryEntryType(0) = #PB_DirectoryEntry_File
        
        AddElement(CompFiles())
        CompFiles()\Name = CompFolder + DirectoryEntryName(0)
        CompFiles()\Size = DirectoryEntrySize(0)
        CompFiles()\FPrint = FileFingerprint(CompFiles()\Name, #PB_Cipher_MD5)
        
      EndIf

    Wend
    FinishDirectory(0)
  EndIf
  
  SortStructuredList(CompFiles(),#PB_Sort_Ascending,OffsetOf(File\Size),TypeOf(File\Size))

  
  EndProcedure

  Procedure CompareLists()
    
    ClearGadgetItems(lstDuplicates)
    
    ForEach BaseFiles()
      
      ForEach Compfiles()

        If Basefiles()\Size = Compfiles()\Size
          If Basefiles()\FPrint = Compfiles()\FPrint
            
            AddGadgetItem(lstDuplicates,-1,CompFiles()\Name)
            
          EndIf

        EndIf
        
      Next
      
      ResetList(Compfiles())
      
    Next
    
  EndProcedure

  
  Window_0 = OpenWindow(#PB_Any, 0, 0, 650, 400, "", #PB_Window_SystemMenu)
  txtBaseFolder = TextGadget(#PB_Any, 20, 20, 100, 20, "Main Folder")
  strBaseFolder = StringGadget(#PB_Any, 130, 20, 470, 25, "")
  btnBaseFolder = ButtonGadget(#PB_Any, 600, 20, 40, 25, "...")
  txtCompFolder = TextGadget(#PB_Any, 20, 60, 100, 20, "Comp Folder")
  strCompFolder = StringGadget(#PB_Any, 130, 60, 470, 25, "")
  btnCompFolder = ButtonGadget(#PB_Any, 600, 60, 40, 25, "...")
  lstDuplicates = ListIconGadget(#PB_Any, 130, 100, 510, 280, "", 20, #PB_ListIcon_CheckBoxes|#PB_ListIcon_GridLines)
  btnSelectAll = ButtonGadget(#PB_Any, 5, 120, 120, 25, "Select All")
  btnClearSelection = ButtonGadget(#PB_Any, 5, 150, 120, 25, "Clear Selected")
  btnDelete = ButtonGadget(#PB_Any, 5, 300, 120, 25, "Delete Selected")
  btnDone = ButtonGadget(#PB_Any, 5, 360, 120, 25, "Done")

  AddGadgetColumn(lstDuplicates, 0, "FileName", 480)
  RemoveGadgetColumn(lstDuplicates,1)
 
  Repeat
    
    
    Event = WaitWindowEvent()
    
    
  Select Event
    Case #PB_Event_CloseWindow
      End

    Case #PB_Event_Gadget
      Select EventGadget()
          
        Case btnSelectAll
          
          For x = 0 To CountGadgetItems(lstDuplicates) -1
            SetGadgetItemState(lstDuplicates,x, #PB_ListIcon_Checked  )
          Next
  
        Case btnClearSelection
            
          For x = 0 To CountGadgetItems(lstDuplicates) -1
            SetGadgetItemState(lstDuplicates,x, #PB_Checkbox_Unchecked  )
          Next
                     
        Case btnDelete
          
          For x = 0 To CountGadgetItems(lstDuplicates) -1
            
            If GetGadgetItemState(lstDuplicates,x) & 2

              DeleteFile(GetGadgetItemText(lstDuplicates,x,0))
              
            EndIf
        
          Next
          ;Reset File List
          CheckCompFolder()
          CompareLists()

          
        Case btnbasefolder
            
          SetGadgetText(strBaseFolder,ChooseBaseFolder())
          CheckBaseFolder()
            
        Case btnCompFolder
            
          SetGadgetText(strCompFolder,ChooseCompareFolder())           
          CheckCompFolder()
          CompareLists()
          
      EndSelect
      
    EndSelect
  
  ForEver
CD
Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.
infratec
Always Here
Always Here
Posts: 6869
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Find And Delete Duplicate Files

Post by infratec »

I think speed is a significant point for this program.
For that it would be much better if you only calc the fingerprint if the size is identical.
Else you do it only to loose time.
collectordave
Addict
Addict
Posts: 1309
Joined: Fri Aug 28, 2015 6:10 pm
Location: Portugal

Re: Find And Delete Duplicate Files

Post by collectordave »

Hi infrared

Speed will be important as I want to get it to handle sub folders as well later.

Will check and get it to do the fingerprint on equal sizes at compare time as you say.

CD
Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.
Post Reply