Search for duplicate files

Applications, Games, Tools, User libs and useful stuff coded in PureBasic
AZJIO
Addict
Addict
Posts: 1313
Joined: Sun May 14, 2017 1:48 am

Search for duplicate files

Post by AZJIO »

Search duplicates

Download: yandex upload.ee

screenshot on linux
Image

1. In the context menu, you can select deletion priority levels.
2. You can save CSV and then use CSV list.

updated
Added unlimited priority level selection
Added removal of an item from the search box
Added group item color (in ini)
Added PseudoHashSize parameter to ini
Added saving results to a file (to compare Linux and Windows results)
Last edited by AZJIO on Fri Jul 01, 2022 9:07 pm, edited 7 times in total.
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5342
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: Search for duplicate files

Post by Kwai chang caine »

Thanks for sharing 8)
ImageThe happiness is a road...
Not a destination
AZJIO
Addict
Addict
Posts: 1313
Joined: Sun May 14, 2017 1:48 am

Re: Search for duplicate files

Post by AZJIO »

To increase the speed of pre-comparison of files, I used the division of the file length into 32 sections and read the data byte 32 times. Now if a series consists of 200 series of the same size, then instead of calculating the md5 of large files with a total size of 100 GB, I read 32 bytes from each file. It happens 10 times faster. And only after that I calculate md5, if the preliminary comparison still gives a suspicion that the files are the same.
I added the source code with the prefix PseudoHash.

Code: Select all

DisableDebugger
EnableExplicit
UseMD5Fingerprint()
Define Path$, StartTime, Res.s, md5$

Procedure.s GetPseudoHash(Path$, Shift.q)
	Protected res$, length, file_id
	file_id = ReadFile(#PB_Any, Path$)
	If file_id
		length = Lof(file_id)
		FileSeek(file_id, 4, #PB_Relative)
	    While Eof(file_id) = 0
	        res$ + Hex(ReadByte(file_id), #PB_Byte)
			FileSeek(file_id, Shift, #PB_Relative)
	    Wend
		FileSeek(file_id, length - 1, #PB_Absolute)
	     res$ + Hex(ReadByte(file_id), #PB_Byte)
	    CloseFile(file_id)
	EndIf
	ProcedureReturn res$
EndProcedure

Path$ = "path_to_video"
StartTime=ElapsedMilliseconds()
md5$ =  GetPseudoHash(Path$, FileSize(Path$) / 31)
Res = "hash time = " + Str(ElapsedMilliseconds()-StartTime) + " ms"
MessageRequester("hash_0", md5$ + #LF$ + #LF$ + Res)

Path$ = "path_to_movie_of_the_same_size_but_different_hash"

StartTime=ElapsedMilliseconds()
md5$ = FileFingerprint(Path$, #PB_Cipher_MD5)
Res = "hash time md5 = " + Str(ElapsedMilliseconds()-StartTime) + " ms"
MessageRequester("md5", md5$ + #LF$ + #LF$ + Res)
User avatar
IceSoft
Addict
Addict
Posts: 1616
Joined: Thu Jun 24, 2004 8:51 am
Location: Germany

Re: Search for duplicate files

Post by IceSoft »

I got the warning:
Couldn't download - Virus detected

Can you provide the source only too?
Belive!
<Wrapper>4PB, PB<game>, =QONK=, PetriDish, Movie2Image, PictureManager,...
AZJIO
Addict
Addict
Posts: 1313
Joined: Sun May 14, 2017 1:48 am

Re: Search for duplicate files

Post by AZJIO »

https://disk.yandex.ru/d/QvQ5oqebC69uZA
Will the antivirus allow you to compile?
User avatar
IceSoft
Addict
Addict
Posts: 1616
Joined: Thu Jun 24, 2004 8:51 am
Location: Germany

Re: Search for duplicate files

Post by IceSoft »

AZJIO wrote: Tue Jun 28, 2022 3:59 pm https://disk.yandex.ru/d/QvQ5oqebC69uZA
Will the antivirus allow you to compile?
Sure. I can stopp it.
I see the source and can trust it
Belive!
<Wrapper>4PB, PB<game>, =QONK=, PetriDish, Movie2Image, PictureManager,...
AZJIO
Addict
Addict
Posts: 1313
Joined: Sun May 14, 2017 1:48 am

Re: Search for duplicate files

Post by AZJIO »

IceSoft wrote: Tue Jun 28, 2022 6:37 pmI see the source and can trust it
At the moment, all my projects contain the source, even the archive in which you saw the virus. My free kaspersky antivirus says that there is no virus in the file.
User avatar
IceSoft
Addict
Addict
Posts: 1616
Joined: Thu Jun 24, 2004 8:51 am
Location: Germany

Re: Search for duplicate files

Post by IceSoft »

AZJIO wrote: Tue Jun 28, 2022 9:29 pm
IceSoft wrote: Tue Jun 28, 2022 6:37 pmI see the source and can trust it
At the moment, all my projects contain the source, even the archive in which you saw the virus. My free kaspersky antivirus says that there is no virus in the file.
Avast, Trendmicro, Defender
Belive!
<Wrapper>4PB, PB<game>, =QONK=, PetriDish, Movie2Image, PictureManager,...
AZJIO
Addict
Addict
Posts: 1313
Joined: Sun May 14, 2017 1:48 am

Re: Search for duplicate files

Post by AZJIO »

AZJIO
Addict
Addict
Posts: 1313
Joined: Sun May 14, 2017 1:48 am

Re: Search for duplicate files

Post by AZJIO »

Update
Added filter/mask for files.
The Windows version does not show checkboxes for groups.
Post Reply