[Solved] Is CRC32 safe for different folder paths?

Just starting out? Need help? Post your questions and find answers here.
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

[Solved] Is CRC32 safe for different folder paths?

Post by BarryG »

[Edit] You've convinced me to use MD5 instead, to be on the safer side and not have to constantly worry. Thanks!

I want to reduce a full folder path to its CRC32 fingerprint to differentiate it from other folders. Is there much risk of collision for this? I assume it would be safe, since zip files use CRC32 checksums for the files they contain.

Here's what I'm currently doing, and with just two characters transposed, it results in different checksums (which is correct). So it should be safe? I'll always be converting the folder path to lower case before ciphering it.

Code: Select all

UseCRC32Fingerprint()
Debug StringFingerprint(LCase("D:\Files\"),#PB_Cipher_CRC32) ; 04528791 (Original folder)
Debug StringFingerprint(LCase("D:\iFles\"),#PB_Cipher_CRC32) ; 7754a295 (Typo, so different)
Debug StringFingerprint(LCase("d:\files\"),#PB_Cipher_CRC32) ; 04528791 (Same as original)
Last edited by BarryG on Sun May 22, 2022 1:54 am, edited 2 times in total.
Bitblazer
Enthusiast
Enthusiast
Posts: 732
Joined: Mon Apr 10, 2017 6:17 pm
Location: Germany
Contact:

Re: Is CRC32 safe for different folder paths?

Post by Bitblazer »

That depends on your definition of safe ;)

For unimportant things where you don't mind to get strange cases once every few years, it should be ok.
webpage - discord chat links -> purebasic GPT4All
User avatar
NicTheQuick
Addict
Addict
Posts: 1223
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Is CRC32 safe for different folder paths?

Post by NicTheQuick »

You can calculate the possibility for a collision. In the case of CRC32 and long path names I would assume it's very high. Keep also in mind that CRC32 is merely used for error detection in transmission and storage. It is not a cryptographic hash. So if someone really wants he could create a collision relatively easy.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: Is CRC32 safe for different folder paths?

Post by BarryG »

It's not going to be used for cryptography, but just to test if a folder path exists in a list (not on disk), something like below. I have long path reasons (>#MAX_PATH) for not checking the actual full folder path as a string.

Is what I'm doing below safe from collisions? The user wouldn't be able to enter some text that matches the windir$ checksum somehow?

Code: Select all

UseCRC32Fingerprint()
windir$=StringFingerprint(LCase("c:\windows\"),#PB_Cipher_CRC32)

folder$=InputRequester("Test","Enter a folder path:","")

If StringFingerprint(LCase(folder$),#PB_Cipher_CRC32)=windir$
  MessageRequester("Result","You entered the Windows folder!")
Else
  MessageRequester("Result","You entered: "+folder$)
EndIf
Bitblazer
Enthusiast
Enthusiast
Posts: 732
Joined: Mon Apr 10, 2017 6:17 pm
Location: Germany
Contact:

Re: Is CRC32 safe for different folder paths?

Post by Bitblazer »

BarryG wrote: Sat May 21, 2022 1:30 pm The user wouldn't be able to enter some text that matches the windir$ checksum somehow?
An experienced motivated user could.

It is good enough for unimportant things but that's all.
webpage - discord chat links -> purebasic GPT4All
User avatar
NicTheQuick
Addict
Addict
Posts: 1223
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Is CRC32 safe for different folder paths?

Post by NicTheQuick »

For example I just found two collisions on my own filesystem.

Code: Select all

0a52362d /var/lib/docker/overlay2/031ad36c5fc9e231758beb6a3c15383d1de757f889dd393342cf68e754944e07/diff/usr/local/lib/python3.8/lib2to3/fixes/fix_tuple_params.py
0a52362d /var/lib/docker/overlay2/0a3eef54176a2e077e423cf03b3dd81def63af04c4f963d62b8f2b50f977ae55/diff/usr/share/zoneinfo/posix/Europe/Simferopol
370f5c1e /var/lib/docker/overlay2/031ad36c5fc9e231758beb6a3c15383d1de757f889dd393342cf68e754944e07/diff/usr/local/lib/python3.8/site-packages/pip/_internal/resolution/resolvelib/__pycache__/base.cpython-38.pyc
370f5c1e /var/lib/docker/overlay2/7958002daebc70223beabf343aaab5184cfe70171a9ff67ed8c9768f85061409/diff/home/bibdok/plone/buildout-cache/eggs/ldkiid_base-3.1.0.14-py3.8.egg/ldkiid/base
And the simple script I wrote which is very slow has just began.

Code: Select all

#!/bin/bash

list="/tmp/allfiles"

while IFS= read -d $'\0' -r file; do
    echo "$(crc32 <(echo -n "$file")) $file"
done < <(sudo find / -print0) | tee "$list.txt"

cat "$list.txt" | sort > "$list.sorted.txt"
I guess it would be much faster with Purebasic but Bash was somehow simpler in this case.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: Is CRC32 safe for different folder paths?

Post by BarryG »

@NicTheQuick: Are those first two both folders, though? One ends with a file extension of ".py".

So when I remove the file and add a drive letter and trailing slash, they don't collide (as expected):

Code: Select all

UseCRC32Fingerprint()
Debug StringFingerprint(LCase("D:/var/lib/docker/overlay2/031ad36c5fc9e231758beb6a3c15383d1de757f889dd393342cf68e754944e07/diff/usr/local/lib/python3.8/lib2to3/fixes/"),#PB_Cipher_CRC32) ; 6c57717a
Debug StringFingerprint(LCase("D:/var/lib/docker/overlay2/0a3eef54176a2e077e423cf03b3dd81def63af04c4f963d62b8f2b50f977ae55/diff/usr/share/zoneinfo/posix/Europe/Simferopol/"),#PB_Cipher_CRC32) ; d49dcab6
Besides, they don't even collide with PureBasic when done exactly as you presented:

Code: Select all

UseCRC32Fingerprint()
Debug StringFingerprint(LCase("/var/lib/docker/overlay2/031ad36c5fc9e231758beb6a3c15383d1de757f889dd393342cf68e754944e07/diff/usr/local/lib/python3.8/lib2to3/fixes/fix_tuple_params.py"),#PB_Cipher_CRC32) ; 0a52362d
Debug StringFingerprint(LCase("/var/lib/docker/overlay2/0a3eef54176a2e077e423cf03b3dd81def63af04c4f963d62b8f2b50f977ae55/diff/usr/share/zoneinfo/posix/Europe/Simferopol"),#PB_Cipher_CRC32) ; 6bac0707
User avatar
NicTheQuick
Addict
Addict
Posts: 1223
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Is CRC32 safe for different folder paths?

Post by NicTheQuick »

Yes, these are files. But that was not my point. I wanted to show you that it is easy to find two file paths in your own filesystem which generate the same CRC32 hash.

Also it makes no sense to use LCase or adding a drive letter. On Linux there are no drive letters and paths are case sensitive. Of course you will get different hashs if you change something on the string. So that's not an argument in any way.

Remove the LCase and you will see that it collides.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
Caronte3D
Addict
Addict
Posts: 1025
Joined: Fri Jan 22, 2016 5:33 pm
Location: Some Universe

Re: Is CRC32 safe for different folder paths?

Post by Caronte3D »

Try one of SHA algorithms instead of CRC32
User avatar
NicTheQuick
Addict
Addict
Posts: 1223
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Is CRC32 safe for different folder paths?

Post by NicTheQuick »

I just scanned only folder paths on my system. I got 16 collisions. With file paths I get up to 1296 collisions.

On of the folder collisions is this:

Code: Select all

d62e8bf5 /snap/dbeaver-ce/180/usr/share/X11/xkb/symbols/sgi_vndr
d62e8bf5 /usr/share/scilab/contrib/toolbox_skeleton/help
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
Paul
PureBasic Expert
PureBasic Expert
Posts: 1243
Joined: Fri Apr 25, 2003 4:34 pm
Location: Canada
Contact:

Re: Is CRC32 safe for different folder paths?

Post by Paul »

Caronte3D wrote: Sat May 21, 2022 3:25 pm Try one of SHA algorithms instead of CRC32
As Caronte3D said, try another algorithm...

Code: Select all

UseCRC32Fingerprint()
Debug StringFingerprint("/snap/dbeaver-ce/180/usr/share/X11/xkb/symbols/sgi_vndr",#PB_Cipher_CRC32)
Debug StringFingerprint("/usr/share/scilab/contrib/toolbox_skeleton/help",#PB_Cipher_CRC32)

UseMD5Fingerprint()
Debug StringFingerprint("/snap/dbeaver-ce/180/usr/share/X11/xkb/symbols/sgi_vndr",#PB_Cipher_MD5)
Debug StringFingerprint("/usr/share/scilab/contrib/toolbox_skeleton/help",#PB_Cipher_MD5)

UseSHA2Fingerprint()
Debug StringFingerprint("/snap/dbeaver-ce/180/usr/share/X11/xkb/symbols/sgi_vndr",#PB_Cipher_SHA2)
Debug StringFingerprint("/usr/share/scilab/contrib/toolbox_skeleton/help",#PB_Cipher_SHA2)
Image Image
Olli
Addict
Addict
Posts: 1071
Joined: Wed May 27, 2020 12:26 pm

Re: Is CRC32 safe for different folder paths?

Post by Olli »

BarryG wrote:Re: Is CRC32 safe for different folder paths?
No ! Look this : crc is too rudimentary.
(Source)

Code: Select all

Global CrcTable

Procedure RoutineCRC32(*Message)

            d = -1
            Repeat
                  a = PeekB(*Message)
                  *Message + 1
                  c = d
                  a & $FF
                  c & $FF
                  a ! c
                  d >> 8
                  d ! PeekL(a << 2 + CrcTable)
                  Contagem - 1
            Until Contagem = 0
      EndWith

EndProcedure
BarryG
Addict
Addict
Posts: 3292
Joined: Thu Apr 18, 2019 8:17 am

Re: Is CRC32 safe for different folder paths?

Post by BarryG »

Paul wrote: Sat May 21, 2022 7:13 pm

Code: Select all

UseCRC32Fingerprint()
Debug StringFingerprint("/snap/dbeaver-ce/180/usr/share/X11/xkb/symbols/sgi_vndr",#PB_Cipher_CRC32)
Debug StringFingerprint("/usr/share/scilab/contrib/toolbox_skeleton/help",#PB_Cipher_CRC32)
Okay, those two clash, but they don't have the leading drive letter or trailing slash, which my paths will. So no clashes that way:

Code: Select all

UseCRC32Fingerprint()
Debug StringFingerprint("d:/snap/dbeaver-ce/180/usr/share/X11/xkb/symbols/sgi_vndr/",#PB_Cipher_CRC32) ; bf493f8a
Debug StringFingerprint("d:/usr/share/scilab/contrib/toolbox_skeleton/help/",#PB_Cipher_CRC32) ; d6f6a231
But I will just use MD5 instead, to be on the safer side and not have to constantly worry. Thanks to all for replying.
Post Reply