Hi guys and gals.
I'm making an app that copies files to an HDF image. One of the things I would like to do is to be able to filter my source files.
To do this I'd like to use a fuzzy search, so for example I have a folder with 10,000 titles inside. I want to find all the "ocean" games, I type Ocean in to my filter box and it will filter the list to show any titles with Ocean in the title.
I'm using an Explorer form at the moment for my source but think I need to write something custom.
Any idea on how this would be achieved (the fuzzy search bit)
Thanks
Fuzzy Search
Fuzzy Search
----
R Tape loading error, 0:1
R Tape loading error, 0:1
Re: Fuzzy Search
A basic start...
Code: Select all
Path.s = GetTemporaryDirectory() + "Fuzzy/"
CreateDirectory(Path)
RandomSeed(0)
For i = 1 To 100
File.s = Str(Random(1000)) + " " + Str(Random(1000)) + ".dat"
File = ReplaceString(File, "9", " ocean ")
If CreateFile(0, Path + File)
CloseFile(0)
EndIf
Next i
; (see below)
CreateRegularExpression(0, ".*ocean.*", #PB_RegularExpression_NoCase)
If ExamineDirectory(0, Path, "")
While NextDirectoryEntry(0)
File.s = DirectoryEntryName(0)
; Simple FindString
If (FindString(File, "ocean", 1, #PB_String_NoCase))
Debug File
EndIf
; RegEx (more powerful)
If (MatchRegularExpression(0, File))
Debug File
EndIf
Wend
FinishDirectory(0)
EndIf
Re: Fuzzy Search
Why not:
But this has nothing todo with Fuzzy
Bernd
Code: Select all
If ExamineDirectory(0, Path, "*ocean*.*")
While NextDirectoryEntry(0)
Debug DirectoryEntryName(0)
Wend
FinishDirectory(0)
EndIf
Bernd
Re: Fuzzy Search
That works too. I didn't do that in my example because
(a) I wanted to show a RegEx match in the loop
(b) on Linux/Mac, I think ExamineDirectory matches case-sensitive, which I assume em_uk does not want(?)
(a) I wanted to show a RegEx match in the loop
(b) on Linux/Mac, I think ExamineDirectory matches case-sensitive, which I assume em_uk does not want(?)
-
- User
- Posts: 69
- Joined: Tue Sep 16, 2008 6:11 am
- Location: ger
Re: Fuzzy Search
A good way to perform a fuzzy search is to use the algo provided by Levenshtein.
Wiki: https://en.wikipedia.org/wiki/Levenshtein_distance
Code: https://github.com/acmeism/RosettaCodeD ... .purebasic
In combination with a prefilled list ( iterate all elements ) or life (check while iterating ), depending on how deep the structure is, the search itself is quite fast --> O(m*n).
You may have to tinker with the distance ( I commonly use a distance of <= 2 ---> means two different letters found in the compared string).
Wiki: https://en.wikipedia.org/wiki/Levenshtein_distance
Code: https://github.com/acmeism/RosettaCodeD ... .purebasic
In combination with a prefilled list ( iterate all elements ) or life (check while iterating ), depending on how deep the structure is, the search itself is quite fast --> O(m*n).
You may have to tinker with the distance ( I commonly use a distance of <= 2 ---> means two different letters found in the compared string).
Re: Fuzzy Search
em, by "fuzzy search" do you actually mean complex search patterns (like regular expressions etc)? because the "search for ocean in the title" example you gave indicates it could be as simple as having the user enter a search word, and then simply enumerating through the list using FindString(nextword, wordtofind)
Re: Fuzzy Search
Hi, thanks for the ideas so far.
Fuzzy search as in simple search terms, not complex regex, just so that titles can be filtered easily.
The source windows is an explorer gadget which the user would browse to a folder with the source files in, but say he wants to add just games by codemasters he could type "codemasters" into the filter and the explorer gadget would filter the rest out.
infratec solution seems to work quite well in my testing and means I wont have to create my own file explorer.
Thanks Keya for the quick framework, if my app needs finer control that will come in very handy.
Fuzzy search as in simple search terms, not complex regex, just so that titles can be filtered easily.
The source windows is an explorer gadget which the user would browse to a folder with the source files in, but say he wants to add just games by codemasters he could type "codemasters" into the filter and the explorer gadget would filter the rest out.
infratec solution seems to work quite well in my testing and means I wont have to create my own file explorer.
Thanks Keya for the quick framework, if my app needs finer control that will come in very handy.
----
R Tape loading error, 0:1
R Tape loading error, 0:1