Quickly find the last file in a numbered sequence

Share your advanced PureBasic knowledge/code with the community.
Seymour Clufley
Addict
Addict
Posts: 1233
Joined: Wed Feb 28, 2007 9:13 am
Location: London

Quickly find the last file in a numbered sequence

Post by Seymour Clufley »

This code is for finding the highest-numbered file in a set of files that are all named in a consistent way. One example would be an image sequence of 100,000 frames, which have the following file-names:
D:\my_folder\imgseqframe_000001.jpg
D:\my_folder\imgseqframe_000002.jpg
D:\my_folder\imgseqframe_000003.jpg
...
D:\my_folder\imgseqframe_100000.jpg
Now let's say you didn't know how many files are in this sequence, and needed to know how long it is. Or, let's say you knew a section in the middle of the sequence had been deleted, and needed to find the location of the gap.

Ordinarily what you would have to do is count up from 1, and check if each file exists. This is very slow. The code here does it much faster by moving upwards but in decreasing increments, to get ever more precision until it identifies the first non-existent file. It then returns the number immediately below that.

The test folder I have been using contains 200,000 files, with a gap of 1000 after #111875. Execution time is 2600ms doing it the ordinary way, or 3ms using this code.

It can be an extremely fast solution. However, this fastness depends on the minimum size of gap. If you know that the gap is just one or two files, this code probably won't be much use.

Since the gap is going to be at least 1000 files, we can afford to jump around in increments of 995, before getting more precise. The maxjumpsize variable must be smaller than the gap can possibly be.

Finally, you will need to adapt this code for the file-name format involved. I have written it to work for the example given above. I dare say a more universal solution could be created using RegEx.

Code: Select all

Procedure.i FindLastFileByCountingUpFrom1(folder.s,pfx.s,ext.s,fillzeroes.i)
  If Left(ext,1)<>"." : ext="."+ext : EndIf
  
  highest_exist_n.i = 0
  
  Repeat
    highest_exist_n+1
    testfn.s = folder+pfx+RSet(Str(highest_exist_n),fillzeroes,"0")+ext
    If FileSize(testfn) = -1
      ProcedureReturn highest_exist_n-1
    EndIf
  ForEver
  
EndProcedure


Procedure.i FindLastFileByCountingUpIncrementallyFrom1(folder.s,pfx.s,ext.s,fillzeroes.i,maxjumpsize.i)
  If Left(ext,1)<>"." : ext="."+ext : EndIf
  
  If FileSize(folder+pfx+RSet(Str(1),fillzeroes,"0")+ext)=-1
    ProcedureReturn 0
  EndIf
  
  highest_exist_n.i = 1
  
  Repeat
    n.i = highest_exist_n+maxjumpsize
    testfn.s = folder+pfx+RSet(Str(n),fillzeroes,"0")+ext
    If FileSize(testfn) = -1
      Break
    EndIf
    highest_exist_n = n
  ForEver
  
  
  jumpsize = maxjumpsize
  Repeat
    jumpsize / 2
    ;Debug "JUMPSIZE: "+Str(jumpsize)
    If jumpsize<1 : Break : EndIf
    n = highest_exist_n+jumpsize
    testfn.s = folder+pfx+RSet(Str(n),fillzeroes,"0")+ext
    If FileSize(testfn) = -1
      Continue
    EndIf
    highest_exist_n = n
  ForEver
  
  Repeat
    testfn.s = folder+pfx+RSet(Str(highest_exist_n+1),fillzeroes,"0")+ext
    If FileSize(testfn) = -1
      Break
    EndIf
    highest_exist_n+1
  ForEver
  
  ProcedureReturn highest_exist_n
  
EndProcedure




frmfolder.s = "D:\my_folder\"
pfx.s = "imgseqframe_"
fillZeroes.i = 6
ext.s = "jpg"

starttime.d = ElapsedMilliseconds()
n.i = FindLastFileByCountingUpFrom1(frmfolder,pfx,ext,fillZeroes)
Debug Str(n)+" ("+StrD(ElapsedMilliseconds()-starttime)+" ms)"

starttime.d = ElapsedMilliseconds()
n.i = FindLastFileByCountingUpIncrementallyFrom1(frmfolder,pfx,ext,fillZeroes,995)
Debug Str(n)+" ("+StrD(ElapsedMilliseconds()-starttime)+" ms)"
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
User avatar
NicTheQuick
Addict
Addict
Posts: 1226
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Quickly find the last file in a numbered sequence

Post by NicTheQuick »

What about using ExamineDirectory and iterating over all files in the directory, putting them in a LinkedList, sort them and then try to find the gap in increments of 1? I did not test it but I could imagine that this is also quite fast.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
Seymour Clufley
Addict
Addict
Posts: 1233
Joined: Wed Feb 28, 2007 9:13 am
Location: London

Re: Quickly find the last file in a numbered sequence

Post by Seymour Clufley »

I just wrote that procedure to test it. The speed is 24x as fast as the ordinary method, but the increment method is 40x faster still.

Code: Select all

Procedure.i FindLastFileByLinkedList(folder.s,pfx.s,ext.s,fillzeroes.i)
  If Left(ext,1)<>"." : ext="."+ext : EndIf
  
  NewList a2z.i()
  d = ExamineDirectory(#PB_Any,folder,"*"+ext)
  While NextDirectoryEntry(d)
    testfn.s = DirectoryEntryName(d)
    testfn = Left(testfn,Len(testfn)-Len(ext))
    testfn = RemoveString(testfn,pfx)
    n.i = Val(testfn)
    AddElement(a2z()) : a2z()=n
  Wend
  FinishDirectory(d)
  
  SortList(a2z(),#PB_Sort_Ascending)
  
  count.i = 0
  ForEach a2z()
    count+1
    If count>1
      n = a2z()
      If n>(prev_n+1)
        ProcedureReturn prev_n
      EndIf
    EndIf
    prev_n = a2z()
  Next
  
EndProcedure
A slightly faster way is to just compare filenames on the fly:

Code: Select all

Procedure.i FindLastFileByRealtimeComparison(folder.s,pfx.s,ext.s,fillzeroes.i)
  If Left(ext,1)<>"." : ext="."+ext : EndIf
  
  d = ExamineDirectory(#PB_Any,folder,"*"+ext)
  While NextDirectoryEntry(d)
    testfn.s = DirectoryEntryName(d)
    testfn = Left(testfn,Len(testfn)-Len(ext))
    testfn = RemoveString(testfn,pfx)
    n.i = Val(testfn)
    If prev_n>0 And n>(prev_n+1)
      FinishDirectory(d)
      ProcedureReturn prev_n
    EndIf
    prev_n = n
  Wend
  FinishDirectory(d)
  
  ProcedureReturn prev_n
  
EndProcedure
but the increment method is still 35x faster.
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
User avatar
NicTheQuick
Addict
Addict
Posts: 1226
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Quickly find the last file in a numbered sequence

Post by NicTheQuick »

Cool 8)
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
jacdelad
Addict
Addict
Posts: 1473
Joined: Wed Feb 03, 2021 12:46 pm
Location: Planet Riesa
Contact:

Re: Quickly find the last file in a numbered sequence

Post by jacdelad »

I hope you tested with the debugger turned off. It's beyond my underatanding that testing if a file exists for 10000 times is faster than using ExamineDirectory and storing the highest value. Can someone explain?
PureBasic 6.04/XProfan X4a/Embarcadero RAD Studio 11/Perl 5.2/Python 3.10
Windows 11/Ryzen 5800X/32GB RAM/Radeon 7770 OC/3TB SSD/11TB HDD
Synology DS1821+/36GB RAM/130TB
Synology DS920+/20GB RAM/54TB
Synology DS916+ii/8GB RAM/12TB
Seymour Clufley
Addict
Addict
Posts: 1233
Joined: Wed Feb 28, 2007 9:13 am
Location: London

Re: Quickly find the last file in a numbered sequence

Post by Seymour Clufley »

Testing now with debugger turned off.

Linear: 4222ms

LinkedList: 165ms (25x linear)

RealtimeComparison: 164ms (25x linear)

Incremental: 5ms (844x linear)
JACK WEBB: "Coding in C is like sculpting a statue using only sandpaper. You can do it, but the result wouldn't be any better. So why bother? Just use the right tools and get the job done."
User avatar
jacdelad
Addict
Addict
Posts: 1473
Joined: Wed Feb 03, 2021 12:46 pm
Location: Planet Riesa
Contact:

Re: Quickly find the last file in a numbered sequence

Post by jacdelad »

I may be wrong, but why use the linked list with ExamineDirectory? The task is to find the last file (=the highest number), so wouldn't a sole variable be enough to the store the value?
PureBasic 6.04/XProfan X4a/Embarcadero RAD Studio 11/Perl 5.2/Python 3.10
Windows 11/Ryzen 5800X/32GB RAM/Radeon 7770 OC/3TB SSD/11TB HDD
Synology DS1821+/36GB RAM/130TB
Synology DS920+/20GB RAM/54TB
Synology DS916+ii/8GB RAM/12TB
User avatar
NicTheQuick
Addict
Addict
Posts: 1226
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Quickly find the last file in a numbered sequence

Post by NicTheQuick »

jacdelad wrote: Wed Nov 10, 2021 8:26 am I may be wrong, but why use the linked list with ExamineDirectory? The task is to find the last file (=the highest number), so wouldn't a sole variable be enough to the store the value?
You also want to find a gap in between.
If your numbers are 1,2,3,5,6,7, then 3 is the highest number because 4 does not exist.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
jacdelad
Addict
Addict
Posts: 1473
Joined: Wed Feb 03, 2021 12:46 pm
Location: Planet Riesa
Contact:

Re: Quickly find the last file in a numbered sequence

Post by jacdelad »

Ah yes, my fault.
PureBasic 6.04/XProfan X4a/Embarcadero RAD Studio 11/Perl 5.2/Python 3.10
Windows 11/Ryzen 5800X/32GB RAM/Radeon 7770 OC/3TB SSD/11TB HDD
Synology DS1821+/36GB RAM/130TB
Synology DS920+/20GB RAM/54TB
Synology DS916+ii/8GB RAM/12TB
davido
Addict
Addict
Posts: 1890
Joined: Fri Nov 09, 2012 11:04 pm
Location: Uttoxeter, UK

Re: Quickly find the last file in a numbered sequence

Post by davido »

@ Seymour Clufley,
Nice, thank you.
DE AA EB
Post Reply