I thought you needed to search the files. (btw: FTS is "full text search"
So you don't need the 'comparison' in code? You just want view the differences?
WinDiff does that much elegantly than fc or comp
There's a few free file comparison tools; but I'm used to windiff
https://learn.microsoft.com/en-us/troub ... ff-utility
Can you post links of examples of the two files you're comparing & show what you want you want to match/not match?
Example, are these two files "the same"
apple
pear
peach
and
peach
apple
pear
(they contain the same lines, just in different order)
Load huge file directly to MAP
Re: Load huge file directly to MAP
Hi Kcc,
In a way, you want to backup or synchronize two disks? but first see the differential?
If you're doing it for the fun of it PB, that's fine, but if not, there are tools that do it much faster than you can do it yourself (command line or GUI, and in any case free and lightweight).
There's also a way to use the Search Everything database, if the latter is installed.
PS. FC.exe (File Compare) has always been present in all versions of MS-DOS and then Windows (now in the %Windir%\System32 directory), but it's poorly documented because young users only want to click.
In a way, you want to backup or synchronize two disks? but first see the differential?
If you're doing it for the fun of it PB, that's fine, but if not, there are tools that do it much faster than you can do it yourself (command line or GUI, and in any case free and lightweight).
There's also a way to use the Search Everything database, if the latter is installed.
PS. FC.exe (File Compare) has always been present in all versions of MS-DOS and then Windows (now in the %Windir%\System32 directory), but it's poorly documented because young users only want to click.
Re: Load huge file directly to MAP
Hi
I hope this code will help you.
I hope this code will help you.
Code: Select all
; Text file helper procedures
; by benubi
; Public Domain / Free software
Prototype.i EnumBufferLinesCallback(*userdata, String$) ; Returns 0 to continue enumeration OR NON-ZERO (e.g. error number) TO ABORT ENUMERATION
Procedure$ BytesToHex(*B.ascii, count)
Protected result$ = Space((count*3)-1)
Protected i
Protected *S = @result$
While i<count
PokeS(*S,RSet(Hex(*B\a),2,"0"),2,#PB_Unicode|#PB_String_NoZero)
*S + 6
*B+1
i+1
Wend
ProcedureReturn result$
EndProcedure
Procedure$ GuessEOL(*Start.Byte, *Limit, Type = -1) ; *Start: memory pointer, *Limit = *Start + ByteSize, Type = #PB_Any, #PB_Unicode, #PB_UTF8 or #PB_Ascii
;
; Guess & return EOL sequence of a text file (CRLF, LFCR, LF or CR)
; UTF8_BOM = $BFBBEF
; UTF16_BOM = $FEFF
Protected *C.Character, *A.Ascii , cr , lf
If type = -1 ; Default is PB internal format (unicode for all current PB versions, 2023)
type = #PB_Unicode
EndIf
If type = #PB_Unicode
*C = *start
;Debug "unicode"
While *Limit > *C
Select *C\C
Case 13
If cr
; Debug "EOL=CR"
ProcedureReturn #CR$
ElseIf lf
; Debug "EOL=LF"
ProcedureReturn #LFCR$
EndIf
cr + 1
Case 10
If lf
; Debug "EOL=LF"
ProcedureReturn #LF$
ElseIf cr
; Debug "EOL=CR"
ProcedureReturn #CRLF$
EndIf
lf + 1
Default
If lf
; Debug "EOL=LF"
ProcedureReturn #LF$
ElseIf cr
; Debug "EOL=CR"
ProcedureReturn #CR$
EndIf
EndSelect
*C + 2
Wend
Else
*A = *Start
; Debug "ascii/utf8"
While *Limit > *A
Select *a\a
Case #CR
If cr
; Debug "EOL=CR"
ProcedureReturn #CR$
ElseIf lf
; Debug "EOL=LF"
ProcedureReturn #LFCR$
EndIf
cr + 1
Case #LF
If lf
; Debug "EOL=LF"
ProcedureReturn #LF$
ElseIf cr
; Debug "EOL=CRLF"
ProcedureReturn #CRLF$
EndIf
lf + 1
Default
If lf
; Debug "EOL=LF"
ProcedureReturn #LF$
ElseIf cr
; Debug "EOL=CR"
ProcedureReturn #CR$
EndIf
EndSelect
*A + 1
Wend
EndIf
; If cr
; ProcedureReturn #CR$
; ElseIf lf
; ProcedureReturn #LF$
; Else
; ProcedureReturn #LFCR$
; EndIf
EndProcedure
Procedure.i CountBufferLines(*Buffer, BufferSize, Type = -1, Eol$=#Null$)
Protected BOM , BOMLEN
Protected charsize
If Type = -1 And BufferSize >= 3
CopyMemory(*Buffer, @BOM, 3)
If $BFBBEF = BOM ; UTF8 check
Type = #PB_UTF8
BOMLEN = 3
EndIf
EndIf
If BOM & $FFFF = $FEFF ; UTF16 check
BOMLEN = 2
type = #PB_Unicode
EndIf
If type = -1 ; no BOM or defined character format => switch to ascii
type = #PB_Ascii ; use ascii
EndIf
If type = #PB_UTF16
charsize = 2
Else
charsize = 1
EndIf
If Eol$=#Null$ ; = Empty$ or ""
Eol$ = GuessEOL(*Buffer, BufferSize + *Buffer, type)
EndIf
Protected QEOL.q
Protected eol_len = Len(Eol$)
Protected eol_blen = StringByteLength(eol$, Type)
Protected *Z1.Ascii, *Z2.Ascii, c
Protected *Lim, *start,*limeol,*QEOL
; Debug "EOL$="+BytesToHex(@EOL$, eol_blen)
; Debug "EOLLEN="+eol_len+" / "+eol_blen
; Debug "Type="+Str(type )
; Debug "Ascii: "+#PB_Ascii
; Debug "UTF8:" +#PB_UTF8
; Debug "UTF16: "+#PB_UTF16
; Debug "charsize="+charsize
*start = *Buffer + *BOMLEN
*Lim = *Buffer + BufferSize
*QEOL = @QEOL
PokeS(*QEOL, Eol$, eol_len, Type | #PB_String_NoZero)
*limeol = *QEOL + eol_blen
While *start<*Lim
*z1 = *start
*Z2 = *qeol
While *Z1\a = *z2\a And *Z1<*LIM And *Z2<*limeol
*Z1+1:*Z2+1
Wend
If *z2 = *limeol
*start=*z1
c=c+1
If *start = *Lim
ProcedureReturn c
EndIf
Continue
EndIf
*Start + charsize
Wend
ProcedureReturn c + 1 ; unterminated last/lone line /empty file
EndProcedure
Procedure.i BufferLines(*Buffer, BufferSize, List Result.s(), Type = -1, Eol$ = #Null$) ; *Buffer: memory pointer of text file, BufferSize: BYTE size of the file in memory, Result.s(): result list where to return the file's lines, Type: character type (force ascii, utf8, unicode, #PB_Any), EOL$: End Of Line sequence. Leave empty to guess or set to force CRLF$ etc.
; --------------------------------------------------------------------
; count = BufferLines(*Buffer, BufferSize, List Result.s(), Type, Eol$)
;
; The procedure adds text lines from a buffer to a PB String List.
;
; *Buffer = *memory pointer of the text file (ascii, utf-8 or utf-16)
; BufferSize = Byte size of the buffer
; Result.s() = The results list where to add the text file's lines
; Type = #PB_Any (default):guess/read BOM, #PB_Ascii, #PB_Unicode, #PB_UTF8
; Eol$ = #Empty$ or #Null$: Guess End Of Line sequence, other$: force use of EOL sequence e.g. Eol$=#CRLF$
; --------------------------------------------------------------------
;
Protected eol.q, eolbytes, charsize, i, *EOL, chars
Protected *C.Character, *A.ascii , *Z1.ascii, *Z2.ascii , *eolim
Protected *START, *LIM
Protected BOM.i, BOMLEN
*START = *Buffer
*LIM = *Buffer + BufferSize
If Type = -1 And BufferSize >= 3
CopyMemory(*Buffer, @BOM, 3)
If $BFBBEF = BOM
BOMLEN = 3
Type = #PB_UTF8 | #PB_ByteLength
EndIf
EndIf
If BOM & $FFFF = $FEFF
BOMLEN + 2
type = #PB_Unicode
EndIf
If type = -1
type = #PB_Ascii
ElseIf type = #PB_UTF8
type = type | #PB_ByteLength
EndIf
If #Empty$ = Eol$ Or #Null$ = Eol$
Eol$ = GuessEOL(*Buffer, *LIM, Type)
If #Empty$ = Eol$
Eol$ = #CRLF$
; Debug "EMPTY EOL?!"
EndIf
EndIf
If type = 2 : charsize = 2 : Else : charsize = 1 : EndIf
PokeS(@eol, Eol$, Len(eol$), type | #PB_String_NoZero)
eolbytes = Len(eol$) * charsize
*EOL = @eol
*eolim = *EOL + eolbytes
*A = *Start + BOMLEN
If type = #PB_UTF8
type = type | #PB_ByteLength
EndIf
While *A < *LIM
*z1 = *A
*z2 = *EOL
While *z1\a = *z2\a And *z1 < *lim And *z2 < *eolim
*Z1 + 1
*Z2 + 1
Wend
If *z2 = *eolim
AddElement(Result())
Result() = PeekS(*A, chars, Type)
*Start = *Start + (chars * charsize) + eolbytes
i + 1
chars = 0
If *START>=*LIM
ProcedureReturn i
EndIf
Continue
EndIf
chars + 1
*A + charsize
Wend
If *START < *LIM
AddElement(Result())
Result() = PeekS(*START, chars, Type)
EndIf
Debug i
ProcedureReturn i
EndProcedure
Procedure.i EnumBufferLines(*Buffer, BufferSize, EnumBufferLinesCallback.EnumBufferLinesCallback, *CallbackCookie = 0, Type = -1, Eol$ = #Null$) ; Enumerate text file lines in *buffer to a Callback procedure
;
; EnumBufferLines(*Buffer, BufferSize, EnumBufferLinesCallback, *CallbackCookie, Type, Eol$)
;
; Enumerate lines form a text file to a callback.
; *Buffer = *memory pointer of the text file (ascii, utf-8 or utf-16)
; BufferSize = Byte size of the buffer
; EnumBufferLinesCallback: Pointer to @YourCallbackProcedure() where to enumerate the text file line string$'s. The Callback has the format: Callback(*Cookie, String$), where *cookie is an arbitrary (optional) value set by the user
; *CallbackCookie: *Userdata
; Type = #PB_Any (default):guess text format (read bom/guess); other: force #PB_Ascii, #PB_Unicode, #PB_UTF8 "text file" format in *Buffer
; EOL$ = #Empty$ or force EOL sequence
;
Protected eol.i, eolbytes, charsize, *EOL, chars
Protected *C.Character, *A.ascii , *Z1.ascii, *Z2.ascii , *eolim
Protected *START, *LIM
Protected BOM.i, BOMLEN
Protected result = #Null
*START = *Buffer
*LIM = *Buffer + BufferSize
If Type = -1 And BufferSize >= 3
CopyMemory(*Buffer, @BOM, 3)
If $BFBBEF = BOM
; Debug ">>> found UTF8 BOM"
Type = #PB_UTF8
BOMLEN = 3
EndIf
EndIf
If BOM & $FFFF = $FEFF
; Debug ">>> Found Unicode UTF16 BOM"
type = #PB_UTF16
BOMLEN = 2
EndIf
If type = -1
; Debug ">>> switch to default/ascii"
type = #PB_Ascii
EndIf
If EOL$=#Empty$
; Debug ">>>Guess EOL..."
Eol$ = GuessEOL(*Buffer, *LIM, Type )
If #Empty$ = Eol$
; Debug ">>> Guess EOL: NO EOL found, set default CRLF"
Eol$ = #CRLF$
Else
; Debug ">>> Guessed EOL:"+BytesToHex(@eol$,Len(eol$)*2)
EndIf
EndIf
If type = #PB_UTF16 : charsize = 2 : Else : charsize = 1 : EndIf
*EOL = @eol
PokeS(*eol, Eol$, Len(Eol$), type | #PB_String_NoZero)
eolbytes = Len(eol$) * charsize
If type=#PB_UTF8
type = type |#PB_ByteLength
EndIf
*eolim = *EOL + eolbytes
*A = *Buffer + BOMLEN
*START = *A
While *A < *LIM ; Go through buffer character by character
*z1 = *A
*z2 = *EOL
While *z1\a = *z2\a And *z1 < *lim And *z2 < *eolim ; Check for EOL byte by byte
*Z1 + 1
*Z2 + 1
Wend
If *z2 = *eolim ; Found EOL, calling callback procedure
result = EnumBufferLinesCallback(*CallbackCookie, PeekS(*START, chars, Type))
If result
ProcedureReturn result
EndIf
*Start = *START + (chars*charsize) + eolbytes
*A=*START
i + 1
chars = 0
If *A => *LIM
ProcedureReturn result
EndIf
Continue
EndIf
chars + 1
*A + charsize
Wend
If *Start < *LIM ; check for unterminated last line, call the callback procedure if it's the case
result = EnumBufferLinesCallback(*CallbackCookie, PeekS(*Start, chars, Type))
EndIf
ProcedureReturn result
EndProcedure
Procedure.i BLOAD(File$, *FileSize.INTEGER) ; Loads a file to new buffer. Returns a *Memory pointer, and file size (write in *FileSize pointer parameter, optional).
Protected fh, *Memory, LOF
fh = ReadFile(-1, file$) ; Open the file to load in read-mode
If fh
Lof = Lof(fh) ; Get file size for buffer allocation
If *FileSize ; Return file size
*FileSize\i = LOF
EndIf
If LOF >1
; Check for minimum memory allocation size (2 bytes on Windows, IDK for the other OS'es)
; Allocate memory without zero-ing it (will be completely overwritten after ReadData()).
*Memory = AllocateMemory(Lof, #PB_Memory_NoClear)
Else ; file size <= 1 bytes
*Memory = AllocateMemory(2) ; Minimum malloc size (Windows or all OS?)
EndIf
If *Memory
ReadData(fh, *Memory, LOF) ; Read complete file in one step
EndIf
CloseFile(fh) ; Close the file
ProcedureReturn *Memory
EndIf
ProcedureReturn #False
EndProcedure
Structure _load_text_file_info
*ArrayBase
index.i
EndStructure
Procedure _load_text_file_callback(*Info._load_text_file_info, String$)
Protected *S.STRING = *Info\ArrayBase + (SizeOf(STRING) * *Info\index)
*info\index = *info\index + 1
*S\s = String$
; Debug "String = "+String$
ProcedureReturn #Null
EndProcedure
Procedure.i Load_Text_File(File$, Array Result.s(1))
Protected c, i
Protected *Memory, fsize
Protected info._load_text_file_info
*Memory = BLOAD(file$,@fsize)
If *Memory
c = CountBufferLines(*Memory, fsize)
ReDim Result(c + 1)
info\ArrayBase=@result() ; + (2 * SizeOf(Integer))
info\index =0
EnumBufferLines(*Memory, fsize, @_load_text_file_callback(), @info)
FreeMemory(*Memory)
EndIf
ProcedureReturn c
EndProcedure
Dim MyFile.s(1)
Define c = Load_Text_File(OpenFileRequester("Load text file to array","file.txt","Text files|*.txt;*.html;*.htm;*.pb;*.pbi;*.css;*.xml;*.xhtml;*.lua;*.php;*.js;*.c;*.cpp;*.h;*.hpp|All|*.*",0), MyFile())
Define i
For i=0 To c-1
Debug RSet(FormatNumber(i+1,0),12)+" "+MyFile(i)
Next
Debug "----------END OF FILE------"
Debug "lines="+c
NewMap KCC_A.i()
NewMap KCC_B.i()
Procedure KCC_CallbackA(*cookie.INTEGER, String$)
Shared KCC_A()
*cookie\i = *cookie\i + 1 ; line counter
AddMapElement(KCC_A(), String$, #PB_Map_NoElementCheck) ; store line
KCC_A() = *cookie\i ; set current line number
ProcedureReturn #Null
EndProcedure
Procedure KCC_CallbackB(*cookie.INTEGER, String$)
Shared KCC_B()
*cookie\i = *cookie\i + 1 ; line counter
AddMapElement(KCC_B(), String$, #PB_Map_NoElementCheck) ; store line
KCC_B() = *cookie\i ; set current line number
ProcedureReturn #Null
EndProcedure
Procedure KCC_MapParser(FileNameA$, FileNameB$)
Shared KCC_A(), KCC_B()
Protected *AFILE, ASIZE
Protected *BFILE, BSIZE
Protected ACOOKIE
Protected BCOOKIE, diff
*AFILE = BLOAD(FileNameA$, @ASIZE)
*BFILE = BLOAD(FileNameB$, @BSIZE)
ClearMap(KCC_A())
ClearMap(KCC_B())
If *AFILE And *BFILE
EnumBufferLines(*AFILE, ASIZE, @KCC_CallbackA(),@ACOOKIE)
EnumBufferLines(*BFILE, BSIZE, @KCC_CallbackB(),@BCOOKIE)
Protected A_MISSING, A_DIFFERENT
Protected B_MISSING, B_DIFFERENT
ForEach KCC_A()
If Not FindMapElement(KCC_B(), MapKey(KCC_A()))
A_MISSING + 1
ElseIf KCC_A()<>KCC_B()
diff = 1
While NextMapElement(KCC_B())
If MapKey(KCC_B())<>MapKey(KCC_A())
Break
EndIf
If KCC_B()=KCC_A()
diff = 0
Break
EndIf
Wend
If diff
A_DIFFERENT + 1
EndIf
EndIf
Next
ForEach KCC_B()
If Not FindMapElement(KCC_A(), MapKey(KCC_B()))
B_MISSING + 1
ElseIf KCC_A()<>KCC_B()
diff = 1
While NextMapElement(KCC_A())
If MapKey(KCC_B())<>MapKey(KCC_A())
Break
EndIf
If KCC_B()=KCC_A()
diff = 0
Break
EndIf
Wend
If diff
B_DIFFERENT + 1
EndIf
EndIf
Next
EndIf
Debug "Map comparison stats:"
Debug "A_MISSING (lines from A missing in B):" + A_MISSING
Debug "A_DIFFERENT (lines in A different position in B): "+ A_DIFFERENT
Debug "A elements: "+MapSize(KCC_A())
Debug "B_MISSING (lines from B missing in A):" + B_MISSING
Debug "B_DIFFERENT (lines in B different position in A): "+ B_DIFFERENT
Debug "B elements: "+MapSize(KCC_B())
If A_DIFFERENT=0 And B_DIFFERENT= 0 And A_MISSING=0 And B_MISSING=0
Debug "files are identical. (they may only be encoded differently e.g. one in UTF16 the other in Ascii with different EOL sequences)"
Else
Debug "files are different."
EndIf
If *AFILE
FreeMemory(*AFILE)
EndIf
If *BFILE
FreeMemory(*BFILE)
EndIf
EndProcedure
Debug KCC_MapParser(OpenFileRequester("Select File A","","",0),OpenFileRequester("Select file B","","",0))
- Kwai chang caine
- Always Here
- Posts: 5357
- Joined: Sun Nov 05, 2006 11:42 pm
- Location: Lyon - France
Re: Load huge file directly to MAP
@JASSING
After read you, iI have try WinDiff but not have real succès for the moment with my 2 huges txt files
For the format, it's an enumeration of the nearly same HD, because just some files have changed (Added/Removed/Modified)
I want just know if the file exist in the other HD, and if yes, have it the same size,
So i have tested several time, PB always enumerate in the same order
This is just possible, a line is missing or theFileSize in the line are different
I read your post
After read you, iI have try WinDiff but not have real succès for the moment with my 2 huges txt files
For the format, it's an enumeration of the nearly same HD, because just some files have changed (Added/Removed/Modified)
I want just know if the file exist in the other HD, and if yes, have it the same size,
So i have tested several time, PB always enumerate in the same order
This is just possible, a line is missing or theFileSize in the line are different
Code: Select all
Disk MASTER
1|C:\MyPath\MyFile.log|SizeFile|DateCreated|DateAccessed|DateModified
2|MyPhoto.jpg|2222222|DateCreated|DateAccessed|DateModified
3|MyExe.jpg|SizeFile|DateCreated|DateAccessed|DateModified
4|C:\MyOtherPath\MyFile2.log|SizeFile|DateCreated|DateAccessed|DateModified
5|MyPhoto2.jpg|SizeFile|DateCreated|DateAccessed|DateModified
6|MyExe2.jpg|SizeFile|DateCreated|DateAccessed|DateModified
etc ...
Disk SLAVE
1|C:\MyPath\MyFile.log|SizeFile|DateCreated|DateAccessed|DateModified
2|MyPhoto.jpg|SizeFile|DateCreated|DateAccessed|DateModified
3|C:\MyOtherPath\MyFile2.log|SizeFile|DateCreated|DateAccessed|DateModified
4|MyPhoto2.jpg|111111|DateCreated|DateAccessed|DateModified
5|MyExe2.jpg|SizeFile|DateCreated|DateAccessed|DateModified
6|MyDoc.jpg|SizeFile|DateCreated|DateAccessed|DateModified
etc ...
@MARC56US and BENUBIResult wrote: MyPhoto.jpg => 2222222 <> 111111 = Different Size
MyExe.jpg => Find in DiskMaster not in DiskSlave
MyDoc.jpg => Find in DiskSlave not in DiskMaster
etc ...
I read your post
The happiness is a road...
Not a destination
Not a destination
Re: Load huge file directly to MAP
The map is a key-value.
There are several modern incarnations of the ancient technology ( https://en.wikipedia.org/wiki/MUMPS ).
One of the implementations has examples for Purebasic.
An analogue of sqlite but without sql minimono is free. http://minimdb.com/minimono.html
http://minimdb.com/
There are several modern incarnations of the ancient technology ( https://en.wikipedia.org/wiki/MUMPS ).
One of the implementations has examples for Purebasic.
An analogue of sqlite but without sql minimono is free. http://minimdb.com/minimono.html
http://minimdb.com/
Last edited by useful on Mon May 29, 2023 7:53 pm, edited 1 time in total.
Dawn will come inevitably.
Re: Load huge file directly to MAP
Since you want to know if a file is modified, you shouldn't rely on dates - you'd need to do a hash, or comparememory() of the files.
A file can have different modified dates, but have the same content. Conversely, a file can have the same dates, but different contents.
If all you care about is size difference, and if a file is on A but not b (and inversely); that seems rather trivial.
A file can have different modified dates, but have the same content. Conversely, a file can have the same dates, but different contents.
If all you care about is size difference, and if a file is on A but not b (and inversely); that seems rather trivial.
- Kwai chang caine
- Always Here
- Posts: 5357
- Joined: Sun Nov 05, 2006 11:42 pm
- Location: Lyon - France
Re: Load huge file directly to MAP
@Marc56US
In fact it's a little bit the two idea
Like all of this forum i think, I like try to do my own soft, mainly when i believe it's not really impossible for me
In this project, the principle is simple...but it's just limited by the speed of the machine
And i'm tired to try other software who works some time, and a day...for X reason not works without me being able to do anything about it
With my own program, they are not very effective all the time, but if i have a problem, i can do something and modify a part of him (Finally...with your help at all )
@BENUBI
YESSSS !!!
What a big and nice code you give to me !!!!
I have tested it with my two huges files and he was able to go all the way so far
I continue to test it, for see if it's fully what i need
One thousand of thanks for this great present
@USEFUL
Thanks to your link, i take a look
In fact it's a little bit the two idea
Like all of this forum i think, I like try to do my own soft, mainly when i believe it's not really impossible for me
In this project, the principle is simple...but it's just limited by the speed of the machine
And i'm tired to try other software who works some time, and a day...for X reason not works without me being able to do anything about it
With my own program, they are not very effective all the time, but if i have a problem, i can do something and modify a part of him (Finally...with your help at all )
@BENUBI
YESSSS !!!
What a big and nice code you give to me !!!!
I have tested it with my two huges files and he was able to go all the way so far
I continue to test it, for see if it's fully what i need
One thousand of thanks for this great present
@USEFUL
Thanks to your link, i take a look
The happiness is a road...
Not a destination
Not a destination