Marc56us, I can see your code returning the identifiers but not understanding regex, I don't know how that actually returns the string in the middle of them.
The list of things I will never understand is growing.
Calibre .OPF file format help needed
- Fangbeast
- PureBasic Protozoa
- Posts: 4749
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Re: Calibre .OPF file format help needed
Amateur Radio, D-STAR/VK3HAF
Re: Calibre .OPF file format help needed
Hi Fangbeast,
These are the brackets that capture part of the text.
In this new version, I make two captures. One For the title of the field And the other For the field itself.
The first capture is a raw text that does not change.
The second one is a "non-greedy" capture that means that the capture stops at the first occurrence found. Otherwise you would have several overlapping captures.
Code: Select all
; Extract_OPF_2.pb
EnableExplicit
Enumeration
#hFile
EndEnumeration
NewList RegEx$()
; regex how to:
;
; ~"<dc:(title)>(.+?)</dc:title>"
; That mean:
; Search from left to right for text "<dc:title>"
; Capture the litteral text "title" as group #1 using ()
; Then
; Continue reading and capture anything (.+) until found literal text (.+?) "</dc:title>"
; second () will be group #2
; see belong how to use it
; Ident$ = RegularExpressionGroup(0, 1)
; Value$ = RegularExpressionGroup(0, 2)
;
; Regex need to code some special chars
; space = \h (horizontal space)
; quote need to be escaped \"
AddElement(Regex$()) : Regex$() = ~"<dc:(title)>(.+?)</dc:title>"
AddElement(Regex$()) : Regex$() = ~"<dc:(creator) opf:file-as=\"(.+?)</dc:creator>"
AddElement(Regex$()) : Regex$() = ~"<dc:(date)>(.+?)</dc:date>"
AddElement(Regex$()) : Regex$() = ~"<dc:(publisher)>(.+?)</dc:publisher>"
AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"(ISBN)\">(.+?)</dc:identifier>"
AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"(GOOGLE)\">(.+?)</dc:identifier>"
AddElement(Regex$()) : Regex$() = ~"<dc:(language)>(.+?)</dc:language>"
AddElement(Regex$()) : Regex$() = ~"<meta\\hname=\"calibre:(series)\"\\hcontent=\"(.+?)\"/>"
AddElement(Regex$()) : Regex$() = ~"<dc:(description)>(.+?)</dc:description>"
; Read OPF file
#File_Name = "TestFile.opf"
If Not OpenFile(0, #File_Name)
Debug #File_Name + " Can't be found or open"
End
EndIf
Debug "Reading: " + #File_Name + #CRLF$
Define Txt$
While Not Eof(#hFile)
Txt$ = ReadString(#hFile, #PB_Ascii | #PB_File_IgnoreEOL)
Wend
CloseFile(#hFile)
Define Ident$, Value$
ForEach Regex$()
If Not CreateRegularExpression(0, Regex$(), #PB_RegularExpression_DotAll)
Debug "Bad RegEx (" + RegEx$() + ")"
Break
Else
If ExamineRegularExpression(0, Txt$)
While NextRegularExpressionMatch(0)
Ident$ = RegularExpressionGroup(0, 1)
Value$ = RegularExpressionGroup(0, 2)
;Debug " " + Ident$ + " : " + Value$
Debug " " + Ident$ + Space(12 - Len(Ident$)) + " : " + Value$
Wend
EndIf
FreeRegularExpression(0)
EndIf
Next
Debug "Done"
End
Code: Select all
Reading: TestFile.opf
title : Skylark of Space
creator : Smith, E. E. 'Doc'" opf:role="aut">E. E. 'Doc' Smith
date : 2011-09-29T22:52:37+00:00
publisher : Berkley
ISBN : 9780425046401
GOOGLE : boBY9bNVQwAC
language : eng
series : Skylark
description : <div><div&g [...]
- Fangbeast
- PureBasic Protozoa
- Posts: 4749
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Re: Calibre .OPF file format help needed
Thanks for the help Marcus, I really appreciate it.
As for the horny goat wink, be careful not to let Idle see you.
You know how these Kiwi Goat shaggers get when a goat winks at them:):)
As for the horny goat wink, be careful not to let Idle see you.
You know how these Kiwi Goat shaggers get when a goat winks at them:):)
Amateur Radio, D-STAR/VK3HAF
- Fangbeast
- PureBasic Protozoa
- Posts: 4749
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Re: Calibre .OPF file format help needed
Marcus, I fangified your code to help me read the results.
Now I just have to find a way to clear out the tags and render readable text. Probably regex again (hehehe).
Now I just have to find a way to clear out the tags and render readable text. Probably regex again (hehehe).
Code: Select all
;--------------------------------------------------------------------------------------------------
; Visual designer created forms and constants
;--------------------------------------------------------------------------------------------------
Global DPIfixX.d = DesktopResolutionX(), DPIfixY.d = DesktopResolutionY()
Define EventID, MenuID, GadgetID, WindowID
; Window Constants
Enumeration 1
#Window_Getopf
EndEnumeration
#WindowIndex = #PB_Compiler_EnumerationValue
; Gadget Constants
Enumeration 1
; Window_Getopf
#Gadget_Getopf_lTitle
#Gadget_Getopf_Title
#Gadget_Getopf_lCreator
#Gadget_Getopf_Creator
#Gadget_Getopf_lDate
#Gadget_Getopf_Date
#Gadget_Getopf_lPublisher
#Gadget_Getopf_Publisher
#Gadget_Getopf_lIsbn
#Gadget_Getopf_ISBN
#Gadget_Getopf_lGoogle
#Gadget_Getopf_Google
#Gadget_Getopf_lLanguage
#Gadget_Getopf_Language
#Gadget_Getopf_lSeries
#Gadget_Getopf_Series
#Gadget_Getopf_lDescription
#Gadget_Getopf_Description
#Gadget_Getopf_Logfile
EndEnumeration
#GadgetIndex = #PB_Compiler_EnumerationValue
Procedure.i Window_Getopf()
If OpenWindow(#Window_Getopf,0,0,1250,550,"Get and display Calibre OPF contents",#PB_Window_SystemMenu|#PB_Window_ScreenCentered|#PB_Window_Invisible)
TextGadget(#Gadget_Getopf_lTitle,10,10,130,25,"Title",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lTitle,LoadFont(#Gadget_Getopf_lTitle,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Title,145,10,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Title,LoadFont(#Gadget_Getopf_Title,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lCreator,10,45,130,25,"Creator",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lCreator,LoadFont(#Gadget_Getopf_lCreator,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Creator,145,45,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Creator,LoadFont(#Gadget_Getopf_Creator,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lDate,10,80,130,25,"Date",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lDate,LoadFont(#Gadget_Getopf_lDate,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Date,145,80,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Date,LoadFont(#Gadget_Getopf_Date,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lPublisher,10,115,130,25,"Publisher",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lPublisher,LoadFont(#Gadget_Getopf_lPublisher,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Publisher,145,115,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Publisher,LoadFont(#Gadget_Getopf_Publisher,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lIsbn,10,150,130,25,"ISBN",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lIsbn,LoadFont(#Gadget_Getopf_lIsbn,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_ISBN,145,150,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_ISBN,LoadFont(#Gadget_Getopf_ISBN,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lGoogle,10,185,130,25,"Google",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lGoogle,LoadFont(#Gadget_Getopf_lGoogle,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Google,145,185,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Google,LoadFont(#Gadget_Getopf_Google,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lLanguage,10,220,130,25,"Language",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lLanguage,LoadFont(#Gadget_Getopf_lLanguage,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Language,145,220,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Language,LoadFont(#Gadget_Getopf_Language,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lSeries,10,255,130,25,"Series",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lSeries,LoadFont(#Gadget_Getopf_lSeries,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Series,145,255,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Series,LoadFont(#Gadget_Getopf_Series,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lDescription,10,290,130,25,"Description",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lDescription,LoadFont(#Gadget_Getopf_lDescription,"Comic Sans MS",10))
EditorGadget(#Gadget_Getopf_Description,145,290,540,250,#PB_Editor_ReadOnly|#PB_Editor_WordWrap)
SetGadgetFont(#Gadget_Getopf_Description,LoadFont(#Gadget_Getopf_Description,"Comic Sans MS",10))
ListIconGadget(#Gadget_Getopf_Logfile,695,10,545,530,"Log entry",540,#PB_ListIcon_FullRowSelect|#PB_ListIcon_AlwaysShowSelection|#LVS_NOCOLUMNHEADER)
SetGadgetFont(#Gadget_Getopf_Logfile,LoadFont(#Gadget_Getopf_Logfile,"Comic Sans MS",10))
HideWindow(#Window_Getopf,#False)
ProcedureReturn WindowID(#Window_Getopf)
EndIf
EndProcedure
;--------------------------------------------------------------------------------------------------
; Macros
;--------------------------------------------------------------------------------------------------
Macro Addlog(Texttoadd)
AddGadgetItem(#Gadget_Getopf_Logfile, -1, Texttoadd)
EndMacro
;--------------------------------------------------------------------------------------------------
; Constants
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Structures
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Prototypes
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Globals
;--------------------------------------------------------------------------------------------------
Global OPFFilename.s = "D:\Metadata.opf"
Define quitGetopf = #False
Define Ident$, Value$
;--------------------------------------------------------------------------------------------------
; Declarations
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Datafill
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Bindings
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Main Loop
;--------------------------------------------------------------------------------------------------
If Window_Getopf()
Addlog("Adding RegEx operators")
NewList RegEx$()
AddElement(Regex$()) : Regex$() = ~"<dc:(title)>(.+?)</dc:title>"
AddElement(Regex$()) : Regex$() = ~"<dc:(creator) opf:file-as=\"(.+?)</dc:creator>"
AddElement(Regex$()) : Regex$() = ~"<dc:(date)>(.+?)</dc:date>"
AddElement(Regex$()) : Regex$() = ~"<dc:(publisher)>(.+?)</dc:publisher>"
AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"(ISBN)\">(.+?)</dc:identifier>"
AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"(GOOGLE)\">(.+?)</dc:identifier>"
AddElement(Regex$()) : Regex$() = ~"<dc:(language)>(.+?)</dc:language>"
AddElement(Regex$()) : Regex$() = ~"<meta\\hname=\"calibre:(series)\"\\hcontent=\"(.+?)\"/>"
AddElement(Regex$()) : Regex$() = ~"<dc:(description)>(.+?)</dc:description>"
Addlog("Trying to open: " + OPFFilename.s)
OPFhandle.i = OpenFile(#PB_Any, OPFFilename.s)
If Not OPFhandle.i
Addlog(OPFFilename.s + " can't be found or opened")
End
EndIf
Addlog("Reading: " + OPFFilename.s)
; Define the text file we are reading the opf file into
Define Txt$
Addlog("Reading text string from: " + OPFFilename.s)
While Not Eof(OPFhandle.i)
Txt$ = ReadString(OPFhandle.i, #PB_Ascii | #PB_File_IgnoreEOL)
Wend
CloseFile(OPFhandle.i)
Addlog(OPFFilename.s + " closed, we are finished with it")
Addlog(#Empty$)
Addlog("Attempting to parse each value, text pair from: " + OPFFilename.s)
Addlog(#Empty$)
ForEach Regex$()
If Not CreateRegularExpression(0, Regex$(), #PB_RegularExpression_DotAll)
Addlog("Bad RegEx (" + RegEx$() + ")")
Break
Else
If ExamineRegularExpression(0, Txt$)
While NextRegularExpressionMatch(0)
Ident$ = RegularExpressionGroup(0, 1)
Value$ = RegularExpressionGroup(0, 2)
Select Ident$
Case "title" : SetGadgetText(#Gadget_Getopf_Title, Value$)
Case "creator" : SetGadgetText(#Gadget_Getopf_Creator, Value$)
Case "date" : SetGadgetText(#Gadget_Getopf_Date, Value$)
Case "publisher" : SetGadgetText(#Gadget_Getopf_Publisher, Value$)
Case "ISBN" : SetGadgetText(#Gadget_Getopf_ISBN, Value$)
Case "GOOGLE" : SetGadgetText(#Gadget_Getopf_Google, Value$)
Case "language" : SetGadgetText(#Gadget_Getopf_Language, Value$)
Case "series" : SetGadgetText(#Gadget_Getopf_Series, Value$)
Case "description" : AddGadgetItem(#Gadget_Getopf_Description, -1, Value$)
EndSelect
Addlog(Ident$ + Space(12 - Len(Ident$)) + " : " + Value$)
Wend
EndIf
FreeRegularExpression(0)
EndIf
Next
;Debug "Job done"
Repeat
EventID = WaitWindowEvent()
MenuID = EventMenu()
GadgetID = EventGadget()
WindowID = EventWindow()
Select EventID
Case #PB_Event_CloseWindow
Select WindowID
Case #Window_Getopf
quitGetopf = 1
EndSelect
Case #PB_Event_Gadget
Select GadgetID
;Case #Gadget_Getopf_Title
;Case #Gadget_Getopf_Creator
;Case #Gadget_Getopf_Date
;Case #Gadget_Getopf_Publisher
;Case #Gadget_Getopf_ISBN
;Case #Gadget_Getopf_Google
;Case #Gadget_Getopf_Language
;Case #Gadget_Getopf_Series
;Case #Gadget_Getopf_Description
EndSelect
EndSelect
Until quitGetopf
CloseWindow(#Window_Getopf)
EndIf
End
Amateur Radio, D-STAR/VK3HAF
Re: Calibre .OPF file format help needed
Hi Fangbeast,,
Glad to have helped you.
It's rudimentary, but you can modify it easily, just add the other expressions separated by a "|"
(There must be better HTML filtering codes on the forum)
I modified the "Select" at the "description" level to make it go through the filter.
Full code below
Glad to have helped you.
Yes, here is a small procedure based on RegEx to filter some elements. I just use "ReplaceRegularExpression" to replace excess tags with nothing.Now I just have to find a way to clear out the tags and render readable text. Probably regex again (hehehe).
It's rudimentary, but you can modify it easily, just add the other expressions separated by a "|"
(There must be better HTML filtering codes on the forum)
I modified the "Select" at the "description" level to make it go through the filter.
Full code below
Code: Select all
;--------------------------------------------------------------------------------------------------
; Visual designer created forms and constants
;--------------------------------------------------------------------------------------------------
Global DPIfixX.d = DesktopResolutionX(), DPIfixY.d = DesktopResolutionY()
Define EventID, MenuID, GadgetID, WindowID
Declare Filter_Description()
; Window Constants
Enumeration 1
#Window_Getopf
EndEnumeration
#WindowIndex = #PB_Compiler_EnumerationValue
; Gadget Constants
Enumeration 1
; Window_Getopf
#Gadget_Getopf_lTitle
#Gadget_Getopf_Title
#Gadget_Getopf_lCreator
#Gadget_Getopf_Creator
#Gadget_Getopf_lDate
#Gadget_Getopf_Date
#Gadget_Getopf_lPublisher
#Gadget_Getopf_Publisher
#Gadget_Getopf_lIsbn
#Gadget_Getopf_ISBN
#Gadget_Getopf_lGoogle
#Gadget_Getopf_Google
#Gadget_Getopf_lLanguage
#Gadget_Getopf_Language
#Gadget_Getopf_lSeries
#Gadget_Getopf_Series
#Gadget_Getopf_lDescription
#Gadget_Getopf_Description
#Gadget_Getopf_Logfile
EndEnumeration
#GadgetIndex = #PB_Compiler_EnumerationValue
Procedure.i Window_Getopf()
If OpenWindow(#Window_Getopf,0,0,1250,550,"Get and display Calibre OPF contents",#PB_Window_SystemMenu|#PB_Window_ScreenCentered|#PB_Window_Invisible)
TextGadget(#Gadget_Getopf_lTitle,10,10,130,25,"Title",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lTitle,LoadFont(#Gadget_Getopf_lTitle,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Title,145,10,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Title,LoadFont(#Gadget_Getopf_Title,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lCreator,10,45,130,25,"Creator",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lCreator,LoadFont(#Gadget_Getopf_lCreator,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Creator,145,45,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Creator,LoadFont(#Gadget_Getopf_Creator,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lDate,10,80,130,25,"Date",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lDate,LoadFont(#Gadget_Getopf_lDate,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Date,145,80,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Date,LoadFont(#Gadget_Getopf_Date,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lPublisher,10,115,130,25,"Publisher",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lPublisher,LoadFont(#Gadget_Getopf_lPublisher,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Publisher,145,115,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Publisher,LoadFont(#Gadget_Getopf_Publisher,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lIsbn,10,150,130,25,"ISBN",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lIsbn,LoadFont(#Gadget_Getopf_lIsbn,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_ISBN,145,150,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_ISBN,LoadFont(#Gadget_Getopf_ISBN,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lGoogle,10,185,130,25,"Google",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lGoogle,LoadFont(#Gadget_Getopf_lGoogle,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Google,145,185,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Google,LoadFont(#Gadget_Getopf_Google,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lLanguage,10,220,130,25,"Language",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lLanguage,LoadFont(#Gadget_Getopf_lLanguage,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Language,145,220,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Language,LoadFont(#Gadget_Getopf_Language,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lSeries,10,255,130,25,"Series",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lSeries,LoadFont(#Gadget_Getopf_lSeries,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Series,145,255,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Series,LoadFont(#Gadget_Getopf_Series,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lDescription,10,290,130,25,"Description",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lDescription,LoadFont(#Gadget_Getopf_lDescription,"Comic Sans MS",10))
EditorGadget(#Gadget_Getopf_Description,145,290,540,250,#PB_Editor_ReadOnly|#PB_Editor_WordWrap)
SetGadgetFont(#Gadget_Getopf_Description,LoadFont(#Gadget_Getopf_Description,"Comic Sans MS",10))
ListIconGadget(#Gadget_Getopf_Logfile,695,10,545,530,"Log entry",540,#PB_ListIcon_FullRowSelect|#PB_ListIcon_AlwaysShowSelection|#LVS_NOCOLUMNHEADER)
SetGadgetFont(#Gadget_Getopf_Logfile,LoadFont(#Gadget_Getopf_Logfile, "Consolas", 10)) ; "Comic Sans MS",10))
HideWindow(#Window_Getopf,#False)
ProcedureReturn WindowID(#Window_Getopf)
EndIf
EndProcedure
;--------------------------------------------------------------------------------------------------
; Macros
;--------------------------------------------------------------------------------------------------
Macro Addlog(Texttoadd)
AddGadgetItem(#Gadget_Getopf_Logfile, -1, Texttoadd)
EndMacro
;--------------------------------------------------------------------------------------------------
; Constants
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Structures
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Prototypes
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Globals
;--------------------------------------------------------------------------------------------------
Global OPFFilename.s = "D:\Metadata.opf"
Define quitGetopf = #False
Global Ident$, Value$
;--------------------------------------------------------------------------------------------------
; Declarations
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Datafill
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Bindings
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Main Loop
;--------------------------------------------------------------------------------------------------
If Window_Getopf()
Addlog("Adding RegEx operators")
NewList RegEx$()
AddElement(Regex$()) : Regex$() = ~"<dc:(title)>(.+?)</dc:title>"
AddElement(Regex$()) : Regex$() = ~"<dc:(creator) opf:file-as=\"(.+?)</dc:creator>"
AddElement(Regex$()) : Regex$() = ~"<dc:(date)>(.+?)</dc:date>"
AddElement(Regex$()) : Regex$() = ~"<dc:(publisher)>(.+?)</dc:publisher>"
AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"(ISBN)\">(.+?)</dc:identifier>"
AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"(GOOGLE)\">(.+?)</dc:identifier>"
AddElement(Regex$()) : Regex$() = ~"<dc:(language)>(.+?)</dc:language>"
AddElement(Regex$()) : Regex$() = ~"<meta\\hname=\"calibre:(series)\"\\hcontent=\"(.+?)\"/>"
AddElement(Regex$()) : Regex$() = ~"<dc:(description)>(.+?)</dc:description>"
Addlog("Trying to open: " + OPFFilename.s)
OPFhandle.i = OpenFile(#PB_Any, OPFFilename.s)
If Not OPFhandle.i
Addlog(OPFFilename.s + " can't be found or opened")
End
EndIf
Addlog("Reading: " + OPFFilename.s)
; Define the text file we are reading the opf file into
Define Txt$
Addlog("Reading text string from: " + OPFFilename.s)
While Not Eof(OPFhandle.i)
Txt$ = ReadString(OPFhandle.i, #PB_Ascii | #PB_File_IgnoreEOL)
Wend
CloseFile(OPFhandle.i)
Addlog(OPFFilename.s + " closed, we are finished with it")
Addlog(#Empty$)
Addlog("Attempting to parse each value, text pair from: " + OPFFilename.s)
Addlog(#Empty$)
ForEach Regex$()
If Not CreateRegularExpression(0, Regex$(), #PB_RegularExpression_DotAll)
Addlog("Bad RegEx (" + RegEx$() + ")")
Break
Else
If ExamineRegularExpression(0, Txt$)
While NextRegularExpressionMatch(0)
Ident$ = RegularExpressionGroup(0, 1)
Value$ = RegularExpressionGroup(0, 2)
Select Ident$
Case "title" : SetGadgetText(#Gadget_Getopf_Title, Value$)
Case "creator" : SetGadgetText(#Gadget_Getopf_Creator, Value$)
Case "date" : SetGadgetText(#Gadget_Getopf_Date, Value$)
Case "publisher" : SetGadgetText(#Gadget_Getopf_Publisher, Value$)
Case "ISBN" : SetGadgetText(#Gadget_Getopf_ISBN, Value$)
Case "GOOGLE" : SetGadgetText(#Gadget_Getopf_Google, Value$)
Case "language" : SetGadgetText(#Gadget_Getopf_Language, Value$)
Case "series" : SetGadgetText(#Gadget_Getopf_Series, Value$)
Case "description"
Filter_Description()
AddGadgetItem(#Gadget_Getopf_Description, -1, Value$)
EndSelect
Addlog(Ident$ + Space(12 - Len(Ident$)) + " : " + Value$)
Wend
EndIf
FreeRegularExpression(0)
EndIf
Next
;Debug "Job done"
Repeat
EventID = WaitWindowEvent()
MenuID = EventMenu()
GadgetID = EventGadget()
WindowID = EventWindow()
Select EventID
Case #PB_Event_CloseWindow
Select WindowID
Case #Window_Getopf
quitGetopf = 1
EndSelect
Case #PB_Event_Gadget
Select GadgetID
;Case #Gadget_Getopf_Title
;Case #Gadget_Getopf_Creator
;Case #Gadget_Getopf_Date
;Case #Gadget_Getopf_Publisher
;Case #Gadget_Getopf_ISBN
;Case #Gadget_Getopf_Google
;Case #Gadget_Getopf_Language
;Case #Gadget_Getopf_Series
;Case #Gadget_Getopf_Description
EndSelect
EndSelect
Until quitGetopf
CloseWindow(#Window_Getopf)
EndIf
Procedure Filter_Description()
If CreateRegularExpression(1, ~"&.+?;|h\\d|/p|/div|div&|gt;|p class=\"description\"SUMMARY:br")
ExamineRegularExpression(1, Value$)
While NextRegularExpressionMatch(1)
Value$ = ReplaceRegularExpression(1, Value$, "")
Wend
FreeRegularExpression(1)
Else
Debug "RegEx 2 error " + #CRLF$ + RegularExpressionError()
EndIf
EndProcedure
End
Re: Calibre .OPF file format help needed
As an alternative to the RegEx approach already shown previously here is one using XML only:
It includes a few test options for the test source data that can be set in line from 101 by setting #TestDataSource to various values.
As an experiment I thought I would also try to come up with a procedure to example fields like the description field that have repeating text and see if they can be reduced in an automated and arbitrary way to something that doesn't include the repetition. That will take a bit of experimenting and if successful may be useful in other areas. If it fails it will most likely crash and burn and never be spoken of again in respectable circles.
Code: Select all
;--------------------------------------------------------------------------------------------------
; Visual designer created forms and constants
;--------------------------------------------------------------------------------------------------
Global DPIfixX.d = DesktopResolutionX(), DPIfixY.d = DesktopResolutionY()
Define EventID, MenuID, GadgetID, WindowID
; Window Constants
Enumeration 1
#Window_Getopf
EndEnumeration
#WindowIndex = #PB_Compiler_EnumerationValue
; Gadget Constants
Enumeration 1
; Window_Getopf
#Gadget_Getopf_lTitle
#Gadget_Getopf_Title
#Gadget_Getopf_lCreator
#Gadget_Getopf_Creator
#Gadget_Getopf_lDate
#Gadget_Getopf_Date
#Gadget_Getopf_lPublisher
#Gadget_Getopf_Publisher
#Gadget_Getopf_lIsbn
#Gadget_Getopf_ISBN
#Gadget_Getopf_lGoogle
#Gadget_Getopf_Google
#Gadget_Getopf_lLanguage
#Gadget_Getopf_Language
#Gadget_Getopf_lSeries
#Gadget_Getopf_Series
#Gadget_Getopf_lDescription
#Gadget_Getopf_Description
#Gadget_Getopf_Logfile
EndEnumeration
#GadgetIndex = #PB_Compiler_EnumerationValue
Procedure.i Window_Getopf()
If OpenWindow(#Window_Getopf,0,0,1250,550,"Get and display Calibre OPF contents",#PB_Window_SystemMenu|#PB_Window_ScreenCentered|#PB_Window_Invisible)
TextGadget(#Gadget_Getopf_lTitle,10,10,130,25,"Title",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lTitle,LoadFont(#Gadget_Getopf_lTitle,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Title,145,10,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Title,LoadFont(#Gadget_Getopf_Title,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lCreator,10,45,130,25,"Creator",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lCreator,LoadFont(#Gadget_Getopf_lCreator,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Creator,145,45,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Creator,LoadFont(#Gadget_Getopf_Creator,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lDate,10,80,130,25,"Date",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lDate,LoadFont(#Gadget_Getopf_lDate,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Date,145,80,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Date,LoadFont(#Gadget_Getopf_Date,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lPublisher,10,115,130,25,"Publisher",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lPublisher,LoadFont(#Gadget_Getopf_lPublisher,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Publisher,145,115,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Publisher,LoadFont(#Gadget_Getopf_Publisher,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lIsbn,10,150,130,25,"ISBN",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lIsbn,LoadFont(#Gadget_Getopf_lIsbn,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_ISBN,145,150,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_ISBN,LoadFont(#Gadget_Getopf_ISBN,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lGoogle,10,185,130,25,"Google",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lGoogle,LoadFont(#Gadget_Getopf_lGoogle,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Google,145,185,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Google,LoadFont(#Gadget_Getopf_Google,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lLanguage,10,220,130,25,"Language",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lLanguage,LoadFont(#Gadget_Getopf_lLanguage,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Language,145,220,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Language,LoadFont(#Gadget_Getopf_Language,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lSeries,10,255,130,25,"Series",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lSeries,LoadFont(#Gadget_Getopf_lSeries,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Series,145,255,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Series,LoadFont(#Gadget_Getopf_Series,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lDescription,10,290,130,25,"Description",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lDescription,LoadFont(#Gadget_Getopf_lDescription,"Comic Sans MS",10))
EditorGadget(#Gadget_Getopf_Description,145,290,540,250,#PB_Editor_ReadOnly|#PB_Editor_WordWrap)
SetGadgetFont(#Gadget_Getopf_Description,LoadFont(#Gadget_Getopf_Description,"Comic Sans MS",10))
ListIconGadget(#Gadget_Getopf_Logfile,695,10,545,530,"Log entry",540,#PB_ListIcon_FullRowSelect|#PB_ListIcon_AlwaysShowSelection|#LVS_NOCOLUMNHEADER)
SetGadgetFont(#Gadget_Getopf_Logfile,LoadFont(#Gadget_Getopf_Logfile,"Comic Sans MS",10))
HideWindow(#Window_Getopf,#False)
ProcedureReturn WindowID(#Window_Getopf)
EndIf
EndProcedure
;--------------------------------------------------------------------------------------------------
; Macros
;--------------------------------------------------------------------------------------------------
Macro Addlog(Texttoadd)
AddGadgetItem(#Gadget_Getopf_Logfile, -1, Texttoadd)
EndMacro
;--------------------------------------------------------------------------------------------------
; Constants
;--------------------------------------------------------------------------------------------------
#XML_OPF = 1
#TestDataSource = 0 ;0 = file, 1,2,3 = predefined strings
;--------------------------------------------------------------------------------------------------
; Structures
;--------------------------------------------------------------------------------------------------
Structure OPF_XMLInfo
Node.i
Ident$
Value$
EndStructure
;--------------------------------------------------------------------------------------------------
; Prototypes
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Globals
;--------------------------------------------------------------------------------------------------
Global OPFFilename.s = "D:\Metadata.opf"; GetPathPart(ProgramFilename()) + "content.opf"
Define quitGetopf = #False
Define Ident$, Value$
;--------------------------------------------------------------------------------------------------
; Declarations
;--------------------------------------------------------------------------------------------------
;Declare process_OPF(XID, List OPF_XML.OPF_XMLInfo())
;extract elements from OPF XML file and return results in a list
Procedure process_OPF(XID, List OPF_XML.OPF_XMLInfo())
Protected RootNode_OPF = RootXMLNode(XID)
Protected targetNode, attributeValue$, attributeValue2$
ClearList(OPF_XML())
With OPF_XML()
;solitary node values
AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:title"): \Ident$ = "title" ;**Required
If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:creator"): \Ident$ = "creator"
If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:date"): \Ident$ = "date"
If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:publisher"): \Ident$ = "publisher"
If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:description"): \Ident$ = "description"
If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
;ISBN & GOOGLE
i = 1
While XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:identifier[" + i + "]")
targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:identifier[" + i + "]")
attributeValue$ = GetXMLAttribute(targetNode, "opf:scheme")
Select attributeValue$
Case "ISBN"
AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "ISBN": \Value$ = GetXMLNodeText(OPF_XML()\Node)
Case "GOOGLE"
AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "GOOGLE": \Value$ = GetXMLNodeText(OPF_XML()\Node)
Default
EndSelect
i + 1
Wend
;language(s) **Required
targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:language[1]")
If targetNode
AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "language": \Value$ = GetXMLNodeText(targetNode)
;create a combined entry if there is more than one language node
i = 2
While XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:language[" + i + "]")
targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:language[" + i + "]")
\Value$ + ", " + GetXMLNodeText(targetNode)
i + 1
Wend
EndIf
;series
i = 1
While XMLNodeFromPath(RootNode_OPF, "/package/metadata/meta[" + i + "]")
targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/meta[" + i + "]")
attributeValue$ = GetXMLAttribute(targetNode, "name")
If attributeValue$ = "calibre:series"
AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "series": \Value$ = GetXMLAttribute(targetNode, "content")
EndIf
i + 1
Wend
EndWith
EndProcedure
;--------------------------------------------------------------------------------------------------
; Datafill
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Bindings
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Main Loop
;--------------------------------------------------------------------------------------------------
If Window_Getopf()
NewList OPF_XML.OPF_XMLInfo()
Define parseXMLSuccessful = #False
Select #TestDataSource
Case 0
Addlog("Trying to open: " + OPFFilename.s)
OPFhandle.i = ReadFile(#PB_Any, OPFFilename.s)
; Addlog("Trying to load as XML: " + OPFFilename.s)
; OPFhandle.i = LoadXML(#XML_OPF, OPFFilename.s, #PB_Ascii)
If Not OPFhandle.i
Addlog(OPFFilename.s + " can't be found or opened")
MessageRequester("Error", OPFFilename.s + " can't be found or opened")
End
EndIf
Addlog("Reading text string from: " + OPFFilename.s)
Define Txt$
Txt$ = ReadString(OPFhandle.i, #PB_Ascii | #PB_File_IgnoreEOL)
CloseFile(OPFhandle.i)
Addlog(OPFFilename.s + " closed, we are finished with it")
Case 1
;original test file
Addlog("Using text from test string: #" + #TestDataSource +":")
Txt$ = "<?xml version='1.0' encoding='utf-8'?>"
Txt$ + "<package xmlns=|http://www.idpf.org/2007/opf| unique-identifier=|uuid_id| version=|2.0|>"
Txt$ + "<metadata xmlns:dc=|http://purl.org/dc/elements/1.1/| xmlns:opf=|http://www.idpf.org/2007/opf|>"
Txt$ + "<dc:identifier opf:scheme=|calibre| id=|calibre_id|>231</dc:identifier>"
Txt$ + "<dc:identifier opf:scheme=|uuid| id=|uuid_id|>9ae160cc-033f-4d59-aa6c-ae9e5225bdf0</dc:identifier>"
Txt$ + "<dc:title>Skylark of Space</dc:title>"
Txt$ + "<dc:creator opf:file-as=|Smith, E. E. 'Doc'| opf:role=|aut|>E. E. 'Doc' Smith</dc:creator>"
Txt$ + "<dc:contributor opf:file-as=|calibre| opf:role=|bkp|>calibre (5.20.0) [https://calibre-ebook.com]</dc:contributor>"
Txt$ + "<dc:date>2011-09-29T22:52:37+00:00</dc:date>"
Txt$ + "<dc:description><div><div><h3>Product Description</h3><p>This is the first of the famous Skylark novels...a voyage to the ends of the universe. </p>"
Txt$ + "</div>"
Txt$ + "<p class=|description|>SUMMARY:<br>Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton's fiancée and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author's preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.</p></div>"
Txt$ + "<div><div><h3>Product Description</h3><p>This is the first of the famous Skylark novels...a voyage to the ends of the universe. </p>"
Txt$ + "</div>"
Txt$ + "<p class=|description|>SUMMARY:<br>Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton's fiancée and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author's preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.</p></div>"
Txt$ + "Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton&#39;s fiancäe and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. ø The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author&#39;s preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge."
Txt$ + "|With the exception of the works of H. G. Wells, possibly those of Jules Verne -- and almost no other writer -- it has inspired more imitators and done more to change the nature of all the science fiction written after it than almost any other single work.| -- Frederik Pohl Finding that his government laboratory coworkers do not believe his discovery of a revolutionary power source that will enable interstellar flight, Dr. Richard Seaton acquires rights to his discovery from the government and commercializes it with the aid of his friend, millionaire inventor Martin Crane. When a former colleague tries to steal the invention, not only the future of Dr. Seaton and his allies, but ultimately the entire world hangs in the balance! The first of the great |space opera| science fiction novels, The Skylark of Space remains a thrilling tale more than 80 years after its creation.</dc:description>"
Txt$ + "<dc:publisher>Berkley</dc:publisher>"
Txt$ + "<dc:identifier opf:scheme=|GUID|>{0F66B19B-DFDA-4596-AD12-68FDF06D9AA7}</dc:identifier>"
Txt$ + "<dc:identifier opf:scheme=|ISBN|>9780425046401</dc:identifier>"
Txt$ + "<dc:identifier opf:scheme=|GOOGLE|>boBY9bNVQwAC</dc:identifier>"
Txt$ + "<dc:identifier opf:scheme=|URI|>http|//www.gutenberg.org/ebooks/20869</dc:identifier>"
Txt$ + "<dc:language>eng</dc:language>"
Txt$ + "<dc:subject>Science Fiction</dc:subject>"
Txt$ + "<dc:subject>Science Fiction/Fantasy</dc:subject>"
Txt$ + "<dc:subject>Space ships -- Fiction</dc:subject>"
Txt$ + "<dc:subject>Space flight -- Fiction</dc:subject>"
Txt$ + "<dc:subject>Action & Adventure</dc:subject>"
Txt$ + "<dc:subject>Fiction</dc:subject>"
Txt$ + "<dc:subject>General</dc:subject>"
Txt$ + "<dc:subject>Space Opera</dc:subject>"
Txt$ + "<meta name=|calibre:author_link_map| content=|{"E. E. 'Doc' Smith": ""}|/>"
Txt$ + "<meta name=|calibre:series| content=|Skylark|/>"
Txt$ + "<meta name=|calibre:series_index| content=|1|/>"
Txt$ + "<meta name=|calibre:rating| content=|10|/>"
Txt$ + "<meta name=|calibre:timestamp| content=|2021-06-05T06:47:53.206469+00:00|/>"
Txt$ + "<meta name=|calibre:title_sort| content=|Skylark of Space|/>"
Txt$ + "</metadata>"
Txt$ + "<guide>"
Txt$ + "<reference type=|cover| title=|Cover| href=|cover.jpg|/>"
Txt$ + "</guide>"
Txt$ + "</package>"
ReplaceString(Txt$, "|", Chr(34), #PB_String_InPlace)
Case 2
;alternate test file #2
Addlog("Using text from test string: #" + #TestDataSource +":")
Txt$ = "<?xml version=|1.0|?>"
Txt$ + "<package version=|2.0| xmlns=|http://www.idpf.org/2007/opf| unique-identifier=|BookId|>"
Txt$ + ""
Txt$ + " <metadata xmlns:dc=|http://purl.org/dc/elements/1.1/| xmlns:opf=|http://www.idpf.org/2007/opf|>"
Txt$ + " <dc:title>Pride and Prejudice</dc:title>"
Txt$ + " <dc:language>en</dc:language>"
Txt$ + " <dc:identifier id=|BookId| opf:scheme=|ISBN|>123456789X</dc:identifier>"
Txt$ + " <dc:creator opf:file-as=|Austen, Jane| opf:role=|aut|>Jane Austen</dc:creator>"
Txt$ + " </metadata>"
Txt$ + ""
Txt$ + " <manifest>"
Txt$ + " <item id=|chapter1| href=|chapter1.xhtml| media-type=|application/xhtml+xml|/>"
Txt$ + " <item id=|appendix| href=|appendix.xhtml| media-type=|application/xhtml+xml|/>"
Txt$ + " <item id=|stylesheet| href=|style.css| media-type=|text/css|/>"
Txt$ + " <item id=|ch1-pic| href=|ch1-pic.png| media-type=|image/png|/>"
Txt$ + " <item id=|myfont| href=|css/myfont.otf| media-type=|application/x-font-opentype|/>"
Txt$ + " <item id=|ncx| href=|toc.ncx| media-type=|application/x-dtbncx+xml|/>"
Txt$ + " </manifest>"
Txt$ + ""
Txt$ + " <spine toc=|ncx|>"
Txt$ + " <itemref idref=|chapter1| />"
Txt$ + " <itemref idref=|appendix| />"
Txt$ + " </spine>"
Txt$ + ""
Txt$ + " <guide>"
Txt$ + " <reference type=|loi| title=|List Of Illustrations| href=|appendix.xhtml#figures| />"
Txt$ + " </guide>"
Txt$ + ""
Txt$ + "</package>"
ReplaceString(Txt$, "|", Chr(34), #PB_String_InPlace)
Case 3
;alternate test file #3
Addlog("Using text from test string: #" + #TestDataSource +":")
Txt$ = "<?xml version=|1.0| encoding=|UTF-8|?>"
Txt$ + "<package xmlns=|http://www.idpf.org/2007/opf| xmlns:dc=|http://purl.org/dc/elements/1.1/|"
Txt$ + " xmln:xsi=|http://www.w3.org/2001/XMLSchema-instance|"
Txt$ + " version=|2.0|"
Txt$ + " unique-identifier=|bookid|>"
Txt$ + " <metadata xmins:dc=|http://purl.org/dc/elements/1.1/| xmlns.opf=|http://www.idpf.org/2007/opf|>"
Txt$ + " <dc:creator>Anonymous</dc:creator>"
Txt$ + " <dc:title>The Three Bears</dc:title>"
Txt$ + " <dc:language xsi:type=|dcterms:RFC3066|>en-GB</dc:language>"
Txt$ + " <dc:rights>Public Domain</dc:rights>"
Txt$ + " <dc:publisher>Project Gutenberg (epub version: Bob DuCharme)</dc:publisher>"
Txt$ + " <dc:identifier id=|bookid|>http://www.snee.com/epub/pg23322</dc:identifier>"
Txt$ + " </metadata>"
Txt$ + " <maifest xmlns.opf=|http://www.idpf.org/2007/opf|>"
Txt$ + " <item id=|ncx| href=|toc.ncx| media-type=|text/xml|/>"
Txt$ + " <item id=|main| href=|23322+h.htm| media-type=|application/xhtml+xml|/>"
Txt$ + " <item id=|cover| href=|images/cover.jpg| media-type=|image/jpeg|/>"
Txt$ + " <item id=|1-4| href=|images/1-4.jpg| media-type=|image/jpeg|/>"
Txt$ + " <item id=|1-6| href=|images/1-6.jpg| media-type=|image/jpeg|/>"
Txt$ + " <item id=|1-9| href=|images/1-9.jpg| media-type=|image/jpeg|/>"
Txt$ + " <item id=|1-10| href=|images/1-10.jpg| media-type=|image/jpeg|/>"
Txt$ + " <item id=|1-13| href=|images/1-13.jpg| media-type=|image/jpeg|/>"
Txt$ + " <item id=|1-15| href=|images/1-15.jpg| media-type=|image/jpeg|/>"
Txt$ + " </maifest>"
Txt$ + " <spine toc=|ncx|>"
Txt$ + " <itemref idref=|main|/>"
Txt$ + " </spine>"
Txt$ + "</package>"
ReplaceString(Txt$, "|", Chr(34), #PB_String_InPlace)
EndSelect
Addlog(#Empty$)
Addlog("Parsing Text as XML ID:" + #XML_OPF)
ParseXML(#XML_OPF, Txt$)
If XMLStatus(#XML_OPF) = #PB_XML_Success
parseXMLSuccessful = #True
Addlog("Attempting to retrieve each value from XML: " + OPFFilename.s)
process_OPF(#XML_OPF, OPF_XML())
Else
Addlog("error parsing " + XMLError(#XML_OPF) + " in line " + XMLErrorLine(#XML_OPF) + ", position " + XMLErrorPosition(#XML_OPF) + ".")
Debug "error parsing " + XMLError(#XML_OPF) + " in line " + XMLErrorLine(#XML_OPF) + ", position " + XMLErrorPosition(#XML_OPF) + "."
Debug Left(Txt$, XMLErrorPosition(#XML_OPF))
parseXMLSuccessful = #True
EndIf
Addlog(#Empty$)
ForEach OPF_XML()
Ident$ = OPF_XML()\Ident$
Value$ = OPF_XML()\Value$
Select Ident$
Case "title" : SetGadgetText(#Gadget_Getopf_Title, Value$)
Case "creator" : SetGadgetText(#Gadget_Getopf_Creator, Value$)
Case "date" : SetGadgetText(#Gadget_Getopf_Date, Value$)
Case "publisher" : SetGadgetText(#Gadget_Getopf_Publisher, Value$)
Case "ISBN" : SetGadgetText(#Gadget_Getopf_ISBN, Value$)
Case "GOOGLE" : SetGadgetText(#Gadget_Getopf_Google, Value$)
Case "language" : SetGadgetText(#Gadget_Getopf_Language, Value$)
Case "series" : SetGadgetText(#Gadget_Getopf_Series, Value$)
Case "description" : AddGadgetItem(#Gadget_Getopf_Description, -1, Value$)
EndSelect
Addlog(Ident$ + Space(12 - Len(Ident$)) + " : " + Value$)
Next
;Debug "Job done"
Repeat
EventID = WaitWindowEvent()
MenuID = EventMenu()
GadgetID = EventGadget()
WindowID = EventWindow()
Select EventID
Case #PB_Event_CloseWindow
Select WindowID
Case #Window_Getopf
quitGetopf = 1
EndSelect
Case #PB_Event_Gadget
Select GadgetID
;Case #Gadget_Getopf_Title
;Case #Gadget_Getopf_Creator
;Case #Gadget_Getopf_Date
;Case #Gadget_Getopf_Publisher
;Case #Gadget_Getopf_ISBN
;Case #Gadget_Getopf_Google
;Case #Gadget_Getopf_Language
;Case #Gadget_Getopf_Series
;Case #Gadget_Getopf_Description
EndSelect
EndSelect
Until quitGetopf
CloseWindow(#Window_Getopf)
EndIf
End
As an experiment I thought I would also try to come up with a procedure to example fields like the description field that have repeating text and see if they can be reduced in an automated and arbitrary way to something that doesn't include the repetition. That will take a bit of experimenting and if successful may be useful in other areas. If it fails it will most likely crash and burn and never be spoken of again in respectable circles.
Re: Calibre .OPF file format help needed
As an alternative to the RegEx approach already shown previously here is one using XML only:
It includes a few test options for the test source data that can be set in line from 101 by setting #TestDataSource to various values.
As an experiment I thought I would also try to come up with a procedure to examine fields like the description field that have repeating text and see if they can be reduced in an automated and arbitrary way to something that doesn't include the repetition. That will take a bit of experimenting and if successful may be useful in other areas. If it fails it will most likely crash and burn and never be spoken of again in respectable circles.
Code: Select all
;--------------------------------------------------------------------------------------------------
; Visual designer created forms and constants
;--------------------------------------------------------------------------------------------------
Global DPIfixX.d = DesktopResolutionX(), DPIfixY.d = DesktopResolutionY()
Define EventID, MenuID, GadgetID, WindowID
; Window Constants
Enumeration 1
#Window_Getopf
EndEnumeration
#WindowIndex = #PB_Compiler_EnumerationValue
; Gadget Constants
Enumeration 1
; Window_Getopf
#Gadget_Getopf_lTitle
#Gadget_Getopf_Title
#Gadget_Getopf_lCreator
#Gadget_Getopf_Creator
#Gadget_Getopf_lDate
#Gadget_Getopf_Date
#Gadget_Getopf_lPublisher
#Gadget_Getopf_Publisher
#Gadget_Getopf_lIsbn
#Gadget_Getopf_ISBN
#Gadget_Getopf_lGoogle
#Gadget_Getopf_Google
#Gadget_Getopf_lLanguage
#Gadget_Getopf_Language
#Gadget_Getopf_lSeries
#Gadget_Getopf_Series
#Gadget_Getopf_lDescription
#Gadget_Getopf_Description
#Gadget_Getopf_Logfile
EndEnumeration
#GadgetIndex = #PB_Compiler_EnumerationValue
Procedure.i Window_Getopf()
If OpenWindow(#Window_Getopf,0,0,1250,550,"Get and display Calibre OPF contents",#PB_Window_SystemMenu|#PB_Window_ScreenCentered|#PB_Window_Invisible)
TextGadget(#Gadget_Getopf_lTitle,10,10,130,25,"Title",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lTitle,LoadFont(#Gadget_Getopf_lTitle,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Title,145,10,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Title,LoadFont(#Gadget_Getopf_Title,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lCreator,10,45,130,25,"Creator",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lCreator,LoadFont(#Gadget_Getopf_lCreator,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Creator,145,45,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Creator,LoadFont(#Gadget_Getopf_Creator,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lDate,10,80,130,25,"Date",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lDate,LoadFont(#Gadget_Getopf_lDate,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Date,145,80,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Date,LoadFont(#Gadget_Getopf_Date,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lPublisher,10,115,130,25,"Publisher",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lPublisher,LoadFont(#Gadget_Getopf_lPublisher,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Publisher,145,115,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Publisher,LoadFont(#Gadget_Getopf_Publisher,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lIsbn,10,150,130,25,"ISBN",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lIsbn,LoadFont(#Gadget_Getopf_lIsbn,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_ISBN,145,150,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_ISBN,LoadFont(#Gadget_Getopf_ISBN,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lGoogle,10,185,130,25,"Google",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lGoogle,LoadFont(#Gadget_Getopf_lGoogle,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Google,145,185,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Google,LoadFont(#Gadget_Getopf_Google,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lLanguage,10,220,130,25,"Language",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lLanguage,LoadFont(#Gadget_Getopf_lLanguage,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Language,145,220,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Language,LoadFont(#Gadget_Getopf_Language,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lSeries,10,255,130,25,"Series",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lSeries,LoadFont(#Gadget_Getopf_lSeries,"Comic Sans MS",10))
StringGadget(#Gadget_Getopf_Series,145,255,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
SetGadgetFont(#Gadget_Getopf_Series,LoadFont(#Gadget_Getopf_Series,"Comic Sans MS",10))
TextGadget(#Gadget_Getopf_lDescription,10,290,130,25,"Description",#PB_Text_Center)
SetGadgetFont(#Gadget_Getopf_lDescription,LoadFont(#Gadget_Getopf_lDescription,"Comic Sans MS",10))
EditorGadget(#Gadget_Getopf_Description,145,290,540,250,#PB_Editor_ReadOnly|#PB_Editor_WordWrap)
SetGadgetFont(#Gadget_Getopf_Description,LoadFont(#Gadget_Getopf_Description,"Comic Sans MS",10))
ListIconGadget(#Gadget_Getopf_Logfile,695,10,545,530,"Log entry",540,#PB_ListIcon_FullRowSelect|#PB_ListIcon_AlwaysShowSelection|#LVS_NOCOLUMNHEADER)
SetGadgetFont(#Gadget_Getopf_Logfile,LoadFont(#Gadget_Getopf_Logfile,"Comic Sans MS",10))
HideWindow(#Window_Getopf,#False)
ProcedureReturn WindowID(#Window_Getopf)
EndIf
EndProcedure
;--------------------------------------------------------------------------------------------------
; Macros
;--------------------------------------------------------------------------------------------------
Macro Addlog(Texttoadd)
AddGadgetItem(#Gadget_Getopf_Logfile, -1, Texttoadd)
EndMacro
;--------------------------------------------------------------------------------------------------
; Constants
;--------------------------------------------------------------------------------------------------
#XML_OPF = 1
#TestDataSource = 0 ;0 = file, 1,2,3 = predefined strings
;--------------------------------------------------------------------------------------------------
; Structures
;--------------------------------------------------------------------------------------------------
Structure OPF_XMLInfo
Node.i
Ident$
Value$
EndStructure
;--------------------------------------------------------------------------------------------------
; Prototypes
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Globals
;--------------------------------------------------------------------------------------------------
Global OPFFilename.s = "D:\Metadata.opf"; GetPathPart(ProgramFilename()) + "content.opf"
Define quitGetopf = #False
Define Ident$, Value$
;--------------------------------------------------------------------------------------------------
; Declarations
;--------------------------------------------------------------------------------------------------
;Declare process_OPF(XID, List OPF_XML.OPF_XMLInfo())
;extract elements from OPF XML file and return results in a list
Procedure process_OPF(XID, List OPF_XML.OPF_XMLInfo())
Protected RootNode_OPF = RootXMLNode(XID)
Protected targetNode, attributeValue$, attributeValue2$
ClearList(OPF_XML())
With OPF_XML()
;solitary node values
AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:title"): \Ident$ = "title" ;**Required
If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:creator"): \Ident$ = "creator"
If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:date"): \Ident$ = "date"
If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:publisher"): \Ident$ = "publisher"
If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:description"): \Ident$ = "description"
If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
;ISBN & GOOGLE
i = 1
While XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:identifier[" + i + "]")
targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:identifier[" + i + "]")
attributeValue$ = GetXMLAttribute(targetNode, "opf:scheme")
Select attributeValue$
Case "ISBN"
AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "ISBN": \Value$ = GetXMLNodeText(OPF_XML()\Node)
Case "GOOGLE"
AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "GOOGLE": \Value$ = GetXMLNodeText(OPF_XML()\Node)
Default
EndSelect
i + 1
Wend
;language(s) **Required
targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:language[1]")
If targetNode
AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "language": \Value$ = GetXMLNodeText(targetNode)
;create a combined entry if there is more than one language node
i = 2
While XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:language[" + i + "]")
targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:language[" + i + "]")
\Value$ + ", " + GetXMLNodeText(targetNode)
i + 1
Wend
EndIf
;series
i = 1
While XMLNodeFromPath(RootNode_OPF, "/package/metadata/meta[" + i + "]")
targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/meta[" + i + "]")
attributeValue$ = GetXMLAttribute(targetNode, "name")
If attributeValue$ = "calibre:series"
AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "series": \Value$ = GetXMLAttribute(targetNode, "content")
EndIf
i + 1
Wend
EndWith
EndProcedure
;--------------------------------------------------------------------------------------------------
; Datafill
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Bindings
;--------------------------------------------------------------------------------------------------
;--------------------------------------------------------------------------------------------------
; Main Loop
;--------------------------------------------------------------------------------------------------
If Window_Getopf()
NewList OPF_XML.OPF_XMLInfo()
Define parseXMLSuccessful = #False
Select #TestDataSource
Case 0
Addlog("Trying to open: " + OPFFilename.s)
OPFhandle.i = ReadFile(#PB_Any, OPFFilename.s)
; Addlog("Trying to load as XML: " + OPFFilename.s)
; OPFhandle.i = LoadXML(#XML_OPF, OPFFilename.s, #PB_Ascii)
If Not OPFhandle.i
Addlog(OPFFilename.s + " can't be found or opened")
MessageRequester("Error", OPFFilename.s + " can't be found or opened")
End
EndIf
Addlog("Reading text string from: " + OPFFilename.s)
Define Txt$
Txt$ = ReadString(OPFhandle.i, #PB_Ascii | #PB_File_IgnoreEOL)
CloseFile(OPFhandle.i)
Addlog(OPFFilename.s + " closed, we are finished with it")
Case 1
;original test file
Addlog("Using text from test string: #" + #TestDataSource +":")
Txt$ = "<?xml version='1.0' encoding='utf-8'?>"
Txt$ + "<package xmlns=|http://www.idpf.org/2007/opf| unique-identifier=|uuid_id| version=|2.0|>"
Txt$ + "<metadata xmlns:dc=|http://purl.org/dc/elements/1.1/| xmlns:opf=|http://www.idpf.org/2007/opf|>"
Txt$ + "<dc:identifier opf:scheme=|calibre| id=|calibre_id|>231</dc:identifier>"
Txt$ + "<dc:identifier opf:scheme=|uuid| id=|uuid_id|>9ae160cc-033f-4d59-aa6c-ae9e5225bdf0</dc:identifier>"
Txt$ + "<dc:title>Skylark of Space</dc:title>"
Txt$ + "<dc:creator opf:file-as=|Smith, E. E. 'Doc'| opf:role=|aut|>E. E. 'Doc' Smith</dc:creator>"
Txt$ + "<dc:contributor opf:file-as=|calibre| opf:role=|bkp|>calibre (5.20.0) [https://calibre-ebook.com]</dc:contributor>"
Txt$ + "<dc:date>2011-09-29T22:52:37+00:00</dc:date>"
Txt$ + "<dc:description><div><div><h3>Product Description</h3><p>This is the first of the famous Skylark novels...a voyage to the ends of the universe. </p>"
Txt$ + "</div>"
Txt$ + "<p class=|description|>SUMMARY:<br>Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton's fiancée and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author's preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.</p></div>"
Txt$ + "<div><div><h3>Product Description</h3><p>This is the first of the famous Skylark novels...a voyage to the ends of the universe. </p>"
Txt$ + "</div>"
Txt$ + "<p class=|description|>SUMMARY:<br>Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton's fiancée and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author's preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.</p></div>"
Txt$ + "Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton&#39;s fiancäe and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. ø The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author&#39;s preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge."
Txt$ + "|With the exception of the works of H. G. Wells, possibly those of Jules Verne -- and almost no other writer -- it has inspired more imitators and done more to change the nature of all the science fiction written after it than almost any other single work.| -- Frederik Pohl Finding that his government laboratory coworkers do not believe his discovery of a revolutionary power source that will enable interstellar flight, Dr. Richard Seaton acquires rights to his discovery from the government and commercializes it with the aid of his friend, millionaire inventor Martin Crane. When a former colleague tries to steal the invention, not only the future of Dr. Seaton and his allies, but ultimately the entire world hangs in the balance! The first of the great |space opera| science fiction novels, The Skylark of Space remains a thrilling tale more than 80 years after its creation.</dc:description>"
Txt$ + "<dc:publisher>Berkley</dc:publisher>"
Txt$ + "<dc:identifier opf:scheme=|GUID|>{0F66B19B-DFDA-4596-AD12-68FDF06D9AA7}</dc:identifier>"
Txt$ + "<dc:identifier opf:scheme=|ISBN|>9780425046401</dc:identifier>"
Txt$ + "<dc:identifier opf:scheme=|GOOGLE|>boBY9bNVQwAC</dc:identifier>"
Txt$ + "<dc:identifier opf:scheme=|URI|>http|//www.gutenberg.org/ebooks/20869</dc:identifier>"
Txt$ + "<dc:language>eng</dc:language>"
Txt$ + "<dc:subject>Science Fiction</dc:subject>"
Txt$ + "<dc:subject>Science Fiction/Fantasy</dc:subject>"
Txt$ + "<dc:subject>Space ships -- Fiction</dc:subject>"
Txt$ + "<dc:subject>Space flight -- Fiction</dc:subject>"
Txt$ + "<dc:subject>Action & Adventure</dc:subject>"
Txt$ + "<dc:subject>Fiction</dc:subject>"
Txt$ + "<dc:subject>General</dc:subject>"
Txt$ + "<dc:subject>Space Opera</dc:subject>"
Txt$ + "<meta name=|calibre:author_link_map| content=|{"E. E. 'Doc' Smith": ""}|/>"
Txt$ + "<meta name=|calibre:series| content=|Skylark|/>"
Txt$ + "<meta name=|calibre:series_index| content=|1|/>"
Txt$ + "<meta name=|calibre:rating| content=|10|/>"
Txt$ + "<meta name=|calibre:timestamp| content=|2021-06-05T06:47:53.206469+00:00|/>"
Txt$ + "<meta name=|calibre:title_sort| content=|Skylark of Space|/>"
Txt$ + "</metadata>"
Txt$ + "<guide>"
Txt$ + "<reference type=|cover| title=|Cover| href=|cover.jpg|/>"
Txt$ + "</guide>"
Txt$ + "</package>"
ReplaceString(Txt$, "|", Chr(34), #PB_String_InPlace)
Case 2
;alternate test file #2
Addlog("Using text from test string: #" + #TestDataSource +":")
Txt$ = "<?xml version=|1.0|?>"
Txt$ + "<package version=|2.0| xmlns=|http://www.idpf.org/2007/opf| unique-identifier=|BookId|>"
Txt$ + ""
Txt$ + " <metadata xmlns:dc=|http://purl.org/dc/elements/1.1/| xmlns:opf=|http://www.idpf.org/2007/opf|>"
Txt$ + " <dc:title>Pride and Prejudice</dc:title>"
Txt$ + " <dc:language>en</dc:language>"
Txt$ + " <dc:identifier id=|BookId| opf:scheme=|ISBN|>123456789X</dc:identifier>"
Txt$ + " <dc:creator opf:file-as=|Austen, Jane| opf:role=|aut|>Jane Austen</dc:creator>"
Txt$ + " </metadata>"
Txt$ + ""
Txt$ + " <manifest>"
Txt$ + " <item id=|chapter1| href=|chapter1.xhtml| media-type=|application/xhtml+xml|/>"
Txt$ + " <item id=|appendix| href=|appendix.xhtml| media-type=|application/xhtml+xml|/>"
Txt$ + " <item id=|stylesheet| href=|style.css| media-type=|text/css|/>"
Txt$ + " <item id=|ch1-pic| href=|ch1-pic.png| media-type=|image/png|/>"
Txt$ + " <item id=|myfont| href=|css/myfont.otf| media-type=|application/x-font-opentype|/>"
Txt$ + " <item id=|ncx| href=|toc.ncx| media-type=|application/x-dtbncx+xml|/>"
Txt$ + " </manifest>"
Txt$ + ""
Txt$ + " <spine toc=|ncx|>"
Txt$ + " <itemref idref=|chapter1| />"
Txt$ + " <itemref idref=|appendix| />"
Txt$ + " </spine>"
Txt$ + ""
Txt$ + " <guide>"
Txt$ + " <reference type=|loi| title=|List Of Illustrations| href=|appendix.xhtml#figures| />"
Txt$ + " </guide>"
Txt$ + ""
Txt$ + "</package>"
ReplaceString(Txt$, "|", Chr(34), #PB_String_InPlace)
Case 3
;alternate test file #3
Addlog("Using text from test string: #" + #TestDataSource +":")
Txt$ = "<?xml version=|1.0| encoding=|UTF-8|?>"
Txt$ + "<package xmlns=|http://www.idpf.org/2007/opf| xmlns:dc=|http://purl.org/dc/elements/1.1/|"
Txt$ + " xmln:xsi=|http://www.w3.org/2001/XMLSchema-instance|"
Txt$ + " version=|2.0|"
Txt$ + " unique-identifier=|bookid|>"
Txt$ + " <metadata xmins:dc=|http://purl.org/dc/elements/1.1/| xmlns.opf=|http://www.idpf.org/2007/opf|>"
Txt$ + " <dc:creator>Anonymous</dc:creator>"
Txt$ + " <dc:title>The Three Bears</dc:title>"
Txt$ + " <dc:language xsi:type=|dcterms:RFC3066|>en-GB</dc:language>"
Txt$ + " <dc:rights>Public Domain</dc:rights>"
Txt$ + " <dc:publisher>Project Gutenberg (epub version: Bob DuCharme)</dc:publisher>"
Txt$ + " <dc:identifier id=|bookid|>http://www.snee.com/epub/pg23322</dc:identifier>"
Txt$ + " </metadata>"
Txt$ + " <maifest xmlns.opf=|http://www.idpf.org/2007/opf|>"
Txt$ + " <item id=|ncx| href=|toc.ncx| media-type=|text/xml|/>"
Txt$ + " <item id=|main| href=|23322+h.htm| media-type=|application/xhtml+xml|/>"
Txt$ + " <item id=|cover| href=|images/cover.jpg| media-type=|image/jpeg|/>"
Txt$ + " <item id=|1-4| href=|images/1-4.jpg| media-type=|image/jpeg|/>"
Txt$ + " <item id=|1-6| href=|images/1-6.jpg| media-type=|image/jpeg|/>"
Txt$ + " <item id=|1-9| href=|images/1-9.jpg| media-type=|image/jpeg|/>"
Txt$ + " <item id=|1-10| href=|images/1-10.jpg| media-type=|image/jpeg|/>"
Txt$ + " <item id=|1-13| href=|images/1-13.jpg| media-type=|image/jpeg|/>"
Txt$ + " <item id=|1-15| href=|images/1-15.jpg| media-type=|image/jpeg|/>"
Txt$ + " </maifest>"
Txt$ + " <spine toc=|ncx|>"
Txt$ + " <itemref idref=|main|/>"
Txt$ + " </spine>"
Txt$ + "</package>"
ReplaceString(Txt$, "|", Chr(34), #PB_String_InPlace)
EndSelect
Addlog(#Empty$)
Addlog("Parsing Text as XML ID:" + #XML_OPF)
ParseXML(#XML_OPF, Txt$)
If XMLStatus(#XML_OPF) = #PB_XML_Success
parseXMLSuccessful = #True
Addlog("Attempting to retrieve each value from XML: " + OPFFilename.s)
process_OPF(#XML_OPF, OPF_XML())
Else
Addlog("error parsing " + XMLError(#XML_OPF) + " in line " + XMLErrorLine(#XML_OPF) + ", position " + XMLErrorPosition(#XML_OPF) + ".")
Debug "error parsing " + XMLError(#XML_OPF) + " in line " + XMLErrorLine(#XML_OPF) + ", position " + XMLErrorPosition(#XML_OPF) + "."
Debug Left(Txt$, XMLErrorPosition(#XML_OPF))
parseXMLSuccessful = #True
EndIf
Addlog(#Empty$)
ForEach OPF_XML()
Ident$ = OPF_XML()\Ident$
Value$ = OPF_XML()\Value$
Select Ident$
Case "title" : SetGadgetText(#Gadget_Getopf_Title, Value$)
Case "creator" : SetGadgetText(#Gadget_Getopf_Creator, Value$)
Case "date" : SetGadgetText(#Gadget_Getopf_Date, Value$)
Case "publisher" : SetGadgetText(#Gadget_Getopf_Publisher, Value$)
Case "ISBN" : SetGadgetText(#Gadget_Getopf_ISBN, Value$)
Case "GOOGLE" : SetGadgetText(#Gadget_Getopf_Google, Value$)
Case "language" : SetGadgetText(#Gadget_Getopf_Language, Value$)
Case "series" : SetGadgetText(#Gadget_Getopf_Series, Value$)
Case "description" : AddGadgetItem(#Gadget_Getopf_Description, -1, Value$)
EndSelect
Addlog(Ident$ + Space(12 - Len(Ident$)) + " : " + Value$)
Next
;Debug "Job done"
Repeat
EventID = WaitWindowEvent()
MenuID = EventMenu()
GadgetID = EventGadget()
WindowID = EventWindow()
Select EventID
Case #PB_Event_CloseWindow
Select WindowID
Case #Window_Getopf
quitGetopf = 1
EndSelect
Case #PB_Event_Gadget
Select GadgetID
;Case #Gadget_Getopf_Title
;Case #Gadget_Getopf_Creator
;Case #Gadget_Getopf_Date
;Case #Gadget_Getopf_Publisher
;Case #Gadget_Getopf_ISBN
;Case #Gadget_Getopf_Google
;Case #Gadget_Getopf_Language
;Case #Gadget_Getopf_Series
;Case #Gadget_Getopf_Description
EndSelect
EndSelect
Until quitGetopf
CloseWindow(#Window_Getopf)
EndIf
End
As an experiment I thought I would also try to come up with a procedure to examine fields like the description field that have repeating text and see if they can be reduced in an automated and arbitrary way to something that doesn't include the repetition. That will take a bit of experimenting and if successful may be useful in other areas. If it fails it will most likely crash and burn and never be spoken of again in respectable circles.
Last edited by Demivec on Tue Aug 24, 2021 3:16 am, edited 1 time in total.
- Fangbeast
- PureBasic Protozoa
- Posts: 4749
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Re: Calibre .OPF file format help needed
Thanks Demivec. As soon as I have cleared my current project bugs, I can play:)
Amateur Radio, D-STAR/VK3HAF