Calibre .OPF file format help needed

Just starting out? Need help? Post your questions and find answers here.
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4667
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Re: Calibre .OPF file format help needed

Post by Fangbeast »

Marc56us, I can see your code returning the identifiers but not understanding regex, I don't know how that actually returns the string in the middle of them.

The list of things I will never understand is growing.
Amateur Radio, D-STAR/VK3HAF
Marc56us
Addict
Addict
Posts: 1108
Joined: Sat Feb 08, 2014 3:26 pm
Location: France

Re: Calibre .OPF file format help needed

Post by Marc56us »

Fangbeast wrote: Thu Aug 19, 2021 8:55 am Marc56us, I can see your code returning the identifiers but not understanding regex, I don't know how that actually returns the string in the middle of them.
The list of things I will never understand is growing.
Hi Fangbeast,

These are the brackets that capture part of the text.
In this new version, I make two captures. One For the title of the field And the other For the field itself.
The first capture is a raw text that does not change.
The second one is a "non-greedy" capture that means that the capture stops at the first occurrence found. Otherwise you would have several overlapping captures.

Code: Select all

; Extract_OPF_2.pb

EnableExplicit

Enumeration 
    #hFile    
EndEnumeration

NewList RegEx$()

; regex how to:
;
; ~"<dc:(title)>(.+?)</dc:title>"
; That mean:
; Search from left to right for text "<dc:title>"
; Capture the litteral text "title" as group #1 using ()
; Then
; Continue reading and capture anything (.+) until found literal text (.+?) "</dc:title>"
; second () will be group #2
; see belong how to use it
;                 Ident$ = RegularExpressionGroup(0, 1)
;                 Value$ = RegularExpressionGroup(0, 2)
;
; Regex need to code some special chars
; space = \h (horizontal space)
; quote need to be escaped \"

AddElement(Regex$()) : Regex$() = ~"<dc:(title)>(.+?)</dc:title>"
AddElement(Regex$()) : Regex$() = ~"<dc:(creator) opf:file-as=\"(.+?)</dc:creator>"
AddElement(Regex$()) : Regex$() = ~"<dc:(date)>(.+?)</dc:date>"
AddElement(Regex$()) : Regex$() = ~"<dc:(publisher)>(.+?)</dc:publisher>"
AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"(ISBN)\">(.+?)</dc:identifier>"
AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"(GOOGLE)\">(.+?)</dc:identifier>"
AddElement(Regex$()) : Regex$() = ~"<dc:(language)>(.+?)</dc:language>"
AddElement(Regex$()) : Regex$() = ~"<meta\\hname=\"calibre:(series)\"\\hcontent=\"(.+?)\"/>"
AddElement(Regex$()) : Regex$() = ~"<dc:(description)>(.+?)</dc:description>"

; Read OPF file
#File_Name = "TestFile.opf"
If Not OpenFile(0, #File_Name)
    Debug #File_Name + " Can't be found or open"
    End
EndIf
Debug "Reading: " + #File_Name + #CRLF$
Define Txt$
While Not Eof(#hFile)
    Txt$ = ReadString(#hFile, #PB_Ascii | #PB_File_IgnoreEOL)
Wend
CloseFile(#hFile)

Define Ident$, Value$

ForEach Regex$()
    If Not CreateRegularExpression(0, Regex$(), #PB_RegularExpression_DotAll)
        Debug "Bad RegEx (" + RegEx$() + ")"
        Break
    Else
        If ExamineRegularExpression(0, Txt$)
            While NextRegularExpressionMatch(0)
                Ident$ = RegularExpressionGroup(0, 1)
                Value$ = RegularExpressionGroup(0, 2)
                ;Debug "    " +  Ident$ + " : " + Value$
                Debug "    " +  Ident$ + Space(12 - Len(Ident$)) + " : " + Value$
            Wend    
        EndIf
        FreeRegularExpression(0)
    EndIf
Next

Debug "Done"

End

Code: Select all

Reading: TestFile.opf

    title        : Skylark of Space
    creator      : Smith, E. E. 'Doc'" opf:role="aut">E. E. 'Doc' Smith
    date         : 2011-09-29T22:52:37+00:00
    publisher    : Berkley
    ISBN         : 9780425046401
    GOOGLE       : boBY9bNVQwAC
    language     : eng
    series       : Skylark
    description  : &lt;div&gt;&lt;div&g [...]
:wink:
(English is not my native language, I use an online translator.)
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4667
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Re: Calibre .OPF file format help needed

Post by Fangbeast »

Thanks for the help Marcus, I really appreciate it.

As for the horny goat wink, be careful not to let Idle see you.

You know how these Kiwi Goat shaggers get when a goat winks at them:):)
Amateur Radio, D-STAR/VK3HAF
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4667
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Re: Calibre .OPF file format help needed

Post by Fangbeast »

Marcus, I fangified your code to help me read the results.

Now I just have to find a way to clear out the tags and render readable text. Probably regex again (hehehe).

Code: Select all

;--------------------------------------------------------------------------------------------------
; Visual designer created forms and constants
;--------------------------------------------------------------------------------------------------

Global DPIfixX.d = DesktopResolutionX(), DPIfixY.d = DesktopResolutionY()

Define EventID, MenuID, GadgetID, WindowID

; Window Constants

Enumeration 1
  #Window_Getopf
EndEnumeration

#WindowIndex = #PB_Compiler_EnumerationValue

; Gadget Constants

Enumeration 1
  ; Window_Getopf
  #Gadget_Getopf_lTitle
  #Gadget_Getopf_Title
  #Gadget_Getopf_lCreator
  #Gadget_Getopf_Creator
  #Gadget_Getopf_lDate
  #Gadget_Getopf_Date
  #Gadget_Getopf_lPublisher
  #Gadget_Getopf_Publisher
  #Gadget_Getopf_lIsbn
  #Gadget_Getopf_ISBN
  #Gadget_Getopf_lGoogle
  #Gadget_Getopf_Google
  #Gadget_Getopf_lLanguage
  #Gadget_Getopf_Language
  #Gadget_Getopf_lSeries
  #Gadget_Getopf_Series
  #Gadget_Getopf_lDescription
  #Gadget_Getopf_Description
  #Gadget_Getopf_Logfile
EndEnumeration

#GadgetIndex = #PB_Compiler_EnumerationValue

Procedure.i Window_Getopf()
  If OpenWindow(#Window_Getopf,0,0,1250,550,"Get and display Calibre OPF contents",#PB_Window_SystemMenu|#PB_Window_ScreenCentered|#PB_Window_Invisible)
      TextGadget(#Gadget_Getopf_lTitle,10,10,130,25,"Title",#PB_Text_Center)
        SetGadgetFont(#Gadget_Getopf_lTitle,LoadFont(#Gadget_Getopf_lTitle,"Comic Sans MS",10))
      StringGadget(#Gadget_Getopf_Title,145,10,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
        SetGadgetFont(#Gadget_Getopf_Title,LoadFont(#Gadget_Getopf_Title,"Comic Sans MS",10))
      TextGadget(#Gadget_Getopf_lCreator,10,45,130,25,"Creator",#PB_Text_Center)
        SetGadgetFont(#Gadget_Getopf_lCreator,LoadFont(#Gadget_Getopf_lCreator,"Comic Sans MS",10))
      StringGadget(#Gadget_Getopf_Creator,145,45,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
        SetGadgetFont(#Gadget_Getopf_Creator,LoadFont(#Gadget_Getopf_Creator,"Comic Sans MS",10))
      TextGadget(#Gadget_Getopf_lDate,10,80,130,25,"Date",#PB_Text_Center)
        SetGadgetFont(#Gadget_Getopf_lDate,LoadFont(#Gadget_Getopf_lDate,"Comic Sans MS",10))
      StringGadget(#Gadget_Getopf_Date,145,80,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
        SetGadgetFont(#Gadget_Getopf_Date,LoadFont(#Gadget_Getopf_Date,"Comic Sans MS",10))
      TextGadget(#Gadget_Getopf_lPublisher,10,115,130,25,"Publisher",#PB_Text_Center)
        SetGadgetFont(#Gadget_Getopf_lPublisher,LoadFont(#Gadget_Getopf_lPublisher,"Comic Sans MS",10))
      StringGadget(#Gadget_Getopf_Publisher,145,115,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
        SetGadgetFont(#Gadget_Getopf_Publisher,LoadFont(#Gadget_Getopf_Publisher,"Comic Sans MS",10))
      TextGadget(#Gadget_Getopf_lIsbn,10,150,130,25,"ISBN",#PB_Text_Center)
        SetGadgetFont(#Gadget_Getopf_lIsbn,LoadFont(#Gadget_Getopf_lIsbn,"Comic Sans MS",10))
      StringGadget(#Gadget_Getopf_ISBN,145,150,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
        SetGadgetFont(#Gadget_Getopf_ISBN,LoadFont(#Gadget_Getopf_ISBN,"Comic Sans MS",10))
      TextGadget(#Gadget_Getopf_lGoogle,10,185,130,25,"Google",#PB_Text_Center)
        SetGadgetFont(#Gadget_Getopf_lGoogle,LoadFont(#Gadget_Getopf_lGoogle,"Comic Sans MS",10))
      StringGadget(#Gadget_Getopf_Google,145,185,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
        SetGadgetFont(#Gadget_Getopf_Google,LoadFont(#Gadget_Getopf_Google,"Comic Sans MS",10))
      TextGadget(#Gadget_Getopf_lLanguage,10,220,130,25,"Language",#PB_Text_Center)
        SetGadgetFont(#Gadget_Getopf_lLanguage,LoadFont(#Gadget_Getopf_lLanguage,"Comic Sans MS",10))
      StringGadget(#Gadget_Getopf_Language,145,220,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
        SetGadgetFont(#Gadget_Getopf_Language,LoadFont(#Gadget_Getopf_Language,"Comic Sans MS",10))
      TextGadget(#Gadget_Getopf_lSeries,10,255,130,25,"Series",#PB_Text_Center)
        SetGadgetFont(#Gadget_Getopf_lSeries,LoadFont(#Gadget_Getopf_lSeries,"Comic Sans MS",10))
      StringGadget(#Gadget_Getopf_Series,145,255,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
        SetGadgetFont(#Gadget_Getopf_Series,LoadFont(#Gadget_Getopf_Series,"Comic Sans MS",10))
      TextGadget(#Gadget_Getopf_lDescription,10,290,130,25,"Description",#PB_Text_Center)
        SetGadgetFont(#Gadget_Getopf_lDescription,LoadFont(#Gadget_Getopf_lDescription,"Comic Sans MS",10))
      EditorGadget(#Gadget_Getopf_Description,145,290,540,250,#PB_Editor_ReadOnly|#PB_Editor_WordWrap)
        SetGadgetFont(#Gadget_Getopf_Description,LoadFont(#Gadget_Getopf_Description,"Comic Sans MS",10))
      ListIconGadget(#Gadget_Getopf_Logfile,695,10,545,530,"Log entry",540,#PB_ListIcon_FullRowSelect|#PB_ListIcon_AlwaysShowSelection|#LVS_NOCOLUMNHEADER)
        SetGadgetFont(#Gadget_Getopf_Logfile,LoadFont(#Gadget_Getopf_Logfile,"Comic Sans MS",10))
      HideWindow(#Window_Getopf,#False)
    ProcedureReturn WindowID(#Window_Getopf)
  EndIf
EndProcedure

;--------------------------------------------------------------------------------------------------
; Macros
;--------------------------------------------------------------------------------------------------

Macro Addlog(Texttoadd)
  AddGadgetItem(#Gadget_Getopf_Logfile, -1, Texttoadd)
EndMacro

;--------------------------------------------------------------------------------------------------
; Constants
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Structures
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Prototypes
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Globals
;--------------------------------------------------------------------------------------------------

Global OPFFilename.s  = "D:\Metadata.opf"

Define quitGetopf     = #False

Define Ident$, Value$

;--------------------------------------------------------------------------------------------------
; Declarations
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Datafill
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Bindings
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Main Loop
;--------------------------------------------------------------------------------------------------

If Window_Getopf()
  
  Addlog("Adding RegEx operators")
  
  NewList RegEx$()
  
  AddElement(Regex$()) : Regex$() = ~"<dc:(title)>(.+?)</dc:title>"
  AddElement(Regex$()) : Regex$() = ~"<dc:(creator) opf:file-as=\"(.+?)</dc:creator>"
  AddElement(Regex$()) : Regex$() = ~"<dc:(date)>(.+?)</dc:date>"
  AddElement(Regex$()) : Regex$() = ~"<dc:(publisher)>(.+?)</dc:publisher>"
  AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"(ISBN)\">(.+?)</dc:identifier>"
  AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"(GOOGLE)\">(.+?)</dc:identifier>"
  AddElement(Regex$()) : Regex$() = ~"<dc:(language)>(.+?)</dc:language>"
  AddElement(Regex$()) : Regex$() = ~"<meta\\hname=\"calibre:(series)\"\\hcontent=\"(.+?)\"/>"
  AddElement(Regex$()) : Regex$() = ~"<dc:(description)>(.+?)</dc:description>"
  
  Addlog("Trying to open: " + OPFFilename.s)
  
  OPFhandle.i = OpenFile(#PB_Any, OPFFilename.s)
  
  If Not OPFhandle.i
    Addlog(OPFFilename.s  + " can't be found or opened")
    End
  EndIf
  
  Addlog("Reading: "  + OPFFilename.s)
  
  ; Define the text file we are reading the opf file into
  
  Define Txt$
  
  Addlog("Reading text string from: " + OPFFilename.s)
  
  While Not Eof(OPFhandle.i)
    Txt$ = ReadString(OPFhandle.i, #PB_Ascii | #PB_File_IgnoreEOL)
  Wend
  
  CloseFile(OPFhandle.i)
  
  Addlog(OPFFilename.s  + " closed, we are finished with it")
  
  Addlog(#Empty$)
  
  Addlog("Attempting to parse each value, text pair from: " + OPFFilename.s)
  
  Addlog(#Empty$)
  
  ForEach Regex$()
    If Not CreateRegularExpression(0, Regex$(), #PB_RegularExpression_DotAll)
      Addlog("Bad RegEx (" + RegEx$() + ")")
      Break
    Else
      If ExamineRegularExpression(0, Txt$)
        While NextRegularExpressionMatch(0)
          Ident$ = RegularExpressionGroup(0, 1)
          Value$ = RegularExpressionGroup(0, 2)
          Select Ident$
            Case "title"        : SetGadgetText(#Gadget_Getopf_Title,           Value$)
            Case "creator"      : SetGadgetText(#Gadget_Getopf_Creator,         Value$)
            Case "date"         : SetGadgetText(#Gadget_Getopf_Date,            Value$)
            Case "publisher"    : SetGadgetText(#Gadget_Getopf_Publisher,       Value$)
            Case "ISBN"         : SetGadgetText(#Gadget_Getopf_ISBN,            Value$)
            Case "GOOGLE"       : SetGadgetText(#Gadget_Getopf_Google,          Value$)
            Case "language"     : SetGadgetText(#Gadget_Getopf_Language,        Value$)
            Case "series"       : SetGadgetText(#Gadget_Getopf_Series,          Value$)
            Case "description"  : AddGadgetItem(#Gadget_Getopf_Description, -1, Value$)
          EndSelect
          Addlog(Ident$ + Space(12 - Len(Ident$)) + " : " + Value$)
        Wend
      EndIf
      FreeRegularExpression(0)
    EndIf
  Next
  
  ;Debug "Job done"
  
  Repeat
    EventID  = WaitWindowEvent()
    MenuID   = EventMenu()
    GadgetID = EventGadget()
    WindowID = EventWindow()
    Select EventID
      Case #PB_Event_CloseWindow
        Select WindowID
          Case #Window_Getopf
            quitGetopf = 1
        EndSelect
      Case #PB_Event_Gadget
        Select GadgetID
         ;Case #Gadget_Getopf_Title
         ;Case #Gadget_Getopf_Creator
         ;Case #Gadget_Getopf_Date
         ;Case #Gadget_Getopf_Publisher
         ;Case #Gadget_Getopf_ISBN
         ;Case #Gadget_Getopf_Google
         ;Case #Gadget_Getopf_Language
         ;Case #Gadget_Getopf_Series
         ;Case #Gadget_Getopf_Description
        EndSelect
    EndSelect
  Until quitGetopf
  CloseWindow(#Window_Getopf)
EndIf
End
Amateur Radio, D-STAR/VK3HAF
Marc56us
Addict
Addict
Posts: 1108
Joined: Sat Feb 08, 2014 3:26 pm
Location: France

Re: Calibre .OPF file format help needed

Post by Marc56us »

Hi Fangbeast,,
Glad to have helped you.
Now I just have to find a way to clear out the tags and render readable text. Probably regex again (hehehe).
Yes, here is a small procedure based on RegEx to filter some elements. I just use "ReplaceRegularExpression" to replace excess tags with nothing.
It's rudimentary, but you can modify it easily, just add the other expressions separated by a "|"
(There must be better HTML filtering codes on the forum)
I modified the "Select" at the "description" level to make it go through the filter.
Full code below

Code: Select all

;--------------------------------------------------------------------------------------------------
; Visual designer created forms and constants
;--------------------------------------------------------------------------------------------------

Global DPIfixX.d = DesktopResolutionX(), DPIfixY.d = DesktopResolutionY()

Define EventID, MenuID, GadgetID, WindowID

Declare Filter_Description()

; Window Constants

Enumeration 1
  #Window_Getopf
EndEnumeration

#WindowIndex = #PB_Compiler_EnumerationValue

; Gadget Constants

Enumeration 1
  ; Window_Getopf
  #Gadget_Getopf_lTitle
  #Gadget_Getopf_Title
  #Gadget_Getopf_lCreator
  #Gadget_Getopf_Creator
  #Gadget_Getopf_lDate
  #Gadget_Getopf_Date
  #Gadget_Getopf_lPublisher
  #Gadget_Getopf_Publisher
  #Gadget_Getopf_lIsbn
  #Gadget_Getopf_ISBN
  #Gadget_Getopf_lGoogle
  #Gadget_Getopf_Google
  #Gadget_Getopf_lLanguage
  #Gadget_Getopf_Language
  #Gadget_Getopf_lSeries
  #Gadget_Getopf_Series
  #Gadget_Getopf_lDescription
  #Gadget_Getopf_Description
  #Gadget_Getopf_Logfile
EndEnumeration

#GadgetIndex = #PB_Compiler_EnumerationValue

Procedure.i Window_Getopf()
  If OpenWindow(#Window_Getopf,0,0,1250,550,"Get and display Calibre OPF contents",#PB_Window_SystemMenu|#PB_Window_ScreenCentered|#PB_Window_Invisible)
    TextGadget(#Gadget_Getopf_lTitle,10,10,130,25,"Title",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lTitle,LoadFont(#Gadget_Getopf_lTitle,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Title,145,10,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Title,LoadFont(#Gadget_Getopf_Title,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lCreator,10,45,130,25,"Creator",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lCreator,LoadFont(#Gadget_Getopf_lCreator,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Creator,145,45,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Creator,LoadFont(#Gadget_Getopf_Creator,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lDate,10,80,130,25,"Date",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lDate,LoadFont(#Gadget_Getopf_lDate,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Date,145,80,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Date,LoadFont(#Gadget_Getopf_Date,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lPublisher,10,115,130,25,"Publisher",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lPublisher,LoadFont(#Gadget_Getopf_lPublisher,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Publisher,145,115,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Publisher,LoadFont(#Gadget_Getopf_Publisher,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lIsbn,10,150,130,25,"ISBN",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lIsbn,LoadFont(#Gadget_Getopf_lIsbn,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_ISBN,145,150,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_ISBN,LoadFont(#Gadget_Getopf_ISBN,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lGoogle,10,185,130,25,"Google",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lGoogle,LoadFont(#Gadget_Getopf_lGoogle,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Google,145,185,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Google,LoadFont(#Gadget_Getopf_Google,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lLanguage,10,220,130,25,"Language",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lLanguage,LoadFont(#Gadget_Getopf_lLanguage,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Language,145,220,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Language,LoadFont(#Gadget_Getopf_Language,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lSeries,10,255,130,25,"Series",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lSeries,LoadFont(#Gadget_Getopf_lSeries,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Series,145,255,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Series,LoadFont(#Gadget_Getopf_Series,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lDescription,10,290,130,25,"Description",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lDescription,LoadFont(#Gadget_Getopf_lDescription,"Comic Sans MS",10))
    EditorGadget(#Gadget_Getopf_Description,145,290,540,250,#PB_Editor_ReadOnly|#PB_Editor_WordWrap)
    SetGadgetFont(#Gadget_Getopf_Description,LoadFont(#Gadget_Getopf_Description,"Comic Sans MS",10))
    ListIconGadget(#Gadget_Getopf_Logfile,695,10,545,530,"Log entry",540,#PB_ListIcon_FullRowSelect|#PB_ListIcon_AlwaysShowSelection|#LVS_NOCOLUMNHEADER)
    SetGadgetFont(#Gadget_Getopf_Logfile,LoadFont(#Gadget_Getopf_Logfile, "Consolas", 10)) ; "Comic Sans MS",10))
    HideWindow(#Window_Getopf,#False)
    ProcedureReturn WindowID(#Window_Getopf)
  EndIf
EndProcedure

;--------------------------------------------------------------------------------------------------
; Macros
;--------------------------------------------------------------------------------------------------

Macro Addlog(Texttoadd)
  AddGadgetItem(#Gadget_Getopf_Logfile, -1, Texttoadd)
EndMacro

;--------------------------------------------------------------------------------------------------
; Constants
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Structures
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Prototypes
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Globals
;--------------------------------------------------------------------------------------------------

Global OPFFilename.s  = "D:\Metadata.opf"

Define quitGetopf     = #False

Global Ident$, Value$

;--------------------------------------------------------------------------------------------------
; Declarations
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Datafill
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Bindings
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Main Loop
;--------------------------------------------------------------------------------------------------

If Window_Getopf()
  
  Addlog("Adding RegEx operators")
  
  NewList RegEx$()
  
  AddElement(Regex$()) : Regex$() = ~"<dc:(title)>(.+?)</dc:title>"
  AddElement(Regex$()) : Regex$() = ~"<dc:(creator) opf:file-as=\"(.+?)</dc:creator>"
  AddElement(Regex$()) : Regex$() = ~"<dc:(date)>(.+?)</dc:date>"
  AddElement(Regex$()) : Regex$() = ~"<dc:(publisher)>(.+?)</dc:publisher>"
  AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"(ISBN)\">(.+?)</dc:identifier>"
  AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"(GOOGLE)\">(.+?)</dc:identifier>"
  AddElement(Regex$()) : Regex$() = ~"<dc:(language)>(.+?)</dc:language>"
  AddElement(Regex$()) : Regex$() = ~"<meta\\hname=\"calibre:(series)\"\\hcontent=\"(.+?)\"/>"
  AddElement(Regex$()) : Regex$() = ~"<dc:(description)>(.+?)</dc:description>"
  
  Addlog("Trying to open: " + OPFFilename.s)
  
  OPFhandle.i = OpenFile(#PB_Any, OPFFilename.s)
  
  If Not OPFhandle.i
    Addlog(OPFFilename.s  + " can't be found or opened")
    End
  EndIf
  
  Addlog("Reading: "  + OPFFilename.s)
  
  ; Define the text file we are reading the opf file into
  
  Define Txt$
  
  Addlog("Reading text string from: " + OPFFilename.s)
  
  While Not Eof(OPFhandle.i)
    Txt$ = ReadString(OPFhandle.i, #PB_Ascii | #PB_File_IgnoreEOL)
  Wend
  
  CloseFile(OPFhandle.i)
  
  Addlog(OPFFilename.s  + " closed, we are finished with it")
  
  Addlog(#Empty$)
  
  Addlog("Attempting to parse each value, text pair from: " + OPFFilename.s)
  
  Addlog(#Empty$)
  
  ForEach Regex$()
    If Not CreateRegularExpression(0, Regex$(), #PB_RegularExpression_DotAll)
      Addlog("Bad RegEx (" + RegEx$() + ")")
      Break
    Else
      If ExamineRegularExpression(0, Txt$)
        While NextRegularExpressionMatch(0)
          Ident$ = RegularExpressionGroup(0, 1)
          Value$ = RegularExpressionGroup(0, 2)
          Select Ident$
            Case "title"        : SetGadgetText(#Gadget_Getopf_Title,           Value$)
            Case "creator"      : SetGadgetText(#Gadget_Getopf_Creator,         Value$)
            Case "date"         : SetGadgetText(#Gadget_Getopf_Date,            Value$)
            Case "publisher"    : SetGadgetText(#Gadget_Getopf_Publisher,       Value$)
            Case "ISBN"         : SetGadgetText(#Gadget_Getopf_ISBN,            Value$)
            Case "GOOGLE"       : SetGadgetText(#Gadget_Getopf_Google,          Value$)
            Case "language"     : SetGadgetText(#Gadget_Getopf_Language,        Value$)
            Case "series"       : SetGadgetText(#Gadget_Getopf_Series,          Value$)
            Case "description"  
              Filter_Description() 
              AddGadgetItem(#Gadget_Getopf_Description, -1, Value$)
          EndSelect
          Addlog(Ident$ + Space(12 - Len(Ident$)) + " : " + Value$)
        Wend
      EndIf
      FreeRegularExpression(0)
    EndIf
  Next
  
  
  
  ;Debug "Job done"
  
  Repeat
    EventID  = WaitWindowEvent()
    MenuID   = EventMenu()
    GadgetID = EventGadget()
    WindowID = EventWindow()
    Select EventID
      Case #PB_Event_CloseWindow
        Select WindowID
          Case #Window_Getopf
            quitGetopf = 1
        EndSelect
      Case #PB_Event_Gadget
        Select GadgetID
            ;Case #Gadget_Getopf_Title
            ;Case #Gadget_Getopf_Creator
            ;Case #Gadget_Getopf_Date
            ;Case #Gadget_Getopf_Publisher
            ;Case #Gadget_Getopf_ISBN
            ;Case #Gadget_Getopf_Google
            ;Case #Gadget_Getopf_Language
            ;Case #Gadget_Getopf_Series
            ;Case #Gadget_Getopf_Description
        EndSelect
    EndSelect
  Until quitGetopf
  CloseWindow(#Window_Getopf)
EndIf

Procedure Filter_Description()
  If CreateRegularExpression(1, ~"&.+?;|h\\d|/p|/div|div&|gt;|p class=\"description\"SUMMARY:br")
    ExamineRegularExpression(1, Value$)
    While NextRegularExpressionMatch(1)
      Value$ = ReplaceRegularExpression(1, Value$, "")
    Wend  
    FreeRegularExpression(1)
  Else
    Debug "RegEx 2 error " + #CRLF$ + RegularExpressionError() 
  EndIf
EndProcedure

End
:wink:
(English is not my native language, I use an online translator.)
User avatar
Demivec
Addict
Addict
Posts: 3848
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Calibre .OPF file format help needed

Post by Demivec »

As an alternative to the RegEx approach already shown previously here is one using XML only:

Code: Select all

;--------------------------------------------------------------------------------------------------
; Visual designer created forms and constants
;--------------------------------------------------------------------------------------------------

Global DPIfixX.d = DesktopResolutionX(), DPIfixY.d = DesktopResolutionY()

Define EventID, MenuID, GadgetID, WindowID

; Window Constants

Enumeration 1
  #Window_Getopf
EndEnumeration

#WindowIndex = #PB_Compiler_EnumerationValue

; Gadget Constants

Enumeration 1
  ; Window_Getopf
  #Gadget_Getopf_lTitle
  #Gadget_Getopf_Title
  #Gadget_Getopf_lCreator
  #Gadget_Getopf_Creator
  #Gadget_Getopf_lDate
  #Gadget_Getopf_Date
  #Gadget_Getopf_lPublisher
  #Gadget_Getopf_Publisher
  #Gadget_Getopf_lIsbn
  #Gadget_Getopf_ISBN
  #Gadget_Getopf_lGoogle
  #Gadget_Getopf_Google
  #Gadget_Getopf_lLanguage
  #Gadget_Getopf_Language
  #Gadget_Getopf_lSeries
  #Gadget_Getopf_Series
  #Gadget_Getopf_lDescription
  #Gadget_Getopf_Description
  #Gadget_Getopf_Logfile
EndEnumeration

#GadgetIndex = #PB_Compiler_EnumerationValue

Procedure.i Window_Getopf()
  If OpenWindow(#Window_Getopf,0,0,1250,550,"Get and display Calibre OPF contents",#PB_Window_SystemMenu|#PB_Window_ScreenCentered|#PB_Window_Invisible)
    TextGadget(#Gadget_Getopf_lTitle,10,10,130,25,"Title",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lTitle,LoadFont(#Gadget_Getopf_lTitle,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Title,145,10,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Title,LoadFont(#Gadget_Getopf_Title,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lCreator,10,45,130,25,"Creator",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lCreator,LoadFont(#Gadget_Getopf_lCreator,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Creator,145,45,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Creator,LoadFont(#Gadget_Getopf_Creator,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lDate,10,80,130,25,"Date",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lDate,LoadFont(#Gadget_Getopf_lDate,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Date,145,80,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Date,LoadFont(#Gadget_Getopf_Date,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lPublisher,10,115,130,25,"Publisher",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lPublisher,LoadFont(#Gadget_Getopf_lPublisher,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Publisher,145,115,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Publisher,LoadFont(#Gadget_Getopf_Publisher,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lIsbn,10,150,130,25,"ISBN",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lIsbn,LoadFont(#Gadget_Getopf_lIsbn,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_ISBN,145,150,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_ISBN,LoadFont(#Gadget_Getopf_ISBN,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lGoogle,10,185,130,25,"Google",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lGoogle,LoadFont(#Gadget_Getopf_lGoogle,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Google,145,185,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Google,LoadFont(#Gadget_Getopf_Google,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lLanguage,10,220,130,25,"Language",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lLanguage,LoadFont(#Gadget_Getopf_lLanguage,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Language,145,220,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Language,LoadFont(#Gadget_Getopf_Language,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lSeries,10,255,130,25,"Series",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lSeries,LoadFont(#Gadget_Getopf_lSeries,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Series,145,255,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Series,LoadFont(#Gadget_Getopf_Series,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lDescription,10,290,130,25,"Description",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lDescription,LoadFont(#Gadget_Getopf_lDescription,"Comic Sans MS",10))
    EditorGadget(#Gadget_Getopf_Description,145,290,540,250,#PB_Editor_ReadOnly|#PB_Editor_WordWrap)
    SetGadgetFont(#Gadget_Getopf_Description,LoadFont(#Gadget_Getopf_Description,"Comic Sans MS",10))
    ListIconGadget(#Gadget_Getopf_Logfile,695,10,545,530,"Log entry",540,#PB_ListIcon_FullRowSelect|#PB_ListIcon_AlwaysShowSelection|#LVS_NOCOLUMNHEADER)
    SetGadgetFont(#Gadget_Getopf_Logfile,LoadFont(#Gadget_Getopf_Logfile,"Comic Sans MS",10))
    HideWindow(#Window_Getopf,#False)
    ProcedureReturn WindowID(#Window_Getopf)
  EndIf
EndProcedure

;--------------------------------------------------------------------------------------------------
; Macros
;--------------------------------------------------------------------------------------------------

Macro Addlog(Texttoadd)
  AddGadgetItem(#Gadget_Getopf_Logfile, -1, Texttoadd)
EndMacro

;--------------------------------------------------------------------------------------------------
; Constants
;--------------------------------------------------------------------------------------------------
#XML_OPF = 1
#TestDataSource = 0 ;0 = file, 1,2,3 = predefined strings

;--------------------------------------------------------------------------------------------------
; Structures
;--------------------------------------------------------------------------------------------------
Structure OPF_XMLInfo
  Node.i
  Ident$
  Value$
EndStructure
;--------------------------------------------------------------------------------------------------
; Prototypes
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Globals
;--------------------------------------------------------------------------------------------------

Global OPFFilename.s  =  "D:\Metadata.opf"; GetPathPart(ProgramFilename()) + "content.opf"

Define quitGetopf     = #False

Define Ident$, Value$

;--------------------------------------------------------------------------------------------------
; Declarations
;--------------------------------------------------------------------------------------------------
;Declare process_OPF(XID, List OPF_XML.OPF_XMLInfo())

;extract elements from OPF XML file and return results in a list
Procedure process_OPF(XID, List OPF_XML.OPF_XMLInfo())
  Protected RootNode_OPF = RootXMLNode(XID)
  Protected targetNode, attributeValue$, attributeValue2$
  
  ClearList(OPF_XML())
  
  With OPF_XML()
    ;solitary node values
    AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:title"): \Ident$ = "title" ;**Required
    If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
    AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:creator"): \Ident$ = "creator"
    If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
    AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:date"): \Ident$ = "date"
    If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
    AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:publisher"): \Ident$ = "publisher"
    If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
    AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:description"): \Ident$ = "description"
    If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
    
    ;ISBN & GOOGLE
    i = 1
    While XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:identifier[" + i + "]")
      targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:identifier[" + i + "]")
      attributeValue$ = GetXMLAttribute(targetNode, "opf:scheme")
      Select attributeValue$
        Case  "ISBN"
          AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "ISBN": \Value$ = GetXMLNodeText(OPF_XML()\Node)
        Case "GOOGLE"
          AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "GOOGLE": \Value$ = GetXMLNodeText(OPF_XML()\Node)
        Default
      EndSelect
      i + 1
    Wend
    
    ;language(s) **Required
    targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:language[1]")
    If targetNode
      AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "language": \Value$ = GetXMLNodeText(targetNode)
      
      ;create a combined entry if there is more than one language node
      i = 2
      While XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:language[" + i + "]")
        targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:language[" + i + "]")
        \Value$ + ", " + GetXMLNodeText(targetNode)
        i + 1
      Wend
    EndIf
    
    ;series
    i = 1
    While XMLNodeFromPath(RootNode_OPF, "/package/metadata/meta[" + i + "]")
      targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/meta[" + i + "]")
      attributeValue$ = GetXMLAttribute(targetNode, "name")
      
      If attributeValue$ = "calibre:series"
        AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "series": \Value$ = GetXMLAttribute(targetNode, "content")
      EndIf
      i + 1
    Wend
    
  EndWith
EndProcedure

;--------------------------------------------------------------------------------------------------
; Datafill
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Bindings
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Main Loop
;--------------------------------------------------------------------------------------------------

If Window_Getopf()
  
  NewList OPF_XML.OPF_XMLInfo()
  Define parseXMLSuccessful = #False  
  Select #TestDataSource
    Case 0
      Addlog("Trying to open: " + OPFFilename.s)
      OPFhandle.i = ReadFile(#PB_Any, OPFFilename.s)
      
;       Addlog("Trying to load as XML: " + OPFFilename.s)      
;       OPFhandle.i =  LoadXML(#XML_OPF, OPFFilename.s, #PB_Ascii)
      
      If Not OPFhandle.i
        Addlog(OPFFilename.s  + " can't be found or opened")
        MessageRequester("Error", OPFFilename.s  + " can't be found or opened")
        End
      EndIf
      
      Addlog("Reading text string from: " + OPFFilename.s)
      
      Define Txt$
      Txt$ = ReadString(OPFhandle.i, #PB_Ascii | #PB_File_IgnoreEOL)
      
      CloseFile(OPFhandle.i)
      
      Addlog(OPFFilename.s  + " closed, we are finished with it")
      
    Case 1  
      ;original test file
      Addlog("Using text from test string: #" + #TestDataSource +":")
      Txt$ = "<?xml version='1.0' encoding='utf-8'?>"
      Txt$ + "<package xmlns=|http://www.idpf.org/2007/opf| unique-identifier=|uuid_id| version=|2.0|>"
      Txt$ + "<metadata xmlns:dc=|http://purl.org/dc/elements/1.1/| xmlns:opf=|http://www.idpf.org/2007/opf|>"
      Txt$ + "<dc:identifier opf:scheme=|calibre| id=|calibre_id|>231</dc:identifier>"
      Txt$ + "<dc:identifier opf:scheme=|uuid| id=|uuid_id|>9ae160cc-033f-4d59-aa6c-ae9e5225bdf0</dc:identifier>"
      Txt$ + "<dc:title>Skylark of Space</dc:title>"
      Txt$ + "<dc:creator opf:file-as=|Smith, E. E. 'Doc'| opf:role=|aut|>E. E. 'Doc' Smith</dc:creator>"
      Txt$ + "<dc:contributor opf:file-as=|calibre| opf:role=|bkp|>calibre (5.20.0) [https://calibre-ebook.com]</dc:contributor>"
      Txt$ + "<dc:date>2011-09-29T22:52:37+00:00</dc:date>"
      Txt$ + "<dc:description>&lt;div&gt;&lt;div&gt;&lt;h3&gt;Product Description&lt;/h3&gt;&lt;p&gt;This is the first of the famous Skylark novels...a voyage to the ends of the universe. &lt;/p&gt;"
      Txt$ + "&lt;/div&gt;"
      Txt$ + "&lt;p class=|description|&gt;SUMMARY:&lt;br&gt;Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton's fiancée and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author's preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.&lt;/p&gt;&lt;/div&gt;"
      Txt$ + "&lt;div&gt;&lt;div&gt;&lt;h3&gt;Product Description&lt;/h3&gt;&lt;p&gt;This is the first of the famous Skylark novels...a voyage to the ends of the universe. &lt;/p&gt;"
      Txt$ + "&lt;/div&gt;"
      Txt$ + "&lt;p class=|description|&gt;SUMMARY:&lt;br&gt;Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton's fiancée and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author's preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.&lt;/p&gt;&lt;/div&gt;"
      Txt$ + "Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton&amp;#39;s fiancäe and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. ø The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author&amp;#39;s preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge."
      Txt$ + "|With the exception of the works of H. G. Wells, possibly those of Jules Verne -- and almost no other writer -- it has inspired more imitators and done more to change the nature of all the science fiction written after it than almost any other single work.| -- Frederik Pohl Finding that his government laboratory coworkers do not believe his discovery of a revolutionary power source that will enable interstellar flight, Dr. Richard Seaton acquires rights to his discovery from the government and commercializes it with the aid of his friend, millionaire inventor Martin Crane. When a former colleague tries to steal the invention, not only the future of Dr. Seaton and his allies, but ultimately the entire world hangs in the balance! The first of the great |space opera| science fiction novels, The Skylark of Space remains a thrilling tale more than 80 years after its creation.</dc:description>"
      Txt$ + "<dc:publisher>Berkley</dc:publisher>"
      Txt$ + "<dc:identifier opf:scheme=|GUID|>{0F66B19B-DFDA-4596-AD12-68FDF06D9AA7}</dc:identifier>"
      Txt$ + "<dc:identifier opf:scheme=|ISBN|>9780425046401</dc:identifier>"
      Txt$ + "<dc:identifier opf:scheme=|GOOGLE|>boBY9bNVQwAC</dc:identifier>"
      Txt$ + "<dc:identifier opf:scheme=|URI|>http|//www.gutenberg.org/ebooks/20869</dc:identifier>"
      Txt$ + "<dc:language>eng</dc:language>"
      Txt$ + "<dc:subject>Science Fiction</dc:subject>"
      Txt$ + "<dc:subject>Science Fiction/Fantasy</dc:subject>"
      Txt$ + "<dc:subject>Space ships -- Fiction</dc:subject>"
      Txt$ + "<dc:subject>Space flight -- Fiction</dc:subject>"
      Txt$ + "<dc:subject>Action &amp; Adventure</dc:subject>"
      Txt$ + "<dc:subject>Fiction</dc:subject>"
      Txt$ + "<dc:subject>General</dc:subject>"
      Txt$ + "<dc:subject>Space Opera</dc:subject>"
      Txt$ + "<meta name=|calibre:author_link_map| content=|{&quot;E. E. 'Doc' Smith&quot;: &quot;&quot;}|/>"
      Txt$ + "<meta name=|calibre:series| content=|Skylark|/>"
      Txt$ + "<meta name=|calibre:series_index| content=|1|/>"
      Txt$ + "<meta name=|calibre:rating| content=|10|/>"
      Txt$ + "<meta name=|calibre:timestamp| content=|2021-06-05T06:47:53.206469+00:00|/>"
      Txt$ + "<meta name=|calibre:title_sort| content=|Skylark of Space|/>"
      Txt$ + "</metadata>"
      Txt$ + "<guide>"
      Txt$ + "<reference type=|cover| title=|Cover| href=|cover.jpg|/>"
      Txt$ + "</guide>"
      Txt$ + "</package>"
      ReplaceString(Txt$, "|", Chr(34), #PB_String_InPlace)
    Case 2
      ;alternate test file #2
      Addlog("Using text from test string: #" + #TestDataSource +":")
      Txt$ = "<?xml version=|1.0|?>"
      Txt$ + "<package version=|2.0| xmlns=|http://www.idpf.org/2007/opf| unique-identifier=|BookId|>"
      Txt$ + ""
      Txt$ + "  <metadata xmlns:dc=|http://purl.org/dc/elements/1.1/| xmlns:opf=|http://www.idpf.org/2007/opf|>"
      Txt$ + "    <dc:title>Pride and Prejudice</dc:title>"
      Txt$ + "    <dc:language>en</dc:language>"
      Txt$ + "    <dc:identifier id=|BookId| opf:scheme=|ISBN|>123456789X</dc:identifier>"
      Txt$ + "    <dc:creator opf:file-as=|Austen, Jane| opf:role=|aut|>Jane Austen</dc:creator>"
      Txt$ + "  </metadata>"
      Txt$ + ""
      Txt$ + "  <manifest>"
      Txt$ + "    <item id=|chapter1| href=|chapter1.xhtml| media-type=|application/xhtml+xml|/>"
      Txt$ + "    <item id=|appendix| href=|appendix.xhtml| media-type=|application/xhtml+xml|/>"
      Txt$ + "    <item id=|stylesheet| href=|style.css| media-type=|text/css|/>"
      Txt$ + "    <item id=|ch1-pic| href=|ch1-pic.png| media-type=|image/png|/>"
      Txt$ + "    <item id=|myfont| href=|css/myfont.otf| media-type=|application/x-font-opentype|/>"
      Txt$ + "    <item id=|ncx| href=|toc.ncx| media-type=|application/x-dtbncx+xml|/>"
      Txt$ + "  </manifest>"
      Txt$ + ""
      Txt$ + "  <spine toc=|ncx|>"
      Txt$ + "    <itemref idref=|chapter1| />"
      Txt$ + "    <itemref idref=|appendix| />"
      Txt$ + "  </spine>"
      Txt$ + ""
      Txt$ + "  <guide>"
      Txt$ + "    <reference type=|loi| title=|List Of Illustrations| href=|appendix.xhtml#figures| />"
      Txt$ + "  </guide>"
      Txt$ + ""
      Txt$ + "</package>"
      ReplaceString(Txt$, "|", Chr(34), #PB_String_InPlace)
    Case 3
      ;alternate test file #3
      Addlog("Using text from test string: #" + #TestDataSource +":")
      Txt$ = "<?xml version=|1.0| encoding=|UTF-8|?>"
      Txt$ + "<package xmlns=|http://www.idpf.org/2007/opf| xmlns:dc=|http://purl.org/dc/elements/1.1/|"
      Txt$ + "     xmln:xsi=|http://www.w3.org/2001/XMLSchema-instance|"
      Txt$ + "     version=|2.0|"
      Txt$ + "     unique-identifier=|bookid|>"
      Txt$ + "  <metadata xmins:dc=|http://purl.org/dc/elements/1.1/| xmlns.opf=|http://www.idpf.org/2007/opf|>"
      Txt$ + "    <dc:creator>Anonymous</dc:creator>"
      Txt$ + "    <dc:title>The Three Bears</dc:title>"
      Txt$ + "    <dc:language xsi:type=|dcterms:RFC3066|>en-GB</dc:language>"
      Txt$ + "    <dc:rights>Public Domain</dc:rights>"
      Txt$ + "    <dc:publisher>Project Gutenberg (epub version: Bob DuCharme)</dc:publisher>"
      Txt$ + "    <dc:identifier id=|bookid|>http://www.snee.com/epub/pg23322</dc:identifier>"
      Txt$ + "  </metadata>"
      Txt$ + "  <maifest xmlns.opf=|http://www.idpf.org/2007/opf|>"
      Txt$ + "    <item id=|ncx| href=|toc.ncx| media-type=|text/xml|/>"
      Txt$ + "    <item id=|main| href=|23322+h.htm| media-type=|application/xhtml+xml|/>"
      Txt$ + "    <item id=|cover| href=|images/cover.jpg| media-type=|image/jpeg|/>"
      Txt$ + "    <item id=|1-4| href=|images/1-4.jpg| media-type=|image/jpeg|/>"
      Txt$ + "    <item id=|1-6| href=|images/1-6.jpg| media-type=|image/jpeg|/>"
      Txt$ + "    <item id=|1-9| href=|images/1-9.jpg| media-type=|image/jpeg|/>"
      Txt$ + "    <item id=|1-10| href=|images/1-10.jpg| media-type=|image/jpeg|/>"
      Txt$ + "    <item id=|1-13| href=|images/1-13.jpg| media-type=|image/jpeg|/>"
      Txt$ + "    <item id=|1-15| href=|images/1-15.jpg| media-type=|image/jpeg|/>"
      Txt$ + "  </maifest>"
      Txt$ + "  <spine toc=|ncx|>"
      Txt$ + "    <itemref idref=|main|/>"
      Txt$ + "  </spine>"
      Txt$ + "</package>"
      ReplaceString(Txt$, "|", Chr(34), #PB_String_InPlace)
  EndSelect   
   
  Addlog(#Empty$)
  
  Addlog("Parsing Text as XML ID:" + #XML_OPF)
  ParseXML(#XML_OPF, Txt$)
  If XMLStatus(#XML_OPF) = #PB_XML_Success
    parseXMLSuccessful = #True
    Addlog("Attempting to retrieve each value from XML: " + OPFFilename.s)
    process_OPF(#XML_OPF, OPF_XML())
  Else
    Addlog("error parsing " + XMLError(#XML_OPF) + " in line " + XMLErrorLine(#XML_OPF) + ", position " + XMLErrorPosition(#XML_OPF) + ".")
    Debug "error parsing " + XMLError(#XML_OPF) + " in line " + XMLErrorLine(#XML_OPF) + ", position " + XMLErrorPosition(#XML_OPF) + "."
    Debug Left(Txt$, XMLErrorPosition(#XML_OPF))
    parseXMLSuccessful = #True
  EndIf
  
  Addlog(#Empty$)
    
  ForEach OPF_XML()
    Ident$ = OPF_XML()\Ident$
    Value$ = OPF_XML()\Value$
    Select Ident$
      Case "title"        : SetGadgetText(#Gadget_Getopf_Title,           Value$)
      Case "creator"      : SetGadgetText(#Gadget_Getopf_Creator,         Value$)
      Case "date"         : SetGadgetText(#Gadget_Getopf_Date,            Value$)
      Case "publisher"    : SetGadgetText(#Gadget_Getopf_Publisher,       Value$)
      Case "ISBN"         : SetGadgetText(#Gadget_Getopf_ISBN,            Value$)
      Case "GOOGLE"       : SetGadgetText(#Gadget_Getopf_Google,          Value$)
      Case "language"     : SetGadgetText(#Gadget_Getopf_Language,        Value$)
      Case "series"       : SetGadgetText(#Gadget_Getopf_Series,          Value$)
      Case "description"  : AddGadgetItem(#Gadget_Getopf_Description, -1, Value$)
    EndSelect
    Addlog(Ident$ + Space(12 - Len(Ident$)) + " : " + Value$)
  Next
  
  ;Debug "Job done"
  
  Repeat
    EventID  = WaitWindowEvent()
    MenuID   = EventMenu()
    GadgetID = EventGadget()
    WindowID = EventWindow()
    Select EventID
      Case #PB_Event_CloseWindow
        Select WindowID
          Case #Window_Getopf
            quitGetopf = 1
        EndSelect
      Case #PB_Event_Gadget
        Select GadgetID
            ;Case #Gadget_Getopf_Title
            ;Case #Gadget_Getopf_Creator
            ;Case #Gadget_Getopf_Date
            ;Case #Gadget_Getopf_Publisher
            ;Case #Gadget_Getopf_ISBN
            ;Case #Gadget_Getopf_Google
            ;Case #Gadget_Getopf_Language
            ;Case #Gadget_Getopf_Series
            ;Case #Gadget_Getopf_Description
        EndSelect
    EndSelect
  Until quitGetopf
  CloseWindow(#Window_Getopf)
EndIf
End
It includes a few test options for the test source data that can be set in line from 101 by setting #TestDataSource to various values.

As an experiment I thought I would also try to come up with a procedure to example fields like the description field that have repeating text and see if they can be reduced in an automated and arbitrary way to something that doesn't include the repetition. That will take a bit of experimenting and if successful may be useful in other areas. If it fails it will most likely crash and burn and never be spoken of again in respectable circles. :)
User avatar
Demivec
Addict
Addict
Posts: 3848
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Calibre .OPF file format help needed

Post by Demivec »

As an alternative to the RegEx approach already shown previously here is one using XML only:

Code: Select all

;--------------------------------------------------------------------------------------------------
; Visual designer created forms and constants
;--------------------------------------------------------------------------------------------------

Global DPIfixX.d = DesktopResolutionX(), DPIfixY.d = DesktopResolutionY()

Define EventID, MenuID, GadgetID, WindowID

; Window Constants

Enumeration 1
  #Window_Getopf
EndEnumeration

#WindowIndex = #PB_Compiler_EnumerationValue

; Gadget Constants

Enumeration 1
  ; Window_Getopf
  #Gadget_Getopf_lTitle
  #Gadget_Getopf_Title
  #Gadget_Getopf_lCreator
  #Gadget_Getopf_Creator
  #Gadget_Getopf_lDate
  #Gadget_Getopf_Date
  #Gadget_Getopf_lPublisher
  #Gadget_Getopf_Publisher
  #Gadget_Getopf_lIsbn
  #Gadget_Getopf_ISBN
  #Gadget_Getopf_lGoogle
  #Gadget_Getopf_Google
  #Gadget_Getopf_lLanguage
  #Gadget_Getopf_Language
  #Gadget_Getopf_lSeries
  #Gadget_Getopf_Series
  #Gadget_Getopf_lDescription
  #Gadget_Getopf_Description
  #Gadget_Getopf_Logfile
EndEnumeration

#GadgetIndex = #PB_Compiler_EnumerationValue

Procedure.i Window_Getopf()
  If OpenWindow(#Window_Getopf,0,0,1250,550,"Get and display Calibre OPF contents",#PB_Window_SystemMenu|#PB_Window_ScreenCentered|#PB_Window_Invisible)
    TextGadget(#Gadget_Getopf_lTitle,10,10,130,25,"Title",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lTitle,LoadFont(#Gadget_Getopf_lTitle,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Title,145,10,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Title,LoadFont(#Gadget_Getopf_Title,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lCreator,10,45,130,25,"Creator",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lCreator,LoadFont(#Gadget_Getopf_lCreator,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Creator,145,45,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Creator,LoadFont(#Gadget_Getopf_Creator,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lDate,10,80,130,25,"Date",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lDate,LoadFont(#Gadget_Getopf_lDate,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Date,145,80,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Date,LoadFont(#Gadget_Getopf_Date,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lPublisher,10,115,130,25,"Publisher",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lPublisher,LoadFont(#Gadget_Getopf_lPublisher,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Publisher,145,115,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Publisher,LoadFont(#Gadget_Getopf_Publisher,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lIsbn,10,150,130,25,"ISBN",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lIsbn,LoadFont(#Gadget_Getopf_lIsbn,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_ISBN,145,150,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_ISBN,LoadFont(#Gadget_Getopf_ISBN,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lGoogle,10,185,130,25,"Google",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lGoogle,LoadFont(#Gadget_Getopf_lGoogle,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Google,145,185,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Google,LoadFont(#Gadget_Getopf_Google,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lLanguage,10,220,130,25,"Language",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lLanguage,LoadFont(#Gadget_Getopf_lLanguage,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Language,145,220,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Language,LoadFont(#Gadget_Getopf_Language,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lSeries,10,255,130,25,"Series",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lSeries,LoadFont(#Gadget_Getopf_lSeries,"Comic Sans MS",10))
    StringGadget(#Gadget_Getopf_Series,145,255,540,25,"",#PB_String_ReadOnly|#PB_String_BorderLess)
    SetGadgetFont(#Gadget_Getopf_Series,LoadFont(#Gadget_Getopf_Series,"Comic Sans MS",10))
    TextGadget(#Gadget_Getopf_lDescription,10,290,130,25,"Description",#PB_Text_Center)
    SetGadgetFont(#Gadget_Getopf_lDescription,LoadFont(#Gadget_Getopf_lDescription,"Comic Sans MS",10))
    EditorGadget(#Gadget_Getopf_Description,145,290,540,250,#PB_Editor_ReadOnly|#PB_Editor_WordWrap)
    SetGadgetFont(#Gadget_Getopf_Description,LoadFont(#Gadget_Getopf_Description,"Comic Sans MS",10))
    ListIconGadget(#Gadget_Getopf_Logfile,695,10,545,530,"Log entry",540,#PB_ListIcon_FullRowSelect|#PB_ListIcon_AlwaysShowSelection|#LVS_NOCOLUMNHEADER)
    SetGadgetFont(#Gadget_Getopf_Logfile,LoadFont(#Gadget_Getopf_Logfile,"Comic Sans MS",10))
    HideWindow(#Window_Getopf,#False)
    ProcedureReturn WindowID(#Window_Getopf)
  EndIf
EndProcedure

;--------------------------------------------------------------------------------------------------
; Macros
;--------------------------------------------------------------------------------------------------

Macro Addlog(Texttoadd)
  AddGadgetItem(#Gadget_Getopf_Logfile, -1, Texttoadd)
EndMacro

;--------------------------------------------------------------------------------------------------
; Constants
;--------------------------------------------------------------------------------------------------
#XML_OPF = 1
#TestDataSource = 0 ;0 = file, 1,2,3 = predefined strings

;--------------------------------------------------------------------------------------------------
; Structures
;--------------------------------------------------------------------------------------------------
Structure OPF_XMLInfo
  Node.i
  Ident$
  Value$
EndStructure
;--------------------------------------------------------------------------------------------------
; Prototypes
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Globals
;--------------------------------------------------------------------------------------------------

Global OPFFilename.s  =  "D:\Metadata.opf"; GetPathPart(ProgramFilename()) + "content.opf"

Define quitGetopf     = #False

Define Ident$, Value$

;--------------------------------------------------------------------------------------------------
; Declarations
;--------------------------------------------------------------------------------------------------
;Declare process_OPF(XID, List OPF_XML.OPF_XMLInfo())

;extract elements from OPF XML file and return results in a list
Procedure process_OPF(XID, List OPF_XML.OPF_XMLInfo())
  Protected RootNode_OPF = RootXMLNode(XID)
  Protected targetNode, attributeValue$, attributeValue2$
  
  ClearList(OPF_XML())
  
  With OPF_XML()
    ;solitary node values
    AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:title"): \Ident$ = "title" ;**Required
    If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
    AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:creator"): \Ident$ = "creator"
    If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
    AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:date"): \Ident$ = "date"
    If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
    AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:publisher"): \Ident$ = "publisher"
    If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
    AddElement(OPF_XML()): \Node = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:description"): \Ident$ = "description"
    If \Node: \Value$ = GetXMLNodeText(\Node): Else: \Value$ = "": EndIf
    
    ;ISBN & GOOGLE
    i = 1
    While XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:identifier[" + i + "]")
      targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:identifier[" + i + "]")
      attributeValue$ = GetXMLAttribute(targetNode, "opf:scheme")
      Select attributeValue$
        Case  "ISBN"
          AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "ISBN": \Value$ = GetXMLNodeText(OPF_XML()\Node)
        Case "GOOGLE"
          AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "GOOGLE": \Value$ = GetXMLNodeText(OPF_XML()\Node)
        Default
      EndSelect
      i + 1
    Wend
    
    ;language(s) **Required
    targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:language[1]")
    If targetNode
      AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "language": \Value$ = GetXMLNodeText(targetNode)
      
      ;create a combined entry if there is more than one language node
      i = 2
      While XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:language[" + i + "]")
        targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/dc:language[" + i + "]")
        \Value$ + ", " + GetXMLNodeText(targetNode)
        i + 1
      Wend
    EndIf
    
    ;series
    i = 1
    While XMLNodeFromPath(RootNode_OPF, "/package/metadata/meta[" + i + "]")
      targetNode = XMLNodeFromPath(RootNode_OPF, "/package/metadata/meta[" + i + "]")
      attributeValue$ = GetXMLAttribute(targetNode, "name")
      
      If attributeValue$ = "calibre:series"
        AddElement(OPF_XML()): \Node = targetNode: \Ident$ = "series": \Value$ = GetXMLAttribute(targetNode, "content")
      EndIf
      i + 1
    Wend
    
  EndWith
EndProcedure

;--------------------------------------------------------------------------------------------------
; Datafill
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Bindings
;--------------------------------------------------------------------------------------------------

;--------------------------------------------------------------------------------------------------
; Main Loop
;--------------------------------------------------------------------------------------------------

If Window_Getopf()
  
  NewList OPF_XML.OPF_XMLInfo()
  Define parseXMLSuccessful = #False  
  Select #TestDataSource
    Case 0
      Addlog("Trying to open: " + OPFFilename.s)
      OPFhandle.i = ReadFile(#PB_Any, OPFFilename.s)
      
;       Addlog("Trying to load as XML: " + OPFFilename.s)      
;       OPFhandle.i =  LoadXML(#XML_OPF, OPFFilename.s, #PB_Ascii)
      
      If Not OPFhandle.i
        Addlog(OPFFilename.s  + " can't be found or opened")
        MessageRequester("Error", OPFFilename.s  + " can't be found or opened")
        End
      EndIf
      
      Addlog("Reading text string from: " + OPFFilename.s)
      
      Define Txt$
      Txt$ = ReadString(OPFhandle.i, #PB_Ascii | #PB_File_IgnoreEOL)
      
      CloseFile(OPFhandle.i)
      
      Addlog(OPFFilename.s  + " closed, we are finished with it")
      
    Case 1  
      ;original test file
      Addlog("Using text from test string: #" + #TestDataSource +":")
      Txt$ = "<?xml version='1.0' encoding='utf-8'?>"
      Txt$ + "<package xmlns=|http://www.idpf.org/2007/opf| unique-identifier=|uuid_id| version=|2.0|>"
      Txt$ + "<metadata xmlns:dc=|http://purl.org/dc/elements/1.1/| xmlns:opf=|http://www.idpf.org/2007/opf|>"
      Txt$ + "<dc:identifier opf:scheme=|calibre| id=|calibre_id|>231</dc:identifier>"
      Txt$ + "<dc:identifier opf:scheme=|uuid| id=|uuid_id|>9ae160cc-033f-4d59-aa6c-ae9e5225bdf0</dc:identifier>"
      Txt$ + "<dc:title>Skylark of Space</dc:title>"
      Txt$ + "<dc:creator opf:file-as=|Smith, E. E. 'Doc'| opf:role=|aut|>E. E. 'Doc' Smith</dc:creator>"
      Txt$ + "<dc:contributor opf:file-as=|calibre| opf:role=|bkp|>calibre (5.20.0) [https://calibre-ebook.com]</dc:contributor>"
      Txt$ + "<dc:date>2011-09-29T22:52:37+00:00</dc:date>"
      Txt$ + "<dc:description>&lt;div&gt;&lt;div&gt;&lt;h3&gt;Product Description&lt;/h3&gt;&lt;p&gt;This is the first of the famous Skylark novels...a voyage to the ends of the universe. &lt;/p&gt;"
      Txt$ + "&lt;/div&gt;"
      Txt$ + "&lt;p class=|description|&gt;SUMMARY:&lt;br&gt;Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton's fiancée and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author's preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.&lt;/p&gt;&lt;/div&gt;"
      Txt$ + "&lt;div&gt;&lt;div&gt;&lt;h3&gt;Product Description&lt;/h3&gt;&lt;p&gt;This is the first of the famous Skylark novels...a voyage to the ends of the universe. &lt;/p&gt;"
      Txt$ + "&lt;/div&gt;"
      Txt$ + "&lt;p class=|description|&gt;SUMMARY:&lt;br&gt;Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton's fiancée and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author's preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.&lt;/p&gt;&lt;/div&gt;"
      Txt$ + "Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton&amp;#39;s fiancäe and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. ø The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author&amp;#39;s preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge."
      Txt$ + "|With the exception of the works of H. G. Wells, possibly those of Jules Verne -- and almost no other writer -- it has inspired more imitators and done more to change the nature of all the science fiction written after it than almost any other single work.| -- Frederik Pohl Finding that his government laboratory coworkers do not believe his discovery of a revolutionary power source that will enable interstellar flight, Dr. Richard Seaton acquires rights to his discovery from the government and commercializes it with the aid of his friend, millionaire inventor Martin Crane. When a former colleague tries to steal the invention, not only the future of Dr. Seaton and his allies, but ultimately the entire world hangs in the balance! The first of the great |space opera| science fiction novels, The Skylark of Space remains a thrilling tale more than 80 years after its creation.</dc:description>"
      Txt$ + "<dc:publisher>Berkley</dc:publisher>"
      Txt$ + "<dc:identifier opf:scheme=|GUID|>{0F66B19B-DFDA-4596-AD12-68FDF06D9AA7}</dc:identifier>"
      Txt$ + "<dc:identifier opf:scheme=|ISBN|>9780425046401</dc:identifier>"
      Txt$ + "<dc:identifier opf:scheme=|GOOGLE|>boBY9bNVQwAC</dc:identifier>"
      Txt$ + "<dc:identifier opf:scheme=|URI|>http|//www.gutenberg.org/ebooks/20869</dc:identifier>"
      Txt$ + "<dc:language>eng</dc:language>"
      Txt$ + "<dc:subject>Science Fiction</dc:subject>"
      Txt$ + "<dc:subject>Science Fiction/Fantasy</dc:subject>"
      Txt$ + "<dc:subject>Space ships -- Fiction</dc:subject>"
      Txt$ + "<dc:subject>Space flight -- Fiction</dc:subject>"
      Txt$ + "<dc:subject>Action &amp; Adventure</dc:subject>"
      Txt$ + "<dc:subject>Fiction</dc:subject>"
      Txt$ + "<dc:subject>General</dc:subject>"
      Txt$ + "<dc:subject>Space Opera</dc:subject>"
      Txt$ + "<meta name=|calibre:author_link_map| content=|{&quot;E. E. 'Doc' Smith&quot;: &quot;&quot;}|/>"
      Txt$ + "<meta name=|calibre:series| content=|Skylark|/>"
      Txt$ + "<meta name=|calibre:series_index| content=|1|/>"
      Txt$ + "<meta name=|calibre:rating| content=|10|/>"
      Txt$ + "<meta name=|calibre:timestamp| content=|2021-06-05T06:47:53.206469+00:00|/>"
      Txt$ + "<meta name=|calibre:title_sort| content=|Skylark of Space|/>"
      Txt$ + "</metadata>"
      Txt$ + "<guide>"
      Txt$ + "<reference type=|cover| title=|Cover| href=|cover.jpg|/>"
      Txt$ + "</guide>"
      Txt$ + "</package>"
      ReplaceString(Txt$, "|", Chr(34), #PB_String_InPlace)
    Case 2
      ;alternate test file #2
      Addlog("Using text from test string: #" + #TestDataSource +":")
      Txt$ = "<?xml version=|1.0|?>"
      Txt$ + "<package version=|2.0| xmlns=|http://www.idpf.org/2007/opf| unique-identifier=|BookId|>"
      Txt$ + ""
      Txt$ + "  <metadata xmlns:dc=|http://purl.org/dc/elements/1.1/| xmlns:opf=|http://www.idpf.org/2007/opf|>"
      Txt$ + "    <dc:title>Pride and Prejudice</dc:title>"
      Txt$ + "    <dc:language>en</dc:language>"
      Txt$ + "    <dc:identifier id=|BookId| opf:scheme=|ISBN|>123456789X</dc:identifier>"
      Txt$ + "    <dc:creator opf:file-as=|Austen, Jane| opf:role=|aut|>Jane Austen</dc:creator>"
      Txt$ + "  </metadata>"
      Txt$ + ""
      Txt$ + "  <manifest>"
      Txt$ + "    <item id=|chapter1| href=|chapter1.xhtml| media-type=|application/xhtml+xml|/>"
      Txt$ + "    <item id=|appendix| href=|appendix.xhtml| media-type=|application/xhtml+xml|/>"
      Txt$ + "    <item id=|stylesheet| href=|style.css| media-type=|text/css|/>"
      Txt$ + "    <item id=|ch1-pic| href=|ch1-pic.png| media-type=|image/png|/>"
      Txt$ + "    <item id=|myfont| href=|css/myfont.otf| media-type=|application/x-font-opentype|/>"
      Txt$ + "    <item id=|ncx| href=|toc.ncx| media-type=|application/x-dtbncx+xml|/>"
      Txt$ + "  </manifest>"
      Txt$ + ""
      Txt$ + "  <spine toc=|ncx|>"
      Txt$ + "    <itemref idref=|chapter1| />"
      Txt$ + "    <itemref idref=|appendix| />"
      Txt$ + "  </spine>"
      Txt$ + ""
      Txt$ + "  <guide>"
      Txt$ + "    <reference type=|loi| title=|List Of Illustrations| href=|appendix.xhtml#figures| />"
      Txt$ + "  </guide>"
      Txt$ + ""
      Txt$ + "</package>"
      ReplaceString(Txt$, "|", Chr(34), #PB_String_InPlace)
    Case 3
      ;alternate test file #3
      Addlog("Using text from test string: #" + #TestDataSource +":")
      Txt$ = "<?xml version=|1.0| encoding=|UTF-8|?>"
      Txt$ + "<package xmlns=|http://www.idpf.org/2007/opf| xmlns:dc=|http://purl.org/dc/elements/1.1/|"
      Txt$ + "     xmln:xsi=|http://www.w3.org/2001/XMLSchema-instance|"
      Txt$ + "     version=|2.0|"
      Txt$ + "     unique-identifier=|bookid|>"
      Txt$ + "  <metadata xmins:dc=|http://purl.org/dc/elements/1.1/| xmlns.opf=|http://www.idpf.org/2007/opf|>"
      Txt$ + "    <dc:creator>Anonymous</dc:creator>"
      Txt$ + "    <dc:title>The Three Bears</dc:title>"
      Txt$ + "    <dc:language xsi:type=|dcterms:RFC3066|>en-GB</dc:language>"
      Txt$ + "    <dc:rights>Public Domain</dc:rights>"
      Txt$ + "    <dc:publisher>Project Gutenberg (epub version: Bob DuCharme)</dc:publisher>"
      Txt$ + "    <dc:identifier id=|bookid|>http://www.snee.com/epub/pg23322</dc:identifier>"
      Txt$ + "  </metadata>"
      Txt$ + "  <maifest xmlns.opf=|http://www.idpf.org/2007/opf|>"
      Txt$ + "    <item id=|ncx| href=|toc.ncx| media-type=|text/xml|/>"
      Txt$ + "    <item id=|main| href=|23322+h.htm| media-type=|application/xhtml+xml|/>"
      Txt$ + "    <item id=|cover| href=|images/cover.jpg| media-type=|image/jpeg|/>"
      Txt$ + "    <item id=|1-4| href=|images/1-4.jpg| media-type=|image/jpeg|/>"
      Txt$ + "    <item id=|1-6| href=|images/1-6.jpg| media-type=|image/jpeg|/>"
      Txt$ + "    <item id=|1-9| href=|images/1-9.jpg| media-type=|image/jpeg|/>"
      Txt$ + "    <item id=|1-10| href=|images/1-10.jpg| media-type=|image/jpeg|/>"
      Txt$ + "    <item id=|1-13| href=|images/1-13.jpg| media-type=|image/jpeg|/>"
      Txt$ + "    <item id=|1-15| href=|images/1-15.jpg| media-type=|image/jpeg|/>"
      Txt$ + "  </maifest>"
      Txt$ + "  <spine toc=|ncx|>"
      Txt$ + "    <itemref idref=|main|/>"
      Txt$ + "  </spine>"
      Txt$ + "</package>"
      ReplaceString(Txt$, "|", Chr(34), #PB_String_InPlace)
  EndSelect   
   
  Addlog(#Empty$)
  
  Addlog("Parsing Text as XML ID:" + #XML_OPF)
  ParseXML(#XML_OPF, Txt$)
  If XMLStatus(#XML_OPF) = #PB_XML_Success
    parseXMLSuccessful = #True
    Addlog("Attempting to retrieve each value from XML: " + OPFFilename.s)
    process_OPF(#XML_OPF, OPF_XML())
  Else
    Addlog("error parsing " + XMLError(#XML_OPF) + " in line " + XMLErrorLine(#XML_OPF) + ", position " + XMLErrorPosition(#XML_OPF) + ".")
    Debug "error parsing " + XMLError(#XML_OPF) + " in line " + XMLErrorLine(#XML_OPF) + ", position " + XMLErrorPosition(#XML_OPF) + "."
    Debug Left(Txt$, XMLErrorPosition(#XML_OPF))
    parseXMLSuccessful = #True
  EndIf
  
  Addlog(#Empty$)
    
  ForEach OPF_XML()
    Ident$ = OPF_XML()\Ident$
    Value$ = OPF_XML()\Value$
    Select Ident$
      Case "title"        : SetGadgetText(#Gadget_Getopf_Title,           Value$)
      Case "creator"      : SetGadgetText(#Gadget_Getopf_Creator,         Value$)
      Case "date"         : SetGadgetText(#Gadget_Getopf_Date,            Value$)
      Case "publisher"    : SetGadgetText(#Gadget_Getopf_Publisher,       Value$)
      Case "ISBN"         : SetGadgetText(#Gadget_Getopf_ISBN,            Value$)
      Case "GOOGLE"       : SetGadgetText(#Gadget_Getopf_Google,          Value$)
      Case "language"     : SetGadgetText(#Gadget_Getopf_Language,        Value$)
      Case "series"       : SetGadgetText(#Gadget_Getopf_Series,          Value$)
      Case "description"  : AddGadgetItem(#Gadget_Getopf_Description, -1, Value$)
    EndSelect
    Addlog(Ident$ + Space(12 - Len(Ident$)) + " : " + Value$)
  Next
  
  ;Debug "Job done"
  
  Repeat
    EventID  = WaitWindowEvent()
    MenuID   = EventMenu()
    GadgetID = EventGadget()
    WindowID = EventWindow()
    Select EventID
      Case #PB_Event_CloseWindow
        Select WindowID
          Case #Window_Getopf
            quitGetopf = 1
        EndSelect
      Case #PB_Event_Gadget
        Select GadgetID
            ;Case #Gadget_Getopf_Title
            ;Case #Gadget_Getopf_Creator
            ;Case #Gadget_Getopf_Date
            ;Case #Gadget_Getopf_Publisher
            ;Case #Gadget_Getopf_ISBN
            ;Case #Gadget_Getopf_Google
            ;Case #Gadget_Getopf_Language
            ;Case #Gadget_Getopf_Series
            ;Case #Gadget_Getopf_Description
        EndSelect
    EndSelect
  Until quitGetopf
  CloseWindow(#Window_Getopf)
EndIf
End
It includes a few test options for the test source data that can be set in line from 101 by setting #TestDataSource to various values.

As an experiment I thought I would also try to come up with a procedure to examine fields like the description field that have repeating text and see if they can be reduced in an automated and arbitrary way to something that doesn't include the repetition. That will take a bit of experimenting and if successful may be useful in other areas. If it fails it will most likely crash and burn and never be spoken of again in respectable circles. :)
Last edited by Demivec on Tue Aug 24, 2021 3:16 am, edited 1 time in total.
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4667
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Re: Calibre .OPF file format help needed

Post by Fangbeast »

Thanks Demivec. As soon as I have cleared my current project bugs, I can play:)
Amateur Radio, D-STAR/VK3HAF
Post Reply