Calibre .OPF file format help needed

Just starting out? Need help? Post your questions and find answers here.
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4667
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Calibre .OPF file format help needed

Post by Fangbeast »

Just had a look at the Calibre ebook reader .OPF file description format and it's non standard xml fields leave me a little puzzled (as in: I know less than nothing) )how to deal with it to extract data.

Has anyone had a go at it?

Even the smallest opf file is a bit 'windy' so I didn't want to post one. Yet. Unless someone asks:):)
Amateur Radio, D-STAR/VK3HAF
jack
Addict
Addict
Posts: 1256
Joined: Fri Apr 25, 2003 11:10 pm

Re: Calibre .OPF file format help needed

Post by jack »

couldn't you use the PDF library ? I think that somewhere on this forum there is a PBpdf library
User avatar
Demivec
Addict
Addict
Posts: 3849
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Calibre .OPF file format help needed

Post by Demivec »

What data are you wishing to extract?
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4667
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Re: Calibre .OPF file format help needed

Post by Fangbeast »

jack wrote: Mon Aug 16, 2021 3:20 pm couldn't you use the PDF library ? I think that somewhere on this forum there is a PBpdf library
Didn't know you could do that? Never used the PBpdf lib before.
Amateur Radio, D-STAR/VK3HAF
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4667
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Re: Calibre .OPF file format help needed

Post by Fangbeast »

Demivec wrote: Mon Aug 16, 2021 3:23 pm What data are you wishing to extract?
Managed to find a smaller opf in my own collection. Title, creator, date, description.
publisher
ISBN
GOOGLE
language
series

But it seems as if the description field is repeated twice or even 3 times (that I can see)

Then it has multiple 'subject' fields which are actually the category/genre


<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="uuid_id" version="2.0">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:identifier opf:scheme="calibre" id="calibre_id">231</dc:identifier>
<dc:identifier opf:scheme="uuid" id="uuid_id">9ae160cc-033f-4d59-aa6c-ae9e5225bdf0</dc:identifier>
<dc:title>Skylark of Space</dc:title>
<dc:creator opf:file-as="Smith, E. E. 'Doc'" opf:role="aut">E. E. 'Doc' Smith</dc:creator>
<dc:contributor opf:file-as="calibre" opf:role="bkp">calibre (5.20.0) [https://calibre-ebook.com]</dc:contributor>
<dc:date>2011-09-29T22:52:37+00:00</dc:date>
<dc:description>&lt;div&gt;&lt;div&gt;&lt;h3&gt;Product Description&lt;/h3&gt;&lt;p&gt;This is the first of the famous Skylark novels...a voyage to the ends of the universe. &lt;/p&gt;

&lt;/div&gt;
&lt;p class="description"&gt;SUMMARY:&lt;br&gt;Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton's fiancée and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author's preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.&lt;/p&gt;&lt;/div&gt;

&lt;div&gt;&lt;div&gt;&lt;h3&gt;Product Description&lt;/h3&gt;&lt;p&gt;This is the first of the famous Skylark novels...a voyage to the ends of the universe. &lt;/p&gt;

&lt;/div&gt;
&lt;p class="description"&gt;SUMMARY:&lt;br&gt;Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton's fiancée and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author's preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.&lt;/p&gt;&lt;/div&gt;

Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton&amp;#39;s fiancäe and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. ø The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author&amp;#39;s preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.

"With the exception of the works of H. G. Wells, possibly those of Jules Verne -- and almost no other writer -- it has inspired more imitators and done more to change the nature of all the science fiction written after it than almost any other single work." -- Frederik Pohl Finding that his government laboratory coworkers do not believe his discovery of a revolutionary power source that will enable interstellar flight, Dr. Richard Seaton acquires rights to his discovery from the government and commercializes it with the aid of his friend, millionaire inventor Martin Crane. When a former colleague tries to steal the invention, not only the future of Dr. Seaton and his allies, but ultimately the entire world hangs in the balance! The first of the great "space opera" science fiction novels, The Skylark of Space remains a thrilling tale more than 80 years after its creation.</dc:description>
<dc:publisher>Berkley</dc:publisher>
<dc:identifier opf:scheme="GUID">{0F66B19B-DFDA-4596-AD12-68FDF06D9AA7}</dc:identifier>
<dc:identifier opf:scheme="ISBN">9780425046401</dc:identifier>
<dc:identifier opf:scheme="GOOGLE">boBY9bNVQwAC</dc:identifier>
<dc:identifier opf:scheme="URI">http|//www.gutenberg.org/ebooks/20869</dc:identifier>
<dc:language>eng</dc:language>
<dc:subject>Science Fiction</dc:subject>
<dc:subject>Science Fiction/Fantasy</dc:subject>
<dc:subject>Space ships -- Fiction</dc:subject>
<dc:subject>Space flight -- Fiction</dc:subject>
<dc:subject>Action &amp; Adventure</dc:subject>
<dc:subject>Fiction</dc:subject>
<dc:subject>General</dc:subject>
<dc:subject>Space Opera</dc:subject>
<meta name="calibre:author_link_map" content="{&quot;E. E. 'Doc' Smith&quot;: &quot;&quot;}"/>
<meta name="calibre:series" content="Skylark"/>
<meta name="calibre:series_index" content="1"/>
<meta name="calibre:rating" content="10"/>
<meta name="calibre:timestamp" content="2021-06-05T06:47:53.206469+00:00"/>
<meta name="calibre:title_sort" content="Skylark of Space"/>
</metadata>
<guide>
<reference type="cover" title="Cover" href="cover.jpg"/>
</guide>
</package>
Amateur Radio, D-STAR/VK3HAF
normeus
Enthusiast
Enthusiast
Posts: 322
Joined: Fri Apr 20, 2012 8:09 pm
Contact:

Re: Calibre .OPF file format help needed

Post by normeus »

Since you are able to edit this data in Calibre, I think someone just copied the SUMMARY 2x.
If you load your book in Calibre, not the reader, but the full version. You'll see where you can change the metadata ("Edit metadata" button). There you can download the description from Amazon, google, etc.. you can automate this in Calibre from the command line if you need to do a bunch. just be careful not to be greedy and download multiple descriptions.

Norm.
google Translate;Makes my jokes fall flat- Fait mes blagues tombent à plat- Machte meine Witze verpuffen- Eh cumpari ci vo sunari
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4667
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Re: Calibre .OPF file format help needed

Post by Fangbeast »

normeus wrote: Tue Aug 17, 2021 2:26 am There you can download the description from Amazon, google, etc.. you can automate this in Calibre from the command line if you need to do a bunch.
I don't see where you are going with this. All the data I need is already in the opf file and I just need a way to extract it.

Don't want to try 40,000 downloads from the internet just to import the books into my database:):)

I got a bit lost because I was going to 'try' to put a structure together and use one of the extract xml functions but the fields have colons in them. Drat
Amateur Radio, D-STAR/VK3HAF
infratec
Always Here
Always Here
Posts: 5529
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Calibre .OPF file format help needed

Post by infratec »

Replace them before with an underscore :wink:
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4667
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Re: Calibre .OPF file format help needed

Post by Fangbeast »

infratec wrote: Tue Aug 17, 2021 6:41 am Replace them before with an underscore :wink:
Stop that horny goat winking, that's idle's job:):)

When, where, how???

Given the below code to load a 'normal' xml without colons, I don't know where, how to do this, at what point.

I have no idea what I am doing here.

Code: Select all

[code]Structure opfitem
  title.s
  subject.s
EndStructure

Structure opfbook
  List opflist.opfitem()
EndStructure
  
  ImportFilename = OpenFileRequester("Choose a file to import", "", "XML|*.xml", 0)
  
  If ImportFilename
    XML = LoadXML(#PB_Any, ImportFilename)
    If XML
      If XMLStatus(XML) = #PB_XML_Success
        *MainNode = MainXMLNode(XML)
        ExtractXMLList(*MainNode, @opflist)
        ForEach opfbook\opflist()
          Debug opfbook\opflist()\title
          Debug opfbook\opflist()\subject
          Debug "------"
        Next
      EndIf
    EndIf
  EndIf
  
[/code]
Amateur Radio, D-STAR/VK3HAF
Marc56us
Addict
Addict
Posts: 1108
Joined: Sat Feb 08, 2014 3:26 pm
Location: France

Re: Calibre .OPF file format help needed

Post by Marc56us »

Hi Fangbeast ,

I don't understand XML, so I do it in plain text.
Here with regular expressions (because I like that).

The expressions are simple to understand and therefore to modify

Code: Select all

- Beginning XML tag
- Part to keep:  (.+?)
- End XML tag
You just have to 'escape' the characters like a string in C

But it can also be done with text functions (but you have to loop if there are several identical fields (ex: author(s))

Here is a small functional example, using your sample saved in a file.

Code: Select all

EnableExplicit

Enumeration 
    #hFile    
EndEnumeration

NewList RegEx$()

AddElement(Regex$()) : Regex$() = ~"<dc:title>(.+?)</dc:title>"
AddElement(Regex$()) : Regex$() = ~"<dc:creator opf:file-as=\"(.+?)</dc:creator>"
AddElement(Regex$()) : Regex$() = ~"<dc:date>(.+?)</dc:date>"
AddElement(Regex$()) : Regex$() = ~"<dc:publisher>(.+?)</dc:publisher>"
AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"ISBN\">(.+?)</dc:identifier>"
AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"GOOGLE\">(.+?)</dc:identifier>"
AddElement(Regex$()) : Regex$() = ~"<dc:language>(.+?)</dc:language>"
AddElement(Regex$()) : Regex$() = ~"<meta\\hname=\"calibre:series\"\\hcontent=\"(.+?)\"/>"
AddElement(Regex$()) : Regex$() = ~"<dc:description>(.+?)</dc:description>"

; Read OPF file
#File_Name = "TestFile.opf"

If Not OpenFile(0, #File_Name)
     Debug #File_Name + " Can't be found or open"
     End
EndIf
Debug "Reading: " + #File_Name
Define Txt$
While Not Eof(#hFile)
     Txt$ = ReadString(#hFile, #PB_Ascii | #PB_File_IgnoreEOL)
Wend
CloseFile(#hFile)

ForEach Regex$()
    If Not CreateRegularExpression(0, Regex$(), #PB_RegularExpression_DotAll)
        Debug "Bad RegEx (" + RegEx$() + ")"
        Break
    Else
        Debug "--- Search for: " + RegEx$()
        Debug ""
        If ExamineRegularExpression(0, Txt$)
            While NextRegularExpressionMatch(0)
                Debug "    " + RegularExpressionGroup(0, 1)
            Wend    
        EndIf
        FreeRegularExpression(0)
        Debug ""
    EndIf
Next

Debug "Done"

End

Feel free to adapt, for example by putting the regexes in a structure to give them a title or by entering the data in your database at each iteration.

The creator field can be broken down into several fields. It depends if this example is particular or general (obviously this file is not strictly in Epub format)

I put the description field at the end for general readability. Anyway regular expressions read the whole file (in memory) each time. This is not a speed problem.

Enjoy
:wink:

Edit:
I installed Calibre to test and realized that it stores all the metadata in a metadata.db file which is a simple SQLite file.
So if you just want to use your e-book data already referenced in Calibre, it may be easier to use PB's SQLite functions on this .db file.
Suggest to read/edit SQLite database if you don't have any: SQLite Studio (freeware, small, fast, portable, easy)
(English is not my native language, I use an online translator.)
normeus
Enthusiast
Enthusiast
Posts: 322
Joined: Fri Apr 20, 2012 8:09 pm
Contact:

Re: Calibre .OPF file format help needed

Post by normeus »

Fagbeast,
I was answering the question of "why multiple descriptions?", and you are right I missed the point.

@Marc56us is right, don't bother with XML, get your data directly from the Calibre database using mysql.
If you forgot where you told Calibre to save your books, click on the down arrow next to Preferences and choose to run Welcome Wizard, write down the directory it shows, and now you are ready to use "metadata.db"

Smoke from California fires is clouding my vision so don't blame me for misunderstanding, or once again missing the point.

Norm.
google Translate;Makes my jokes fall flat- Fait mes blagues tombent à plat- Machte meine Witze verpuffen- Eh cumpari ci vo sunari
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4667
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Re: Calibre .OPF file format help needed

Post by Fangbeast »

Fagbeast,
Fagbeast????? That made my day. Nearly rolled off my chair laughing.
I was answering the question of "why multiple descriptions?", and you are right I missed the point.
I miss everything else:):)

@Marc56us is right, don't bother with XML, get your data directly from the Calibre database using mysql.

Only if there is a db to query. Some of my collections don't have one although each book has its own separate opf which is why I need to query a lot of those first and it takes forever to do that on a normal hdd as I don't have any fast ssd's around.
Smoke from California fires is clouding my vision so don't blame me for misunderstanding, or once again missing the point.
Heard about that. One of our upper states has so many fires like yours while the rest of us freeze to death down below, it's weird.

Norm, you have always been thankful, don't stress. I'm just getting older and understanding less. Stress for me is daily life now.
Amateur Radio, D-STAR/VK3HAF
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4667
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Re: Calibre .OPF file format help needed

Post by Fangbeast »

Here with regular expressions (because I like that).
Eek!! That does my head in.
The expressions are simple to understand and therefore to modify
Suuuure, that's what an intelligent person would say:):)

Yay!! An example to read and see if I can follow. I might have around 70,000 books to go through shortly and many don't have a database, just the individual opf's in some of the collections.

My big family project currently gets some of its info from Google or ISFDB and this extra method of yours will cut down on the internet traffic.
:wink:
More horny goat winking. I feel like I am in a black light district (Very evil grin)
Amateur Radio, D-STAR/VK3HAF
normeus
Enthusiast
Enthusiast
Posts: 322
Joined: Fri Apr 20, 2012 8:09 pm
Contact:

Re: Calibre .OPF file format help needed

Post by normeus »

Sorry Fangbeast!

I am a programmer who never learned to type.

Just for that I should rewrite your XML program. I am sure you wouldn't want it with so many typos.

[EDIT]
XML.pb from the help file xml main menu, has a nice sample of processing an XML file without using Lists or Maps. Don't forget to change the input type to #PB_Ascii if that's what the file is.

:lol:
Soory again!
Norm.
google Translate;Makes my jokes fall flat- Fait mes blagues tombent à plat- Machte meine Witze verpuffen- Eh cumpari ci vo sunari
User avatar
Fangbeast
PureBasic Protozoa
PureBasic Protozoa
Posts: 4667
Joined: Fri Apr 25, 2003 3:08 pm
Location: Not Sydney!!! (Bad water, no goats)

Re: Calibre .OPF file format help needed

Post by Fangbeast »

Sorry Fangbeast!
Eeek!! Don't be sorry!! You've helped me on and off during the years and I appreciate it.

I'm just getting older and stupider:):)

If srod is listening, "Shaddap ya varmint!"
Amateur Radio, D-STAR/VK3HAF
Post Reply