Calibre .OPF file format help needed
- Fangbeast
- PureBasic Protozoa
- Posts: 4747
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Calibre .OPF file format help needed
Just had a look at the Calibre ebook reader .OPF file description format and it's non standard xml fields leave me a little puzzled (as in: I know less than nothing) )how to deal with it to extract data.
Has anyone had a go at it?
Even the smallest opf file is a bit 'windy' so I didn't want to post one. Yet. Unless someone asks:):)
Has anyone had a go at it?
Even the smallest opf file is a bit 'windy' so I didn't want to post one. Yet. Unless someone asks:):)
Amateur Radio, D-STAR/VK3HAF
Re: Calibre .OPF file format help needed
couldn't you use the PDF library ? I think that somewhere on this forum there is a PBpdf library
Re: Calibre .OPF file format help needed
What data are you wishing to extract?
- Fangbeast
- PureBasic Protozoa
- Posts: 4747
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Re: Calibre .OPF file format help needed
Didn't know you could do that? Never used the PBpdf lib before.
Amateur Radio, D-STAR/VK3HAF
- Fangbeast
- PureBasic Protozoa
- Posts: 4747
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Re: Calibre .OPF file format help needed
Managed to find a smaller opf in my own collection. Title, creator, date, description.
publisher
ISBN
language
series
But it seems as if the description field is repeated twice or even 3 times (that I can see)
Then it has multiple 'subject' fields which are actually the category/genre
<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf" unique-identifier="uuid_id" version="2.0">
<metadata xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:opf="http://www.idpf.org/2007/opf">
<dc:identifier opf:scheme="calibre" id="calibre_id">231</dc:identifier>
<dc:identifier opf:scheme="uuid" id="uuid_id">9ae160cc-033f-4d59-aa6c-ae9e5225bdf0</dc:identifier>
<dc:title>Skylark of Space</dc:title>
<dc:creator opf:file-as="Smith, E. E. 'Doc'" opf:role="aut">E. E. 'Doc' Smith</dc:creator>
<dc:contributor opf:file-as="calibre" opf:role="bkp">calibre (5.20.0) [https://calibre-ebook.com]</dc:contributor>
<dc:date>2011-09-29T22:52:37+00:00</dc:date>
<dc:description><div><div><h3>Product Description</h3><p>This is the first of the famous Skylark novels...a voyage to the ends of the universe. </p>
</div>
<p class="description">SUMMARY:<br>Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton's fiancée and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author's preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.</p></div>
<div><div><h3>Product Description</h3><p>This is the first of the famous Skylark novels...a voyage to the ends of the universe. </p>
</div>
<p class="description">SUMMARY:<br>Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton's fiancée and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author's preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.</p></div>
Brilliant government scientist Richard Seaton discovers a remarkable faster-than-light fuel that will power his interstellar spaceship, The Skylark. His ruthless rival, Marc DuQuesne, and the sinister World Steel Corporation will do anything to get their hands on the fuel. They kidnap Seaton&#39;s fiancäe and friends, unleashing a furious pursuit and igniting a burning desire for revenge that will propel The Skylark across the galaxy and back. ø The Skylark of Space is the first and one of the best space operas ever written. Breezy dialogue, romantic intrigue, fallible heroes, and complicated villains infuse humanity and believability into a conflict of galactic proportions. The Amazing Stories publication of The Skylark of Space in 1928 heralded the debut of a major new voice in American pulp science fiction and ushered in its golden age. Legions of interstellar epics have been written since that time, but none can match the wonder, dazzle, and sheer fun of the original. This commemorative edition features the author&#39;s preferred version of the story, the original illustrations by O. G. Estes Jr., and a new introduction by acclaimed science fiction writer Vernor Vinge.
"With the exception of the works of H. G. Wells, possibly those of Jules Verne -- and almost no other writer -- it has inspired more imitators and done more to change the nature of all the science fiction written after it than almost any other single work." -- Frederik Pohl Finding that his government laboratory coworkers do not believe his discovery of a revolutionary power source that will enable interstellar flight, Dr. Richard Seaton acquires rights to his discovery from the government and commercializes it with the aid of his friend, millionaire inventor Martin Crane. When a former colleague tries to steal the invention, not only the future of Dr. Seaton and his allies, but ultimately the entire world hangs in the balance! The first of the great "space opera" science fiction novels, The Skylark of Space remains a thrilling tale more than 80 years after its creation.</dc:description>
<dc:publisher>Berkley</dc:publisher>
<dc:identifier opf:scheme="GUID">{0F66B19B-DFDA-4596-AD12-68FDF06D9AA7}</dc:identifier>
<dc:identifier opf:scheme="ISBN">9780425046401</dc:identifier>
<dc:identifier opf:scheme="GOOGLE">boBY9bNVQwAC</dc:identifier>
<dc:identifier opf:scheme="URI">http|//www.gutenberg.org/ebooks/20869</dc:identifier>
<dc:language>eng</dc:language>
<dc:subject>Science Fiction</dc:subject>
<dc:subject>Science Fiction/Fantasy</dc:subject>
<dc:subject>Space ships -- Fiction</dc:subject>
<dc:subject>Space flight -- Fiction</dc:subject>
<dc:subject>Action & Adventure</dc:subject>
<dc:subject>Fiction</dc:subject>
<dc:subject>General</dc:subject>
<dc:subject>Space Opera</dc:subject>
<meta name="calibre:author_link_map" content="{"E. E. 'Doc' Smith": ""}"/>
<meta name="calibre:series" content="Skylark"/>
<meta name="calibre:series_index" content="1"/>
<meta name="calibre:rating" content="10"/>
<meta name="calibre:timestamp" content="2021-06-05T06:47:53.206469+00:00"/>
<meta name="calibre:title_sort" content="Skylark of Space"/>
</metadata>
<guide>
<reference type="cover" title="Cover" href="cover.jpg"/>
</guide>
</package>
Amateur Radio, D-STAR/VK3HAF
Re: Calibre .OPF file format help needed
Since you are able to edit this data in Calibre, I think someone just copied the SUMMARY 2x.
If you load your book in Calibre, not the reader, but the full version. You'll see where you can change the metadata ("Edit metadata" button). There you can download the description from Amazon, google, etc.. you can automate this in Calibre from the command line if you need to do a bunch. just be careful not to be greedy and download multiple descriptions.
Norm.
If you load your book in Calibre, not the reader, but the full version. You'll see where you can change the metadata ("Edit metadata" button). There you can download the description from Amazon, google, etc.. you can automate this in Calibre from the command line if you need to do a bunch. just be careful not to be greedy and download multiple descriptions.
Norm.
google Translate;Makes my jokes fall flat- Fait mes blagues tombent à plat- Machte meine Witze verpuffen- Eh cumpari ci vo sunari
- Fangbeast
- PureBasic Protozoa
- Posts: 4747
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Re: Calibre .OPF file format help needed
I don't see where you are going with this. All the data I need is already in the opf file and I just need a way to extract it.
Don't want to try 40,000 downloads from the internet just to import the books into my database:):)
I got a bit lost because I was going to 'try' to put a structure together and use one of the extract xml functions but the fields have colons in them. Drat
Amateur Radio, D-STAR/VK3HAF
Re: Calibre .OPF file format help needed
Replace them before with an underscore
- Fangbeast
- PureBasic Protozoa
- Posts: 4747
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Re: Calibre .OPF file format help needed
Stop that horny goat winking, that's idle's job:):)
When, where, how???
Given the below code to load a 'normal' xml without colons, I don't know where, how to do this, at what point.
I have no idea what I am doing here.
Code: Select all
[code]Structure opfitem
title.s
subject.s
EndStructure
Structure opfbook
List opflist.opfitem()
EndStructure
ImportFilename = OpenFileRequester("Choose a file to import", "", "XML|*.xml", 0)
If ImportFilename
XML = LoadXML(#PB_Any, ImportFilename)
If XML
If XMLStatus(XML) = #PB_XML_Success
*MainNode = MainXMLNode(XML)
ExtractXMLList(*MainNode, @opflist)
ForEach opfbook\opflist()
Debug opfbook\opflist()\title
Debug opfbook\opflist()\subject
Debug "------"
Next
EndIf
EndIf
EndIf
Amateur Radio, D-STAR/VK3HAF
Re: Calibre .OPF file format help needed
Hi Fangbeast ,
I don't understand XML, so I do it in plain text.
Here with regular expressions (because I like that).
The expressions are simple to understand and therefore to modify
You just have to 'escape' the characters like a string in C
But it can also be done with text functions (but you have to loop if there are several identical fields (ex: author(s))
Here is a small functional example, using your sample saved in a file.
Feel free to adapt, for example by putting the regexes in a structure to give them a title or by entering the data in your database at each iteration.
The creator field can be broken down into several fields. It depends if this example is particular or general (obviously this file is not strictly in Epub format)
I put the description field at the end for general readability. Anyway regular expressions read the whole file (in memory) each time. This is not a speed problem.
Enjoy
Edit:
I installed Calibre to test and realized that it stores all the metadata in a metadata.db file which is a simple SQLite file.
So if you just want to use your e-book data already referenced in Calibre, it may be easier to use PB's SQLite functions on this .db file.
Suggest to read/edit SQLite database if you don't have any: SQLite Studio (freeware, small, fast, portable, easy)
I don't understand XML, so I do it in plain text.
Here with regular expressions (because I like that).
The expressions are simple to understand and therefore to modify
Code: Select all
- Beginning XML tag
- Part to keep: (.+?)
- End XML tag
But it can also be done with text functions (but you have to loop if there are several identical fields (ex: author(s))
Here is a small functional example, using your sample saved in a file.
Code: Select all
EnableExplicit
Enumeration
#hFile
EndEnumeration
NewList RegEx$()
AddElement(Regex$()) : Regex$() = ~"<dc:title>(.+?)</dc:title>"
AddElement(Regex$()) : Regex$() = ~"<dc:creator opf:file-as=\"(.+?)</dc:creator>"
AddElement(Regex$()) : Regex$() = ~"<dc:date>(.+?)</dc:date>"
AddElement(Regex$()) : Regex$() = ~"<dc:publisher>(.+?)</dc:publisher>"
AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"ISBN\">(.+?)</dc:identifier>"
AddElement(Regex$()) : Regex$() = ~"<dc:identifier\\hopf:scheme=\"GOOGLE\">(.+?)</dc:identifier>"
AddElement(Regex$()) : Regex$() = ~"<dc:language>(.+?)</dc:language>"
AddElement(Regex$()) : Regex$() = ~"<meta\\hname=\"calibre:series\"\\hcontent=\"(.+?)\"/>"
AddElement(Regex$()) : Regex$() = ~"<dc:description>(.+?)</dc:description>"
; Read OPF file
#File_Name = "TestFile.opf"
If Not OpenFile(0, #File_Name)
Debug #File_Name + " Can't be found or open"
End
EndIf
Debug "Reading: " + #File_Name
Define Txt$
While Not Eof(#hFile)
Txt$ = ReadString(#hFile, #PB_Ascii | #PB_File_IgnoreEOL)
Wend
CloseFile(#hFile)
ForEach Regex$()
If Not CreateRegularExpression(0, Regex$(), #PB_RegularExpression_DotAll)
Debug "Bad RegEx (" + RegEx$() + ")"
Break
Else
Debug "--- Search for: " + RegEx$()
Debug ""
If ExamineRegularExpression(0, Txt$)
While NextRegularExpressionMatch(0)
Debug " " + RegularExpressionGroup(0, 1)
Wend
EndIf
FreeRegularExpression(0)
Debug ""
EndIf
Next
Debug "Done"
End
The creator field can be broken down into several fields. It depends if this example is particular or general (obviously this file is not strictly in Epub format)
I put the description field at the end for general readability. Anyway regular expressions read the whole file (in memory) each time. This is not a speed problem.
Enjoy
Edit:
I installed Calibre to test and realized that it stores all the metadata in a metadata.db file which is a simple SQLite file.
So if you just want to use your e-book data already referenced in Calibre, it may be easier to use PB's SQLite functions on this .db file.
Suggest to read/edit SQLite database if you don't have any: SQLite Studio (freeware, small, fast, portable, easy)
Re: Calibre .OPF file format help needed
Fagbeast,
I was answering the question of "why multiple descriptions?", and you are right I missed the point.
@Marc56us is right, don't bother with XML, get your data directly from the Calibre database using mysql.
If you forgot where you told Calibre to save your books, click on the down arrow next to Preferences and choose to run Welcome Wizard, write down the directory it shows, and now you are ready to use "metadata.db"
Smoke from California fires is clouding my vision so don't blame me for misunderstanding, or once again missing the point.
Norm.
I was answering the question of "why multiple descriptions?", and you are right I missed the point.
@Marc56us is right, don't bother with XML, get your data directly from the Calibre database using mysql.
If you forgot where you told Calibre to save your books, click on the down arrow next to Preferences and choose to run Welcome Wizard, write down the directory it shows, and now you are ready to use "metadata.db"
Smoke from California fires is clouding my vision so don't blame me for misunderstanding, or once again missing the point.
Norm.
google Translate;Makes my jokes fall flat- Fait mes blagues tombent à plat- Machte meine Witze verpuffen- Eh cumpari ci vo sunari
- Fangbeast
- PureBasic Protozoa
- Posts: 4747
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Re: Calibre .OPF file format help needed
Fagbeast????? That made my day. Nearly rolled off my chair laughing.Fagbeast,
I miss everything else:):)I was answering the question of "why multiple descriptions?", and you are right I missed the point.
@Marc56us is right, don't bother with XML, get your data directly from the Calibre database using mysql.
Only if there is a db to query. Some of my collections don't have one although each book has its own separate opf which is why I need to query a lot of those first and it takes forever to do that on a normal hdd as I don't have any fast ssd's around.
Heard about that. One of our upper states has so many fires like yours while the rest of us freeze to death down below, it's weird.Smoke from California fires is clouding my vision so don't blame me for misunderstanding, or once again missing the point.
Norm, you have always been thankful, don't stress. I'm just getting older and understanding less. Stress for me is daily life now.
Amateur Radio, D-STAR/VK3HAF
- Fangbeast
- PureBasic Protozoa
- Posts: 4747
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Re: Calibre .OPF file format help needed
Eek!! That does my head in.Here with regular expressions (because I like that).
Suuuure, that's what an intelligent person would say:):)The expressions are simple to understand and therefore to modify
Yay!! An example to read and see if I can follow. I might have around 70,000 books to go through shortly and many don't have a database, just the individual opf's in some of the collections.
My big family project currently gets some of its info from Google or ISFDB and this extra method of yours will cut down on the internet traffic.
More horny goat winking. I feel like I am in a black light district (Very evil grin)
Amateur Radio, D-STAR/VK3HAF
Re: Calibre .OPF file format help needed
Sorry Fangbeast!
I am a programmer who never learned to type.
Just for that I should rewrite your XML program. I am sure you wouldn't want it with so many typos.
[EDIT]
XML.pb from the help file xml main menu, has a nice sample of processing an XML file without using Lists or Maps. Don't forget to change the input type to #PB_Ascii if that's what the file is.
Soory again!
Norm.
I am a programmer who never learned to type.
Just for that I should rewrite your XML program. I am sure you wouldn't want it with so many typos.
[EDIT]
XML.pb from the help file xml main menu, has a nice sample of processing an XML file without using Lists or Maps. Don't forget to change the input type to #PB_Ascii if that's what the file is.
Soory again!
Norm.
google Translate;Makes my jokes fall flat- Fait mes blagues tombent à plat- Machte meine Witze verpuffen- Eh cumpari ci vo sunari
- Fangbeast
- PureBasic Protozoa
- Posts: 4747
- Joined: Fri Apr 25, 2003 3:08 pm
- Location: Not Sydney!!! (Bad water, no goats)
Re: Calibre .OPF file format help needed
Eeek!! Don't be sorry!! You've helped me on and off during the years and I appreciate it.Sorry Fangbeast!
I'm just getting older and stupider:):)
If srod is listening, "Shaddap ya varmint!"
Amateur Radio, D-STAR/VK3HAF