Electronic books and their formats: FB2 and FB3 - history, pros, cons and principles of work

In the past, the material we talked about the features of the format DjVu . Today we decided to focus on the FictionBook2 format, better known as FB2, and its “heir” FB3.


/ Flickr / Judit Klein / CC

Appearance format


In the mid-90s, enthusiasts began digitizing Soviet books. They translated and saved literature in a wide variety of formats. One of the first libraries in Runet - the Maxim Moshkov Library - used a formatted text file (TXT).

The choice in his favor was made due to resistance to byte damage and versatility - TXT opens on any operating system. However, it hampered the processing of stored textual information. For example, to go to the thousandth row, we had to process the 999 lines that precede it. Books were also stored in Word documents and PDF — the latter was difficult to convert to other formats, and weak computers opened and displayed PDF documents with delays.

Also for the "storage" of electronic literature used HTML. He simplified indexing, converting to other formats and creating documents (tagging text with tags), but he brought his own shortcomings. One of the most significant was the " vagueness " of the standard: it allowed certain liberties when writing tags. Some of them should have been closed, others (for example, <p>) - it was not necessary to close. The tags themselves could have an arbitrary order of attachment.

And although such work with files was not encouraged - such documents were considered incorrect - the standard required readers to try to display the contents. There were difficulties, because in each application the process of “guessing” was implemented in its own way. At the same time, at the time, devices and applications on the market for reading understood one or two specialized formats. If the book was in the same format, it had to be reformatted to read. To solve all these shortcomings and was called FictionBook2 , or FB2, which took over the initial "combing" of the text and conversion.

Note that the format had the first version - FictionBook1 - however, it was only experimental, it did not last long, is not supported today and does not have backward compatibility. Therefore, under FictionBook most often imply its "follower" - the format FB2.

FB2 has created a group of developers headed by Dmitry Gribov , who is the technical director of the company "LithRes", and Mikhail Matsnev, the creator of the reading room Haali Reader. The basis of the format is XML, which is stricter than HTML, regulates the work with unclosed and nested tags. An XML document is accompanied by a so-called XML schema. An XML schema is a special file that gathers all the tags and describes the rules for their use (consistency, nesting, binding and non-binding, etc.). In FictionBook, the schema is in the FictionBook2.xsd file. An example of an XML schema can be found by reference (it is used by the liters e-book store).

Document structure FB2


The text in the document is stored in special tags - paragraph type elements: <p>, <v> and <subtitle>. There is also an <empty-line> element that has no content and is used to insert gaps.

All documents begin with the root <fictionbook> tag, below which <stylesheet>, <description>, <body> and <binary> may appear.

The <stylesheet> tag contains style sheets to make it easy to convert to other formats. The <binary> contains base64- encoded data that may be needed to render a document.

The <description> element contains all the necessary information about the book: the genre of the work, the list of authors (F. I. O., e-mail address and website on the Internet), title, block with keywords, annotation. It may also contain information about changes to the document and data about the publisher of the book, if it was issued on paper.

This is part of the <description> block in the FictionBook entry for the “Etude in Scarlet Tones” by Arthur Conan Doyle, taken from the Gutenberg Project :

<?xml version="1.0" encoding="iso-8859-1"?> <FictionBook xmlns:xlink="http://www.w3.org/1999/xlink" xmlns="http://www.gribuser.ru/xml/fictionbook/2.0"> <description> <title-info> <genre match="100">detective</genre> <author> <first-name>Arthur</first-name> <middle-name>Conan</middle-name> <last-name>Doyle</last-name> </author> <book-title>A Study in Scarlet</book-title> <annotation> </annotation> <date value="1887-01-01">1887</date> </title-info> </description> 

The key component of a FictionBook document is <body>. It contains the text of the book itself. Throughout the document, there may be several of these tags — additional blocks are used to store footnotes, comments, and notes.

FictionBook also provides several hyperlink tags. They are based on the XLink specification developed by the W3C consortium specifically for creating links between various resources in XML documents.

Advantages of format


The FB2 standard includes only the minimum required set of tags (sufficient for the “decoration” of fiction), which simplifies its processing by readers. Moreover, in the case of direct operation of the reader with the FB format, the user gets the opportunity to customize almost all display parameters.

The strict structure of the document allows you to automate the process of converting from FB to any other format. The same structure gives the opportunity to work with individual elements of documents - to customize filters by book authors, title, genre, etc. For this reason, the FB2 format has gained popularity in Runet, becoming the default standard in Russian electronic libraries and libraries of the CIS countries.

Format flaws


The simplicity of the FB2 format is its strength and disadvantage at the same time. This limits the functionality for complex text layout (for example, notes in the margins). It does not have vector graphics and numbered lists. For this reason, the format is not very suitable for textbooks, reference books and technical literature (even the very name of the format - the fiction book, or “art book”) tells about it.

At the same time, to display the minimum information about the book - the title, author and cover - the program needs to process almost the entire XML document. This is due to the fact that the metadata are located at the beginning of the text, and the images - at the end.

FB3 - format development


In connection with the increased requirements for formatting text books (and to level some of the shortcomings of FB2), Mushroom began work on the FB3 format. Later development stopped, but in 2014 it was resumed .

According to the authors, they studied the real needs when publishing technical literature, looked at textbooks, reference books, manuals, and outlined a more specific set of tags that would allow to display any book.

In the new specification, the FictionBook format is a zip-archive in which metadata, images and text are stored as separate files. The requirements for the format of the zip file and the agreement on its organization are specified in the ECMA-376 standard, which defines Open XML.

A number of improvements related to formatting (discharging, underlining) were made and a new object was added - “block” - which decorates an arbitrary fragment of the book in the form of a quadrangle and is able to be embedded in the text with flow. Added support for numbered and bulleted lists.

FB3 is distributed under a free license and is open source, so all utilities are available to publishers and users: converters, cloud editors, readers. The current format version , reader and editor can be found in the project repository on GitHub.

In general, FictionBook3 is still less common than its elder brother, but books in this format are already offered by several digital libraries. And in “liters” a couple of years ago they announced their intention to transfer their entire catalog to a new format. Some readers already support all the necessary FB3 functionality. For example, all current ONYX reader models, for example, Darwin 3 or Cleopatra 3 , are able to work out of the box with this format.


/ ONYX BOOX Cleopatra 3

A wider distribution of FictionBook3 will allow you to create an ecosystem oriented towards full and effective text processing on any device with limited resources: black and white or a small display, low memory capacity, etc. According to the developers, once the book is laid out, it will be as convenient as possible in any environment.



PS We bring to your attention several reviews of ONYX BOOX readers:

Source: https://habr.com/ru/post/411755/


All Articles