Not supported by all e-readers. Verdict HTML is a great format and ideal for ebooks in terms of file specification, but struggles with compatibility due to the tendency for eReaders to assume one file per book. Advantages Preserves the print layout of a book, whether it is practical for digital screens or not. Well-supported format on most eReaders and devices. Disadvantages Intended for print, PDFs are not resolution independent and do not adapt well to screen sizes.
A nearly dead-end format in terms of conversion. Manual copying and pasting of underlying text may be possible if the text has been embedded but otherwise you cannot convert from PDF to another format. Can render very large files, depending on image compression options used at the time of creation.
Everything from image-type support to font conflicts can also hinder whether content is rendered at all. Verdict The PDF format is, in a sense, better suited for delivering style rather than content. Advantages Better-than-jpeg compression for facsimiles of the printed page, embeddable text; functionally, a more efficient, open, and simpler replacement for PDF. Disadvantages Not widely supported on eReaders.
Resolution specific. Verdict A good format with benefits in file size and efficiency, but due to limited support, it may not be practical for daily use. Comic Book Archives As the name implies, a Comic Book Archive is a format intended for digital storage and consumption of comic books and graphic novels.
Advantages Open format using only existing technology. No bloat; a very clean format. Mostly indifferent to image type although your ereader may not be , so image compression levels are adjustable. Disadvantages Resolution specific. Verdict As a format for storage and consumption, this is the ideal way to digitally archive comics and scanned facsimiles. Overview The eBook format conundrum boils down to this: there are source formats, there are formats for consumption often dictated by what your e-reading device supports , and there's what a vendor or distributor is offering you.
Unfortunately, these are not always in alignment with one another. Topics Education. About the author. He has worked in the film and computing industry, often at the same time. He is one of the maintainers of the Slackware-based multimedia production project Slackermedia.
More about me. Recommended reading 12 inspiring examples of open source in education this year. Why my public library chooses Linux and open source. Building an open source community health analytics platform. Learn everything about computers with this Raspberry Pi kit. Essential open source tools for an academic organization. How my team built an open source learning experience platform. Ben Cotton on 30 Nov Permalink. Great article, Seth! It occurs to me that plain text is the truest to the original form, in that its disadvantages are the same as for dead-tree books.
Seth Kenlon on 30 Nov Permalink. As long as the plain text is consistent, it can be parsed and up-converted to rst or markdown, and then to epub, which brings the otherwise humble format into modern e-reading convenience standards. The problem is when you get fancy stylized plain text with surprising indentaitons and ascii art and stuff like that, Then it's a matter of desperately trying to normalise the text into some parse-able format, but in the end it's basically manual conversion.
Or you just live with plain text, funky line breaks and all. I guess the moral of that story is that we as content creators should NEVER assume anything about how people will be consuming the deliverable.
For every 7 people who will use an e-reader, there will be those 3 whacko's using a phone, a web browser, and a TI Jason van Gumster on 01 Dec Permalink. Pandoc tends to be my and, from the look of it, Seth's go-to app for converting between document formats. As a secondary option, you may also want to try Calibre.
If you can't get up and running with Pandoc, that might be a decent fallback. Seth Kenlon on 01 Dec Permalink. Yep, I agree with Jason. Pandoc makes it pleasantly simple to convert HTML to epub. It's easiest if all your html is in one file, but that is not always the case, so you might have to point pandoc to each file I use this kind of command fairly regularly; I pull a directory off of the web and convert the pages to ebook so that I can read it whilst offline.
I'm reading the GNU Gawk manual in exactly this way. Hope that helps. Thanks, pandoc works really well!! I appreciate your quick replies! While most eBook readers do not directly read files in this format it is used to create the source eBook files which are then translated into the native format for the Reader.
The native format encapsulates the various files specified in the manifest. The files themselves may or may not continue to exist in the encapsulating file. The Open eBook specification standard, comprising of Publication Structure 1. The OPF file includes the list of files in the build and the metadata used to define the file contents. The HTML files sometimes have a. They are generally expected to be "well formed" meaning that they include the optional closing tags.
Here is a sample Package File from an actual eBook. Page Contents Page Contents General Standard format used to publish electronic texts. Dictionary dc:Description dc:Date YYYY-MM-DD, month and day are optional dc:Rights x-Metadata other proprietary metadata info Manifest 1 List of all files in the publication Fallback item if not a core media type Order not important Spine 1 Primary reading order of the document First itemref link is the file shown on opening Can include only text items Tours multiple Alternative reading orders Reading systems are not required to implement tours Guide Lists key structural components e.
Images: Related links to external sites from Bing. Related Studies. The required manifest must provide a list of all the files that are part of the publication e. Content Documents, style sheets, image files, any embedded font files, any included schemas.
All DTDs and external entities including, but not limited to, external DTD references referenced by XML documents listed in the package manifest are considered part of the publication and thus must also be listed in the manifest. As an exception to that rule, DTDs of certain core document types do not need to be included. The list of DTDs that do need to be included in the manifest is:. The manifest element must contain one or more item elements. Each item describes a document, an image file, a style sheet, or other component that is considered part of the publication.
The manifest must not include item elements referring to the file or files that make up the OPF Package Document. The order of item elements in the manifest is not significant. The URIs in href attributes of item elements in the manifest must not use fragment identifiers.
A single resource href must not be listed in the manifest more than once. For a publication that uses only those media types, the manifest merely lists the publication's component files directly. However, content providers may construct publications that reference items of additional media types. In order for such publications to be read by all conforming Reading Systems, content providers must provide alternative "fallback" items for each such item.
These are as follows:. For the purpose of fallback specification, schema definition files with media types of supported schema languages should be considered as Core Media Types, thus fallback information must not be provided for these files.
These schema languages are:. In this case, its fallback must be identified with a fallback attribute pointing to another item. An item identifies a fallback item using its fallback attribute, which must specify the ID of the item element that identifies the fallback. Items referenced from fallback attributes may each specify a fallback attribute in turn, forming a multi-level fallback chain. If a fallback attribute points to an item that also has a fallback attribute, a Reading System must continue down the fallback chain until it reaches a reference to an item with a media type it can display or as specified below, it reaches an item with a fallback-style attribute.
A Reading System may continue further, and may display any item from the chain. In the absence of element-specific i. Fallback chains must terminate; circular references are not permitted.
Nevertheless, Reading Systems should not fail catastrophically if they encounter such a loop. The namespace of an Out-Of-Line XML Island item must be specified with the required-namespace attribute and its fallback must be identified with either a fallback attribute pointing to another item or by providing CSS styling that can be used to render the island via the fallback-style attribute. If the fallback-style attribute is specified, a Reading System may choose to process the Out-Of-Line XML Island even though it can not natively process the vocabulary or Extended Modules used in the island using the stylesheet specified by the fallback-style attribute's value which must contain a reference to the id of the item containing an href to the stylesheet desired for the island.
In this case, and with non-Preferred Vocabulary islands utilizing Extended Modules, the required-modules attribute must be present along with the required-namespace attribute.
The attribute value for required-modules must be a comma-separated list containing the name s of the Extended Modules used in the Out-of-Line XML Island. The names of the modules are not case-sensitive, unless specifically defined otherwise in the XML vocabulary specification.
Spaces in module names must be replaced with "-" for listing in the required-modules attribute value. Note that listing the names of non-Extended Modules in a required-modules attribute value is also allowed; such modules are always considered to be supported if the XML vocabulary is supported. This can be useful both for clarity and in the case where there is a possibility that some modules could become optional in the later revisions of the specification e.
It is allowed, and sometimes useful, to provide a required-modules attribute on an item specifying a non-Preferred Vocabulary Out-Of-Line XML Island — either for clarity or to specify Extended Modules needed from the non-Preferred Vocabulary.
However, fallback information must be provided for Reading Systems that do not have such native processing ability. In the above example when processing item1 , a Reading System could choose to render item1 natively, item2 natively, item2 with only styling from css1 , item2.
When processing item4 , a Reading System could choose to render item4 natively or item4 with only styling from css1. Inclusion of the required-namespace attribute is not required in item elements referring to XML documents authored in Preferred Vocabularies unless Extended Modules are used, in which case both required-namespace and required-modules attributes must be provided. Following manifest , there must be one and only one spine element, which contains one or more itemref elements.
The order of the itemref elements organizes the associated OPS Content Documents into the linear reading order of the publication. When a document with a media type not from this list or a document whose fallback chain doesn't include a document with a media type from this list is referenced in spine , Reading Systems must not include it as part of the spine.
It is valid for this item to appear in the spine because the fallback chain includes in this case terminates with an OPS Content Document. In addition, a specific spine item from the perspective of its id attribute value in manifest must not appear more than once in spine. Should a Reading System encounter, by such reference, an OPS Content Document not listed in spine as required in this specification, the Reading System should add it to spine the placement at the discretion of the Reading System and assign the value of the linear attribute to no see next.
It is important that the publication author include some kind of internal reference, such as a hypertext link, to any OPS Content Document that is declared to be auxiliary; it is recommended that references be added to NCX for all auxiliary content. At least one itemref in spine must be declared primary. Specifying whether an OPS Content Document is primary or auxiliary is useful for Reading Systems which opt to present auxiliary content differently than primary content.
For example, a Reading System might opt to render auxiliary content in a popup window apart from the main window which presents the primary content. For an example of the types of content that may be considered auxiliary, refer to the example below and the subsequent discussion.
Reading Systems are not required to differentiate between primary and auxiliary content, and for the requirements and recommendations given in this section may consider all OPS Content Documents in spine to be primary, regardless of the value of the linear attribute. Reading Systems are to use the ordered itemref information in spine to present the publication during reading. Reading Systems must recognize the first primary OPS Content Document in spine to be the beginning of the main reading order of the publication.
Successive primary OPS Content Documents form the remainder of the main reading order in the same order given in spine. Reading Systems may use "next-page" style functionality when moving from one primary OPS Content Document to the next primary one in spine. The spine element must include the toc attribute, whose value is the the id attribute value of the required NCX document declared in manifest see Section 2.
Example illustrating spine and the optional linear attribute:. Three of the four are "answer keys," and the fourth is a note of some sort; all four are auxiliary to the main flow of the book and may be viewed separately from the main flow.
Reading Systems which recognize and render auxiliary content separate from primary content will set the main reading order to be the four primary OPS Content Documents: intro , c1 , c2 and c3. The auxiliary content documents will be rendered by such Reading Systems, upon activation such as through a hypertext link or entry in NCX , in some manner distinct from the main reading order.
It is important that the publication author provide the necessary references to the auxiliary content documents, otherwise this content might not be reachable in some auxiliary-aware Reading Systems.
This is especially useful for Reading Systems which provide print output, where it is important that all the information in the OPS Content Documents be printed in an author-determined linear order.
A Reading System may , at its discretion, provide both rendering options to the user. The NCX is a portion Section 8 of this comprehensive multimedia standard. Some optional elements and metadata items are not needed to implement the NCX for this specification. All "exceptions" are described in Section 2.
The NCX is similar to a table of contents in that it enables the reader to jump directly to any of the major structural elements of the document, i. It can be visualized as a collapsible tree familiar to PC users. Its development was motivated by the need to provide quick access to the main structural elements of a document without the need to parse the entire documents. Other elements such as pages, footnotes, figures, tables, etc.
It is important to emphasize that these navigation features are intended as a convenience for users who want them, and not a burden to those who do not. The alternative guide to the book may be provided for those users not requiring the navigation features of the NCX. A Reading System should have the ability to, at user selection, provide access to the NCX navMap in a fashion that allows the user to activate the links provided in the navMap , thus relocating the application's current reading position to the destination described by the selected NCX navPoint.
Reading System implementors should be aware that in a forthcoming major revision of the EPUB specification, it likely will become a compliance criteria for Reading Systems to support the NCX navMap , pageList and navList as described above.
The NCX-referencing item must not contain any fallback information required-namespace , fallback or fallback-style attributes. The version and xmlns attributes on the ncx element must be explicitly specified in the document instance, using values drawn from the above-named DTD. One or several navList 's may be included to allow navigation to other arbitrary constructs in the content see the below informative example.
This difference causes the following exceptions to be noted from Section 8 in that standard:. XML Islands may be referenced from the spine. In the event that a Reading System cannot display the XML Island correctly, then the standard fallback methodology defined in the Open Publication Structure must be used. In short, the Reading System must display the chosen fallback for an XML Island in the event that the island itself cannot be displayed. Much as a tour-guide might assemble points of interest into a set of sightseers' tours, a content provider could assemble selected parts of a publication into a set of tours to enable convenient navigation.
An OPS Package Document may , but need not, contain one tours element, which in turn contains one or more tour elements. Each tour must have a title attribute, intended for presentation to the user. Reading Systems may use tours to provide various access sequences to parts of the publication, such as selective views for various reading purposes, reader expertise levels, etc. Because Reading Systems are not required to implement tour support, content providers should also provide other means of accessing content referenced from tours.
Each tour element contains one or more site elements, each of which must have an href attribute and a title attribute. The href attribute must refer to an OPS Content Document included in the manifest , and may include a fragment identifier as defined in section 4.
0コメント