Reflections on some early E-book experiments

It’s nearly 2 years since I got a Kindle, and back then I quickly started filling it with as much free stuff as possible from Amazon. I even got about halfway through Middlemarch, before I started exploring the other possibilities of the device.

First, I discovered the Amazon email service for document delivery:

You and your approved contacts can e-mail personal documents to your Kindle Keyboard through your Send-to-Kindle e-mail address. You must ensure that:

  • You have approved the sender’s e-mail address.
  • You gave the sender your Send-to-Kindle e-mail address.
  • Your document is a supported file type.

That proved pretty successful and is a free service too (as long as you use wifi, not 3G, to pick up the document). At a Repositories Support Project event in Sheffield I even suggested that a Download To My Kindle button might be a useful addition to an institutional repository. The germ of an idea!

I next discovered how to use Calibre to automate generating E-books/E-magazines from various content sources, such as online newspapers, in both Kindle’s MOBI format and in the open EPUB format (supported on most other tablet and e-book platforms).

Then I thought about making a ‘proper’ book. To save having to write one myself, I thought I’d create an anthology of short fiction. I downloaded ECUB and started harvesting some classic short stories from Project Gutenberg. The HTML versions of the stories on Gutenberg needed some work to normalise the markup across twenty files – nothing particularly difficult, though I did come to regret deciding to standardise the use of single- and double-quotation marks. The result was a little e- Short Story Anthology – potentially a completely public domain book that I could share or even try to sell online. (In fact one story that I used, and the cover image, are not public domain, but I’d run out of steam by then, and just wanted to shift my proof of concept.)

Playing with ECUB it was easy to see how an EPUB E-book was structured. Essentially it is little different from a simple website.  EPUB 2.0  was defined by three open standard specifications, the Open Publication Structure (OPS), Open Packaging Format (OPF) and Open Container Format (OCF), but at the heart of the E-book is a familiar group of HTML, CSS and image files – tightly bound together by the OCF specified structure,the OPS content specification, and the OPF XML, and zipped into a single file. (EPUB 3 introduces some differences, but is essentially the same.)

I also noticed that ECUB could easily create a Table Of Contents page automatically – just like the web CGI scripts we still sometimes write – and also that it was easy to use hyperlinks between and within the HTML files in the package. This kind of added editorial value, by the way, is noticeably absent from many of the free E-books available on Amazon and elsewhere.

In my explorations, I also discovered that, although Calibre could convert my EPUB to an acceptable Kindle/MOBI format, some of the finer points of formatting that I had implemented with CSS in EPUB are not supported by the Kindle. For example, I’d used CSS rules to render quotation marks:

q:before { content: “\2018” ; }
q:after { content: “\2019” ; }
q q:before { content: “\201c” ; }
q q:after { content: “\201d” ; }

The reasoning behind this was the idea that it would make it easy to switch between the two common typesetting conventions – single outer/double inner quotation marks and double outer/single inner  quotation marks. But to get Kindle to render this, I’d have to revisit the original HTML files and ensure all such niceties were rendered in text or markup only. I may do that one day.

Turning back to EPrints (and also the theme of that presentation for RSP), we know we can get lists of items in the repository in all sorts of formats – RSS, XML, HTML, Endnote, etc. So it seems we have all the building blocks we need to write a script that takes a list of items in a repository, retrieves them, converts them (if necessary and possible), wraps them up according to the EPUB standard, and makes them available for download, or even emails them directly to your tablet or reader. Anthologizr in a nutshell, perhaps.


Use case #1

Our simplest use case:

We are preparing a study-pack of reading materials related to Soviet Literature, so we go to UCL Discovery repository and use the EPrints advanced search to search for articles with the words “soviet literature” where full-text is available.

The repository returns us a screen of results, and also gives us the wherewithal to export all these results as citations, in several popular formats.

But instead of a list of citations, what we really want to do is export all the items (full-text included) as an E-book, which we can send to our iPad, Kindle or other E-book reader, by the simplest means possible.

That’s the basics of Anthologizr.

However there are a number of refinements necessary to make this truly useful, chief among them that the user needs to be able to select/deselect items for inclusion in the E-book.

One might reasonably object that not everything one wants is likely to be in any one institutional repository, but there are many meta-repository services, among them MIMAS Repository Search and CoRE, which could deploy the Anthologizr approach.

Alternatively, one could envisage a growing, collaborative repository, either institutional or national, where the kinds of articles and chapters commonly given out in study packs are gradually accumulated, collaboratively, with OA licences and in E-book friendly formats.

Finally (for now) one might also object, that not everything one might want or need is Open Access. That may just be a matter of time, or conscience.

A long time ago…

Xeroxed study packs

Xeroxed study packs from the early 1990s

Back in the 1980s, there were no iPads or Kindles, nor E-books, nor even PDFs. But in most Universities there were lots of photocopiers, so course tutors would prepare photocopied booklets of key set texts, staple them together between brightly-coloured card, and issue them to students at the start of term (sometimes for a small fee, to cover the cost of paper and toner). As long as everyone stayed the right side of the CLA, it was a relatively cheap and convenient way to make sure students had access to core texts, without having to buy or locate a large number books or journals themselves.

It is 2012 and we now have iPads and affordable E-book readers. Yet still piles of photocopied course booklets are a common sight on campuses. The contents of the booklets range from anthologies of fiction and criticism, in the humanities, to case studies and research, in the social and natural sciences.

What if tutors had an easy way to create such booklets and packages as E-books, rather than physical objects? Some courses do issue PDFs of scanned texts, via email or VLEs, but these are typically sent as individual articles, and often only scanned images of the text, offering no support for searching or indexing the article.

The open EPUB standard, and the increasing standardisation of E-books, offer a natural, and highly functional, successor to the scanned booklet, and to the ‘dumb’ images in scanned PDFs. With the growing availability of full-text Open Access and public domain items, in Institutional Repositories and other online systems, there is an opportunity to use such items as the raw materials of course study packs, providing greater accessibility and usability of texts, as well as promoting use of OA materials

This is the essence of the Anthologizr project: to develop a demonstrator repository system in which users can take an arbitrary selection items (text or images), and export them – ‘one-click’ style – as a viable, usable E-book anthology in EPUB format, for individual use, or for sharing with students by email or in VLEs.

There will be lots of challenges on the way, but this is where it begins.