Full Functionality of an Audiobook: From an MP3 File to True Accessibility

Written by
Monika Zarczuk-Engelsma
Posted on
Oct 21, 2025
Category
Accessibility
For many readers, an audiobook is simply a recording – for a person with visual impairments “accessibility” means much more than that. A fully accessible audiobook allows equal control over its content: it can be navigated with precision, you can find quotes, skip chapters, use bookmarks, regulate its speed without compromising the quality and, when necessary – it is synchronised with text and includes descriptions of non-narrative elements. In other words: accessibility is not only “spoken text” but also full functionality, comparable with a well-prepared text file.
Who Reads Audiobooks
Just a dozen or so years ago, audiobooks where thought of mainly as spoken books published by libraries for blind people. In time, they started to appear in commercial bookstores and because of the development of smatphones and streaming platforms they became one of the most popular forms of consuming literature. Today, everyone listens to audiobooks – on their way to work, while running or cooking. It’s not a niche anymore, it’s a fully-fledged segment of the publishing market and it keeps growing. At the same time, more and more people are aware that an audiobook should be accessible for everyone both on a technical and functional level.
First audiobooks were created in the 1930s. Libraries in the United States and Great Britain recorded books on phonograph records so that blind people could access literature. One book would sometimes take up to several dozens of records! Today, in the digital age, it is possible to fit whole libraries in one’s pocket and audiobooks are listened to on a great scale.
According to Guinness World Records, the longest audiobook is Shree Haricharitramrut Sagar which is 146444 minutes and 52 seconds long (2240 hours and 44 minutes, more than 101 days of listening). The record was made by Gyanjivandasji Swami and Ishwarcharandasji Swami from India. They intended to disseminate the biography of Lord Swaminarayan. This is an exceptional case in the history of audiobooks as this work is not a typical literary book but a spiritual religious text. The recording is available in audio form and is an interesting case of using technology in the context of spirituality and religion.
The audio version of a typical novel is usually between 8 and 20 hours long. Such extreme cases show just how diverse the world of audiobooks can be.
In the Department for the Blind of the Main Library of Work and Social Insurance in Warsaw there are about 31 thousands books on cassettes, 6500 on CDs, 15 thousand works in the Czytak format and 16500 works in the DAISY format.
It is worth noting that public libraries in Poland also make more and more audiobooks available, taking the needs of blind and visually impaired people into account as well. For example, more than 2141 audiobooks are available in the Library in Wisła, and 3667 in Cieszyn.
What a Fully Accessible Audiobook Needs – A List of Key Elements
An audiobook is commonly seen as a book divided into parts, read aloud and recorded in a studio. This format, however, has more possibilities and a lot of potential also in the aspect of accessibility. What needs to be done to make an audiobook functional and useful?
A navigable structure
A user should be able to move quickly between the structure levels: chapter → subchapter → section → paragraph → page (when it makes sense). This is much more than “constant playback” and requires marking the structure in the file (e.g. headers, tables of contents, navigation maps) DAISY and EPUB 3 formats offer mechanisms which enable such navigation.
Coordinated / synchronised audio and text (if there is text)
When publishing ebooks like textbooks, guidebooks or audiobooks where text can be followed, consider using synchronisation mechanisms (SML/Media Overlays in EPUB 3). This way, the reader is able to access a specified sentence or paragraph and hear it at once. This facilitates learning, quoting and finding fragments quickly.
High recording quality and clear narration
Recording techniques and mastering affect comprehensibility. You should: record in good conditions (low background level), use clear diction, the correct pace and natural breaks; provide the right volume level, low background noise and no compression which affects speeding up/ slowing down the playback. You can find tips on audio quality in W3C guidelines.
Complete metadata and marked accessibility
A publication’s metadata should contain information on formats accessible for it (e.g. EPUB 3 with Media Overlays, DAISY/DTBook, MP3 + SMIL, text version), about the accessibility of illustrations (or descriptions) and, if necessary, about limitations (e.g. DRM). Correct metadata facilitates the search and distribution to libraries for people with disabilities (Bookshare, libraries for blind people).
Player and auxiliary interface support
Navigation, bookmarks, playback speed and access to the table of contents need to be available both from the level of an app or player and for assistive technologies (screen readers, keyboards, voice control). When creating an audiobook, think about how it is going to be played – readers use different apps and devices and not all players will support “uncommon” mechanisms. There are Media Overlays playback guidelines and requirements.
Additional content: descriptions, tables, graphics, equations
If an audiobook contains important visual content (diagrams, tables, graphs, photos), it is not enough to skip them – you need to provide an alternative version (using words): short descriptions in the narration or references to full descriptions in the text version. In the case of mathematics or complex tables you should consider additional text files containing descriptions (or using MathML and synchronising it) for the information to be accessible in sound form and presented in a logical, comprehensible structure.
“Born accessible” versions and formats and conversions
It is best to think about an audiobook’s accessibility from the beginning of the process of creating it (born accessible) rather than to adapt a normal audio file. If you need to do the latter, however, make sure that you add the navigation and metadata layers (e.g. DAISY production, EPUB3 with Media Overlays or the MP3 + SMIL package). Projects which publish guidebooks for publishers show that an “accessibility in the process” workflow saves time and money.
Examples of solutions and standards you should know:
DAISY / NISO Z39.86 – the standard for spoken digital books created for users with print disabilities; offers complex navigation levels.
EPUB 3 + Media Overlays (SMIL) – a modern way of audio-HTML/EPUB content synchronisation; it allows you to create both a “text with narration” and an “audio-only EPUB” with complex navigation.
W3C WAI – audio and video guidelines – practical quality, metadata and WCAG compability (e.g. adding redundant sensory information) recommendations
Bibliography:
https://www.w3.org/publishing/epub3/epub-mediaoverlays.html
https://www.w3.org/WAI/media/av/
https://daisy.org/info-help/navigable-audio-only-epub3/
https://daisy.org/activities/standards/media-overlays-playback-requirements/
https://daisy.org/activities/standards/daisy-niso-z39-86/
https://inclusivepublishing.org/accessible-audiobook-workflow-guide/
https://eboundcanada.org/cnib-accessible-audiobook-production/