Tagged mathematics in PDFs for accessibility and other purposes

Ross Moore*

*Corresponding author for this work

    Research output: Contribution to journalConference paperpeer-review

    3 Citations (Scopus)


    PDF has been the preferred format for publishing mathematics for many years now. With changes to methods of delivery (i.e., electronic rather than predominantly paper) there need to be corresponding enhancements in the document format. Not least among these can be implicit legal obligations to satisfy Accessibility criteria. The answer developed for PDF is tagging of document structure and content types, as described in the PDF/UA Implementation Guide [4]. Wikipedia describes this as "not a separate file-format but simply a way to use PDF" [12], which when supported "reader software will be able to reliably reflow text onto small screens, provide powerful navigation options, transform text appearance, improve search engine functionality, aid in the selection and copying of text, and more" [12]. Academic publishers are starting to see these benefits and will doubtless soon require at least minimal tagging of online PDF documents for Accessibility purposes, in a similar way to how Accessibility tags have been incorporated into HTML. Here is a brief overview of work done by the author to incorporate full MathML tagging of mathematical content in documents produced primarily using the LATEX typesetting system. Since the publicly available TEX software was not written to support such tagging of document content, further software tools are also required. This includes using a modified version of pdfTEX, a self-developed Perl program, TEX to MathML conversion software, some standard Unix command-line utilities, and extensive use of self-written TEX and LATEX macros. As this work is a continuation of work presented at the CICM meetings in 2009 [5], we concentrate here mostly on the advancements made since then. This includes the ability to capture complete math-environments from a running LATEX job, to automatically invoke a conversion of the LATEX source of the particular piece of mathematics into Presentation MathML using whatever appropriate conversion software is available. Previously the MathML version needed to have been available independent from the LATEX source. Now this conversion can be done 'on-the-fly', using TEX4HT for example, before merging the MathML and LATEX de-scriptions of the same piece of mathematics into a new extended LATEX description incorporating macros to cause the generation of appropriate tagging and enrichment to satisfy Accessibility requirements. Such automatic conversion and merging can add significantly to the total running time for the whole job, so an indexing system has been developed which allows the resulting enriched LATEX description to be reused with multi-ple occurrences of the same source coding within the same job, and to be available for reuse in subsequent LATEX runs. Another development is better control over the words produced for alternative text, to be read by screen-readers. Where previously this was largely hard-coded in the enriched LATEX description, this is now replaced by macros whose expansion text can be customised. This allows for the possibility of generating speech text in different languages, or customising what is to be spoken according to the field of mathematics being described within the document. Such customisations can be done at the LATEX level, so that a document author need not be involved with the highly intricate details of conversion to MathML and the enhancements required for tagging.

    Original languageEnglish
    Pages (from-to)1-11
    Number of pages11
    JournalCEUR Workshop Proceedings
    Publication statusPublished - 2013
    Event8th Workshop on Mathematical User Interfaces - Bath
    Duration: 9 Jul 201310 Jul 2013


    Dive into the research topics of 'Tagged mathematics in PDFs for accessibility and other purposes'. Together they form a unique fingerprint.

    Cite this