BoutonJones.com

PDF A11y 101: Don't Remediate your PDFs

Read Me

This is not meant to be a complete or exhaustive explanation of PDF Accessibility.

A Brief History of the Portable Document Format

Adobe created the Portable Document Format (PDF) in 1993 to address the incompatibility of different computer systems, which often distorted document formatting during transfer. PDFs offered a solution by enabling the sharing and printing of documents with consistent layout, fonts, and images across various platforms. Regardless of the computer platform, the PDF will always display exactly the same.

Since then, PDFs have evolved beyond simple printing. They are now widely used for eBooks, contracts, interactive forms, and digital archiving. Features like clickable links, multimedia, and digital signatures have made PDFs adaptable to modern digital needs.

Adobe released the PDF as an ISO open standard In 2008. If Adobe were ever to go out of business, the PDF format will remain available and supported.

PDF allows the user to view a file precisely—down to the pixel, essentially, of what the author had intended.
- Bob Wulff, Adobe's Senior Vice President of Cloud Technology
cite in Who Created the PDF? on Adobe.com

PDF vs Web Pages

PDFs present some limitations. Many lack proper tagging or alternative text, hindering accessibility for users of assistive technologies. Additionally, they can become bloated with unnecessary data, impacting loading and sharing speeds. While PDFs excel at preserving document appearance, they are less suitable for editing and adapting to smaller screens compared to other formats.

Comparison: PDF vs Web Pages
Feature PDF Web Pages
Accessibility Limited if not properly tagged Highly customizable for accessibility
Layout Consistency Always consistent Can vary based on screen size and browser (e.g., Responsive Design)
Editability Difficult to edit without special tools Easy to update
Interactivity Supports forms and multimedia Highly interactive with JavaScript and CSS
File Size Can become large with images or media Typically smaller. Loads dynamically

When are PDFs the best option? PDFs are good for:

 

Don't use a PDF where a web page will do as well --- or better.

 

Related Usability Articles from the Nielsen Norman Group:

Demonstration: Three Identical PDFs

three visually identical pdfs

They have the exact same content, title, headers, font, and colors. So how are they different?

Blank PDFs

The first of three visually identical PDF documents

To users of assistive technology, scanned PDFs are blank. Click the next button to hear how JAWS perceives a scanned PDF.

JAWS identifies the first PDF as an "document," but it doesn't provide an further information. Click the next button to hear how NVDA understands the document.

NVDA is more informative. It announces "Alert: Empty document. Edit list: Document appears to be empty. It may be a scanned image that needs OCR or it may be a malformed document read only."

"A picture may be worth a thousand words --- but not if it's a picture of a thousand words that you're trying to read with accessible technology."
 - David Ondich
  ADA Program Manager for the City of Austin

When you scan a printed document – e.g., a signed contract, police report, corporate policy, or anything requested for discovery – it is no more than an image of the hard copy. It will not be accessible to screen readers in its current form.

If you can't "copy and paste " text, it most likely is an image.

Untagged PDFs

The first of three visually identical PDF documents

Untagged PDFs contain true text but they can't be navigated. Screen readers can read the content but they can't find the headers or the lists. The tags have been stripped from the document.

Question: when is a PDF with text an "untagged" document?

Answer 1 of 2: When semantic markup is not used to tag the structural elements in a document, the document is called "untagged." If the source of a PDF is an untagged document, the PDF will also be untagged.

Tagged PDFs
 
  1. Tagged PDF (PDF 1.4) is a stylized use of PDF that builds on PDF's logical structure framework. It defines a set of standard structure types and attributes that allow page content (text, graphics, and images) to be extracted and reused for other purposes
    Cite: PDF Technology Notes
  2. A PDF file that -- in addition to text and graphics -- contains meta-data for text-extraction, content-reflow, document accessibility, geographic information in PDF containing maps, etc.,

    With the correct tags, a screen reader can:
    • Understand where headings fall
    • Follow the correct reading order
    • Identify footnotes & graphics
    • Understand the structure of tables
    • Complete fillable forms

In most cases, tags are necessary in order to make a PDF file comply with Section 508.

Answer 2 of 2: When you print a document to PDF. The text is saved to the PDF but the tags for indicating structure are lost.

One surprising exception is tables. I printed (to PDF) a Word document containing a table. Both cells of the table included multiple lines of text. I expected the screen reader to ignore the table cells and read the entire content left to right. But both JAWS and NVDA read all the line of text in the first cell before reading all the lines of text in the next cell. I tried using Word's column function. NVDA was able to read a two column PDF printed from Word correctly.

Export to PDF

Markup/Styles in HTML
Never Print to PDF! Always Export to PDF instead

Never print to PDF. Always export to --- or save as --- PDF. That way the semantic formatting (the tags) will be included.

Export to PDF is a better option than Save as PDF.

Accessible PDFs

This third PDF is properly tagged and contains real text.

The first of three visually identical PDF documents

Screen readers (such as Jaws and NVDA) will identify headers and lists in a tagged PDF.

PDF Tutorial (Video)

How to Create Accessible in PDF [SIC] by Using Adobe Acrobat Pro [2017] (1:50)

The video on YouTube

Transcript:

00:00 In this video, PDF Tutorial: How to Create Accessible in pdf by using adobe acrobat pro
00:01 Go to the tool menu and click the Action Wizard and Click Create Accessibly
00:11 select the Add Document Description and click ok and select destination
00:22 fill the information
01:00 now your accessibility is created
01:40 Please Subscribe My channel Thank you for watching

A11y PDF Basics

PDF Titles

In a PDF, the "title style" refers to the text displayed as the title within the document itself

The "title metadata" is a separate piece of information embedded within the PDF file that describes the document's title and is used for searching and indexing purposes but isn't necessarily visible directly in the document itself.

Saving the Title Metadata

  1. Open the document in the Adobe Acrobat editor.
  2. Navigate to "File" > "Properties."
  3. Select the "Description" tab.
  4. Enter the desired title in the "Title" field

This will embed the title as metadata within the PDF file.

Display the Title (Metadata)

By default, the Title (Metadata) is empty and the PDF is set to display the File Name instead of the Title (Style).

You must manually enter the Title and set Show to display the Document Title.

  1. Open the PDF in Adobe Acrobat.

  2. Go to "File" > "Properties."

  3. Select the "Initial View" tab.

  4. In "Window Options," choose "Document Title" from the "Show" dropdown.

Optical Character Recognition (OCR)

OCR Demo (Video)

OCR: How To Convert Scanned Document (PDF) To Text Using Adobe Acrobat Professional (:37)

 

This video has no narration, transcript, or close captioning.

 

In this video a PDF is displayed inside Adobe Acrobat. The user tries to select the text. But the PDF is image only so the text cannot be selected. The user runs OCR on the entire PDF. The pages are de-skewed (i.e. straightened) The text inside the images are converted into real text. They appear to retain the original fancy font face.

Recognize Text (OCR) Output Options in Adobe Acrobat 2017

If you click the Setting Button on the Recognize Text menu, you can choose among three options in the Recognize Text dialog box.

  1. Searchable Image
  2. Searchable Image (exact)
  3. Editable Text and Image

Two Output Options:

  1. Editable Text and Image
    • Converts the graphical text into vector text. The vector text will replace the "text in image."
    • It creates and embeds a custom font that looks like the original font in the image.
    • You can now edit the text.
    • The file size is larger
  2. Searchable Image and Searchable Image (Exact)
    • Places an invisible "layer" (not really a layer in a strict technical sense) on top of the original image.
    • You can't easily edit the document but the invisible layer can be read by assistive technology (such as JAWS and NVDA.)

Searchable Image (Exact)

"Searchable Image" may slightly alter the image (by deskewing it) to improve text recognition, while "Searchable Image (Exact)" preserves the original.

an original skewed page and a deskewed version side by side

Generally the file size of deskewed PDFs are smaller than the skewed versions because of the improved text recognition.

Additional Resources on the Output Options

Validating OCR

OCR can be deceptive. It might appear to the human eye to match the original document exactly while being wildly inaccurate to assistive technology. It's always best to check the accuracy. Here are three methods.

  1. Compare the visible text to what a screen reader finds.
  2. Copy and paste the newly editable text into a text document.
  3. Change the font face --- of the editable text --- throughout the revised document. (Later you can change it back to the fonts closest to the original document's fonts.)

Fillable PDF Forms

Using Tables for Layout

It's a best practice not to use tables for layout. But if you do, it's a good idea to remove the tags for tables (used for layout) in the Adobe's Tag Tree.

Remove Layout Tables in Adobe Acrobat 2017

  1. Select Tools
  2. Select Edit PDF
  3. Select Tag to see the tag tree
  4. Select the table with a right click
  5. Select the Delete Tag option from the context menu

The challenge here is to identify the correct tag. They are not well labeled and it's hard to identify which item in the document they correspond to.

Remediation

I am not an expert on PDF remediation. I am focused on creating accessible PDFs by starting with accessible documents and exporting them into PDF.

But sometimes remediation is unavoidable.

My Best Advice on Remediation: If you must remediate PDFs, save the documents periodically under different names (e.g., ebook_ver01, ebook_ver02, ebook_ver03). It can be hard to recover from mistakes made during remediation and it is often easier to start over with your last saved copy than undo the mistake.

Embedding Fonts

Adobe Acrobat's Base 14 Fonts

The Base 14 Fonts are defined in ISO 32000-1:2008(E) in Section 9.6.2.2.

They are also referred to as the "Standard 14 Fonts", the "Standard Type 1 Fonts", and the "Standard Fonts".

Monospaced Fonts: Courier, Courier Bold, Courier Oblique, Courier Bold-Oblique
Proportional Fonts (Sans Serif): Helvetica, Helvetica Bold, Helvetica Oblique, Helvetica Bold-Oblique
Proportional Fonts (Serif): Times Roman, Times Bold, Times Italic, Times Bold-Italic
Symbol
Zapf Dingbats

Since these fonts are standard and assumed to be available on most PDF readers, they don't need to be embedded within the document, resulting in a smaller file size. If you limit your document to these 14 fonts, it should not be necesssary to embed any additional fonts.

Checking If Fonts are Embedded in a PDF.

To check whether the fonts are all embedded in your PDF file:

  1. Open your PDF file
  2. Click File > Document Properties
  3. Click on the Fonts Tab to display the list of all fonts
  4. All fonts are either Type 1 or TrueType fonts
  5. All fonts should show as "Embedded Subset"

Cite: How to Embed Fonts in a PDF Document on qoppa.com

Embedding Fonts in PDFs

Noe: a font can only be embedded if it contains a setting by the font vendor that permits it to be embedded.

  1. Open the file in Adobe Acrobat (the editor, not the free reader.
  2. In the File menu, click Print.
  3. Click Adobe PDF
  4. Click the Properties button to the right of the Printer Name text box
  5. Select the tab Adobe PDF Settings
  6. Edit the Default Settings
  7. Click Fonts
  8. For Subset embedded fonts when percent of characters used is less than: Set the percentage to 100%
  9. Select the Embed all Fonts option
    • For Embedding, select the folder with the fonts you want to embed from the drop-down list
    • Make sure the fonts you need to embed are in the Always Embed box and not in the Never Embed box

Cite: How to Embed Fonts in PDF on printivity.com

Here's a short video (less than 2 minutes long): How to Embed Fonts in a PDF using Acrobat pro-2017

Bookmarks

What Are Bookmarks?

You might know bookmarks from web browsers-they save web pages so you can find them later. In PDFs, bookmarks work a little differently. They show up in the navigation panel and let you jump to different sections within the same document.

So, while browser bookmarks help you move between websites, PDF bookmarks help you move around within a document.

Are Bookmarks Required?

Some tools-like Adobe Acrobat's checker and the PDF Accessibility Checker (PAC)-may flag an error if a PDF over 20 pages doesn't have bookmarks.

The user interface of PAC shows that it checks whether Bookmarks are available in PDFs.

The WCAG guidelines don't strictly require bookmarks, but it's considered a good practice to include them in long PDFs.

Some A11y professionals advise: if you use Bookmarks, they should mirror headings. For example, in the U.S. Department of Health and Human Services' Adobe Acrobat PDF Accessibility Reference (in the "Take Additional Measures" section), the HHS advises Adding bookmarks to lengthy documents can aid navigation. Open the Bookmarks pane and confirm bookmarks are present. Insert bookmarks by activating the Options menu and selecting New Bookmarks from Structure… Typically heading structure (i.e., H1-H6) is used. Bookmarks must be organized and properly nested. (That document was last revised in August 2020.)

Prior to that, the HHS's advice (in a no longer extant web page) was a little more emphatic.

Sections of a table from the "Required Fixes for PDF Files" page on the HHS.gov website.

This screen capture from the "Required Fixes for PDF Files" page on the HHS.gov website was taken before 2008. It states The document contains at least 10 pages and does not contain proper bookmarks. This issue is a violation of section 508 and WCAG 2.0 Success Criterion 2.4.5..

It goes on to recommend to Add bookmarks for major divisions of the document. Recommend creating based on the heading structure or Table of Contents if one exists. For assistance see: W3 PDF Technique #2

It concludes with this link: Adobe Bookmark

As mentioned earlier, WCAG doesn't require bookmarks. But what WCAG Success Criterion 2.4.5 (Level AA) actually requires is that there is More than one way is available to locate a Web page within a set of Web pages except where the Web Page is the result of, or a step in, a process. The intent of the success criteria is to make it possible for users to locate content in a manner that best meets their needs. Users may find one technique easier or more comprehensible to use than another.

By applying that web page criteria to PDF, I interprete it to mean the bookmark technique is not required, but it is one technique that can be applied. The goal is to help users find what they need in the way that works best for them. For PDFs, bookmarks are one way to do that.

W3C's PDF Technique #2 (i.e., "Creating bookmarks in PDF documents")

The intent of this technique is to make it possible for users to locate content using bookmarks (outline entries in an Outline dictionary) in long documents. Furthermore A person with cognitive disabilities may prefer a hierarchical outline that provides an overview of the document rather than reading and traversing through many pages. This is also a conventional means of navigating a document that benefits all users.

Notice that the W3C is offering this as technique, not as success criteria.

Is it Redundant to Use Both Headings and Bookmarks?

It can seem redundant to include headings as well as bookmarks that mirror headings, but bookmarks offer an additional benefit. They show up in the navigation pane, so users can quickly jump between sections without scrolling or using a mouse. This helps everyone - not just people using screen readers. Headings alone are hidden unless you use assistive tech.

How Do You Use Word Headings for Bookmarks?

How to add bookmarks based on a document's heading structure via Adobe Acrobat Pro DC:

  1. Select Bookmarks icon on the Accessibility Checker panel
  2. Select the Options icon
  3. Select New Bookmarks from Structure
  4. In the Structure Elements dialog box, select the element(s) (e.g., headings) that you want to use as bookmarks.
  5. Click OK.
screen capture showing the bookmark panel in Adobe Acrobat

Citation and Image Source: Accessible PDFs: When Bookmarks Are Required Posted on June 1, 2019 by Mary Gillen

In Conclussion (Bookmarks)

For long PDFs with headings, it seems prudent and helpful to include bookmarks based on the orginal document's headings. While not strictly a WCAG success criteria, it will improve the PDF's accessibiity which is the ultimate goal.

That said, I welcome hearing from PDF Accessibility experts who have a different opinion or offer new information.

PDF Tools

Adobe Acrobat

The most popular software for creating PDFs is Adobe Acrobat. Among other functions it provides OCR, editing, and remediation. It's not to be confused with the free Adobe Reader.

For document authors who already have licenses for Acrobat, the built in Accessibility Checker is an obvious choice for auditing PDF. In most cases, that checker will be enough.

However, document authors should consider what guidelines they mean to follow when checking the options for the checker. If they are not folowing the PDF/AU guidelines they should not select the "accessibility permission flag" or "missing bookmarks" criteria. The remaining criteria will work for both WCAG and PDF/AU.

Adobe Acrobat's Accessibility Checker's Options Window

Other Adobe PDF Products

Some Alternative Free PDF Editors

This is not meant to be a complete list or intended as recommendations.

Additional PDF Tools

Depending on how much PDF remediation you perform, other tools may be helpful or necessary.

I'm not personally recommending any of the following. I have limited experience using any of them. (As stated previously, I avoid PDF remediation as much as possible, but it can't always be avoided.)

The criteria that PAC checks for: doc. marked as tagged, doc. title available,doc. language defined, permitted security settings, tag follows tag-structure, consistent heading structure, bookmarks available, accessible font encodings, content completely tagged, logical reading order, alt. text available, correct syntax of tags / rolls, sufficinet contrast for text, and spaces existent.

Links for PDFs

Similar pages on BoutonJones.com