BoutonJones.com

PDF A11y 101: Don't Remediate your PDFs

Read Me

This is not meant to be a complete or exhaustive explanation of PDF Accessibility.

A11y PDF Basics

A Brief History of the Portable Document Format

Adobe created the Portable Document Format (PDF) in 1993 to address the incompatibility of different computer systems, which often distorted document formatting during transfer. PDFs offered a solution by enabling the sharing and printing of documents with consistent layout, fonts, and images across various platforms. Regardless of the computer platform, the PDF will always display exactly the same.

Since then, PDFs have evolved beyond simple printing. They are now widely used for eBooks, contracts, interactive forms, and digital archiving. Features like clickable links, multimedia, and digital signatures have made PDFs adaptable to modern digital needs.

Adobe released the PDF as an ISO open standard In 2008. If Adobe were ever to go out of business, the PDF format will remain available and supported.

PDF allows the user to view a file precisely—down to the pixel, essentially, of what the author had intended.
- Bob Wulff, Adobe's Senior Vice President of Cloud Technology
cite in Who Created the PDF? on Adobe.com

PDF vs Web Pages

PDFs present some limitations. Many lack proper tagging or alternative text, hindering accessibility for users of assistive technologies. Additionally, they can become bloated with unnecessary data, impacting loading and sharing speeds. While PDFs excel at preserving document appearance, they are less suitable for editing and adapting to smaller screens compared to other formats.

Comparison: PDF vs Web Pages
Feature PDF Web Pages
Accessibility Limited if not properly tagged Highly customizable for accessibility
Layout Consistency Always consistent Can vary based on screen size and browser (e.g., Responsive Design)
Editability Difficult to edit without special tools Easy to update
Interactivity Supports forms and multimedia Highly interactive with JavaScript and CSS
File Size Can become large with images or media Typically smaller. Loads dynamically

When are PDFs the best option? PDFs are good for:

 

Don't use a PDF where a web page will do as well --- or better.

 

Related Usability Articles from the Nielsen Norman Group:

Demonstration: Three Identical PDFs

three visually identical pdfs

They have the exact same content, title, headers, font, and colors. So how are they different?

Blank PDFs

The first of three visually identical PDF documents

To users of assistive technology, scanned PDFs are blank. Click the next button to hear how JAWS perceives a scanned PDF.

JAWS identifies the first PDF as an "document," but it doesn't provide an further information. Click the next button to hear how NVDA understands the document.

NVDA is more informative. It announces "Alert: Empty document. Edit list: Document appears to be empty. It may be a scanned image that needs OCR or it may be a malformed document read only."

"A picture may be worth a thousand words --- but not if it's a picture of a thousand words that you're trying to read with accessible technology."
 - David Ondich
  ADA Program Manager for the City of Austin

When you scan a printed document – e.g., a signed contract, police report, corporate policy, or anything requested for discovery – it is no more than an image of the hard copy. It will not be accessible to screen readers in its current form. It is not possible to copy and paste the text without performing optical character recognition first.

Image of Text (or Image-only Text):
Text that appears inside an image, such as scanned pages or graphics. Screen readers cannot interpret this unless OCR is applied.
Live Text (or Real Text)
Text stored as actual characters in a document, not as an image. Live text can be selected, searched, copied & pasted, and read by assistive technologies.

Untagged PDFs

The first of three visually identical PDF documents

Untagged PDFs contain true text but lack semantic formatting which means they can't be navigated. Screen readers can read the content but they can't find the headers or the lists. The tags have been stripped from the document.

Question: when is a PDF with text an "untagged" document?

Answer 1 of 2: When semantic markup is not used to tag the structural elements in a document, the document is called "untagged." If the source of a PDF is an untagged document, the PDF will also be untagged.

Tagged PDFs
 
  1. Tagged PDF (PDF 1.4) is a stylized use of PDF that builds on PDF's logical structure framework. It defines a set of standard structure types and attributes that allow page content (text, graphics, and images) to be extracted and reused for other purposes
    Cite: PDF Technology Notes
  2. A PDF file that -- in addition to text and graphics -- contains meta-data for text-extraction, content-reflow, document accessibility, geographic information in PDF containing maps, etc.,

    With the correct tags, a screen reader can:
    • Understand where headings fall
    • Follow the correct reading order
    • Identify footnotes & graphics
    • Understand the structure of tables
    • Complete fillable forms

In most cases, tags are necessary in order to make a PDF file comply with Section 508.

Answer 2 of 2: When you print a document to PDF. The text is saved to the PDF but the tags for indicating structure are lost.

One surprising exception is tables. I printed (to PDF) a Word document containing a table. Both cells of the table included multiple lines of text. I expected the screen reader to ignore the table cells and read the entire content left to right. But both JAWS and NVDA read all the line of text in the first cell before reading all the lines of text in the next cell. I tried using Word's column function. NVDA was able to read a two column PDF printed from Word correctly.

Export to PDF

Never print to PDF if there is an alternative. (See Converting Microsoft Office Documents to PDF on this page.)

Accessible PDFs

This third PDF is properly tagged and contains real text.

The first of three visually identical PDF documents

Screen readers (such as Jaws and NVDA) will identify headers and lists in a tagged PDF.

Converting Microsoft Office Documents to PDF

Rating the accessibility of methods of converting MSO documents to PDF on a 1-5 scale
OptionRatingPreserves A11y FeaturesAdvantagesDisadvantages
Scanning to PDF1NoConverts physical documents to digital format. Sometimes used for making signed contracts, memos, and policies available online.Creates image-only PDFs; no tags or structure; OCR is required to convert images into true text (live text) that assistive technology like screen readers can read. (OCR should always be tested for accuracy.)
Print to Microsoft PDF1NoBuilt into Windows; quick and easyStrips accessibility tags; Creates flat PDF; The text can be read aloud by assistive technology, but the semantic formatting is missing making it impossible to navigate using assistive technology. It fails key WCAG 2.x Level AA criteria related to structure, navigation, and semantics.
Print to Adobe PDF2NoAvailable with Adobe Acrobat; embeds fonts; Preserves the appearance of the digital document.Does not preserve accessibility features; requires Adobe software;
File tab > Export > Create Adobe PDF (Word ribbon)5YesPreserves tags, headings, alt text; supports PDF/UA compliance when the source document is properly structured.Requires Adobe Acrobat; slightly more steps; The best option here (assuming you have installed Adobe Acrobat on your device).
Create PDF from File in Adobe Acrobat4YesPreserves most accessibility features; allows accessibility checksRequires opening Acrobat separately; may need manual fixes
Save As PDF (Word built-in)4YesPreserves tags, headings, alt text; no extra software neededLimited advanced accessibility options: cannot set PDF/UA compliance. (NOTE: PDF/UA compliance, while recommended, is not required by WCAG.)
Third-Party Tools (e.g., Nitro, Foxit)2-3VariesOften cheaper than Adobe; some preserve basic tagsInconsistent accessibility support; may require manual fixes

Recommendations

PDF Tutorial (Video)

How to Create Accessible in PDF [SIC] by Using Adobe Acrobat Pro [2017] (1:50)

The video on YouTube

Transcript:

00:00 In this video, PDF Tutorial: How to Create Accessible in pdf by using adobe acrobat pro
00:01 Go to the tool menu and click the Action Wizard and Click Create Accessibly
00:11 select the Add Document Description and click ok and select destination
00:22 fill the information
01:00 now your accessibility is created
01:40 Please Subscribe My channel Thank you for watching

Converting Web Pages to PDF

When I find a useful web page online, I will often save it to PDF for my own future reference. But if I plan to highlight or copy text, I need to make sure the text is saved as true text instead of text to graphics. So, the method for saving to PDF is important.

The techniques in the table below are performed in web browsers within the Print function.

Screen captures of the print function menus in Edge and Firefox
Rating the accessibility of methods of converting web pages to PDF on a 1-5 scale
OptionRatingAdvantagesDisadvantages
Adobe Acrobat "Save as Adobe PDF" (Browser Extension)1Preserves true selectable text, generates tagged PDFs, maintains structure and links.Requires Acrobat Pro and extension; occasional tagging issues which should be addressed in Adobe Acrobat
Browser "Save as PDF" (Edge/Chrome/Firefox)2Keeps true text, maintains reading order, preserves links, built-in.Not fully tagged; complex layouts may misconvert. Should run an Accessibility checker on the resulting PDF before sharing.
Adobe PDF Printer (virtual printer)4Sometimes preserves real text; consistent output.No tagging; may flatten complex elements; not WCAG compliant.
Microsoft Print to PDF5Simple and universally available.Often flattens text to images; no tags; not accessible.

Recommendations

PDF Titles

In a PDF, the "title style" refers to the text displayed as the title within the document itself

The "title metadata" is a separate piece of information embedded within the PDF file that describes the document's title and is used for searching and indexing purposes but isn't necessarily visible directly in the document itself.

Saving the Title Metadata

  1. Open the document in the Adobe Acrobat editor.
  2. Navigate to "File" > "Properties."
  3. Select the "Description" tab.
  4. Enter the desired title in the "Title" field

This will embed the title as metadata within the PDF file.

Display the Title (Metadata)

By default, the Title (Metadata) is empty and the PDF is set to display the File Name instead of the Title (Style).

You must manually enter the Title. you musy also set the Show setting to display the Document Title.

  1. Open the PDF in Adobe Acrobat.

  2. Go to "File" > "Properties."

  3. Select the "Initial View" tab.

  4. In older versions of Microsoft Office, go "Window Options" and choose "Document Title" from the "Show" dropdown.

Optical Character Recognition (OCR)

OCR Demo (Video)

OCR: How To Convert Scanned Document (PDF) To Text Using Adobe Acrobat Professional (:37)

 

This video has no narration, transcript, or close captioning.

 

In this video a PDF is displayed inside Adobe Acrobat. The user tries to select the text. But the PDF is image only so the text cannot be selected. The user runs OCR on the entire PDF. The pages are de-skewed (i.e. straightened) The text inside the images are converted into real text. They appear to retain the original fancy font face.

To view the video on YouTube, visit OCR: How To Convert Scanned Document (PDF) To Text Using Adobe Acrobat Professional (:37)

Recognize Text (OCR) Output Options in Adobe Acrobat 2017

If you click the Setting Button on the Recognize Text menu, you can choose among three options in the Recognize Text dialog box.

  1. Searchable Image
  2. Searchable Image (exact)
  3. Editable Text and Image

Two Output Options:

  1. Editable Text and Image
    • Converts the graphical text into vector text. The vector text will replace the "text in image."
    • It creates and embeds a custom font that looks like the original font in the image.
    • You can now edit the text.
    • The file size is larger.
  2. Searchable Image and Searchable Image (Exact)
    • Places an invisible "layer" (not really a layer in a strict technical sense) on top of the original image.
    • You can't easily edit the document but the invisible layer can be read by assistive technology (such as JAWS and NVDA.)
    • It might not be posssible to correct OCR errors.

Searchable Image (Exact)

"Searchable Image" may slightly alter the image (by deskewing it) to improve text recognition, while "Searchable Image (Exact)" preserves the original.

an original skewed page and a deskewed version side by side

Generally the file size of deskewed PDFs are smaller than the skewed versions because of the improved text recognition.

Additional Resources on the Output Options

Validating OCR

OCR can be deceptive. It might appear to the human eye to match the original document exactly while being wildly inaccurate to assistive technology. It's always best to check the accuracy. Here are three methods.

  1. Compare the visible text to what a screen reader finds.
  2. Copy and paste the newly editable text into a text document.
  3. Change the font face --- of the editable text --- throughout the revised document. (Later you can change it back to the fonts closest to the original document's fonts.)

Fillable PDF Forms

Using Tables for Layout

Screen readers interpret tables as data structures, so layout tables can confuse users by announcing rows/columns unnecessarily. So it's a best practice not to use tables for layout. But if you do, it's a good idea to remove the tags for tables (used for layout) in the Adobe's Tag Tree.

If your PDF contains tables for layout, you should remove the table tags but preserve the visual appearance. Here are three techniques for doing so followed by instructions to test afterward.

Note: As a best practice, always create a backup of the PDF before remediating it.

Technique 1: Use Adobe Acrobat Pro (Recommended)

  1. Open the PDF in Acrobat Pro.
  2. Go to Tools > Accessibility > Reading Order.
  3. In the Reading Order panel:
    • Select the table.
    • Change its tag to Background or Artifact (this removes it from the tag tree but keeps the visual layout).
  4. Alternatively:
    • Open Tags panel.
    • Re-tag the content as paragraphs or appropriate elements.
      • Replace <Table> with <Div> or <Sect> (generic container).
      • Replace <TR> with <Div> (or remove entirely if not needed).
      • Replace <TD> with <P> for text content.
      • Replace <TH> with <P> (or appropriate heading tags like <H1>-<H6> if it's actually a heading).

Technique 2: Use PAC or Other Tag Editors

Technique 3: Automated Tools

Mistakes to avoid

Test After Remediation

  1. Automatically check with Accessibility Checker, PAC, or other checker for quick structural validation. Confirm that:
    • The <Table> tag and related tags are gone from the tag tree.
    • No necessary tags are missing.
    • There are no reading order issues.
    • There are no other structural problems.
  2. Manually verify with a screen reader (NVDA or JAWS) ensure actual usability. Confirm that:
    • Content flows in the correct order.
    • Layout tables no longer cause confusing navigation.
    • Headings, paragraphs, and artifacts behave as intended.

Redaction

Improperly redacted PDFs can be inaccersble to people with disabilities.

See Hacking Redacted PDFs to learn how to properly redact PDFs and keep them accessible.

Remediation

I am not an expert on PDF remediation. I am focused on creating accessible PDFs by starting with accessible documents and exporting them into PDF.

But sometimes remediation is unavoidable.

My Best Advice on Remediation: If you must remediate PDFs, save the documents periodically under different names (e.g., ebook_ver01, ebook_ver02, ebook_ver03). It can be hard to recover from mistakes made during remediation, and it is often easier to start over with your last saved copy than undo the mistake. If you play computer games, it's similar to saving game play before making a risky move, but it's far more rewarding.

Embedding Fonts

Embedded Fonts:
Fonts that are included within a PDF file so the text displays correctly on any device, even if the font is not installed locally. Embedded fonts preserve the document's appearance and ensure that live text remains readable and accessible. Without embedded fonts, PDFs may substitute fonts, causing layout issues or unreadable characters.

Adobe Acrobat's Base 14 Fonts

The Base 14 Fonts are defined in ISO 32000-1:2008(E) in Section 9.6.2.2.

They are also referred to as the "Standard 14 Fonts", the "Standard Type 1 Fonts", and the "Standard Fonts".

Monospaced Fonts: Courier, Courier Bold, Courier Oblique, Courier Bold-Oblique
Proportional Fonts (Sans Serif): Helvetica, Helvetica Bold, Helvetica Oblique, Helvetica Bold-Oblique
Proportional Fonts (Serif): Times Roman, Times Bold, Times Italic, Times Bold-Italic
Symbol
Zapf Dingbats

Since these fonts are standard and assumed to be available on most PDF readers, they don't need to be embedded within the document, resulting in a smaller file size. If you limit your document to these 14 fonts, it should not be necesssary to embed any additional fonts.

Checking Whether Fonts are Embedded in a PDF.

To check whether the fonts are all embedded in your PDF file:

  1. Open your PDF file
  2. Click File > Document Properties
  3. Click on the Fonts Tab to display the list of all fonts
  4. All fonts are either Type 1 or TrueType fonts
  5. All fonts should show as "Embedded Subset"

Cite: How to Embed Fonts in a PDF Document on qoppa.com

Embedding Fonts in PDFs

Note: a font can only be embedded if it contains a setting by the font vendor that permits it to be embedded.

  1. Open the file in Adobe Acrobat (the editor, not the free reader.
  2. In the File menu, click Print.
  3. Click Adobe PDF
  4. Click the Properties button to the right of the Printer Name text box
  5. Select the tab Adobe PDF Settings
  6. Edit the Default Settings
  7. Click Fonts
  8. For Subset embedded fonts when percent of characters used is less than: Set the percentage to 100%
  9. Select the Embed all Fonts option
    • For Embedding, select the folder with the fonts you want to embed from the drop-down list
    • Make sure the fonts you need to embed are in the Always Embed box and not in the Never Embed box

Cite: How to Embed Fonts in PDF on printivity.com

Here's a short video (less than 2 minutes long): How to Embed Fonts in a PDF using Acrobat pro-2017

Bookmarks

What Are Bookmarks?

You might know bookmarks from web browsers-they save web pages so you can find them later. In PDFs, bookmarks work a little differently. They show up in the navigation panel and let you jump to different sections within the same document.

So, while browser bookmarks help you move between websites, PDF bookmarks help you move around within a document.

Are Bookmarks Required?

Some tools-like Adobe Acrobat's checker and the PDF Accessibility Checker (PAC)-may flag an error if a PDF over 20 pages doesn't have bookmarks.

The user interface of PAC shows that it checks whether Bookmarks are available in PDFs.

The WCAG guidelines don't strictly require bookmarks, but it's considered a good practice to include them in long PDFs.

Some A11y professionals advise: if you use Bookmarks, they should mirror headings. For example, in the U.S. Department of Health and Human Services' Adobe Acrobat PDF Accessibility Reference (in the "Take Additional Measures" section), the HHS advises Adding bookmarks to lengthy documents can aid navigation. Open the Bookmarks pane and confirm bookmarks are present. Insert bookmarks by activating the Options menu and selecting New Bookmarks from Structure… Typically heading structure (i.e., H1-H6) is used. Bookmarks must be organized and properly nested. (That document was last revised in August 2020.)

Prior to that, the HHS's advice (in a no longer extant web page) was a little more emphatic.

Sections of a table from the "Required Fixes for PDF Files" page on the HHS.gov website.

This screen capture from the "Required Fixes for PDF Files" page on the HHS.gov website was taken before 2008. It states The document contains at least 10 pages and does not contain proper bookmarks. This issue is a violation of section 508 and WCAG 2.0 Success Criterion 2.4.5..

It goes on to recommend to Add bookmarks for major divisions of the document. Recommend creating based on the heading structure or Table of Contents if one exists. For assistance see: W3 PDF Technique #2

It concludes with this link: Adobe Bookmark

As mentioned earlier, WCAG does not require bookmarks. What WCAG Success Criterion 2.4.5 (Level AA) actually requires is that there is More than one way is available to locate a Web page within a set of Web pages except where the Web Page is the result of, or a step in, a process. The intent of the success criteria is to make it possible for users to locate content in a manner that best meets their needs. Users may find one technique easier or more comprehensible to use than another.

By applying that web page criteria to PDF, I interprete it to mean the bookmark technique is not required, but it is one technique that can be applied. The goal is to help users find what they need in the way that works best for them. For PDFs, bookmarks are one way to do that.

W3C's PDF Technique #2 (i.e., "Creating bookmarks in PDF documents")

The intent of this technique is to make it possible for users to locate content using bookmarks (outline entries in an Outline dictionary) in long documents. Furthermore A person with cognitive disabilities may prefer a hierarchical outline that provides an overview of the document rather than reading and traversing through many pages. This is also a conventional means of navigating a document that benefits all users.

Notice that the W3C is offering this as technique, not as success criteria.

Is it Redundant to Use Both Headings and Bookmarks?

It can seem redundant to include headings as well as bookmarks that mirror headings, but bookmarks offer an additional benefit. They show up in the navigation pane, so users can quickly jump between sections without scrolling or using a mouse. This helps everyone - not just people using screen readers. Headings alone are hidden unless you use assistive tech.

How Do You Use Word Headings for Bookmarks?

How to add bookmarks based on a document's heading structure via Adobe Acrobat Pro DC:

  1. Select Bookmarks icon on the Accessibility Checker panel
  2. Select the Options icon
  3. Select New Bookmarks from Structure
  4. In the Structure Elements dialog box, select the element(s) (e.g., headings) that you want to use as bookmarks.
  5. Click OK.
screen capture showing the bookmark panel in Adobe Acrobat

Citation and Image Source: Accessible PDFs: When Bookmarks Are Required Posted on June 1, 2019 by Mary Gillen

In Conclussion (Bookmarks)

For long PDFs with headings, it seems prudent and helpful to include bookmarks based on the orginal document's headings. While not strictly a WCAG success criteria, it will improve the PDF's accessibiity --- which should be your ultimate goal. So, I recommend it as a best practice.

That said, I welcome hearing from PDF Accessibility experts who have a different opinion or offer new information.

PDF Tools

Adobe Acrobat

The most popular software for creating PDFs is Adobe Acrobat. Among other functions it provides OCR, editing, and remediation. It's not to be confused with the free Adobe Reader.

For document authors who already have licenses for Acrobat, the built in Accessibility Checker is an obvious choice for auditing PDF. In most cases, that checker will be enough.

However, document authors should consider what guidelines they mean to follow when checking the options for the checker. If they are not folowing the PDF/AU guidelines they should not select the "accessibility permission flag" or "missing bookmarks" criteria. The remaining criteria will work for both WCAG and PDF/AU.

Adobe Acrobat's Accessibility Checker's Options Window

Other Adobe PDF Products

Some Alternative Free PDF Editors

This is not meant to be a complete list or intended as recommendations.

Additional PDF Tools

Depending on how much PDF remediation you perform, other tools may be helpful or necessary. Experts on PDF remediation have suggested the tools below to me.

I'm not personally recommending any of the following. I have limited experience using any of them. (As stated previously, I avoid PDF remediation as much as possible. The name of this page is "Don't Remediate Your PDFs" after all. But it can't always be avoided.)

The criteria that PAC checks for: doc. marked as tagged, doc. title available,doc. language defined, permitted security settings, tag follows tag-structure, consistent heading structure, bookmarks available, accessible font encodings, content completely tagged, logical reading order, alt. text available, correct syntax of tags / rolls, sufficinet contrast for text, and spaces existent.

Links for PDFs

Similar pages on BoutonJones.com