Does Firefox Automatically perform OCR on PDF Documents?
My bank delivers monthly statements as rasterized copies of their paper statements. They are clearly pixelated and not text. However, when I open one of these PDFs in Firefox I am able to select the rasterized text, as you can see from the attached screenshot clip.
How is this possible?
Wšykne wótegrona (9)
I assume that your bank actually sends real PDF files. If you use Print then in some cases Firefox converts the page to an image.
I would have assumed the same thing except that Sumatra won't t allow me to highlight and copy text and Acrobat will select it but won't copy it. Firefox allows both.
Also I've never seen a pixelated PDF that still contains text. Will wonders never cease?!
Wót Helmanfrow
If the PDF consists purely of a series of full-page images, unfortunately, Firefox's PDF viewer doesn't have the ability to OCR it.
I suspect your bank applied "security" to the PDF to prevent certain actions, such as copying, editing, and/or printing. (https://helpx.adobe.com/acrobat/how-to/password-protect-pdf.html)
Firefox's PDF viewer is based on the pdf.js JavaScript library, which ignores these "security" restrictions by default. It is a bit of an annoyance to people who create the PDFs, but Mozilla doesn't seem inclined to enforce the restrictions in Firefox.
jscher2000 - Support Volunteer said
I suspect your bank applied "security" to the PDF to prevent certain actions, such as copying, editing, and/or printing. (https://helpx.adobe.com/acrobat/how-to/password-protect-pdf.html)
Yes, I did a little more digging and that's apparently what it is. The document is protected from editing and apparently this can sometimes present text as pixelated images.
Wót Helmanfrow
jscher2000 - Support Volunteer said
I suspect your bank applied "security" to the PDF to prevent certain actions, such as copying, editing, and/or printing. (https://helpx.adobe.com/acrobat/how-to/password-protect-pdf.html)
Yes, the document is password-protected so that's probably it.
By the way, when you select text in Firefox's PDF viewer, you are selecting a transparent layer of text positioned in front of the page image.
It's funny that "security" can be partially bypassed by simply ignoring it in code.
Helmanfrow said
It's funny that "security" can be partially bypassed by simply ignoring it in code.
Once upon a time, basing "security" on the honor system actually worked, I guess.