What Hidden Data Is Lurking in Your PDFs? A Guide to PDF Metadata and Privacy
When you share a PDF, you probably think you are sharing just the visible content — the text, images, and formatting you can see on the page. But PDFs carry a surprising amount of hidden information that you might not want the recipient to see.
This hidden data, called metadata, can reveal your name, your organisation, the software you used, your operating system, the full edit history of the document, and sometimes even text you thought you deleted.
What Metadata Does a PDF Contain?
Every PDF file contains a metadata section that stores information about the document. Here is what is typically included:
| Metadata Field | What It Reveals |
|---|---|
| Author | The name of the person or account that created the document |
| Creator Application | The software used (e.g., Microsoft Word, Adobe InDesign, Google Docs) |
| Producer | The PDF library or converter used |
| Creation Date | When the document was first created |
| Modification Date | When it was last changed |
| Title & Subject | Document title and subject fields (often auto-filled from the filename) |
| Keywords | Any keywords added to the document properties |
| Operating System | Revealed through the creator/producer fields |
This is just the standard metadata. PDF files can also contain much more.
The Hidden Data You Might Not Know About
Embedded Image GPS Data
If your PDF contains photos taken on a phone or digital camera, those images may carry EXIF data — including GPS coordinates showing exactly where the photo was taken. When the image is embedded in a PDF, this location data often comes along with it.
Incremental Save History
PDFs use a format called incremental saves. When you edit a PDF, the changes are appended to the file rather than replacing the original content. This means previous versions of the document, including text that was edited or "deleted," may still be recoverable from the file data.
Redaction Failures
One of the most common metadata problems involves redaction. Many people "redact" sensitive text by placing a black rectangle over it. But the text underneath is often still present in the PDF data. Anyone with a basic PDF editor can remove the rectangle and read the original text.
Embedded File Paths
PDFs sometimes contain references to files on the creator's computer — full file system paths that can reveal usernames, folder structures, and internal network details.
JavaScript and Actions
PDFs can contain embedded JavaScript code and automated actions. While often used for legitimate form functionality, these can also be used to track when and where a document is opened.
Real-World Consequences
PDF metadata leaks have caused real embarrassment and security incidents:
- Government documents have exposed internal revision history and classified editing notes when metadata was not removed before public release.
- Law firms have accidentally shared negotiation strategies and client information through PDF metadata in court filings.
- Research has shown that a significant percentage of publicly available PDFs from government and security agencies leak operating system and software version information through metadata.
- Businesses have revealed employee names, internal processes, and competitive information through document metadata shared with partners and clients.
Before sharing any PDF externally, treat metadata removal as a required step — not an optional one. Assume every PDF contains data you did not intend to share.
How to Check What Metadata Your PDFs Contain
You can inspect PDF metadata using several methods:
- Adobe Acrobat Reader: Go to File → Properties to see basic metadata fields.
- Mac Preview: Open the PDF, go to Tools → Show Inspector, and click the info tab.
- Right-click properties: On Windows, right-click the PDF file, select Properties, then the Details tab.
- Online PDF analysers: Various tools can extract full metadata — but be cautious about uploading sensitive documents to these services (see our article on PDF tool safety).
How to Remove Metadata From Your PDFs
There are several approaches to cleaning metadata from PDFs before sharing them:
Method 1: Re-process Through a Browser-Based Tool
When you run a PDF through a compression or conversion tool, much of the original metadata is stripped in the process. Compressing a PDF with PDFico rebuilds the file structure, which removes most embedded metadata without uploading your document to any server.
Method 2: Convert to Image and Back
For maximum metadata removal, convert your PDF to images using PDFico's PDF to Image tool, then convert those images back to a PDF using Image to PDF. This completely rebuilds the document from scratch, eliminating all hidden data. The trade-off is that text in the resulting PDF will not be selectable.
Method 3: Use Proper Redaction Tools
If you need to redact text, never just cover it with a black box. Use a proper redaction tool that actually removes the underlying text data, not just covers it visually.
Method 4: Add a Watermark for Ownership
While not metadata removal, adding a visible watermark using PDFico's Watermark tool establishes ownership and can deter unauthorized sharing of your documents.
A Simple Metadata Hygiene Routine
Before sharing any PDF externally, follow this quick process:
- Check the properties — look at what metadata the file contains.
- Compress the file — running it through PDFico's compressor strips most metadata and reduces file size.
- Add protection if needed — password-protect the PDF for confidential documents.
- Verify before sending — check the properties one more time to confirm the metadata has been removed.
All of these steps can be done in your browser using PDFico's tools, without uploading your document to any server.
Clean Your PDFs With PDFico — Free →The Bottom Line
PDFs are not as simple as they appear. Behind the visible pages, there is a layer of metadata that can reveal personal information, editing history, and even deleted content. For anyone sharing documents externally — whether for business, legal, medical, or personal reasons — cleaning this hidden data should be standard practice.
The good news is that it takes less than a minute when you use the right tools. And when those tools run entirely in your browser, you do not have to worry about creating new privacy risks in the process of solving old ones.