Document Tracking System
March 2025
I started this project solely from curiosity and an intriguing hypothesis: that documents may not always be accessed by their intended recipients. The objective was to establish a mechanism to track when a PDF file is opened by embedding a tracking feature that sends a request to a hosted server. This data is then logged in the backend, enabling users to monitor access to the document and verify that the intended party has received it.
Infrastructure + Technology Stack
Backend: Node.js (Express).
Database: Not required for MVP (logs stored in Railway).
Hosting: Railway (deployed successfully).
Tracking Mechanism: Invisible image (1x1 pixel) and alternative approaches.
Steps Taken + Attempted Approach
To address these challenges, this paper recommends the development of interactive tools tailored to the unique needs of international students.
Setting Up the Backend
Created an Express server to listen for tracking requests.
Configured Railway.app for deployment, ensuring port binding was set up correctly (from 3000 to 8080):
Tested API endpoint manually using a browser to confirm it logged access requests.
Verified logs using railway logs.
Embedding Tracking Code in the PDF
Approach 01: Adding a background image in a table and hiding it to force an image request.
Issue: Wouldn't record nor register the hidden table content in many PDF viewers.
Conclusion: Failed attempt.
Approach 02: Added a 1x1 pixel transparent image with the tracking URL that recorded and logged access requests.
Issue: Some access requests were logged, but most PDF viewers and browsers wouldn't preload the image, making it unreliable.
Conclusion: Partially successful.
A screenshot of access request logs captured between 06 - 11 March 2025 through Railway.
Approach 03: Created an HTML file with an iframe to load the tracking URL, then converted the HTML file into a PDF.
Issue: Most PDF readers and browsers blocked the iframe from loading.
Conclusion: Failed attempt.
Findings + Next Steps
Findings
Web-based PDF viewers like Chrome and Edge load external images instantly, which was advantageous in my case.
Native PDF readers (Adobe, Preview on macOS) block tracking images by default.
Automatic tracking of document interaction isn't reliable at the moment.
Next Steps
Explore embedding JavaScript inside an interactive PDF (works in Adobe Acrobat).
Consider alternative document formats that allow tracking more reliably (e.g., hosted HTML pages instead of PDFs).
Investigate third-party solutions that integrate better with PDF tracking.
Add randomised ID numbers to track specific documents, providing accurate information on how many times it was accessed.
Potential Use Cases
Business & Marketing
Track when a client opens a business proposal or sales pitch PDF. Follow up at the right time based on the engagement.
Log when a client or partner views a contract before signing. Useful for B2B transactions to track engagement in real-time.
Academic & Research
Researchers can track when their paper or dissertation is accessed to measure outreach without relying on third-party services.
Security & Compliance
Track access to confidential reports, financial documents, or NDAs when auditing document distribution in an organisation.
Ensure recipients open official notices or legal warnings in regulatory sectors.
Publishing & Media
Track when journalists or media professionals open press kits for PR teams to time their follow-up engagements effectively.
Authors can track engagement on preview chapters or self-published e-books to gauge reader interest before a book launch.
Closing Thoughts
This project successfully deployed a tracking backend system, but embedding an automatic tracker inside a PDF remains challenging due to restrictions in many PDF viewers and browsers. I will ensure that future work focuses on overcoming these limitations using JavaScript-enabled PDFs or other alternative formats.