Referrer leaks in self-hosted web apps
Referrer headers are a browser mechanism that websites use to track where their visitors come from. When you follow a link from one site too another, your browser will often tell the new site which URL you were previously looking at. The same thing happens when one site contains images, CSS stylesheets or fonts loaded from an external domain.
Sensitive information can be leaked via the Referrer header, and the leaks are subtle and unexpected because the information is sent invisibly and without the user’s consent. In the past, high-profile sites including Facebook, Dropbox, Google and HealthCare.gov have inadvertently leaked information via the Referrer header. I did some testing with a handful of self-hosted web apps to see if they also contained referrer leaks.
- Open the browser’s network inspector to begin recording HTTP requests.
- Navigate to a self-hosted web app and use it normally.
- Check for any requests to third-party sites and make note of URLs contained within their referrer headers.
- If the app doesn’t make any third-party requests on its own, try inserting and clicking on external links. Insert and view externally-hosted images/videos where possible. Referrer leaks are caused by linking or embedding external content, so the goal is to jam it into any place where it gets displayed back to the user.
Out of the 24 apps tested, 21 of them could be made to send at least some information to third-party sites via Referrer headers.
|App||Leaks referrers||Notes||Full notes|
|Cozy Contacts||Yes||Only leaks app URL.||*|
|Cozy Emails||No||Emails are contained within a sourceless iframe.||*|
|Diaspora||Yes||Referrers can contain profile IDs, post IDs, tag names.||*|
|EtherCalc||Yes||Sheet name/ID is revealed. Knowing the sheet ID allows editing.||*|
|Etherpad||Yes||Pad name/ID is revealed. Knowing the pad ID allows editing.||*|
|Feedbin||Yes||Also sends referrers to Instagram, Stripe, SubToMe and Twitter during setup.||*|
|GitLab||Yes||Referrers can contain usernames, group names, repo names, branches, filenames, wiki pages, commit IDs. Also sends referrers to Gravatar by default.||*|
|Gogs||Yes||Referrers can contain usernames, repo names, branches, filenames, commit IDs. Also sends referrers to Bootstrap CDN, jQuery, Gravatar.||*|
|IPython Notebook||Yes||Referrer contains notebook filename and folder names, which also get sent to MathJax.||*|
|Mailpile||Yes||Referrer contains thread ID.||*|
|MediaGoblin||Yes||Referrers can contain usernames, media titles, collection titles.||*|
|ownCloud Contacts||Yes||Only leaks app URL.||*|
|Roundcube||Yes||Nothing leaked unless user clicks “display images”. Referrer contains mailbox name, internal message ID.||*|
|selfoss||Yes||Only leaks app URL.||*|
|Shaarli||Yes||Referrers can contain tag names. Optionally uses an online redirection serice to mask referrers.||*|
|ShareLaTeX||Yes||Referrer contains project ID. Also sends referrers to Bootstrap CDN, Google Fonts.||*|
|Shout||Yes||Also sends referrers to Google Fonts.||*|
|SquirrelMail||Yes||Referrer contains mailbox name, message ID.||*|
|Tiny Tiny RSS||Yes||Uses
|wallabag||Yes||Referrers can contain link ID, tag ID, search terms.||*|
|YaCy||Yes||Referrer contains search terms, page number. Proxies images on the results page.||*|
Referrer leaks are widespread and greatly vary in severity. Some of the ones shown here are minor and require jumping through hoops to trigger them (Cozy Contacts) while others result in potentially sensitive information being sent to third-parties automatically (GitLab, Gogs).
Leaking user data is obviously a problem, but even if the referrer URL doesn’t contain personal information it can still be used to track users. If you visit a page and you browser reveals that you arrived via the Google search page, that’s not a big issue. Lots of people use Google so it’s not enough information to identify a single person. Arriving from a smaller site narrows down the pool of people, with the extreme case being self-hosted web apps where there is possibly only a single person using the site.
The referrer mechanism has existed in browsers for ages. It is reasonably well understood among web developers and finding leaks in the wild is neither particularly interesting nor technically challenging. I can see why it’s not a high-priority issue, though it would be great if browsers defaulted to protecting their users by not sending referrers at all.
Mozilla has plans to send shorter referrers by default. This is a good first step and would eliminate some of the worst leaks. However, for self-hosted web apps, revealing the domain is enough to track users so shortened referrers alone wouldn’t fully protect them.
Patching the leaks
I think it’s still worthwhile fixing the leaky apps where possible, and recently that become a lot easier. Limiting referrer headers previously involved ugly hacks such as redirect pages, image proxying and iframe trickery. Now there are saner methods as described in the upcoming Referrer Policy spec. Here’s a test suite to find out which mechanisms work in browsers today.
So far I’ve had patches accepted into EtherCalc, Mailpile, Gogs, MediaGoblin and Shout. There’s so much left to do! If you use self-hosted web apps then I would encourage you to check for leaks. The patches are often single-line code changes and are an easy way to begin contributing to a project.
If all else fails, Firefox and Chrome both have internal settings to disable referrers across all sites, all the time. In Firefox it’s named
network.http.sendRefererHeader (set it to 0) and in Chrome it’s the
--no-referrers command-line flag. Revisit the test suite after making those changes and you should see all green.