Picking a privacy-first analytics tool

Picking a privacy-first analytics tool

After spending some time trying Cloudflare Web Analytics, then reviewing all the other privacy-first options, we ended up using Plausible Analytics. This post explains the reasons behind that decision.

Our goal at Console is to track as little as possible, only collecting what will help us improve our product - a free weekly email digest of the best tools and beta releases for developers, and our website reviewing developer tools. To do that, we want to understand the following metrics:

  • How many visitors have been on our website.
  • Where those visitors are referred from.
  • Which pages they visit.
  • What browser are they using, and whether it is mobile or desktop.
  • How is that trending over time.

Using these simple metrics, we can analyze how many people sign up to our newsletter, which pages are the most popular (so we can invest more in those types of page), whether our marketing is working (paid vs organic), and how we're doing over time. Knowing browser and device type helps us ensure we test on the right platforms.

We don't need anything more detailed, and certainly don't need or want to track visitors on an individual level. We especially want to avoid services which conduct mass tracking across the internet e.g. Google Analytics, Facebook Pixel, etc because we don't want to participate in building profiles of web browsing activity.

After spending some time trying Cloudflare Web Analytics, then reviewing all the other privacy-first options, we decided that Plausible Analytics is the best privacy-first analytics tool. This post explains the reasons behind that decision.

Update (2021-08): We have been using Plausible Analytics for 7+ months and are very happy with it. This post has been updated to include a more detailed review and update the state of other tools.

Cloudflare Web Analytics

Console is an email newsletter for developers. When I originally wrote this post in 2021-02, the website was a simple static page that allowed visitors to subscribe. That's it. The site is hosted on Cloudflare Workers because it is the best way to build simple logic and deploy to a fast, globally distributed network.

Already using Cloudflare meant it made sense to try their analytics products first. Like us they care about privacy and evaluating their tools would avoid increasing system complexity by adding new vendors. Our tech stack is very simple.

We first tried Cloudflare Web Traffic Analytics which is part of the Pro plan. All of our content is already served by Cloudflare so running analytics on those existing logs means no additional telemetry or beacons in the website.

Confusingly, this product is  different from Cloudflare Web Analytics, which use a JS beacon embedded into the web page to measure "real" users i.e. where the JS can be executed. I enabled both so that we could compare the stats.

Web Traffic Analytics showed x10 more traffic than Web Analytics was reporting. I had seen a review last year (admittedly from a competitor) which had the same problem, but wanted to see it for myself. Unfortunately, this meant that Web Traffic Analytics were useless for us. I disabled it, falling back to just Web Analytics. Even if we had to use a beacon to ensure the traffic being measured was "real", at least it would be kept inside Cloudflare. Indeed, their announcement blog post was promising:

Being privacy-first means we don’t track individual users for the purposes of serving analytics. We don’t use any client-side state (like cookies or localStorage) for analytics purposes. Cloudflare also doesn’t track users over time via their IP address, User Agent string, or any other immutable attributes for the purposes of displaying analytics — we consider “fingerprinting” even more intrusive than cookies, because users have no way to opt out.

However, after using it for several weeks, it turned out the data retention period for the online dashboard was only 1 week (not documented anywhere). The data seemed accurate, and allowed us to see the metrics we were after, but we couldn't get any trend data. This is important to understand how we are doing overall and compare sources of traffic over time. I asked Cloudflare Support if there was any way to get longer retention and they said to query the GraphQL API. At this stage I don't really want to building a custom analytics dashboard.

Disappointed, I decided to go on the hunt for privacy first alternative to Cloudflare Analytics.

Requirements

Now I had to evaluate several products, I needed to think about my requirements:

  1. Privacy. Google Analytics is not an option because it is part of Google's mass data mining efforts and tracks you all over the internet. We only want to track high level metrics and trends, not individuals. This means making design decisions to preserve privacy, even where that might harm data accuracy. We want "good enough" data, not "perfect". Data should be minimized and anonymized.
  2. SaaS with an open source option. Console deliberately has no servers so there is nowhere to run any software. Dealing with data storage, traffic spikes and being on call is something I want to avoid until absolutely necessary, so the first choice is a SaaS product. However, this may change in the future so there should be a way to run the product on our infrastructure. Ideally this means the product is open source. Bonus if we can export the SaaS data for import into the self-hosted version.
  3. Data ownership. Paying for a product means we should own all the data that is generated in case we want to export and migrate it on-premise in the future.

With these in mind, I found several options:

Fathom

Fathom is privacy-first and explains how it implements that as well as providing a technical walkthrough of their cookie-less visitor tracking. Unfortunately it is SaaS-only with no open source option, which ruled it out.

Koko Analytics

I use the open source Koko Analytics plugin on my personal WordPress blog, but it is WordPress specific so not an option for our website.

Matomo

Formerly Piwik and no longer related to PiwikPRO, Matomo is an open source alternative to Google Analytics. This means its goal is to offer much of the same functionality, from user behavior tracking and heatmaps down to A/B testing and funnels.

Matomo has a SaaS version, can be installed on-premise, and provides ownership of the data. However, it is not privacy-first. It has privacy options, but these are not the defaults.

Simple Analytics

Simple Analytics looks nice, but is SaaS-only and not open source. The T&C don't mention anything about who owns the data, so I assume Simple Analytics do.

Plausible

Plausible is a privacy-first analytics product implemented by taking a strict approach to privacy principles: aggregate only, no cross-device tracking, daily rollups. They explain the data collected with reasoning for each item, and how the unique-visitor tracking works without cookies:

Plausible attempts to strike a reasonable balance between de-duplicating pageviews and staying respectful of visitor privacy. We do not attempt to generate a device-persistent identifier because they are considered personal data under GDPR. Instead, we generate a daily changing identifier using the visitor’s IP address and User Agent. To anonymize these datapoints, we run them through a hash function with a rotating salt. This generates a random string of letters and numbers that is used to calculate unique visitor numbers for the day. Old salts are deleted to avoid the possibility of linking visitor information from one day to the next. Forgetting used salts also removes the possibility of the original IP addresses being revealed in a brute-force attack.

Available as a SaaS product but with data owned by the customer, the product is also open-source and available self-hosted. The only thing missing is import/export, but it is planned for the future.

Privacy-first is important because it informs the development philosophy. We want to adhere to the data protection principle of data minimization. This is why we chose Plausible over Matomo, even though both fit our requirements.

Update (2021-08): 7 months on, we continue to use Plausible for our website and blog. This has helped us see traffic spikes, such as when one of our developer interviews hit the front page of Hacker News, through to tracking the popularity of our developer tool reviews.

We also run paid marketing campaigns and being able to track clicks has helped us optimize our spending. This is achieved through the goal conversion tracking for events on the page - newsletter subscribers (so we can see how people find out about the newsletter), and onbound link clicks (so we can see which tools are most popular).

This privacy-friendly approach means that browsers like Firefox and Safari do not block Plausible by default, but we are starting to see some of the blocklists include the Plausible tracking beacon in their lists. One of the big ones is AdGuard, which is imported by tools like Pi Hole and Adblock Plus that give their users the option to block everything.

Even with Plausible being blocked by some clients, we get stats that are sufficiently representative that it doesn't detract from their accuracy. We don't need 100% precise tracking - we just want to know things like which pages are most popular and where visitors are coming from.

Knowing which key metrics are most important means that we are not affected as privacy becomes a real choice for web users. We've seen this with the changes Apple is introducing in iOS 15 for emails, which we think is a good think for small publishers:

The Console weekly developer newsletter has never tracked opens. We don't find them useful because our goal is to deliver interesting tools each week. If someone is interested in the tool then they'll click. If not, they won't. We track these clicks to help us manage the list and to give us an indication of whether our selection criteria are working. Click rate is the way to measure engagement because it aligns the incentives of the reader (they want to check out something interesting) with the sender (provide interesting content). You get privacy as part of that.

Plausible provides everything we need from a web analytics tool without sacrificing user privacy.

Conclusions

Plausible and Matomo tick all the boxes from our requirements. Both have SaaS versions we can pay for, collect the data we want, have privacy functionality, and are open source if we ever want to self-host. However, Plausible wins because of how minimalist it is.

Google Analytics used to be the only option for quick and easy analytics. It is still the industry standard and if you can figure out the UI, you can learn a lot about site visitors. But do you really need to know that much detail?

Over the last few years, the true cost of "free" has become clearer - privacy. A few massive tech companies tracking every click across the web is probably not a good idea because of the Panopticon Effect.

I'm pleased to see several options available to site owners. "Privacy-first" is a real principle that can be implemented in code. Small, independent businesses can thrive with a SaaS version of an open source product. SaaS is great when you don't want to manage the infrastructure of data storage and traffic spikes, but there is an exit route if necessary.

We will keep an eye on Cloudflare Web Analytics because the product does what we want all within our existing infrastructure, but today Plausible is the best choice for privacy-first analytics.

Discover the best tools for developers

Console Newsletter - A free weekly email digest of the best tools and beta releases for developers. Every Thursday. See the latest email.