By Yannick Van Avermaet
2017 has been a great year for HTTPS adoption. According to Scott Helme, in August of 2017 over 30% of the Alexa Top 1 Million sites were served over HTTPS.
The research also shows that security headers such as Strict-Transport-Security and Content-Security-Policy have gained traction as well, albeit very slowly.
This article won’t cover what HTTPS is, how it works, etcetera. If you want to learn more https://https.cio.gov/ would be a good place to start. Instead I would like to tell you how we, De Persgroep went from zero to hero(-ish)!
From zero to hero(-ish)
We started looking into HTTPS during October 2016. As with any new feature we asked ourselves: “Why do we need it?”
After a few weeks we concluded that these items were the most important:
- Security & Privacy
- Android and iOS “forcing” apps to communicate via HTTPS only
- HTTP/2
- SEO
- Being able to use certain browser features/API’s: ServiceWorker, GeoLocation
With a lot of heart and dedication we continued our research: “How can we accomplish this?”
Theoretically, it seemed easy: you obtain a certificate, install/configure it, change all HTTP references in your codebase to HTTPS and redirect all HTTP traffic to HTTPS. However, in practice there was much more work that needed to be done.
Let’s Encrypt
Nowadays obtaining a certificate is a piece of cake with Let’s Encrypt. It is a free, automated and open Certificate Authority (CA). They offer Domain Validation (DV) certificates, but not Organization Validation (OV) or Extended Validation (EV) because the latter 2 can not be automated.
This was a no-brainer. We decided to use Let’s Encrypt as our CA. They managed to reduce our maintenance cost by introducing automation and it’s free: double win!
Just to be clear: all 3 types (DV, OV and EV) offer the exact same encryption and protection for the end user. The only differences are the type of validation process (are you who you say you are), how browsers display the padlock and of course the price of the certificate.
The culprit: Mixed Content
Obtaining and installing the certificate turned out to be a walk in the park. Changing the HTTP references to HTTPS wasn’t rocket science either. What managed to delay our process significantly was mixed content in the form of advertising and “free html” in our articles.
Mixed content is unsecure content (javascript, css, images, …) that’s loaded on a secure webpage. This content is still vulnerable to sniffers and man-in-the-middle attacks. It could therefor be compromised and inject and execute malicious code in the clients browser.
Advertising
We had no problems with our ad server, Google DoubleClick for Publishers (DFP). All requests from and to DFP are encrypted since June 2015. The problem lies with the actual ad material: the image or javascript that fetches the image. This is often code that’s provided by the client or a creative partner. If it requests unsecure content, there is nothing we can do about it.
For this problem there is only 1 thing we could do: contact all clients and partners and ask them to only serve TLS compliant ads. And that’s exactly what we did. Our Ad Operations department contacted all clients and partners and asked them to support HTTPS. Luckily most of our partners already had the option available to serve over HTTPS and the delay was reduced to a minimum.
This took us a few months to achieve: Creating a list of clients and partners, contacting them, waiting on their changes and testing all of them. We decided to test on a small site first (https://www.trouw.nl) and see what the loss on revenue would be when we only request HTTPS compliant ads. It turned out to be negligible so we moved on to our other websites. On certain sites we even noticed a revenue increase. At this point, we still don’t know exactly why. We assume it’s because certain clients only wanted to be shown on secure websites.
Free HTML
In order to understand this one, you need to understand the concept of “Free HTML”.
Our editorial system offers the journalists certain components when writing an article: a title, an intro, a paragraph, a quote, a tweet, a video embed, … But every so often they want to add something to the article that’s not supported by the system. To do so, they may use the “free HTML” component (pictured below). As the name implies, it offers the journalist the possibility to add some HTML to the article.
He/she could add unsecure content to certain articles resulting in breaking the padlock, and browser API’s such as GeoLocation would fail.
The solution to this problem was two-folded: on the one hand we needed to raise awareness with the journalists and on the other hand we needed to migrate as many “old” free HTML as we seemed fit/logical.
The first part was simple: We informed them via e-mail/their bosses/… and added a warning when adding free HTML.
The second part was a bit more complex: our content team analysed all articles that contained free HTML components which in turn contained HTTP-references and the number of times these appeared in the past 6 months. We then tested the top 100 for HTTPS support and contacted everyone who didn’t support HTTPS. To my surprise a lot of them migrated because we asked them. Some of them were already in the process of migrating and some others just didn’t bother.
The team then implemented a simple function that finds these components and replaces the HTTP-references with their equivalent HTTPS-references. This function was then executed on our entire archive, updating millions of articles. At this day it’s still executed when a journalist wants to add free html.
Just like advertising this took a few months. Migrating to HTTPS isn’t something that’s done overnight, and we couldn’t expect that from an external party. Let alone multiple external parties. Slow but steady both they and we were able to migrate successfully.
Reporting
So, we’ve fixed a bunch of problems with mixed content, but journalists can still add free html and ads can still reference http-resources. Luckily for us, there are some tools available that inform us of mixed content issues on our site. It doesn’t work proactively, but at least we’ll be notified as soon as mixed content appears.
The one we use is Report URI (https://report-uri.io). The tool by itself is not enough though, we needed to implement the Content-Security-Policy response header on our website (start with Content-Security-Policy-Report-Only) and add the “report-to” and/or “report-uri” (will be deprecated in favor of the former) directives. We won’t go into any details as this is beyond the scope of this article. However, we will write a separate one in the near future just for this, as implementing CSP is a crucial step towards better security.
Looking back
It took us 5 to 8 months to migrate ~90% of our major news sites (23/25). And we still have a lot more smaller sites which we are gradually migrating too.
When I first read the article of The Guardian has moved to HTTPS back in November 2016 I couldn’t believe how long it took them to migrate to HTTPS. I remember thinking “it was a sure thing” and “maximum 2 months and all sites are on HTTPS”. Oh boy, how naive of me… After going through the entire process myself, I can only applaud them for being one of the first media companies to have successfully migrated to HTTPS and for sharing their learnings with the rest of us. It helped guide us on our track.
If you haven’t read their article, take the time to do so. It’s definitely worth it.
What’s next?
Our HTTPS migration is still ongoing. Each month we’re switching more of our sites to HTTPS. Besides, we’re also actively implementing various security headers (Strict-Transport-Security, Content-Security-Policy and others) and we are working on building the perfect secure cookie.