For most internet users, the notion that websites track your browsing habits won’t come as much of a surprise. Most websites that offer a service, particularly news and shopping sites, install cookies onto your browser with every visit to help focus the content you want to see and remind you of what you searched for last time.
More advanced providers will even sell this information to the suppliers of products you view, meaning, for example, you could see an advert for an album in an entirely unrelated place on the web.
None of this is new, however fresh research has emerged from the University of Princeton which alleges that the practices might be far more invasive than most anticipated.
In the first edition of a series titled ‘No Boundaries’, three researchers from Princeton’s Centre for Information Technology Policy (CITP) analyse how nearly 400 of the world’s most-visited sites track keystrokes and send that information out to third-parties.
According to the researchers, these websites are guilty of running software that records mouse clicks and words typed. Additionally, the software is so advanced that if a customer fills out part of a form and then leaves the page without saving or submitting, their movements will still have been captured. Readers might recall that Facebook was criticised for recording users’ status updates (even if they were not posted) back in 2013, for which the firm faced significant backlash.
The offending software these websites are running are ‘session replay’ scripts. The software is typically used by companies online to analyse how customers use their sites and identify confusing webpages for improvement. These scripts however have been found to go beyond identifying general trends, capable of recording and playing back particular browsing sessions. Additionally, the research team found these scripts are operating on pages containing sensitive information, such as passwords or medical records.
“I’m just happy that users will be made aware of it,” he added.
The combination of confusing privacy policies and the level-of-detail which session replay scripts can grab have led the researchers to conclude that the information we type online cannot, “reasonably be expected to be kept anonymous.” Motherboard even notes that some services, such as FullStory design tracking scripts that can link between a user’s identity down to specific name and email address, as demonstrated below:
To discover how widespread these services are, Englehardt and co-researchers Gunes Acar and Arvind Narayanan examined seven of the most-popular session replay companies including FullStory, SessionCam, Clicktale, Smartlook, UserReplay, Hotjar, and Russia’s most popular search engine Yandex.
After creating test pages, the researchers then ran the software from six of the seven companies. Their findings indicated that at least one of the above services are apparent in 482 of the world’s 50,000 highest-traffic sites (according to their Alexa ranking).
While 482 firms were found, the team has added that this could be just the tip of the iceberg. For example, it is ‘likely’ that the software in place does not record every visit made to a site, meaning that more scripts could have been in-place, but non-operational at the time of testing.
All the sites using the script may be found here.
Since the Princeton researchers released their findings, retailer Bonobos and pharmacy Walgreens have said that they would stop using session replay scripts. “We take the protection of our customers’ data very seriously and are investigating the claims made in the study. As we look into the concerns that were raised, and out of an abundance of caution, we have stopped sharing data with FullStory,” a spokesperson from Walgreens told Motherboard.
Bonobos subsequently contacted Wired and claimed it had, “eliminated data sharing with FullStory in order to evaluate our protocols and operations with respect to their service. We are continually assessing and strengthening systems and processes in order to protect our customers’ data.”
Some replay script services do offer redaction tools that clients can opt in to, to exclude the recording of sensitive information. But despite these features and the fact that both these firms quickly agreed to stop running replay scripts, the apparent widespread use of the software has huge implications for privacy online.
“Collection of page content by third-party replay scripts may cause sensitive information such as medical conditions, credit card details, and other personal information displayed on a page to leak to the third-party as part of the recording,” the researchers write in their post.
Two of the tested firms, UserReplay and SessionCam, were found to block all user inputs by default (but still track where users clicked), which arguably is a much safer approach.
However, the team also note that not only what users’ type, but what is also on the screen itself, that could be sensitive. And in this regard, they found that, “none of the companies appear to provide automated redaction of displayed content by default; all displayed content ends up leaking.”
For example, upon testing Walgreens.com – which runs redacted scripts from FullStory – they found that information like medical conditions and prescriptions still are being captured by the session replay script, along with users’ real names. Added to this is the fact that sites such as Yandex, Hotjar and Smartlook lack basic protections such as HTTPS encryption.
Yandex has subsequently pledged itself to upgrade from HTTP to HTTPS, adding: “HTTP is used intentionally, as session recordings load websites using iframe. Unfortunately, loading http content from https websites is prohibited on the browser level so http player is required to support http websites for this feature.”
Meanwhile, SessionCam CEO Kevin Goodings wrote in a blog post: “I wouldn’t work at SessionCam if the company didn’t have an ethical and proactive stance to security. I spent too long writing about ethical issues in technology and the importance of privacy to take a wage from a company that doesn’t care about those things. The tone of this latest debate about session replay and privacy has been scary and hyperbolic.
“Many of the articles have not actually discussed why companies would use session replay and other analytics tools. Instead, the implication has been one of a terrifying surveillance economy looking at your every move. Journalists have also tended to ignore the intent of the researchers and the selection of session replay providers they chose to cover. It is notable that IBM Tealeaf, a significant player in the space, is not included.
“Session replay isn’t out to get you. Session replay is there to help websites — many of which you probably find annoying or hard to use — to provide you with a better experience.”
For those concerned about session replay scripts, ad-blocking tools such as AdBlock Plus provide protections against all the scripts referred to in the Princeton study. The extension was recently updated in light of the study itself.