AI for Teachers, An Open Textbook: Edition 1

Cookies and fingerprinting

This page is still being processed. Please come back later!

Rewrite : small text files that a Web browser places on a user’s computer system for the purposes of tracking and recording that user’s activities on a Web site

"Cookies technology is not only embedded in the design of contemporary Web browsers, it is also used by major search engine companies to acquire information about users. In so far as these companies place cookies on users’ computer systems, without first getting their consent, they also seem to contribute to, and perhaps even exacerbate, at least one kind of technology-related bias—i.e., one that threatens values such as privacy and autonomy, while favoring values associated with surveillance and monitoring. However, since this kind of bias also applies to design issues affecting Web browsers, it is not peculiar to search engines per se." Tavani, H., Zimmer, M., Search Engines and Ethics, The Stanford Encyclopedia of Philosophy, Fall 2020 Edition), Edward N. Zalta (ed.)

"1994 The HTTP cookie is developed.
In its development stages, web users could fully restrict what data cookies could collect. However, data privacy measures were quickly removed, and users lost the power to control cookie data before the technology became widespread." Kant, T., Identity, Advertising, and Algorithmic Targeting: Or How (Not) to Target Your “Ideal User.” MIT Case Studies in Social and Ethical Responsibilities of Computing, 2021

"1996 Ad networks (platforms that serve as brokers between groups of publishers and groups of advertisers) increasingly emerge, including Doubleclick (now owned by Google)."..."1998 Open Profiling Standard (OPS) is bought and rolled out by Microsoft. OPS could securely store and manage individuals’ personal information and credit card details, allowing user profiles to be exchanged between vendors." ... "2008
Behavioral targeting begins to be integrated into real-time bidding, marking a crucial shift away from media content toward user behavior as key to targeting." Kant, T., Identity, Advertising, and Algorithmic Targeting: Or How (Not) to Target Your “Ideal User.” MIT Case Studies in Social and Ethical Responsibilities of Computing, 2021

Cookies and Beyond

"Of course, there are benefits to having services algorithmically rendered “more relevant”: cookies streamline site visits by storing user details, autofilling technologies can quickly complete registration forms, and filtering systems manage otherwise unmanageable amounts of content, all while the data needed for such user benefits is doubly harnessed to make platform profits. Despite (or indeed because of) its monetizable qualities, targeting creates a host of stark ethical problems in relation to identity articulation, collective privacy, data bias, raced and gendered discrimination and socioeconomic inequality." Kant, T., Identity, Advertising, and Algorithmic Targeting: Or How (Not) to Target Your “Ideal User.” MIT Case Studies in Social and Ethical Responsibilities of Computing, 2021

"2020 Apple bans third-party cookies and Google pledges to do so by 2022, prompting debates on the so-called “cookie apocalypse.”
Though welcomed by privacy-concerned users, third-party marketing companies such as Criteo experience a fall in share values and argue that the erasure of third-party cookies gives even more power to monopolistic first-party data trackers" Kant, T., Identity, Advertising, and Algorithmic Targeting: Or How (Not) to Target Your “Ideal User.” MIT Case Studies in Social and Ethical Responsibilities of Computing, 2021

"The HTTP cookie is “a way of storing information on the user’s computer about a transaction between a user and a server that can be retrieved at a later date by the server.” Cookie tracking works by storing this text file on a user’s computer and sending it to either third- or first-party cookie trackers, who then use this data to attribute characteristics to the user in the form of demographic profiling and other profiling mechanisms. It is important to note that cookies ultimately only capture information that is decipherable through abstracted correlation and “pattern recognition.” These abstract identifiers are then translated back into marketing demographic profiles by data brokers: computational referents of correlational and networked positionality are converted into “man,” “woman,” and so on by complex pre- and post-cookie data categorizations. It is the rendering of cookie data into “traditional social parameters” that makes cookie tracking so common and profitable."Kant, T., Identity, Advertising, and Algorithmic Targeting: Or How (Not) to Target Your “Ideal User.” MIT Case Studies in Social and Ethical Responsibilities of Computing, 2021

"Cookieless tracking refers to identifying and anticipating users through technologies alternative to the HTTP cookie. Common types of tracking have included Flash and canvas “fingerprinting,” which are seen as preferential to cookie tracking since fewer web users are aware of these technologies and they cannot be easily deleted. Third-party cookie aggregation is set to be banned by Google and other platforms by 2022. This is partially in response to privacy concerns: however, as the Electronic Frontier Foundation notes, Google is essentially replacing third-party cookie tracking with a new experimental tracking system that still works by “sorting their users into groups based on behavior, then sharing group labels with third-party trackers and advertisers around the web,” but in ways that users cannot necessarily know about or consent to.
" Kant, T., Identity, Advertising, and Algorithmic Targeting: Or How (Not) to Target Your “Ideal User.” MIT Case Studies in Social and Ethical Responsibilities of Computing, 2021

The easiest and most common method that Web developers use to passively collect user data is through cookies, which are relatively small files that store user-specific information such as preferences, account information, recent site activity, and the contents of a shopping cart. Your browser

Spencer, Stephan. Google Power Search: The Essential Guide to Finding Anything Online With Google (pp. 111-112). Koshkonong. Kindle Edition.

"More recently,
Libert studied third-party HTTP requests on the top 1 mil-
lion sites [31], providing view of tracking across the web. In
this study, Libert showed that Google can track users across
nearly 80% of sites through its various third-party domains.
Web tracking has expanded from simple HTTP cookies to
include more persistent tracking techniques. Soltani et al.
rst examined the use of
ash cookies to \respawn" or re-
instantiate HTTP cookies [53], and Ayenson et al. showed
how sites were using cache E-Tags and HTML5 localStor-
age for the same purpose [6]. These discoveries led to media
backlash [36, 30] and legal settlements [51, 10]. ............................
Device ngerprinting is a persistent tracking technique
which does not require a tracker to set any state in the user's browser. Instead, trackers attempt to identify users by a
combination of the device's properties." Englehardt, S., Narayanan, A., Online Tracking: A 1-million-site Measurement and Analysis, Extended version of paper at ACM CCS 2016, https://webtransparency.cs.princeton.edu/webcensus/

"the tool is less e ective for obscure trackers
(prominence < 0:1). In Section 6.6, we show that less prominent ngerprinting scripts are not blocked as frequently by
blocking tools. This makes sense given that the block list
is manually compiled and the developers are less likely to
have encountered obscure trackers." Englehardt, S., Narayanan, A., Online Tracking: A 1-million-site Measurement and Analysis, Extended version of paper at ACM CCS 2016, https://webtransparency.cs.princeton.edu/webcensus/

"Cookie syncing, a workaround to the Same-Origin Policy,
allows di erent trackers to share user identi ers with each
other. Besides being hard to detect, cookie syncing enables
back-end server-to-server data merges hidden from public
view, which makes it a privacy concern.............Most third parties are involved in cookie syncing.
.............
More interestingly, we find that the vast majority of top
third parties sync cookies with at least one other party: 45
of the top 50, 85 of the top 100, 157 of the top 200, and
460 of the top 1,000. This adds further evidence that cookie
syncing is an under-researched privacy concern." Englehardt, S., Narayanan, A., Online Tracking: A 1-million-site Measurement and Analysis, Extended version of paper at ACM CCS 2016, https://webtransparency.cs.princeton.edu/webcensus/

"Canvas Fingerprinting
Privacy threat. The HTML Canvas allows web appli-
cation to draw graphics in real time, with functions to sup-
port drawing shapes, arcs, and text to a custom canvas el-
ement. In 2012 Mowery and Schacham demonstrated how
the HTML Canvas could be used to ngerprint devices [37].
Di erences in font rendering, smoothing, anti-aliasing, as
well as other device features cause devices to draw the im-
age di erently. This allows the resulting pixels to be used
as part of a device ngerprint....Comparing our results with a 2014 study [1], we nd three
important trends. First, the most prominent trackers have
by-and-large stopped using it, suggesting that the public
backlash following that study was e ective. Second, the
overall number of domains employing it has increased con-
siderably, indicating that knowledge of the technique has
spread and that more obscure trackers are less concerned
about public perception. As the technique evolves, the im-
ages used have increased in variety and complexity, as we de-
tail in Figure 12 in the Appendix. Third, the use has shifted
from behavioral tracking to fraud detection, in line with the
ad industry's self-regulatory norm regarding acceptable uses
of ngerprinting.
6.2 Canvas Font Fingerprinting
Privacy threat. The browser's font list is very useful
for device ngerprinting [12]. The ability to recover the list
of fonts through Javascript or Flash is known, and existing
tools aim to protect the user against scripts that do that [41,
2]. But can fonts be enumerated using the Canvas interface?
"Englehardt, S., Narayanan, A., Online Tracking: A 1-million-site Measurement and Analysis, Extended version of paper at ACM CCS 2016, https://webtransparency.cs.princeton.edu/webcensus/

"WebRTC-based fingerprinting
Privacy threat. WebRTC is a framework for peer-to-
peer Real Time Communication in the browser, and acces-
sible via Javascript. To discover the best network path be-
tween peers, each peer collects all available candidate ad-
dresses, including addresses from the local network inter-
faces (such as ethernet or WiFi) and addresses from the
public side of the NAT and makes them available to the
web application without explicit permission from the user.
This has led to serious privacy concerns: users behind a
proxy or VPN can have their ISP's public IP address ex-
posed [59]. We focus on a slightly di erent privacy concern:
users behind a NAT can have their local IP address revealed,
which can be used as an identi er for tracking."Englehardt, S., Narayanan, A., Online Tracking: A 1-million-site Measurement and Analysis, Extended version of paper at ACM CCS 2016, https://webtransparency.cs.princeton.edu/webcensus/

"AudioContext Fingerprinting
The scale of our data gives us a new way to systemati-
cally identify new types of ngerprinting not previously re-
ported in the literature. The key insight is that ngerprint-
ing techniques typically aren't used in isolation but rather
in conjunction with each other. ..................;This is conceptually similar to canvas ngerprinting: audio
signals processed on di erent machines or browsers may have
slight di erences due to hardware or software di erences be-
tween the machines, while the same combination of machine
and browser will produce the same output."Englehardt, S., Narayanan, A., Online Tracking: A 1-million-site Measurement and Analysis, Extended version of paper at ACM CCS 2016, https://webtransparency.cs.princeton.edu/webcensus/

"Battery API Fingerprinting
As a second example of bootstrapping, we analyze the
Battery Status API, which allows a site to query the browser for the current battery level or charging status of a host
device. Olejnik et al. provide evidence that the Battery
API can be used for tracking [43]. The authors show how
the battery charge level and discharge time have a sucient
number of states and lifespan to be used as a short-term
identi er. These status readouts can help identify users who
take action to protect their privacy while already on a site.
For example, the readout may remain constant when a user
clears cookies, switches to private browsing mode, or opens
a new browser before re-visiting the site. We discovered two
ngerprinting scripts utilizing the API during our manual
analysis of other ngerprinting techniques.The second script, http://js.ad-score.com/
score.min.js, queries all properties of the BatteryManager
interface, retrieving the current charging status, the charge
level, and the time remaining to discharge or recharge. As
with the previous script, these features are combined with
other identifying features used to ngerprint a device."Englehardt, S., Narayanan, A., Online Tracking: A 1-million-site Measurement and Analysis, Extended version of paper at ACM CCS 2016, https://webtransparency.cs.princeton.edu/webcensus/


Goliath : Cookies weren’t intended to be surveillance devices; rather,
they were designed to make surfing the web easier. Websites
don’t inherently remember you from visit to visit or even from
click to click. Cookies provide the solution to this problem. Each
cookie contains a unique number that allows the site to identify
you. So now when you click around on an Internet merchant’s
site, you keep telling it, “I’m customer #608431.” This allows the
site to find your account, keep your shopping cart attached to you,
remember you the next time you visit, and so on.
Companies quickly realized that they could set their own
cookies on pages belonging to other sites—with their permission
and by paying for the privilege—and the third-party cookie was
born. Enterprises like DoubleClick (purchased by Google in 2007)
started tracking web users across many different sites. This is
when ads started following you around the web. "

Goliath : "Today, Internet surveillance is far more insistent than cookies.
In fact, there’s a minor arms race going on. Your browser—yes,
even Google Chrome—has extensive controls to block or delete
cookies, and many people enable those features. DoNotTrackMe is
one of the most popular browser plug-ins. The Internet
surveillance industry has responded with “flash cookies”—
basically, cookie-like files that are stored with Adobe’s Flash player
and remain when browsers delete their cookies. To block those,
you can install FlashBlock.
But there are other ways to uniquely track you, with esoteric
names like evercookies, canvas fingerprinting, and cookie
synching. It’s not just marketers; in 2014, researchers found that
the White House website used evercookies, in violation of its own
privacy policy. I’ll give some advice about blocking web
surveillance in Chapter 15.
Cookies are inherently anonymous, but companies are
increasingly able to correlate them with other information that
positively identifies us. "

This page is referenced by:

This page references: