First They Came For The Data
1.2
Welcome back or welcome to arachne đ¸ď¸! This week weâve got some ramblings on data privacy. Always something to talk about there. Plus some shopping suggestions and why you needed to create a password for this newsletter. Thanks for reading.
đ Feature
I cannot quantify for you how much information there is. I mean this in both literal and metaphorical senses.
I cannot tell you how much data there is on this planet. Like the actual amount of gigabytes of encoded information, there is no useful way to put a number on this. New data is born faster than people are. One source I found suggests it could be in the 120 zettabyte rangeâŚor 120 trillion gigabytes. Your phone probably has, say, 128 GB in it. Thatâs 937.5 billion phones.
I cannot even come up with a meaningful analogy. The amount of gigabytes of information on the planet is somewhere in the realm of the number of red blood cells in six adult human beings. But somehow that doesnât even seem like a lot. Until you remember that there are around 5 million red blood cells in one milliliter blood.
A plurality of that information is unlikely to be available to you at this moment. It sits behind paywalls, it alights the pixels in military machines, it amasses in troves of corporate emails. Another plurality of that information is data you will likely never come in contact with. The depths of YouTube, an awful Netflix show, years and years of audio books that donât interest you. And then of course thereâs the information you will come in contact with, like a Netflix show you like. Youâll stream it and that data will travel along miles of wires to your Roku and turn into colors on your TV.
But then thereâs this other pile. Whatâs in it? Well, here are some examples:
email addresses, phone numbers, names, social security numbers, routing numbers, home addresses, health insurance ID numbers, mothersâ maiden names, credit card numbers, dates of birth, zip codes, shoe sizes
Mountains and mountains of data. All of it willingly offered at checkout, in Google forms, on login pages.
For around 15 years, the internet has been funded by these sorts of identity markers and the online behaviors of individual users that coincide with them. I mean this literally; the internet as we know it today would not exist without the collection of your data, the synthesis of that data, and the recommendation of advertising that comes with it. Entire sites like Facebook, Pinterest, YouTube, Google, Buzzfeed, and on and on and on are advertising companies fronted by other operations.
For a moment, letâs follow the journey of an individual mobile internet user. For the sake of this hypothetical, we are going to assume that they have never used the internet before. They are creating an account for Boogle, a service that provides an internet browser, the worldâs most powerful search engine, document collaboration software, email, cloud storage, video entertainment, calendar and task management, video conferencing, a map app, a weather app, and photo library management. Damn, that feels like a lot, maybe someone should break up Boogle or its competitor Tapple. The user opens up the Boogle internet browser and make a Boogle account:
Name: Testy McTesterson
Email: testy@bmail.com
So theyâve got an account. They see an offer for new users! They can get 1 TB of Boogle Drive storage for a year for free! They click on the offer. The click gets stored and tracked. They set up cloud storage, adding a credit card, and link their photo library to Boogle Photos. Boogle Photos automatically begins looking at Testyâs photos identifying pets, people, locations, and text. A screenshot in their photos reminds Testy to lookup lawn chairs in a new tab. Their search gets stored and tracked.
Their browser asks for Testyâs location. This will allow automatic news and weather generation. Well, thatâs convenient, Testy thinks. They hit allow.
So letâs audit some information Testy has given Boogle so far:
Name, Email, computer IP address, interest in cloud storage, purchase of cloud storage, credit card information, 1057 images, search for lawn chairs, exact location.
Testy offers up all of this information, freely and without reservation, and Boogle provides a host of useful technologies that will help Testy be more â¨productiveâ¨.
In the coming weeks, Testy will leave a trail of cookies. Every move they make on the Boogle browser, every email that they send, receive, or open, every video they watch on BooTube, will be collected, stored, analyzed, and sold. Sold to local businesses advertising in Testyâs town, sold to lawn chair companies, sold to Proctor and Gamble, Unilever, Pizza Hut, etc.
In the coming months and years, billions of people every single day will feed Boogle more and more information about interests, political affiliations, gender expression, exact locations, credit card chargesâŚ
There are two primary antagonists that make this situation bad.
Corporate power. There are many hypothetical reasons why one single corporate entity knowing an enormous amount about you seems bad, but there are real world ones as well. The data sourced by individual companies has lead to the disruption of democratic elections around the globe (Facebook), the de-competition of industries from paintball guns to paperbacks (Amazon), and class action lawsuits against the unlawful collection of biometric data (Google).
Independent bad actors. âHackers,â scammers, fraudsters, and bottom feeders of that ilk. Imagine, for example, you spread out your most valuable stuff between three different homes hundreds of miles apart. If someone wanted to steal it all theyâd have to take on a massive coordinated attack, one that addresses the unique locks and security systems on each home. Now imagine all of your stuff is under one roof in one house. Weâll call that house Target âď¸. Bad actors only have to crack that one place for all of your credit card info, your home address, your name, email, phone number, etc.
I want to note here how hackers typically work. There is not, in all likelihood, some random person in a hoodie specifically seeking you out, tracking your keystrokes, following you around the internet, and trying to make purchases on your credit card you wonât notice. Your data sits in massive troves with other peoplesâ information. Despite it being safe guarded with high budgets and encryption, it is not invulnerable. Some attacks in the past have been the result of data breaches that give bad actors millions of personal data points. For example, hackers have used enormous data sets on user passwords not to necessarily find your password, but to find the most commonly used passwords. They can run programs that try the highest likelihood passwords on millions of emails and only gain access in a small percentage of them, but that small percentage can yield a lot in ransomware attacks and other bullshit.
But why should you care? Isnât this just the cost of entry to doing anything on the internet? It has become the cost of entry, but it certainly does not have to be.
One reason I am sounding an alarm here about offering up so much of your information to these companies is the reactionary, fascistic behavior of the Republican Party since 2020. Despite insisting they love small government, they have used incredibly powerful tools of the state to create open hostility and violence against queer people, people of color, and women.
Since the repeal of Roe v. Wade, multiple states have signed into law policies essentially eliminating access to abortion. These laws incriminate people seeking out an abortion and doctors who perform them. In some cases, they provide benefits to people who snitch on those people. In a world in which the state and citizens are emboldened to report the private behavior of patients and doctors, the internet becomes a fertile ground for criminalization. Corporate institutions, despite how much they insist that they will not bend the knee to the state, could be forced to give up data. And since our country does not have robust data privacy laws on the books, the nature of the search and seizure of that information could get murky. Independent bad actors could go hunting for data markers of people seeking abortions and their doctors: gender, age, income, and so on. Some apps help users track menstrual cycles, wellbeing, or diet without guarantees of end-to-end encryption. In a world that fast tracked the Patriot Act, it doesnât actually seem that far fetched to future-scope the invasive possibilities of a theo-fascist American government.
As drag and non-conforming gender expression continues to be criminalized and antagonized, online behavior will come under scrutiny. As you read this, Texas is attempting to put into effect a bill for âchild safetyâ online. It specifically refers to âgroomingâ as a prohibited behavior that companies like Facebook or Google would need to eradicate from their platforms. As you might know, Republicans have long used the language of grooming as a dog whistle for homophobia and transphobia. Under laws like this, states can not only decree the terms of what is âallowedâ on these platforms, but also open the doors for targeted harassment, threats, and violence against queer and gender non-conforming people. In environments like Twitter where private or even deleted data can seem to find its way in front of an audience, this is incredibly dangerous.
There is always money in hate. The consolidation of your information under these pliable corporations can, and has, been used to stand up tyrants, increase the wealth gap, and enact state, corporate, and individual violence against the oppressed.
I worry that I am not making my point adequately. Iâll just say that. I feel like I am both under and over exaggerated the problem. Dabbling in real world examples and extending them to hypotheticals can often be seen as hyperbolic catastrophizing. But I am reminded of the long history of tools fascists will use to accumulate power and single out groups. First they will come for trans and gender non conforming peopleâs data, and you might not speak out because you donât have that data. But then they will come for your data, and there will be no one left to speak for you.
These institutions have a financial incentive to own your existence. If your existence becomes criminalized, they will have a financial incentive to criminalize you too.
What can you do about it? A couple things.
Keep your data incredibly close to your chest. For me, that means going into my Google settings to ensure that my browser, email, and maps data is wiped automatically every three months. This way, I feel more comfortable clicking the âSign in with Googleâ button on various sites. I can be more assured that third parties have limited access to information about me. Consider using more private browsers like Safari or DuckDuckGo.
Avoid clicking âSign in with Google,â or any of the other sign in integrations. I am not very good at this. The convenience is just too much. But this gets at the importance of siloing your information across different servers. Of the available âsign in withâ options, Apple likely has the best, most private one.
Get a robust password keeper. I keep multiple password libraries, which can often have discrepancies when I update a password on one and not another. If you heed #2, youâll need this.
Tread cautiously. It is better to assume your actions and information is being tracked than not. If you are a Chrome user, every move you make is being being collected in some way, even if you are only navigating incognito or on a VPN. Safari is slightly less invasive, but different individual sites may track you. Modern browsers and encryption are incredibly good at rooting out sketchy sites, but they often fail to illustrate the sketchiness of the companies that manage them.
Question everything. I often ask myself âIs the value I am providing this company by giving them my information returned to me in a useful way?â If the answer is no, I provide them very little or nothing at all.
Enable two-factor authentication on your accounts whenever possible.
Regularly review and update your privacy settings on social media and other online accounts to limit the amount of personal information that is shared.
Be mindful of the apps you download and the permissions you grant them on your device. Hereâs some permissions to be aware of:
Location access
Camera access
Microphone access
Contacts access
Calendar access
Email access
Local device access (Bluetooth, etc.)
Health data access
Facial/fingerprint identification access
What can we do about it? We must advocate for good faith data privacy legislation and corporate policy. Illinois is currently the nationâs gold standard in digital privacy legislation âď¸, creating restrictions on the ways companies can retrieve and use your data. More states should adopt policy like this.
We should work towards reconstructing the economic model of the internet, away from advertising and toward direct payment for goods and services. As is said, if you are getting something for free, you are the product. Services that reward this sort of peer-to-peer or direct payment include Patreon, Squarespace, Substack, and OnlyFans.
We should decentralize the internet and break up the large tech firms. Currently, there is a significant movement toward decentralization. We have Elon Musk to thank for that. Social sites like Mastadon and Blue Sky work on a different protocol than is typical, spreading out data management over many server nodes. But thatâs a story for another đ Feature.
I am certain that I will often return to this subject. I will likely make corrections and updates to this argument in the future, but for now I just wanted to get this conversation started. I havenât even brought up the ways this data is used for recommendation algorithms, and thatâs a whole debacle.
đ Reading list
This week I want to highlight some recent talk about data.
This week on the Vergecast, the challenges of protecting kids on the internet. You can skip the Xbox part, lol:
An article about why digital ads seem so bad:
Why Are You Seeing So Many Bad Digital Ads Now? by Tiffany Hsu at the New York Times
âĄď¸ Lightning
Remember that AI is very good at sounding confident. Engage with chat bots how you might a new intern. They require clear prompting, a little hand holding, and verification.
If youâre looking for new toys, I typically check a couple places.
Facebook Marketplace
CTBids, an online estate sale service
Thereâs a big fight going on in Microsofts bid to purchase Activision âď¸. Itâs messy and silly, but I am opposed to the merger. Tech consolidation is typically bad for creative laborers.
Based on continued reports of negative experiences in the animation âď¸ and visual effects âď¸ industries, I would not be surprised if a large scale unionization effort is formed for these kinds of workers. Itâs difficult ground; much of this labor is already outsourced to Canadian, New Zealand, and South Korean firms, but something has gotta give.
My favorite app this week: NYT Games.
I really want to play with a 15 inch MacBook Air.
đ Glossary
PII
is any information that can be used to identify a specific individual. Examples of PII include a person's name, Social Security number, date of birth, home address, phone number, email address, and biometric data, such as fingerprints or facial recognition data.
cookies
Cookies are bits of code stored in your browser that allow the browser/sites to recognize you across the internet. They are used to customize experiences, remember you are logged in to certain sites, and target advertising.
keystrokes
Keystrokes refer to the physical act of pressing a key on a keyboard or other input device, particularly in the context of typing or entering data into a computer. In the context of data privacy, keystrokes can refer to the information that is captured and stored by software or devices as a user types, including passwords and other sensitive data.
ransomware
Ransomware is a type of malicious software that threatens to publish the victim's data or block access to it unless a ransom is paid. While some types of ransomware simply lock the victim's computer screen, others encrypt the files on the system, making them unusable until the ransom is paid and the files are decrypted. It is a form of cyber extortion that can be used to target both individuals and organizations.
end-to-end encryption
A process by which information is encoded so that only the sender or recipient of the data can read or edit it. Applications that use E2EE include Signal, FaceTime, and Facebook Messenger.
biometric data
Fingerprints, retinal scans, facial recognition. This is all considered biometric data.
âď¸ Answers
Got a question relevant to todayâs topic:
Why do I need to create an account for your newsletter?
This is a totally fair question. Fwiw, I should point out that I donât see any passwords or information beyond your email.
This newsletter is currently behind a free subscriber wall in my Squarespace site. As part of the security protocol Squarespace imposes on member areas, they require validation beyond just an email address. I imagine that they do this for creators with a more robust subscriber area in order to build trust with a creatorâs users. In those more robust member areas, money is being exchanged, credit card information is being added, and other PII comes into play. The basic protocol in that situation is to at least password protect the account.
This brings up a good point. Never share credit card, bank, or insurance information in a non-logged in state. Companies are quick to remind you of this, but they will never solicit this information in informal ways.
One quick way to tell if a site meets certain data security requirements is to look at the URL bar. Most browsers will show a lock icon like this đ but you can go one step further and take a look at the random stuff at the beginning of the URL. If it says âhttps://â the S is for secure. No S, no secure.
Thatâs it for this week! Thanks for reading and see you in your inbox next Sunday. Much love, Alex.