Self-knowledge is hard to come by.

What happens when companies and governments know more about consumers than we know about ourselves? (I prefer to think of us as people, or maybe citizens, though we are also consumers.)

The NSA has a new spy center going up in Utah. It’s a data storage facility, a really big one. When it’s finished later this year, it will have the capacity to store over 3.5 million GB per capita. That’s equivalent to storing 7,000 laptop hard drives full of information on every American citizen. Now I’m not saying the government is profiling you individually, but… I want you to know that they could. (Don’t worry, they’ll only use their powers to catch bad guys.)

In 2006, AOL leaked documents containing 20 million things that their users typed into the AOL search engine over three months. Though the searches were in one sense ‘anonymized’ (no names or IP addresses, etc.), each search string was attributed to a numeric ID, making it easy to associate different searches by the same user.

You might not realize how intimate the relationship between human and search engine can be. When you have a weird growth on your body, where do you turn first for information? Problems with your love life? The nearest therapist is your browser. You’ve talked to a search engine before, haven’t you?

“How to tie a bow tie”
“What day is Easter 2013”
“Is the NHL lockout over”

One of the assignments for my first grad class was to take a look at the leaked AOL documents—literally just to scroll through for a while and get a sense of what you were looking at. It’s overwhelming in a creepy, intimate way.

We were prompted to identify and approximate the geographic location of users with certain specific characteristics, such as A gambler who might not be paying his/her taxes, or Someone looking to terminate a pregnancy with over-the-counter medication.

Some of my findings for At least 3 people trying to self-diagnose an illness:

10291 – This user appears to be from the Long Island area in New York, based on numerous searches for things in Jones Beach and Selden, NY. The user, presumably a mother, first searches for “two year old with lazy eye red in inner corner of eye,” and “inner corner of eye is red swollon,” leading her to investigate the possibility of the condition “dacryocystitis.” She then spends some time trying to diagnose what sounds like a more serious condition with queries like “discharge of clear fluid sack from vagina not pregnant,” “can you pass cyst from ovary after it ruptured,” “ovarian cyst,” “blood in urine,” and “symptoms of bladder cancer.” The user is concerned that prescription drug use may be to blame, asking the search engine “can vicoden or percocet make blood in urine.” Also interesting to note the earlier query, “buy vicodin.”

These leaked logs are still pretty easy to find. Just search for them. [wink]

If you think about it, your search history might “know” more about you than anyone else in your life.

I was born in the mid-1980s and sort of remember what it was like to not have the Web. Now, it takes a fair amount of effort to distance myself from the Web, even temporarily or in small degrees. Every day, we interact with, depend on, and orchestrate our lives through Web-connected technology. The depth of connection with technology is evident in our individual lives and society at large. It’s not possible to live “on the grid” in a first-world society today and not experience this relationship with technology.

You expose an incredible amount of information about yourself, willingly or otherwise, just by using the internet. This is not necessarily a bad thing if you’re living mindful of the fact, but consider:

– What does your web history say about you—the blogs you read, videos you watch, places you shop, and everything else you “do” online?
– What about the relationships in your social network?
– And, now that we carry the Web with us, what about location data? Where are you right now on planet Earth and where did you get your coffee this morning?

Aggregate enough of that stuff and it’s likely that a clear picture of who you are, what you do, and how you might think and feel can be discerned or extrapolated.

Before I dramatically segue back to the NSA spy center, let me express my (naive?) optimism that these personal datasets can be made available to individuals and protected from surveillant and commercial interests. If we all had such powerful, wide perspective on ourselves, it might prompt us to change, to live better.

But back to that 1.5 million square foot data center the NSA is wrapping up in Utah right now. Last year, details about the intelligence center and the NSA’s digital spy network were published in a fantastic piece of investigative reporting by James Bamford for Wired magazine. It’s absolutely worth reading in its entirety, but here’s a good synopsis:

“Under construction by contractors with top-secret clearances, the blandly named Utah Data Center is being built for the National Security Agency. A project of immense secrecy, it is the final piece in a complex puzzle assembled over the past decade.

“Its purpose: to intercept, decipher, analyze, and store vast swaths of the world’s communications as they zap down from satellites and zip through the underground and undersea cables of international, foreign, and domestic networks.

“The heavily fortified $2 billion center should be up and running in September 2013. Flowing through its servers and routers and stored in near-bottomless databases will be all forms of communication, including the complete contents of private emails, cell phone calls, and Google searches, as well as all sorts of personal data trails—parking receipts, travel itineraries, bookstore purchases, and other digital “pocket litter.” It is, in some measure, the realization of the “total information awareness” program created during the first term of the Bush administration—an effort that was killed by Congress in 2003 after it caused an outcry over its potential for invading Americans’ privacy.”

Other highlights include:

– A former senior NSA crypto guy going on the record to say, holding his thumb and forefinger together, “We are that far from a turnkey totalitarian state.”

– The NSA, with full complicity of US telecoms, uses “wiretaps” across the country to read domestic internet traffic in real time. They also read the internet’s major ocean floor cable lays and monitor satellite communications. In other words, they’re reading all the world’s unencrypted internet traffic in real time. And they’re storing it all (including the encrypted stuff) for a rainy day.

– The NSA’s ability to eavesdrop directly on phone calls in real time. Yep!

About nine months after this article was published, another whistleblower, also a former NSA crypto guy, went on the record with a similar story about the US government’s capacity to monitor the populace. Discussing Gen. Petraeus’s affair, he claimed, “all the congressional members are on the surveillance too, no one is excluded. They are all included. So, yes, this can happen to anyone. If they become a target for whatever reason – they are targeted by the government, the government can go in, or the FBI, or other agencies of the government, they can go into their database, pull all that data collected on them over the years, and we analyze it all. So, we have to actively analyze everything they’ve done for the last 10 years at least.” (Watch/read the full interview here)

What is privacy?

Privacy is an abstract concept with important implications.

It is “an interest of the human personality. It protects the inviolate personality, the individual’s independence, dignity and integrity.” – Edward Bloustein¹

An article in the Stanford Encyclopedia of Philosophy cites arguments that privacy is “crucial for intimacy” and “the development of varied and meaningful interpersonal relationships.”

Data privacy is an Internet-era extension to the domain of privacy. “Data” refers to an ever-growing array of digital records and artifacts. Think of it as all information you generate by using technology, including meta-information like how, when, and where you use it. Also all the information that is generated about you—video recordings of you driving through a toll booth, the scheduling software at your doctor’s office that records your visits, and so on.

Tim Berners-Lee, inventor of the World Wide Web, in an interview with the Guardian, calls data privacy a human right.

When you use your computer or phone, your data is collected directly and indirectly by software companies, app developers, search engines, service providers, and content providers—among many other interested parties. Other industries (government agencies, hospitals, retail stores, insurance companies, cable companies, etc.) also capture, retain, aggregate, sell, and buy your data using various methods and sources.

Data collection is not new.

Storing and selling information is not new.

What is new is “the scope of data collected, the precision with which the company can associate an action with a customer, and the sheer quantity of information.” – Catherine Tucker, MIT Sloan School²

“If you are not paying for it, you’re not the customer; you’re the product being sold.”³

Data collection is a business model. The free apps and services we use are often provided by companies whose business is based on ‘monetizing’ information about you: by selling your information directly and/or by knowing who you are and selling access to you to other companies. As more information about you becomes available to enable such a thing, what’s being marketed to you (products, ideas, feelings) will involve increasingly complex psychological methodology. Advertising, that vanguard of behaviorism, and ‘business intelligence,’ which companies now pursue (in the form of data warehousing and analysis) with fanatical exuberance, will together lead the research that probes the extent of data’s power to create profit.

“But I have nothing to hide. . . ”

I’ve heard it a lot, and I’m really interested in this rebuttal. If you feel this way, can you share your perspective in the comments section?

Because I do have something to hide, in a protective sense, and it’s not a criminal act or an embarrassing secret, it’s my “inviolate personality” that wants to reserve the right to be unknown. Unknown to companies trying to sell me stuff. Unknown to governments eavesdropping on everyone. Unknown to a new friend, in a time when a Google search and a curated Facebook timeline rob us of opportunities to actually share ourselves and build trust and empathy.

Privacy is subjective. But in that Stanford article on privacy, Priscilla Regan is quoted, arguing that “Privacy is not only of value to the individual, but also to society in general… Privacy is a common value in that all individuals value some degree of privacy and have some common perceptions about privacy… Privacy is rapidly becoming a collective value in that technology and market forces are making it hard for any one person to have privacy without all persons having a similar minimum level of privacy.”

When thinking about any technology, I’ve learned to consider what could happen with ubiquity and scale. We are increasingly reliant on technology to shape our perception and understanding of reality. The camera eye of a smartphone takes on new significance when ‘everyone’ has a smartphone, billions of optical eyes that can scan and record their surroundings, mapping out an archive of physical space in real time. With tools like GPS and a connection to the internet, and with algorithms and databases of facial and object recognition data, those millions of eyes not only watch and document the world but begin to understand what they’re looking at.

Google is putting considerable resources behind bringing augmented reality devices to the mass market. Here’s a teaser video for their new product, Glass:

Maybe not this product, but this idea is likely to become as ubiquitous as smartphones are now, and more. We now have to ask how expectations of privacy will change, even if you plan to never use one of these devices.

In a recent blog post, Mark Hurst plays the scenario out:

“Anywhere you go in public – any store, any sidewalk, any bus or subway – you’re liable to be recorded: audio and video. Fifty people on the bus might be Glassless, but if a single person wearing Glass gets on, you – and all 49 other passengers – could be recorded. Not just for a temporary throwaway video buffer, like a security camera, but recorded, stored permanently, and shared to the world… The really interesting aspect is that all of the indexing, tagging, and storage could happen without the Google Glass user even requesting it. Any video taken by any Google Glass, anywhere, is likely to be stored on Google servers, where any post-processing (facial recognition, speech-to-text, etc.) could happen at the later request of Google, or any other corporate or governmental body, at any point in the future.”

The buzzphrase right now is ‘big data.’ Do a search for that term and you’ll see hardware and software companies promoting the idea that amassing colossal data sets is “the foundation for creating new levels of business value.” The tech and business trade magazines rhapsodize about how Big Data will affect life and society and industry at every level. I think that’s probably true. A change of such magnitude brings both gains and losses. And though it seems to me that privacy is non-negotiable, that perspective is underrepresented in the conversation about Big Data.

I recently attended a lecture titled “Big Data – and its Dark Side,” which, ostensibly, was a promotional stop on a book tour for a new book, Big Data, retitled with a “Dark Side” to appeal to the audience of the event’s sponsor, the Berkman Center. I haven’t read the book, so I can’t say definitively, but the two co-authors seem to represent the idea that massive change is coming, and as long as we’re careful, it’s going to be great. Setting the tone early in the lecture, Kenneth Cukier, who is data editor at The Economist, put up a slide of a preemie cradled in someone’s palm, talked about qualitative studies and letting the data speak, and admonished the audience, “Lest you forget… Big Data saves lives.”

The “dark side” turned out to be the potential for actualizing ‘precognitive’ law enforcement (as depicted in the movie Minority Report), using data to profile future criminals and prosecute crimes before they are committed. Though he didn’t have to tell the audience at a law school, the other presenter, Viktor Mayer-Schönberger, repeated how “highly unlikely” this idea is. And while the authors didn’t talk about data-interpretive tools like facial recognition rendering our current concepts of privacy outmoded, they did mention “humility” as necessary for keeping the “evils” of Big Data in check. Which is great, but probably below “shareholders” on most companies’ list of priorities.

During the Q&A, a Harvard Business faculty member, after dismantling the authors’ definition of ‘Big Data’ as something distinct from just ‘data,’ posed what feels like an important question: “What does empiricism crowd out?” What truths are concealed by data points? What perspective is lost understanding the world by mining its information? As advertisers engineer taste, for example, what sum of the human person is disturbed by manipulating its parts?

The data revolution is here, and we’re collectively opted in. This new relationship with data will change how we live, and we will celebrate the changes that benefit humanity. But there is also a dark side, it will show itself in gradually less-subtle ways, and you shouldn’t hold your breath waiting for the government to legislate the protection of your inviolate personality. It’s a tired-but-true aphorism: knowledge is power. Don’t freely offer power over yourself to companies or governments—do the opposite: resist!

P.S. If you set your Twitter profile to private, the government may not actually be reading your tweets. The HTTPS protocol is commonly used to encrypt internet traffic, and in March, 2012, Twitter began using this protocol as a default for all users. Encryption can be cracked, though. It takes an extremely powerful computer system. Not unlike the ones the NSA keeps building.


1 Privacy as an Aspect of Human Dignity, [1964] 39 New York U. L.R. 962. (Though I originally found this quote on https://www.privacyinternational.org.)


  • The Government is Reading Your Tweets : The Lone Pamphleteer × 07.08.13

    […] The Government is Reading Your Tweets […]

  • football × 07.13.14

    I honestly believe that Adrian Peterson, Arian Foster, Ray Rice, Marshawn Lynch, or Jamaal
    Charles should be your focus in the first round. Gene Wojciechowski’s ode to college football is a
    great read. So besides the fact that both sports
    are being played with 11 players on the field, the similarity
    ends here.

Add Comment