A privacy-preserving internet?

How a new generation of cryptographic techniques might allow people to transact in complete privacy.

The internet makes communication easier than ever before in human history. Scalable, low friction communication is not only about sending text messages to others but is central to all human activity. For example, marketplaces, both physical and digital, are coordination solutions that help buyers and sellers to efficiently discover one another. When it’s easy to communicate, it’s also easy to create markets for goods, services, and ideas. Today, about 1/2 of the earth’s population, some 4 billion people, use their phones to message others, learn, discover, consume digital content, and buy and sell everything from food to medicines to clothing.

So how is cryptography relevant to any of this?

There’s an obvious problem with communications networks – as they grow, they can quickly exceed the scale at which everyone knows (and trusts) everyone else. You might feel comfortable giving your neighbor some tomatoes in return for a verbal promise for some future fruit, but you hopefully will be more cautious with `spacecadet42` who just randomly messaged you on Craigslist. Scalable solutions to trust, identity, and security are therefore vital. This problem is equivalent to the classical key exchange problem encountered in symmetric cryptography – if a king wishes to securely communicate with a few other people, keys can certainly be exchanged by human couriers, but this approach quickly breaks down when millions or billions of people wish to securely communicate and transact. If n people wish to securely communicate, you need roughly n^2 keys. Likewise, if an enemy discovers your key and you wish to change it, having to rely on couriers or pigeons to distribute fresh keys is obviously not ideal.

Cryptography to the rescue, Part 1

Solutions to the key exchange problem were discovered in the 1970’s – first, in secret work by James H. Ellis. Soon thereafter, in 1976, Whitfield Diffie and Martin Hellman described the first practical asymmetric key crypto system, now known as Diffie–Hellman key exchange. Finally, in 1977, Ron Rivest, Adi Shamir and Leonard Adleman invented RSA, which offers both public key encryption and digital signatures. That’s what you are using right now to read this post, which is hosted on a server that uses SSL/TLS to create a secure channel between your browser and the server. Fundamentally, RSA and related cryptographic methods allow you to securely transact with banks, stores, universities, doctors, search engines, newspapers, and all the other mainstays of our digital lives.

Cryptography is not only good for solving scaling problems, but also has a long history of making entirely new things possible. Most obviously, cryptographic hashing (and ECC and digital signatures) allowed the double-spend problem to be solved, by creating immutable digital chains and a mechanism for distributed coalescence around one unique history of events, in this case, a noisy succession of peer-to-peer digital value transactions (aka Bitcoin).

Unfortunately…

As the internet becomes part of our daily lives, I’m sure we all have encountered situations that made us wonder. Here’s a really simple example – I recently posted a package at UPS, and several seconds after paying at the cash register, a message popped up on my iPhone encouraging me to use Fedex “for all my shipping needs”. UPS and Fedex are direct competitors in US shipping/logistics. Imagine all the things that needed to happen to make this one message on my iPhone possible – is my cell phone carrier selling my meter-resolution GPS location data, allowing Fedex to message me right after leaving the UPS store? Alternatively, is my bank selling credit card transaction data, allowing Fedex to see I had just paid for UPS shipping? Or, was the shopping mall I was in harvesting bluetooth packets from my phone to provide hyper-local targeted advertising? That’s only a trivial example, of course, but many of us are increasingly concerned about our data and how it’s being used.

A privacy <> service tradeoff?

Although data privacy and data use are receiving more attention all over the world, many of us take it for granted that we need to divulge information to receive relevant goods and services, or to be able to transact. If I like pistachio ice cream, I clearly need to tell people that, otherwise I’ll almost always get the wrong flavor. Important examples of this privacy <> service tradeoff can be found in finance, banking, healthcare, and education. To sell stocks on the stock market, it seems inescapable that you have to divulge the price at which you would sell your assets and what those assets are. Likewise, perhaps you are looking for a loan from a bank – to qualify for a loan, surely the lender needs all your financial information? Finally, when you visit a doctor, you take it for granted that you need to tell the doctor your symptoms to receive a diagnosis. After all, how else shall the doctor generate a diagnosis, other than by computing on your symptoms? This is where it gets really interesting. What if you could obtain digital goods and services, relevant to you, without revealing anything about yourself?

Cryptography to the rescue, Part 2

Here’s a partial answer. This example is from healthcare AI, but the underlying math is completely general. In Microsoft’s Cryptonet, leveled homomorphic encryption is used to (1) encrypt images at the source, (2) send those encrypted images to the cloud, (3) have the cloud computer classify the image, despite not being able to decrypt the image, and finally, (4) return an encrypted label to the person who initially encrypted the image. Put simply, only the person who initially encrypted the image is able to decrypt the output of the classifier. In the jargon of ‘privacy-preserving analytics’, the remote computer is an untrusted cloud worker able to perform useful computations without being able to see either the inputs or the results of all the work they are doing. It’s immediately clear why this could be useful in healthcare – you could use your phone to get a diagnosis from a cloud doctor without your (unencrypted) symptom data ever leaving the phone – moreover, the cloud doctor would have no idea what the diagnosis was.

This example only scrapes the surface of what can be done with new cryptographic techniques such as Fully Homomorphic Encryption (FHE) and Secure Multiparty Computation (SMC). These techniques can be used, for example, to privately match two people based on shared (or overlapping) attributes, which is of course the foundation of all classical financial markets and exchanges (leaving aside for the moment automated market makers which do not have traditional order books). If bids and asks can be cryptographically guaranteed to be private, and yet, buyers and sellers can still somehow discover one another, fundamentally new types of digital transactions with unique characteristics can be realized.

On a high level, the essential point is that the privacy <> services tradeoff, a fact of life throughout human history, has basically been eliminated, just like Bitcoin solved the double spend problem. In healthcare, there is no longer any reason to have to divulge your personal information, such as your medical symptoms, to get an accurate diagnosis. In finance, it’s now (cryptographically speaking) possible to trade without revealing the nature of your ask or bid, although suitable digital exchanges must first be constructed and deployed globally. Equivalent considerations apply broadly throughout the internet, not just for healthcare and finance, but essentially all matching operations across communications, content, and commerce.

2 replies on “A privacy-preserving internet?”

“In finance, it’s now (cryptographically speaking) possible to trade without revealing the nature of your ask or bid, although suitable digital exchanges must first be constructed and deployed globally.”
Other than preventing front running, what benefits would you see from a blind ordering system?

LikeLike

Beyond front-running – which is a great example – privacy is relevant to any situation with “personalized” pricing. To give a simple example, some online retailers keep track of your kids’ birthdays – so if you as the parent go to those retailers a day or two before your child’s birthday, the prices on toys shown to you are conditioned on those priors. This is similar to ‘surge’ or ‘real time’ pricing, except in those cases, pricing is based on something that affects many people (rain; rush hour) rather than something about you specifically (you need a birthday present by tomorrow!). In both examples (personalized pricing; surge pricing), the more extra information (time, weather, your IP address, entries in your calendar, …) the seller has, the better the seller is able to guess your limit price. Overall, it would be better for the buyer if the seller had less information about you; the same is generally true for air travel, stock exchanges, and commodities markets (agriculture; gas/oil/electricity; shipping logistics).