Categories
Uncategorized

AI and the rise of zero-cost healthcare

In many AI/ML papers, classifiers are scored by well they do compared to human doctors. For example, a (made up) title could be “MyNextGenClassifier does 0.6% better at finding 2 mm brain bleeds than human radiologists at Memorial Sloan Kettering“. Let’s unpack that. The title implies that the goal is to do “better” than a human, where better is defined as higher classification accuracy. This is the kind of thinking that got IBM into trouble with their attempt to “revolutionize” cancer care. In their original take on cancer care, the notion was that their technology would serve as an adjunct to 12 world class human cancer doctors, and make sure that e.g. new therapies or drug combinations would not be missed.

Let’s think about this from a different angle. For code running on a silicon-based computer, there are dozens of potential optimization functions. Beyond accuracy relative to a human, there are also cost-per-diagnosis, energy efficiency, reliability, accessibility, stability over time, scalability, privacy, and ease of use. Unfortunately, considering the set of medical AI papers (of which there are somewhere between 25,000 to 75,000, depending on how you count), we have somehow navigated ourselves into a corner – the vast majority of these papers focus on accuracy, which is not really where Healthcare AI shines.

For something different, we could for example look at the energy-efficiency and climate impact of a hospital with 200 human doctors vs. a hospital with zero human doctors (and only nurses). This type of hospital would presumably also be more cost effective and better able to grow and shrink with real-time patient demand, such as during a pandemic. Similarly, we could ask about the hidden cost of untreated/undiagnosed conditions. Especially in communities of color, the US healthcare system struggles to provide suitable levels of care – patients might not have insurance, they may not trust their local providers, or there may be any one of many barriers that can make it hard to access care. Digital health classifiers and recommendation engines can offer convenience, 24/7 ease of access, and privacy guarantees that are hard to realize in a traditional medical setting.

The real power of digital health is not to be 0.6% better than a typical human doctor, but to provide entirely new capabilities that health systems built around human doctors fundamentally cannot. Most obviously, once a classifier has been trained, deployed, and is used to help 1 million people, it costs almost nothing to use the same classifier to help all 7.9 billion people on earth. Why not make the entire diagnosis step of healthcare all-digital (and free)? With a relatively modest investment, it is entirely conceivable to build (and open-source) classifiers for the top 10 human health conditions and make those available globally. The world’s computers run on open-source software – the world’s health diagnostics system should, too.

Categories
Uncategorized

A privacy-preserving internet?

How a new generation of cryptographic techniques might allow people to transact in complete privacy.

The internet makes communication easier than ever before in human history. Scalable, low friction communication is not only about sending text messages to others but is central to all human activity. For example, marketplaces, both physical and digital, are coordination solutions that help buyers and sellers to efficiently discover one another. When it’s easy to communicate, it’s also easy to create markets for goods, services, and ideas. Today, about 1/2 of the earth’s population, some 4 billion people, use their phones to message others, learn, discover, consume digital content, and buy and sell everything from food to medicines to clothing.

So how is cryptography relevant to any of this?

There’s an obvious problem with communications networks – as they grow, they can quickly exceed the scale at which everyone knows (and trusts) everyone else. You might feel comfortable giving your neighbor some tomatoes in return for a verbal promise for some future fruit, but you hopefully will be more cautious with `spacecadet42` who just randomly messaged you on Craigslist. Scalable solutions to trust, identity, and security are therefore vital. This problem is equivalent to the classical key exchange problem encountered in symmetric cryptography – if a king wishes to securely communicate with a few other people, keys can certainly be exchanged by human couriers, but this approach quickly breaks down when millions or billions of people wish to securely communicate and transact. If n people wish to securely communicate, you need roughly n^2 keys. Likewise, if an enemy discovers your key and you wish to change it, having to rely on couriers or pigeons to distribute fresh keys is obviously not ideal.

Cryptography to the rescue, Part 1

Solutions to the key exchange problem were discovered in the 1970’s – first, in secret work by James H. Ellis. Soon thereafter, in 1976, Whitfield Diffie and Martin Hellman described the first practical asymmetric key crypto system, now known as Diffie–Hellman key exchange. Finally, in 1977, Ron RivestAdi Shamir and Leonard Adleman invented RSA, which offers both public key encryption and digital signatures. That’s what you are using right now to read this post, which is hosted on a server that uses SSL/TLS to create a secure channel between your browser and the server. Fundamentally, RSA and related cryptographic methods allow you to securely transact with banks, stores, universities, doctors, search engines, newspapers, and all the other mainstays of our digital lives.

Cryptography is not only good for solving scaling problems, but also has a long history of making entirely new things possible. Most obviously, cryptographic hashing (and RSA and digital signatures) allowed the double-spend problem to be solved, by creating immutable digital chains and a unique mechanism for distributed coalescence around one unique history of events, in this case, a noisy succession of peer-to-peer digital value transactions (aka Bitcoin).

Unfortunately…

As the internet becomes part of our daily lives, I’m sure we all have encountered situations that made us wonder. Here’s a really simple example – I recently posted a package at UPS, and several seconds after paying at the cash register, a message popped up on my iPhone encouraging me to use Fedex “for all my shipping needs”. UPS and Fedex are direct competitors in US shipping/logistics. Imagine all the things that needed to happen to make this one message on my iPhone possible – is my cell phone carrier selling my meter-resolution GPS location data, allowing Fedex to message me right after leaving the UPS store? Alternatively, is my bank selling credit card transaction data, allowing Fedex to see I had just paid for UPS shipping? Or, was the shopping mall I was in harvesting bluetooth packets from my phone to provide hyper-local targeted advertising? That’s only a trivial example, of course, but many of us are increasingly concerned about our data and how it’s being used.

A privacy <> service tradeoff?

Although data privacy and data use are receiving more attention all over the world, many of us take it for granted that we need to divulge information to receive relevant goods and services, or to be able to transact. If I like pistachio ice cream, I clearly need to tell people that, otherwise I’ll almost always get the wrong flavor. Important examples of this privacy <> service tradeoff can be found in finance, banking, healthcare, and education. To sell stocks on the stock market, it seems inescapable that you have to divulge the price at which you would sell your assets and what those assets are. Likewise, perhaps you are looking for a loan from a bank – to qualify for a loan, surely the lender needs all your financial information? Finally, when you visit a doctor, you take it for granted that you need to tell the doctor your symptoms to receive a diagnosis. After all, how else shall the doctor generate a diagnosis, other than by computing on your symptoms? This is where it gets really interesting. What if you could obtain digital goods and services, relevant to you, without revealing anything about yourself?

Cryptography to the rescue, Part 2

Here’s a partial answer. This example is from healthcare AI, but the underlying math is completely general. In Microsoft’s Cryptonet, leveled homomorphic encryption is used to (1) encrypt images at the source, (2) send those encrypted images to the cloud, (3) have the cloud computer classify the image, despite not being able to decrypt the image, and finally, (4) return an encrypted label to the person who initially encrypted the image. Put simply, only the person who initially encrypted the image is able to decrypt the output of the classifier. In the jargon of ‘privacy-preserving analytics’, the remote computer is an untrusted cloud worker able to perform useful computations without being able to see either the inputs or the results of all the work they are doing. It’s immediately clear why this could be useful in healthcare – you could use your phone to get a diagnosis from a cloud doctor without your (unencrypted) symptom data ever leaving the phone – moreover, the cloud doctor would have no idea what the diagnosis was.

This example only scrapes the surface of what can be done with new cryptographic techniques such as Fully Homomorphic Encryption (FHE) and Secure Multiparty Computation (SMC). These techniques can be used, for example, to privately match two people based on shared (or overlapping attributes), which is of course the foundation of all classical financial markets and exchanges (leaving aside for the moment automated market makers which do not have traditional order books). If bids and asks can be cryptographically guaranteed to be private, and yet, buyers and sellers can still somehow discover one another, fundamentally new types of digital transactions with unique characteristics can be realized.

On a high level, the essential point is that the privacy <> services tradeoff, a fact of life throughout human history, has basically been eliminated, just like Bitcoin solved the double spend problem. In healthcare, there is no longer any reason to have to divulge your personal information, such as your medical symptoms, to get an accurate diagnosis. In finance, it’s now (cryptographically speaking) possible to trade without revealing the nature of your ask or bid, although suitable digital exchanges must first be constructed and deployed globally. Equivalent considerations apply broadly throughout the internet, not just for healthcare and finance, but essentially all matching operations across communications, content, and commerce.

Categories
Uncategorized

FeverIQ: A global deployment of secure multiparty computation

Healthcare involves the exchange of unsecured information between two people, right? After all, how could a doctor possibly help you to stay healthy, without knowing anything about you. But things are changing.

There are two major intersecting trends. First, computers double their compute performance every year or two and are beginning to rival and exceed human performance in multiple clinical specialties, such as radiology and dermatology. This allows us to broaden our views of who, or what, doctors are. Second, it’s possible to compute on encrypted data, such that only the person who generated the data can see the computation results.

When you combine those two things – powerful classifiers and ability to compute on encrypted data – you end up with something new. You can begin to imagine a world where healthcare is both affordable, costing fractions of a penny per diagnosis, and completely private. In the last few months, we’ve built the world’s largest deployment of Secure Health, in which computers work on encrypted data to give people useful insights, in this case, a personalized COVID risk estimate.

We’ve also decided to make the data we obtained from millions of people around the world available to scientists and doctors, as a starting point to further discovery and impact.

The preprint is out: https://www.medrxiv.org/content/10.1101/2020.09.23.20200006v2

This is only possible because millions of people in 91 countries thought that this was a good idea, and took a leap of faith to share their symptoms and test results with the FeverIQ efforts, which uses Enya’s secure multiparty computation API to classify and learn without their data ever leaving their phone. Thank you, to each one of you.

Categories
Uncategorized

CancerBase – first lessons

CancerBase.org has been up and running for a few months now. About 1000 people have signed up and many of them suggested at least one new feature. We are preparing to launch CancerBase 2.0 this winter, based on everything we have learned so far. Here’s just one example – dozens of people complained (nicely) that we downsampled their physical location on the map. We did this to ensure anonymity, which makes complete sense if you are approaching medical data as a scientist or a lawyer. However, some patients have a different perspective. Their point is that they are a real person with a real name and a real location, and they want everyone to know about them and their disease, and they want their dot on the CancerBase map to be right on their house. We added a checkbox to CancerBase so that everyone can now specify their personal mapping/geolocation preferences; since medical and other personal data belong to the patient, of course they should have complete control over how the data are used and displayed. It’s been amazing to work with everyone on CancerBase and we are growing quickly. Stay tuned for CancerBase 2.0, which will feature a common API and support several patient-centered applications.

Categories
Uncategorized

Fast Global Sharing of Medical Data?

Why is it so hard to access medical data for science? Almost without exception, people with cancer are very open and eager to help. They are surprised about this problem, too – they assume that the data they give to medical centers are somehow broadly shared and accessible to the global research community.

In one study I was involved in, it took several years to work through the legal paperwork to access stored medical images, and even then, the images were subject to myriad constraints. If people can go to the moon, and 2.08 billion people on earth are active smartphone users, why are medical data frequently still stuck in, figuratively speaking, local libraries with only a limited selection of books?

The strange thing of course is that, fundamentally, medical data belong to the patient, and therefore, if a patient wants to share his or her information, they should find it easy to do so. The most telling conversation for me was a father with two kids. When asked about data sharing, he said said he could not care less about who saw his medical records; rather, it was much more important to him that as many scientists as possible had access to his data, so that his data would make the largest difference and hopefully reduce the chance of his kids having brain cancer, like he did, at some point in their lives. That made a lot of sense to me.

I still do not fully understand all the barriers to efficient data sharing in cancer biology, but I’m curious about standard web technologies that can help patients share what they wantwhen they want, and to whom they want. If a patient can share a movie, a picture, or a book within several seconds around the world, why is it sometimes still difficult for them to share their medical information?

For a while I thought the major problems had to do with the rules and regulations surrounding medical data, but that turns out not to be the case. The simplest way to start thinking about crowd-sharing of medical information is that millions of people around the world already crowd-share medical information. For example, women with breast cancer sometimes wear pink t-shirts to raise awareness, and they then circulate these pictures on social networks. That’s an example of someone sharing medical information – namely, their cancer diagnosis – in the form of a picture.

A few months ago, I started to look into web technologies that could potentially be used to help people share some of their medically-relevant information within 1 second. I chose the 1 second standard arbitrarily – it seemed like a reasonable number. Much below one second you run into various technical problems, but if you are willing to wait a few hundred milliseconds, the technologies are all already there: inexpensive, massively scalable, and globally deployed.

What if each cancer patient on earth had the ability to broadcast key pieces of information about their cancers around the world, in one second? 

If you are curious, here is the White House fact sheet announcing CancerBase, and here is a little bit more information about how we started out. The actual site is at CancerBase.org. It’s an experiment run by volunteers, many of whom are cancer patients, so bear with us, and if you can, help out!