Fast Global Sharing of Medical Data?

When I started out in cancer biology, I was surprised by the difficulty of accessing medical data for science. This was puzzling to me because all the cancer patients I met were very open and extremely helpful. When I spoke to patients about this problem, they were surprised, too – many of them assumed that the data they shared with medical centers were broadly shared and accessible to the global research community.

In one study I was involved in, it took several years to work through the legal paperwork to access stored medical images, and even then, the images were subject to myriad constraints. If people can go to the moon, and 2.08 billion people on earth are active smartphone users, why are medical data frequently still stuck in, figuratively speaking, local libraries with only a limited selection of books?

The strange thing of course is that, fundamentally, medical data belong to the patient, and therefore, if a patient wants to share his or her information, they should find it easy to do so. The most telling conversation for me was a father with two kids. When asked about data sharing, he said said he could not care less about who saw his medical records; rather, it was much more important to him that as many scientists as possible had access to his data, so that his data would make the largest difference and hopefully reduce the chance of his kids having brain cancer, like he did, at some point in their lives. That made a lot of sense to me.

I still do not fully understand all the barriers to efficient data sharing in cancer biology, but I’m curious about standard web technologies that can help patients share what they want, when they want, and to whom they want. If a patient can share a movie, a picture, or a book within several seconds around the world, why is it sometimes still difficult for them to share their medical information?

For a while I thought the major problems had to do with the rules and regulations surrounding medical data, but that turns out not to be the case. The simplest way to start thinking about crowd-sharing of medical information is that millions of people around the world already crowd-share medical information. For example, women with breast cancer sometimes wear pink t-shirts to raise awareness, and they then circulate these pictures on social networks. That’s an example of someone sharing medical information – namely, their cancer diagnosis – in the form of a picture.

A few months ago, I started to look into web technologies that could potentially be used to help people share some of their medically-relevant information within 1 second. I chose the 1 second standard arbitrarily – it seemed like a reasonable number. Much below one second you run into various technical problems, but if you are willing to wait a few hundred milliseconds, the technologies are all already there: inexpensive, massively scalable, and globally deployed.

What if each cancer patient on earth had the ability to broadcast key pieces of information about their cancers around the world, in one second? 

If you are curious, here is the White House fact sheet announcing CancerBase, and here is a little bit more information about how we started out. The actual site is at It’s an experiment run by volunteers, many of whom are cancer patients, so bear with us, and if you can, help out!

4 thoughts on “Fast Global Sharing of Medical Data?”

  1. It appears that the seeming discrepancy between patients’ apparent willingness to release their medical data and the actual availability of such data is two-fold: one part legal and one part economic. Neither is simple to solve, but at least the legal issue has the potential to change in the short term.

    To some extent, hospitals have very little to gain from sharing data compared to the apparent risks. Even if the cost of sharing was financially negligible, care providers fear various legal repercussions. If datasets fail to be handled securely or properly anonymized, then there could be at best a PR nightmare. Whether or not the risk is actually substantial, it’s the same reason grocery stores discard massive amounts of edible food despite the fact that good Samaritan laws in many states shield those who make food donations from legal prosecution. While someone who shares a photo or video online takes full responsibility as the “content owner” and is subject to any resulting legal ramifications, the responsibility for protecting and sharing medical data rests chiefly on care providers, despite the fact that patients may own the rights to their own medical data.

    Certainly, laws could be written to decrease the red tape surrounding the release of medical information, but fundamentally, there has got to be some movement whereby sharing medical data is as socially lionized as being an organ donor. There’s too much legal bureaucracy for the impetus for change not to come from the ground up. As it currently stands, there’s much more information on the web about how to sue for damages over HIPAA violations rather than on how to authorize release of medical information.

    On the economic side, one faces the problem that electronic medical records (EMR) are stored on legacy systems or current but incompatible systems. There’s relatively little incentive for care providers and EMR systems providers to emphasize interoperability and mutual compatibility among different care providers and EMR systems. In fact, there’s a strong incentive to design systems in a way such that patients are discouraged from transferring to a different healthcare provider, who may also use a different EMR system. Given the size of the healthcare and EMR systems industries and their ballooning Washington lobby, it’s hard not to be pessimistic that there could be a short-term solution.

    As to your actual question: no doubt, the real-time sharing of massive amounts of medical data would prove to be game-changing in any age but perhaps particularly revolutionary in our current era, in which “big data” is the buzzword of the day. To state the obvious, as recently as a decade or two ago, no one was really prepared—not just in terms of computational ability but also in terms of mindset—to make effective use of many large datasets in medicine and industry. Of course, with more data (and better data), one can always narrow confidence intervals and so forth, but the big deal now is the ability to do reasonably fast and robust statistical learning on datasets in the petabyte or even exabyte range, often without subsampling. Considering also with the fact the current approach to cancer is becoming more and more fine grained (so that what previously was the “one-in-ten-thousand” sample may now be one in a million), the impact could be enormous.

  2. Another dimension to the fog that is active participation is that of “contributor as product”, that those contributions made to any epistemic community might, in fact, be monetized. Early personal health sites like and had business models running in the background based on what I call “anecdotal epidemiology” — they were able to aggregate pt data and provide reports to those who have no other legal means to acquire such data. Both sites mentioned are thriving today, one by dint of massive creativity in their activities, the other by being bought out by 23&me. From my perspective, what is missing in the quest for global improvement in pt outcomes is that each of the present entities remains a silo. That is, I believe we need to get beyond silos and into the realm of global participation and sharing.

  3. I applaud what you’re doing to encourage data sharing by cancer patients. Since you’re appealing to patients themselves, the challenge is that they don’t have their own medical records. Even today, cancer patients are known to take hard copies of scans to get second opinions because there is no way to share the images with other docs outside a provider system. It’s not entirely technical. Obviously we all routinely send images and videos today through social media, cloud services and even email. This problem could be solved if everyone kept their own medical record, had visualizations of their health status to discuss with physicians and allied health providers. Otherwise you’re relegated to a fax machine!

    Have you ever tried screen scraping software to get EHR data to a patient medical record? Assuming patient makes the request to get data from provider. Another option is to tell patient what data you’re seeking, such as cancer diagnosis, subtype, tumor genotype vs germline, treatments, age, sex, family history….then give them a place where they can build their own medical record.

    Helping researchers is a great goal of cancerbase but you have to look beyond altruism to give patients a value proposition.

