Author Archives: jliphardt

CancerBase – first lessons

CancerBase.org has been up and running for a few months now. About 1000 people have signed up and many of them suggested at least one new feature. We are preparing to launch CancerBase 2.0 this winter, based on everything we have learned so far. Here’s just one example – dozens of people complained (nicely) that we downsampled their physical location on the map. We did this to ensure anonymity, which makes complete sense if you are approaching medical data as a scientist or a lawyer. However, some patients have a different perspective. Their point is that they are a real person with a real name and a real location, and they want everyone to know about them and their disease, and they want their dot on the CancerBase map to be right on their house. We added a checkbox to CancerBase so that everyone can now specify their personal mapping/geolocation preferences; since medical and other personal data belong to the patient, of course they should have complete control over how the data are used and displayed. It’s been amazing to work with everyone on CancerBase and we are growing quickly. Stay tuned for CancerBase 2.0, which will feature a common API and support several patient-centered applications.

Fast Global Sharing of Medical Data?

Why is it so hard to access medical data for science? Almost without exception, people with cancer are very open and eager to help. They are surprised about this problem, too – they assume that the data they give to medical centers are somehow broadly shared and accessible to the global research community.

In one study I was involved in, it took several years to work through the legal paperwork to access stored medical images, and even then, the images were subject to myriad constraints. If people can go to the moon, and 2.08 billion people on earth are active smartphone users, why are medical data frequently still stuck in, figuratively speaking, local libraries with only a limited selection of books?

The strange thing of course is that, fundamentally, medical data belong to the patient, and therefore, if a patient wants to share his or her information, they should find it easy to do so. The most telling conversation for me was a father with two kids. When asked about data sharing, he said said he could not care less about who saw his medical records; rather, it was much more important to him that as many scientists as possible had access to his data, so that his data would make the largest difference and hopefully reduce the chance of his kids having brain cancer, like he did, at some point in their lives. That made a lot of sense to me.

I still do not fully understand all the barriers to efficient data sharing in cancer biology, but I’m curious about standard web technologies that can help patients share what they want, when they want, and to whom they want. If a patient can share a movie, a picture, or a book within several seconds around the world, why is it sometimes still difficult for them to share their medical information?

For a while I thought the major problems had to do with the rules and regulations surrounding medical data, but that turns out not to be the case. The simplest way to start thinking about crowd-sharing of medical information is that millions of people around the world already crowd-share medical information. For example, women with breast cancer sometimes wear pink t-shirts to raise awareness, and they then circulate these pictures on social networks. That’s an example of someone sharing medical information – namely, their cancer diagnosis – in the form of a picture.

A few months ago, I started to look into web technologies that could potentially be used to help people share some of their medically-relevant information within 1 second. I chose the 1 second standard arbitrarily – it seemed like a reasonable number. Much below one second you run into various technical problems, but if you are willing to wait a few hundred milliseconds, the technologies are all already there: inexpensive, massively scalable, and globally deployed.

What if each cancer patient on earth had the ability to broadcast key pieces of information about their cancers around the world, in one second? 

If you are curious, here is the White House fact sheet announcing CancerBase, and here is a little bit more information about how we started out. The actual site is at CancerBase.org. It’s an experiment run by volunteers, many of whom are cancer patients, so bear with us, and if you can, help out!

Prediction of Overall Mortality from Fitbit heart rate data

From what I can tell, the Fitbit API returns heart rate data at an effective temporal resolution of 9.98 seconds (min: 5 s, median: 10 s, max: 15 s). Curiously, you are more likely to get either a 5 or 15 s interval than a 10 s interval. Using Mathematica, as before, we can plot the distribution of times between samples returned by the Fitbit API,


fitbitHR
That is still (although just barely) usable for measuring heart rate recovery, the change in your heart rate some time t after you stop your exercise. For most things you can measure on a wearable, any one datapoint is next to useless; the key is to look at first and second derivatives, such as gradual trends in how your heart rate drops following a few minutes on the treadmill. The key medical study is probably the October 1999 article in NEJM, Heart-rate recovery immediately after exercise as a predictor of mortality. The conclusion of that paper is that “A delayed decrease in the heart rate during the first minute after graded exercise, which may be a reflection of decreased vagal activity, is a powerful predictor of overall mortality”. Their standard for a ‘delayed’ decrease was a drop of ≤ 12 beats per minute from the heart rate at peak exercise, measured 1 minute after cessation of exercise. Since Fitbit is probably not in the “mortality prediction” market, ~10 s temporal resolution is fine; for medical researchers, however, it would be nice to have slightly higher temporal resolution data.

Directional Quantile Envelopes – making sense of 2D and 3D point clouds

Imagine some large multidimensional dataset; one of the things you might wish to do is to find outliers, and more generally, say something statistically-defined about the structure of clusters of points within that space. One of my favorite techniques for doing that is to use directional quantile envelopes, developed and implemented by Anton Antonov and described here and here. In that post, Antonov considers a set of uniformly distributed directions and constructs the lines (or planes) that separate the points into quantiles; if you consider enough directions, and do this a few times, you are left with lines (or planes) that define a curve (or surface) that envelops some quantile q of your data. The figures show a cloud of points with some interesting structure and the surface for q = 0.7, with and without the data.

Beyond general data analytics, the directional quantile envelope approach has at least one more application, which is in image processing and segmentation. Imagine taking a picture of a locally smooth blob-like object in the presence of various (complicated) artifacts and noise. You could throw the usual approaches at this problem (gradient filter, distance transform, morphological operations, watershed, …), but in many of those approaches you end up having to empirically play with dozens of parameters until things “look nice”, which is unsettling. What you would really like to do is to detect/localize/reconstruct the emitting object in a statistically-defined, principled manner, and this is what Directional Quantile Envelopes allow you to do.

segmentation_7With a quantile envelope, you can compactly communicate what you did to the raw imaging data to get some final picture of a cell or organoid, rather than reporting an inscrutable succession of filters, convolutions, and adaptive nonlinear thesholding steps. The figure shows a cell nucleus imaged with a confocal microscope; in reality, the cell nucleus is quite smooth, but various imaging artifacts result in the appearance of “ears”, which can be detected as outliers via directional quantile envelopes.