Prediction of Overall Mortality from Fitbit heart rate data

From what I can tell, the Fitbit API returns heart rate data at an effective temporal resolution of 9.98 seconds (min: 5 s, median: 10 s, max: 15 s). Curiously, you are more likely to get either a 5 or 15 s interval than a 10 s interval. Using Mathematica, as before, we can plot the distribution of times between samples returned by the Fitbit API,


fitbitHR
That is still (although just barely) usable for measuring heart rate recovery, the change in your heart rate some time t after you stop your exercise. For most things you can measure on a wearable, any one datapoint is next to useless; the key is to look at first and second derivatives, such as gradual trends in how your heart rate drops following a few minutes on the treadmill. The key medical study is probably the October 1999 article in NEJM, Heart-rate recovery immediately after exercise as a predictor of mortality. The conclusion of that paper is that “A delayed decrease in the heart rate during the first minute after graded exercise, which may be a reflection of decreased vagal activity, is a powerful predictor of overall mortality”. Their standard for a ‘delayed’ decrease was a drop of ≤ 12 beats per minute from the heart rate at peak exercise, measured 1 minute after cessation of exercise. Since Fitbit is probably not in the “mortality prediction” market, ~10 s temporal resolution is fine; for medical researchers, however, it would be nice to have slightly higher temporal resolution data.

Fitbit API and High Resolution Heart Rate Data

After trying the Jawbone UP3 for a few days and quickly returning it due to multiple limitations, I’m now testing a Fitbit Charge HR. I’m mostly interested in heart rate data, so I had to to update my code to OAuth 2.0. Fortunately orcasgit/python-fitbit is completely on top of things and their new gather_keys_oauth2.py works perfectly. All I had to do was to set my callback URL in the ‘manage my apps’ tab at dev.fitbit.com to http://127.0.0.1:8080/, and then gather_keys_oauth2.py returned my OAuth 2.0 access and refresh tokens. I dropped those into a text file (‘config.ini’) and used them to set up my client:

There was a small issue with orcasgit/python-fitbit, which had to do with the new ‘1sec’ detail level for the heart rate data, but I made that change and the merge is pending at github. Now, the data are flowing,

I was expecting 1 sec resolution data (based on the ‘1sec’ parameter), but the timestamps are actually spaced 5 to 15 seconds apart. It would not surprise me if ‘1sec’ is a request rather than a guaranteed minimal temporal resolution; perhaps the device does some kind of (sensible) compression and concatenates runs of identical rates, e.g. if your heart rate is precisely 59 bpm for a while, it is probably silly to continuously report a sequence of {“value”: 59}, over and over again. If this is true, are we (basically) dealing with a lossless run length encoded (RLE) data stream? Any ideas? It’s not a simple RLE, as this data 4-tuple demonstrates: {“value”: 60, “time”: “00:05:00”}, {“value”: 60, “time”: “00:05:15”}, {“value”: 60, “time”: “00:05:30”}, {“value”: 62, “time”: “00:05:40”}. If it were a simple RLE-ish encoding, then this sequence would be {“value”: 60, “time”: “00:05:00”}, {“value”: 62, “time”: “00:05:40”}, with the recipient code then assuming 40 seconds of 60 +/- 0.5 bpm or something similar. My guess right now is RLE modified to provide at least one datapoint every 15 seconds, and updating more quickly when something is changing, yielding the observed {“value”: 60, “time”: “00:05:00”}, {“value”: 60, “time”: “00:05:15”}, {“value”: 60, “time”: “00:05:30”}, {“value”: 62, “time”: “00:05:40”}.

Directional Quantile Envelopes – making sense of 2D and 3D point clouds

Imagine some large multidimensional dataset; one of the things you might wish to do is to find outliers, and more generally, say something statistically-defined about the structure of clusters of points within that space. One of my favorite techniques for doing that is to use directional quantile envelopes, developed and implemented by Anton Antonov and described here and here. In that post, Antonov considers a set of uniformly distributed directions and constructs the lines (or planes) that separate the points into quantiles; if you consider enough directions, and do this a few times, you are left with lines (or planes) that define a curve (or surface) that envelops some quantile q of your data. The figures show a cloud of points with some interesting structure and the surface for q = 0.7, with and without the data.

Beyond general data analytics, the directional quantile envelope approach has at least one more application, which is in image processing and segmentation. Imagine taking a picture of a locally smooth blob-like object in the presence of various (complicated) artifacts and noise. You could throw the usual approaches at this problem (gradient filter, distance transform, morphological operations, watershed, …), but in many of those approaches you end up having to empirically play with dozens of parameters until things “look nice”, which is unsettling. What you would really like to do is to detect/localize/reconstruct the emitting object in a statistically-defined, principled manner, and this is what Antonov’s Directional Quantile Envelopes allow you to do.

segmentation_7 A quantile envelope is well defined and you can compactly communicate what you did to the raw imaging data to get some final picture of a cell or organoid, rather than reporting an inscrutable succession of filters, convolutions, and adaptive nonlinear thesholding steps. The figure shows a cell nucleus imaged with a confocal microscope; in reality, the cell nucleus is quite smooth, but various imaging artifacts result in the appearance of “ears”, which can be detected as outliers via directional quantile envelopes.

The Fitbit API – Mathematica vs. Python

I’m teaching a class later this year and part of what we will cover is how to explore data coming from wearables. I’ve had a Fitbit Zip for a while, and my plan is to collect data over the summer and then to use those data for class and for problem sets, so the students have real data to look at.

I thought I would use the Mathematica Connector (via its ServiceConnect[“Fitbit”] call) to get the data from the Fitbit API, but I quickly ran into various problems. The ServiceConnect functionality at present seems somewhat rudimentary. After spending a few hours on the internals of Mathematica’s OAuth.m and trying to get valid tokens from the undocumented HTTPClient`OAuthAuthentication call (can anyone tell me how to pass nontrivial OAuth 2.0 scopes into this function?), I gave up and just used the Python Fitbit client API, which all worked right away, since, among other reasons, there is actually documentation. I followed the instructions at first-steps-into-the-quantified-self-getting-to-know-the-fitbit-api. Once you have the 4 keys you need, just place them in a config.ini file and use something like this:

This gives you a JSON dump, which can then be manipulated in Mathematica:

The latter function gives the cumulative steps vs. time (I like CDFs!), which is nice way of seeing when you were moving (slope of line > 0). This requires partner access to the Fitbit API; I was impressed with their help (emails answered within minutes) and their enthusiastic support for education and our upcoming class.