Ticking all the boxes for a health care upgrade at Strata Rx

This was originally published on O’Reilly Media’s Strata blog, October 7, 2013.

Here’s what we all know: that a data-rich health care future is coming our way. And what it will look like, in large outlines. Health care reformers have learned that no single practice will improve the system. All of the following, which were discussed at O’Reilly’s recent Strata Rx conference, must fall in place.

Data collection in daily living: Office visits, even if done every three or four months, can’t deal with the behavioral and life style issues that characterize modern illnesses. People can live better with chronic conditions if their medicine bottles report their adherence to prescriptions (and get a text message or phone call when they forget to take their pill), their carpets report if their tread is wavering, and their scales report their daily weight.
mHealth and telemedicine: The Internet can connect everybody: patients can send photos of symptoms to their doctors, doctors can consult with other doctors, and patients can exchange ideas with each other. Telemedicine makes use of all that data we hope individuals will collect about their health—and not just in developed regions of the world.
Data liquidity: Doctors need to know their patients’ histories to treat them right, so the days of repeating tests (including X-rays) and waiting weeks for records to be sent must end. In addition, data gives an advantage to large institutions—insurers, hospital conglomerates, and others who get data from thousands or millions of patients. Competition calls for offering the same benefits to small players. And even the large players jump at the chance to use open data sets, usually provided by government agencies such as the Center for Medicare & Medicaid Services (CMS). Seamless data exchange is the technical requirement for increased data sharing.
An app ecosystem: For too long, software has been left in the hands of EHR vendors and other data holders, leading to stagnation and a bias toward applications that increase the revenues of a single vendors. Open APIs turn the whole equation upside-down, encouraging anyone with an interest in the data to create a useful app. These APIs also facilitate the combination of data sets, especially if different data sets use the same API. Google Maps and the success of the iPhone App Store are often cited as metaphors for what open APIs could do for health.
Transparency in quality measures and payments: The scandal of payments that were wildly out of line with medical services and quality simmered for decades and finally burst into public view with Steven Brill’s Time Magazine article. More data has come online since, which may reduce costs (although some observers worry it will raise them, because hospitals know what they can get away with). More importantly, doctors and their institutions will be judged more accurately on how well they do.
Patient control over data: A subterranean theme undergirding Strata Rx, patient control can shake loose the bonds on all the other changes that have to happen. To store patient data with the individual instead of with the provider, insurer, pharmacy, and marketing firms is a radical inversion of power. The patients would share data as needed with providers and researchers, and could add data collected from daily living.
Sophisticated analytics: The difference between today’s accountable care and the managed care hated so much by the public in the 1970s is data. Insurers rejected treatments under managed care for crude, poorly supported reasons. (Less well-known is that they approved some horrendously painful and expensive treatments that didn’t work.) Strata Rx speakers detailed some of the ways better data collection and crunching can aid in treatment.
Genomic and other "omics" data: The body is the richest source of health-related information, and the costs of that data decrease (a full genome on one person can be generated now for about two thousand dollars) while our understand of how the data fits together increases. Genetics are only one small element of most diseases, surpassed by behavior and environmental factors, but we’re getting closer to making diagnoses on the basis of underlying physical characteristics rather than symptoms.
Research collaboration: Chronic diseases, and their genetic causes, are too big for a single biologist or lab to encompass. Data sharing will lead to the next natural step: researchers will throw away (or at least relax) their traditional suspicion of rivals in other institutions, and will work together on problems. In a talk at Strata Rx, Richard Elmore mentioned that NIH had started a "collaboratory." Sage Bionetworks has also pioneered genetic research across multiple labs.

Although this cocktail of treatments is complex, all commenters concur on the ingredients’ importance to a remarkable degree. There is no Greek or Jew, no Democrat or Republican in the consensus over health care: anyone who has looked at the system comes up with the same vision.

I’ll warrant you can’t find a single doctor who says, "It works great to wait for people to get sick and then come to me to be fixed up." No insurer will say, "We’re happy taking our cut from the 18% of gross national product that goes to health care, and we’re looking forward to it reaching 24%" (a figure I’ve heard batted around for future costs). Everyone realizes the system will collapse, taking their livelihoods with it, unless we change.

Modeling the use of models

In modern statistics, a model is not just a way of approaching problems mentally, but a set of directions to a computer program for solving those problems Tuan Dinh, who wrote a recent article on new medical practices, traced the history of model-based medicine at Strata Rx. Dinh rang up most of the themes of modern health reform: collecting data from multiple sources, patient engagement, analytics.

In the 1970s and 1980s (when the casual meaning of "model" applied), models were based on clinical judgment and expert opinion. They were not supported by well-established evidence, but were based on gross oversimplifications and errors.

Then evidence-based medicine (EBM) emerged in 1990s, based on systematic reviews of available evidence, of which randomized clinical trials are the gold standard. EMB is seen everywhere now: pay for performance, care processes, EHRs, etc.

But EBM was designed for the pre-computer era, to let doctors focus on one variable at a time. Dinh said there are already 10 established models for treating cardiovascular disease, 50 for diabetes, etc. But most are poor because they are based on a small and inappropriate selection. And different models give different advice, so what do doctors trust?

The upcoming stage of analysis, model-based medicine, requires the analysis of large numbers of variables, and huge sets of patient information that are not obtainable through clinical trials. Model-based medicine can handle information on real patients (clinical trials used idealized patients—people who are healthy except for a single condition) and gather up complex inputs: lab information, genetic information, family history, comorbidities, and patient preference.

A number of talks at Strata Rx dealt with reducing readmissions shortly after a hospital discharge. Why the obsession with this particular cost reduction? Well, Medicare fairly recently announced strong penalties for hospital readmissions, so it catapulted suddenly to the health care field’s favorite application of data analysis.

In one such talk, Miriam Paramore and David Talby showed the value of big data. There have been models for predicting readmissions for some time, but they were based on a single institution, or at best a single geographical area, and did not necessarily apply to other locations with different demographics. The older models were based on a few thousand to at most 1,700,000 samples. Paramore’s and Talby’s was based on 4.7 billion medical claims, from 120 million patients seeing 500,000 providers.

Although common regression analyses work well with smaller data sets, this gargantuan collection called for newer statistical techniques using Hadoop, some of them fairly weak by themselves, but effective when combined in what the speakers called an "ensemble approach."

Bad data bedeviled this project too, of course. They had do deal with missing data, errors, fraud, outliers, flurries, and duplicates. Still, predictions are 40% better with their model than the biggest earlier model.

The speakers also warned that models must continuously evolve, paying attention to locality (epidemics), seasonality, and changes in the hospital or its population (for instance, if a new center is opened to treat a particular condition). Even deploying the system changes the model, because it cuts out the most preventable hospital admissions and raises to the fore a new group of most at-risk patients.

Data in practice

You can learn a lot by focusing in microscopic fashion on one use of data, so Jo Prichard’s talk on fraud detection taught several principles that can be applied more broadly. He revealed how LexisNexis Risk Solutions identified fraud among Medicaid recipients using graph search, a fairly recent innovation that traces the chains of relationships among people or things. LexisNexis used an SQL database, but there are graph databases that represent such relationships directly, such as the Neo4j used by Fred Trotter to store referral information among doctors.

Medicaid cheaters tend to request small amounts of money that don’t draw attention, but derive value by spreading out lots of fraud among people they know and trust. LexisNexis checked for evidence of ties among Medicaid recipients, including shared houses and shared businesses.

Similarly, they could uncover doctors who were illegally prescribing controlled substances by looking at the people the doctors seemed to be connected to personally.

Accountable Care Organizations (ACOs) have insatiable appetites for data (although I don’t know whether their managers understand yet how fundamental data is to their operations). Many people take the term ACO generically to mean a health provider who has to show that their treatments are state of the art and are having a positive effect. But officially, the ACO is a regulatory category created by Centers for Medicare & Medicaid Services (CMS). As summarized by Michael Gleeson, ACOs use data to find at-risk patients and make sure they come in for treatment, measure their own success at restoring patients to health, and do other performance improvements. As I mentioned earlier, ACOs have to choose exactly the right treatment, not too little and not too much. They can’t cut off patients arbitrarily, like earlier managed care plans did.

As examples of performance improvement, Gleeson mentioned tracking the length of the doctor’s workday to determine stress levels, and to measure the time the doctor spends using an EHR to indicate where the EHR is inefficient.

He emphasized the advantage that data gives large organizations. Small ones will be able to draw fewer conclusions from data, and therefore can’t use it to improve care. Combining payer data with provider data (that is, insurers along with clinics and hospitals) is also valuable.

Finally, Gleeson lamented that Health Information Exchanges (HIEs) don’t have better data than payers, possibly because the standard data collected by doctors and transmitted to HIEs don’t have the types of data ACOs need.

Safe data exchange

Ann Waldo and Roger Magoulas, I think, tapped the angst of the data community in their presentation of HIPAA’s effect on research, which is why it was rated the most popular talk at the conference. HIPAA’s complex regulations create a maze all developers must follow. But they care about HIPAA because data exchange is so important.

In the recent HIPAA regulations announced by HHS, the biggest changes concern "business associates," a category that probably covers many conference attendees because they take patient data from health care providers or insurers and analyze it to cut provider costs or find people who need more intervention.

One recent element of the regulation, according to Waldo, makes it easy for patients to be "data altruists" and release data for future research. Another element makes it easier for patients to release sensor data.

It should be noted that doctors’ common fallback on HIPAA as an excuse for denying patients their data has been repeatedly refuted by government representatives. Claudia Williams stressed in her keynote (4:57 in the keynote video) that HIPAA requires doctors to give patients their information.

Should you cultivate your data?

Shahid Shah, whom I mentioned earlier, gave a talk promoting better metadata about data. He pointed out that software developers take very good care of their code, using version control and marking the origin of each change.

Shah pointed out that data in the health care field has an unusual degree of variety, and suggested that the data should be marked with its source, its owner, the location where it was collected, and other traits such as whether it has been analyzed by an expert.

He got some push-back from several attendees, who were worried about the costs of collecting this metadata and whether it would ever actually be used. I worry, too, about the extra dimension added to data this way, which becomes data in its own right. If you have to record scads of metadata about each piece of data, and then go in to make a correction, don’t you have to record all that metadata about the correction?

Short visits

On Wednesday evening we were entertained—and perhaps influenced—by a series of five-minute talks in the Ignite! format. The importance of the patient’s emotional care came through in Amik Ahmad’s talk. He pointed out that gift shops serve no medical purpose in hospitals, but are ubiquitous because they help visitors show they care about the patient.

Cheng Zhou provided an example of uncovering waste through data. In a number of situations (such as heart conditions) where unfractionated heparin would be effective, a few doctors were prescribing a variant named enoxaparin instead. Enoxaparin is 40 times more expensive, and Zhou identified the small segment of physicians were prescribing it frequently.
Dave Anstey of YarcData suggested that analytics should help you figure out "the next question to ask." An example is finding that an HIV drug is effective for some cancers.
Rumi Chunara of Harvard Medical School focused on infectious disease tracking, a topic we don’t often talk about—until we’re almost dying from it. He pointed out that globalization is creatingo more infectious diseases and spreading them faster, but the press doesn’t give them much coverage in the early stages and "there are too many steps between the public and the World Health Organization." His solution is to connect individuals directly, with regard to eliminating noise in the reports and preserving privacy.
Colette Ellis described interpersonal relationships as sometime more powerful motivations than personal goals for adhering to health regimens.

Data shards

A number of other speakers at Strata Rx expounded on the themes in this article in various guises. Jeff Hammerbacher, the data scientist who is making a big splash as he moved from Facebook to medicine, delivered a presentation to a standing-room-only crowd. It was an advantage to have this conference in Boston, where we could draw on national figures such as John Halamka, CIO of Beth Israel Deaconess Medical Center, and figures of local import such as Charlie Baker, former CEO of two health care organizations before turning to politics.

Fred Trotter rolled out a new data set at this conference, following up on his release last year of doctor referral information. The new data set offers even more potential: it covers prescriptions. He showed how that could be mashed up with the referral information through a graph database, and encouraged participants to find uses for this data.
Kevin Patrick of the University of California in San Diego rattled through a dizzying presentation on data collection by individuals. A survey by the Robert Wood Johnson Foundation found that people are willing to share their data with researchers, particularly if they’re asked to support a specific study. Perhaps more importantly, the researchers are willing to accept self-tracking data and find it useful.
Peter Speyer described a comparison of the incidence of disease across all nations. This is big data in volume (more than one billion data points) but especially in variety: they get their input from surveys, censuses, vital registration, disease registries, police records, and more. The web site provides a treemap that can be manipulated in almost endless ways to show what diseases we have suffered from and are experiencing right now. For instance, Speyer showed us how countries go through what public health calls an "epidemiological transition" as incomes rise, moving from mostly communicable diseases such as cholera to mostly non-communicable diseases such as diabetes. Speyer noted that the project often cannot publish the raw data behind the visualizations for various reasons, including copyright.
Paul Bleicher of Optum Labs, speaking with Mark Hayward of the Mayo Clinic, described a combined database that traces the path of a patient through health care starting with the initial admission.
Typical of the services offered to the heath care industry is the SAP HANA platform, a cloud solution that adheres to current fads such as in-memory databases (along with ways to persist the data) and columnar stores. HANA is being offered free to start-ups. Another talk I attended discussed Impetus, which uses Spark (possibly the next generation of MapReduce systems) to combine multiple patient data sets and suggest treatment plans.
Syapse won the award for best app at the conference, and the representative encouraged us to support Free the Data! for ovarian cancer.
Henry Wei described the Care Engine, which has been used since 1998 to support evidence-based medicine, and describe an application that can reveal drug safety and comparative effectiveness, including off-label uses (a drug that the manufacturer got FDA approval for use in one area and that turns out to be useful in another). Coming at the end of the conference, Wei needed all the bells and whistles of a compelling presentation to keep the audience’s attention, and he pulled it off with a charisma appropriate to prime-time TV anchors.
Adinda Uzoma, CEO of FlowLogic, championed what he called "data democracy." Valuable data about each of us is being held by large institutions, ranging from our doctors to our wireless providers, and could be useful to us if we had control.
The conference wrapped up with a jaw-dropping presentation by Atul Butte on the insights available through "omics" data, although he too promoted the use of personal devices to do individual tracking. "I used to kind of laugh at the the Quantified Self folks, and now I realize I am one of them." (15:10 into the keynote video.) He covered the range of genetic samples available and highlighted the achievement of high school student to derive insights from genetic data on breast cancer and leukemia. Analyzing such large data sets can lead to good candidates for clinical trials.

Author’s home page
Other articles in chronological order
Index to other articles