by Andrew Oram
This was originally published on O’Reilly Media’s Strata blog, October 7, 2013.
Here’s what we all know: that a data-rich health care future is coming our way. And what it will look like, in large outlines. Health care reformers have learned that no single practice will improve the system. All of the following, which were discussed at O’Reilly’s recent Strata Rx conference, must fall in place.
I’ll warrant you can’t find a single doctor who says, "It works great to wait for people to get sick and then come to me to be fixed up." No insurer will say, "We’re happy taking our cut from the 18% of gross national product that goes to health care, and we’re looking forward to it reaching 24%" (a figure I’ve heard batted around for future costs). Everyone realizes the system will collapse, taking their livelihoods with it, unless we change.
In modern statistics, a model is not just a way of approaching problems mentally, but a set of directions to a computer program for solving those problems Tuan Dinh, who wrote a recent article on new medical practices, traced the history of model-based medicine at Strata Rx. Dinh rang up most of the themes of modern health reform: collecting data from multiple sources, patient engagement, analytics.
In the 1970s and 1980s (when the casual meaning of "model" applied), models were based on clinical judgment and expert opinion. They were not supported by well-established evidence, but were based on gross oversimplifications and errors.
Then evidence-based medicine (EBM) emerged in 1990s, based on systematic reviews of available evidence, of which randomized clinical trials are the gold standard. EMB is seen everywhere now: pay for performance, care processes, EHRs, etc.
But EBM was designed for the pre-computer era, to let doctors focus on one variable at a time. Dinh said there are already 10 established models for treating cardiovascular disease, 50 for diabetes, etc. But most are poor because they are based on a small and inappropriate selection. And different models give different advice, so what do doctors trust?
The upcoming stage of analysis, model-based medicine, requires the analysis of large numbers of variables, and huge sets of patient information that are not obtainable through clinical trials. Model-based medicine can handle information on real patients (clinical trials used idealized patients—people who are healthy except for a single condition) and gather up complex inputs: lab information, genetic information, family history, comorbidities, and patient preference.
A number of talks at Strata Rx dealt with reducing readmissions shortly after a hospital discharge. Why the obsession with this particular cost reduction? Well, Medicare fairly recently announced strong penalties for hospital readmissions, so it catapulted suddenly to the health care field’s favorite application of data analysis.
In one such talk, Miriam Paramore and David Talby showed the value of big data. There have been models for predicting readmissions for some time, but they were based on a single institution, or at best a single geographical area, and did not necessarily apply to other locations with different demographics. The older models were based on a few thousand to at most 1,700,000 samples. Paramore’s and Talby’s was based on 4.7 billion medical claims, from 120 million patients seeing 500,000 providers.
Although common regression analyses work well with smaller data sets, this gargantuan collection called for newer statistical techniques using Hadoop, some of them fairly weak by themselves, but effective when combined in what the speakers called an "ensemble approach."
Bad data bedeviled this project too, of course. They had do deal with missing data, errors, fraud, outliers, flurries, and duplicates. Still, predictions are 40% better with their model than the biggest earlier model.
The speakers also warned that models must continuously evolve, paying attention to locality (epidemics), seasonality, and changes in the hospital or its population (for instance, if a new center is opened to treat a particular condition). Even deploying the system changes the model, because it cuts out the most preventable hospital admissions and raises to the fore a new group of most at-risk patients.
You can learn a lot by focusing in microscopic fashion on one use of data, so Jo Prichard’s talk on fraud detection taught several principles that can be applied more broadly. He revealed how LexisNexis Risk Solutions identified fraud among Medicaid recipients using graph search, a fairly recent innovation that traces the chains of relationships among people or things. LexisNexis used an SQL database, but there are graph databases that represent such relationships directly, such as the Neo4j used by Fred Trotter to store referral information among doctors.
Medicaid cheaters tend to request small amounts of money that don’t draw attention, but derive value by spreading out lots of fraud among people they know and trust. LexisNexis checked for evidence of ties among Medicaid recipients, including shared houses and shared businesses.
Similarly, they could uncover doctors who were illegally prescribing controlled substances by looking at the people the doctors seemed to be connected to personally.
Accountable Care Organizations (ACOs) have insatiable appetites for data (although I don’t know whether their managers understand yet how fundamental data is to their operations). Many people take the term ACO generically to mean a health provider who has to show that their treatments are state of the art and are having a positive effect. But officially, the ACO is a regulatory category created by Centers for Medicare & Medicaid Services (CMS). As summarized by Michael Gleeson, ACOs use data to find at-risk patients and make sure they come in for treatment, measure their own success at restoring patients to health, and do other performance improvements. As I mentioned earlier, ACOs have to choose exactly the right treatment, not too little and not too much. They can’t cut off patients arbitrarily, like earlier managed care plans did.
As examples of performance improvement, Gleeson mentioned tracking the length of the doctor’s workday to determine stress levels, and to measure the time the doctor spends using an EHR to indicate where the EHR is inefficient.
He emphasized the advantage that data gives large organizations. Small ones will be able to draw fewer conclusions from data, and therefore can’t use it to improve care. Combining payer data with provider data (that is, insurers along with clinics and hospitals) is also valuable.
Finally, Gleeson lamented that Health Information Exchanges (HIEs) don’t have better data than payers, possibly because the standard data collected by doctors and transmitted to HIEs don’t have the types of data ACOs need.
Ann Waldo and Roger Magoulas, I think, tapped the angst of the data community in their presentation of HIPAA’s effect on research, which is why it was rated the most popular talk at the conference. HIPAA’s complex regulations create a maze all developers must follow. But they care about HIPAA because data exchange is so important.
In the recent HIPAA regulations announced by HHS, the biggest changes concern "business associates," a category that probably covers many conference attendees because they take patient data from health care providers or insurers and analyze it to cut provider costs or find people who need more intervention.
One recent element of the regulation, according to Waldo, makes it easy for patients to be "data altruists" and release data for future research. Another element makes it easier for patients to release sensor data.
It should be noted that doctors’ common fallback on HIPAA as an excuse for denying patients their data has been repeatedly refuted by government representatives. Claudia Williams stressed in her keynote (4:57 in the keynote video) that HIPAA requires doctors to give patients their information.
Shahid Shah, whom I mentioned earlier, gave a talk promoting better metadata about data. He pointed out that software developers take very good care of their code, using version control and marking the origin of each change.
Shah pointed out that data in the health care field has an unusual degree of variety, and suggested that the data should be marked with its source, its owner, the location where it was collected, and other traits such as whether it has been analyzed by an expert.
He got some push-back from several attendees, who were worried about the costs of collecting this metadata and whether it would ever actually be used. I worry, too, about the extra dimension added to data this way, which becomes data in its own right. If you have to record scads of metadata about each piece of data, and then go in to make a correction, don’t you have to record all that metadata about the correction?
On Wednesday evening we were entertained—and perhaps influenced—by a series of five-minute talks in the Ignite! format. The importance of the patient’s emotional care came through in Amik Ahmad’s talk. He pointed out that gift shops serve no medical purpose in hospitals, but are ubiquitous because they help visitors show they care about the patient.
A number of other speakers at Strata Rx expounded on the themes in this article in various guises. Jeff Hammerbacher, the data scientist who is making a big splash as he moved from Facebook to medicine, delivered a presentation to a standing-room-only crowd. It was an advantage to have this conference in Boston, where we could draw on national figures such as John Halamka, CIO of Beth Israel Deaconess Medical Center, and figures of local import such as Charlie Baker, former CEO of two health care organizations before turning to politics.
Author’s home page
Other articles in chronological order
Index to other articles