I see the process of data science as being similar to the process historians go through to understand our past. In each case, we are looking back at something that has happened and making interpretations about it and how it might influence our future. In doing so, both data scientists and historians rely on primary data sources and sometimes secondary data sources. The challenge of using secondary sources is that it has already had someone else’s interpretation applied. In other words, it is derived data, which is different than source data.
Gaining insights without inadvertently applying another person’s interpretation or subconscious bias is critical in data science. As Eric Haller and Greg Satell wrote in their article, “Data-Driven Decisions Start with These 4 Questions” for Harvard Business Review online, to have confidence in data about a business, we need to first understand:
- Its source
- How it was analyzed
- What it doesn't tell us
- How we can use it
That’s why at XIFIN, we’ve continued to invest in solutions that deliver quality data from primary sources that our clients can rely on to make data-based decisions. Let’s look at revenue cycle management (RCM) data, for example. With some RCM solutions, clients are expected to pull data from multiple reports and bring it together to evaluate it and make decisions. Because it’s coming from other reports, assumptions and interpretations inherently come along with it. What was each report designed to exhibit? What assumptions might have been made by the analysts who designed the reports?
At XIFIN, our goal is to limit incidental interpretation and let the data “speak” for itself. As such, our business intelligence (BI) capabilities and Advanced Analytics platform pull data directly from the source application, i.e., primary source data. The Advanced Analytics platform uses comprehensive, subject-focused datasets, pulled straight from the RPM application. This reduces the risk of errors as well as any bias from prior interpretation.
With many RCM reporting solutions, clients pull subsets of data based on what they want to see — they shine a spotlight on a particular subset of data they believe will direct them to make a decision or solve a problem. If we draw conclusions based on analysis of the data subset, we may be lured in to extrapolating the characteristics of that subset to the entire data population. This is similar to a scientist starting with a faulty hypothesis, and interpreting data in such a way to support said hypothesis; the data “in the spotlight” may confirm our hypothesis, but it is not necessarily true of the data “in the dark.” With XIFIN, on the other hand, there is no data left in the dark. Clients see a complete dataset, prepared in a series of top-level visualizations, with all of the detail built-in. Therefore, clients can spot valuable trends they may not have even been trying to see. For a deeper look into how RCM metrics might inadvertently mislead a finance, billing, or revenue cycle management team, see Diana Richard’s blog post “Hidden Landmines in Revenue Cycle Management: Misleading Metrics.”
Look for Part 2 of this post, coming soon, which will outline how our data science process influences diagnostic providers. It will cover the value of aggregating (federating) data, the analysis, action, automation process, and how we use artificial intelligence in conjunction with in-depth subject matter expertise.