I Just Want to See the Raw Data!

Originally posted (by me) on LinkedIn, December 9, 2014.

This is pretty close to raw data. How informative is it?

This is pretty close to raw data. How informative is it?

“I just want to see the raw data! No interpretation, no massaging the numbers, just the raw data straight out of the instrument!”

I sympathized with my non-scientist friend. She felt frustrated after reading a series of news items that began with a promising discovery, followed by a series of caveats, followed by more news stories reporting that no one knew for sure what was going on, and several more years of research would be required to clarify the findings from the initial report.

She didn’t know whom to believe. Scientists presented what looked like clear and convincing evidence, only to be shouted down by political activists and religious leaders claiming, “That’s just your opinion!” and citing past scientific studies proven biased, fraudulent, or just plain wrong.

The problem is, raw data points don’t tell you much of anything. Even an experienced scientist needs some kind of interpretation to convert the numbers into knowledge. What question was being asked and how did the person go about trying to find an answer? What did the instrument measure directly, and what assumptions were used to make indirect observations? What good does it do you to know that an unknown sample produces six times the voltage response at 7.3 minutes elution time than it did at 6.9 minutes if that’s all you know?

What were the conditions of the experiment? The same instrument can generate high- and low-resolution data, depending on how it is set up. Each type of data is useful for some purposes, but not others. Attachments, filters, thermostats — each of these can be added, adjusted, or calibrated to increase the sensitivity toward some observations, but they can also obscure other observations. The star you’re looking at might shine most brightly at the wavelengths you’re filtering out to cut interference from the sodium-vapor street lamps nearby.

How closely do the observations mimic processes in the real world? Lab-scale syntheses often fail to predict the results of a full-scale industrial production run. Data taken in the field under real-life conditions can be biased or inaccurate, because the act of observing has altered the behavior of thing you’re observing.

The progress of science itself can prove previous science wrong or expose the limitations of previous theories. Some 19th-century physicists thought that physics would shortly become a closed field of inquiry. All the questions had been answered, they thought. Only a few small loose ends needed to be wrapped up, and then the books could be closed. One of those loose ends turned out to be quantum theory, which underlies the technologies behind the automatic doors at the grocery store and flash drives that let you store hundreds of tunes in a device you can carry in your pocket.

Scientists do share raw data among themselves, especially when they are seeking alternate interpretations or reusing data for another purpose. The studies they publish in the journals contain interpreted data, along with information on how the data were obtained and what assumptions and processing steps were used. Other scientists with experience in the strengths and limitations of various instruments and methods review each others’ studies and identify gaps or alternate interpretations.

Some tests have been repeated so often, with such consistent results, that the interpretative steps can be programmed into the instruments themselves. You see this type of lab analysis on television shows where the forensic lab tech puts a paint chip from a crime scene into an instrument, and immediately sees that it could only have come from a 2009 Fiat. This data is not raw — the interpretation has merely been automated.

A wide spectrum of knowledge spans the territory between “That’s just your opinion!” and “There’s so much evidence here that I would stake my life on this.” The difference lies not in seeking some pure spring of unsullied data, but in knowing what questions were asked, how they were answered, and how the answers fit in with everything else. It requires seeing things happen the same way over and over and trusting things to happen that way again under the same conditions. It also requires a willingness to change your thinking if new information puts established knowledge into a new and broader context.