Back

Discover CALS

See how our current work and research is bringing new thinking and new solutions to some of today's biggest challenges.

Share
  • Statistics and Data Science
Dan Kowal, associate professor of statistics and data science in the College of Agriculture and Life Sciences, has developed a new Bayesian regression analysis method that’s more flexible, accurate and easy to use.

Machine learning and artificial intelligence wouldn’t be possible without the statistical models that underpin their analytic capabilities. A Cornell statistician and his colleague have developed a revolutionary new method to analyze complex datasets that’s more flexible, accurate and easy to use. 

Dan Kowal, associate professor of statistics and data science, a shared department in the College of Agriculture and Life Sciences, the School of Industrial and Labor Relations and the Cornell Ann S. Bowers College of Computing and Information Science, is lead author of “Monte Carlo Inference for Semiparametric Bayesian Regression,” which published Oct. 1 in the Journal of the American Statistical Association. Co-author is Bohan Wu, now a Ph.D. student at Columbia University. 

“This method gives people more power when they’re working with messy data and trying to untangle the complexities of various effects,” Kowal said. “I want people to be using reliable models so that they can really tease out the signal from the noise. We’ve found empirically that this method can do that across a broad array of different data types, distributions and settings. It’s exactly the kind of contribution that excites me as a statistician.”

Bayesian regression analysis enables researchers to predict a range of outcomes instead of a single estimate. Kowal’s model is specifically designed to analyze “messier data” that doesn’t fit nicely into a bell curve, he said. It can analyze and make predictions on a huge variety of topics, including health care utilization, family incomes, financial markets and climate events. For example, doctors sometimes ask their patients to self-report on their mental health with questions like, “How many days in the last 30 days was your mental health not good?” A large number of people answer “0,” and another large number answer “30,” and the rest generally estimate by answering in increments of 5 or 7, Kowal said. 

“With data like this, you get these spikes in the response that are more about the self-reporting than they are about the data itself,” he said. “If I’m trying to plan for health care capacity, I shouldn’t make decisions based on whether people are answering 14 vs. 15 vs. 16. But having models that can appropriately stretch out or compress these clumped data points enables your analysis to make more sense and ultimately be more useful.”

Kowal’s new method is also easier for researchers to use. Bayesian regression analyses typically require use of a complex algorithm (called Markov chain Monte Carlo) that requires a huge amount of computing power and multiple diagnostics to ensure the algorithm itself doesn’t break. Kowal’s method avoids that algorithm. 

“When people use Markov chain Monte Carlo, they have to do all types of diagnostics to make sure things are working well. The algorithm requires its own effort, independent of the model and the data you really care about,” he said. “In this paper, we actually completely circumvent that but still retain model flexibility and accuracy in predicting outcomes.”

Kowal has built a website with documentation and examples of how to use his new method, and he’s published free, downloadable software on CRAN, the premier website for open-source programming for statistical computing. 

 

Krisy Gashler is a writer for the College of Agriculture and Life Sciences.

Keep Exploring

A person works on a production line wearing protective gear

News

Researchers have created a computer model that can help produce farms and food processing facilities control COVID-19 outbreaks, keeping workers safe and the food chain secure.

  • Food Science
  • Food
Melanie Lyons, M.Eng ’22, biomedical engineer at Llume; Roy Cohen, co-founder of TETmedical; and Ricardo Garcia de Alba, CEO of Meiogenix, show the certificates and mugs they received at the April 15 incubator graduation ceremony at Weill Hall.

News

Cornell’s incubator Class of 2025, composed of startups Llume, Meiogenix and TETmedical, is advancing innovations in human performance monitoring, non-GMO plant breeding and neurological critical care.

  • School of Integrative Plant Science
  • Plant Biology Section
  • Agriculture