Back

Discover CALS

See how our current work and research is bringing new thinking and new solutions to some of today's biggest challenges.

Share
  • Statistics and Data Science
Dan Kowal, associate professor of statistics and data science in the College of Agriculture and Life Sciences, has developed a new Bayesian regression analysis method that’s more flexible, accurate and easy to use.

Machine learning and artificial intelligence wouldn’t be possible without the statistical models that underpin their analytic capabilities. A Cornell statistician and his colleague have developed a revolutionary new method to analyze complex datasets that’s more flexible, accurate and easy to use. 

Dan Kowal, associate professor of statistics and data science, a shared department in the College of Agriculture and Life Sciences, the School of Industrial and Labor Relations and the Cornell Ann S. Bowers College of Computing and Information Science, is lead author of “Monte Carlo Inference for Semiparametric Bayesian Regression,” which published Oct. 1 in the Journal of the American Statistical Association. Co-author is Bohan Wu, now a Ph.D. student at Columbia University. 

“This method gives people more power when they’re working with messy data and trying to untangle the complexities of various effects,” Kowal said. “I want people to be using reliable models so that they can really tease out the signal from the noise. We’ve found empirically that this method can do that across a broad array of different data types, distributions and settings. It’s exactly the kind of contribution that excites me as a statistician.”

Bayesian regression analysis enables researchers to predict a range of outcomes instead of a single estimate. Kowal’s model is specifically designed to analyze “messier data” that doesn’t fit nicely into a bell curve, he said. It can analyze and make predictions on a huge variety of topics, including health care utilization, family incomes, financial markets and climate events. For example, doctors sometimes ask their patients to self-report on their mental health with questions like, “How many days in the last 30 days was your mental health not good?” A large number of people answer “0,” and another large number answer “30,” and the rest generally estimate by answering in increments of 5 or 7, Kowal said. 

“With data like this, you get these spikes in the response that are more about the self-reporting than they are about the data itself,” he said. “If I’m trying to plan for health care capacity, I shouldn’t make decisions based on whether people are answering 14 vs. 15 vs. 16. But having models that can appropriately stretch out or compress these clumped data points enables your analysis to make more sense and ultimately be more useful.”

Kowal’s new method is also easier for researchers to use. Bayesian regression analyses typically require use of a complex algorithm (called Markov chain Monte Carlo) that requires a huge amount of computing power and multiple diagnostics to ensure the algorithm itself doesn’t break. Kowal’s method avoids that algorithm. 

“When people use Markov chain Monte Carlo, they have to do all types of diagnostics to make sure things are working well. The algorithm requires its own effort, independent of the model and the data you really care about,” he said. “In this paper, we actually completely circumvent that but still retain model flexibility and accuracy in predicting outcomes.”

Kowal has built a website with documentation and examples of how to use his new method, and he’s published free, downloadable software on CRAN, the premier website for open-source programming for statistical computing. 

 

Krisy Gashler is a writer for the College of Agriculture and Life Sciences.

Keep Exploring

Students participate in activity at conference

News

Eight student delegates from the New York Youth Institute joined 162 other delegates at the 2024 Global Youth Institute, an educational conference for high school students hosted annually by the World Food Prize Foundation in Des Moines, Iowa...
  • Department of Global Development
  • Global Development
Portrait of Xingen Lei

Field Note

In a classroom in Warren Hall, students in Global Food, Energy and Water Nexus (ANSC 4880/6880 and crosslisted) delve into a heated debate about China’s Three Gorges Dam. They weigh its effectiveness in providing carbon-neutral energy while...
  • Animal Science
  • Food Science
  • Energy