Back

Discover CALS

See how our current work and research is bringing new thinking and new solutions to some of today's biggest challenges.

Share
  • PRO-DAIRY
  • Animal Science
  • Digital Agriculture
  • Dairy

Puchun Niu is a visiting postdoctoral associate in the lab of Miel Hostens, the Robert and Anne Everett associate professor of digital dairy management and data analytics at Cornell University. Puchun's work focuses on collecting experimental data for integration and management, processing the data through a pipeline for quality control and profiling, and creating a comprehensive dataset for storage, analysis and visualization.

We spoke with Puchun about creating a data pipeline to standardize large volumes of data collected across multiple dairy cattle experiments. The experiments are conducted as part of the Accelerating Livestock Innovations for Sustainability (ALIS) program, a transdisciplinary collaboration that uses feed additives, ration balancing, technology and data modeling to create solutions for climate-smart animal agriculture. 

Animal agriculture researchers have been conducting experiments for years without a data pipeline. Why do they need one now?

Researchers collect large volumes of data from animal experiments. A typical dairy cattle experiment might include feed chemical analysis, feed intake, milk yield, feed digestibility and energy expenditures—not to mention methane emissions, which are central to the latest ALIS experiments. Managing these diverse measurements, and the metadata that provides information about the measurements, across multiple experiments poses significant challenges for data organization, analysis and efficient utilization.

A data pipeline helps streamline the collection, processing and analysis of data. This is crucial in modern agriculture, where the growing use of technology and data-driven decision-making plays an essential role in improving productivity, sustainability and research outcomes. 

What is wrong with storing data in traditional worksheet-based formats, such as spreadsheets?

The spreadsheet approach has several limitations. There are discrepancies in how data is recorded across experiments and limited ability to scale for large datasets. Frequent errors also happen due to manual input, and the lack of automation makes the process tedious for humans. And then there is the lack of centralized data management, which weakens data security.

These challenges highlight the need for a standardized approach to agricultural data organization that is consistent, easy to understand, and readable by both people and computer systems. An approach like that allows for seamless integration and analysis across different experiments. That makes the process more efficient and reliable.

What kind of data are you using to create the data pipeline?

The experimental data we are working with come from experiments focused on understanding the effects of various dietary interventions on methane emissions in dairy cattle. ALIS researchers investigated the effects of dietary starch, lipid types and levels, castor oil, cashew nut shell liquid and dietary fat on enteric methane emissions. 

They also carried out a comparison of emissions from two types of measurement tools: GreenFeed units, which measure gas emissions, such as methane and carbon dioxide, from individual cows as they vist the unit; and respiration chambers, which measure and monitor methane and other gas emissions from individual cows every 10 minutes.

You are currently tackling the challenges of data management using data from a couple of ALIS experiments. Tell us how that works.

I am collaborating with Dr. Hostens and Enhong Liu, postdoctoral associate, on the data pipeline. Our project’s key goal is to create a streamlined, scalable and reliable data pipeline that significantly improves the organization, quality and usability of experimental data across multiple dairy cattle studies.

We start by providing formatted spreadsheets to ALIS researchers to ensure that data—such as feed intake, and daily methane and carbon dioxide production—is submitted in a consistent format. Once we receive the data, we follow a multi-step process to ensure it is verified and complete. Finally, the cleaned datasets are stored and integrated into a single, large dataset. This unified dataset can then be used for statistical analysis, visualization and machine learning applications. The entire process is automated, allowing for rapid data processing within seconds.

Looking forward, what are the next steps? 

Our next steps involve gathering feedback from the ALIS researchers to learn about their experience with the pipeline. This feedback will help us improve the system and make it easier to use. Our main goal is to manage data from animal agriculture projects more efficiently, reduce the time needed to combine data, and ensure that the data is accurate and consistent.

Jackie Swift is the communications specialist for the Cornell CALS Department of Animal Science.

Keep Exploring

a man stands in a cow barn wearing a methane sensor backpack

News

When you think about the science of dairy farming, you may think about cow genetics or feed formulation, but what about environmental stewardship? This includes things like manure management, cow comfort, and greenhouse gas reduction—all aspects...
  • PRO-DAIRY
  • Ruminant Center
  • Animal Science
Long row of shelves storing large cheese wheels

News

Rob Ralyea, manager of the Cornell Food Processing and Development Laboratory, has been the genius behind many of New York’s award-winning cheeses.

  • Food Science
  • Food
  • Dairy