Sumanta Basu

November 8, 2016

Sumanta Basu

Sumanta Basu, assistant professor and Shayegani Bruno Family Faculty Fellow, biological statistics and computational biology
Academic focus: learning dynamics of complex systems from high-dimensional data; applications in genomics and financial economics
Previous positions: postdoctoral scholar, University of California, Berkeley and Lawrence Berkeley National Laboratory, 2014-16
Academic background: B.Stat, 2006, and M.Stat., 2008, Indian Statistical Institute, Kolkata; Ph.D. statistics, University of Michigan, 2014
Last book read: “American Gods” by Neil Gaiman
In his own time: reading, cycling and traveling
Current research project: Currently I am working on two collaborative interdisciplinary projects where the common statistical theme is to learn about the structure of a large, complex system from noisy, high-dimensional datasets. The first project, joint with my collaborators at UC Berkeley and LBNL, is on epigenomics. Our goal is to detect high-order interactions among biomolecules (typically transcription factors and histone modifications) which play central roles in gene regulation and RNA-processing. Building upon a popular machine learning algorithm Random Forests, we have developed a method that shows promising results in recovering known interactions in fruit fly genome by integrative analysis of next generation sequencing data. We are currently working with datasets on human epigenome produced by the ENCODE consortium to look for novel interactions, potentially relevant for alternative splicing. The second project, joint with my collaborators at U Michigan, is on monitoring systemic risk and identifying risky institutions in the US economy. Using some of our prior works on estimating high-dimensional time series models and recent advances in statistical machine learning, we are developing econometric measures to estimate network connectivity among firms from publicly available time series data on their health characteristics (e.g., return, volatility, leverage, etc.). Our approach of graphical modeling with time series datasets holds the promise of monitoring risk build-up in the economy by studying the evolution of network density and identifying risky firms by studying the network topology. Both projects present some unique theoretical and computational challenges that are leading to novel methodological research in core statistics.
What most excites you about Cornell: Cornell has a long history of statisticians collaborating closely and successfully with experts from diverse scientific fields and making significant contributions to both mathematical and interdisciplinary research. I consider myself very fortunate to be part of such a vibrant academic environment.