Statistical Modelling 17 (4-5) (2017), 245–289

Statistical contributions to bioinformatics: Design, modelling, structure learning and integration

Jeffrey S. Morris
Department of Biostatistics,
The University of Texas M.D. Anderson Cancer Center,
Houston, Texas,
USA
e-mail: jefmorris@mdanderson.org

Veerabhadran Baladandayuthapani
Department of Biostatistics,
The University of Texas M.D. Anderson Cancer Center,
Houston, Texas,
USA


Abstract:

The advent of high-throughput multi-platform genomics technologies providing whole- genome molecular summaries of biological samples has revolutionalized biomedical research. These technologiees yield highly structured big data, whose analysis poses significant quantitative challenges. The field of bioinformatics has emerged to deal with these challenges, and is comprised of many quantitative and biological scientists working together to effectively process these data and extract the treasure trove of information they contain. Statisticians, with their deep understanding of variability and uncertainty quantification, play a key role in these efforts. In this article, we attempt to summarize some of the key contributions of statisticians to bioinformatics, focusing on four areas: (1) experimental design and reproducibility, (2) preprocessing and feature extraction, (3) unified modelling and (4) structure learning and integration. In each of these areas, we highlight some key contributions and try to elucidate the key statistical principles underlying these methods and approaches. Our goals are to demonstrate major ways in which statisticians have contributed to bioinformatics, encourage statisticians to get involved early in methods development as new technologies emerge, and to stimulate future methodological work based on the statistical principles elucidated in this article and utilizing all available information to uncover new biological insights.

Keywords:

bioinformatics; Epigenetics; experimental design; genomics; preprocessing, proteomics; regularization; reproducible research; statistical modelling.
back