Statistics - learning from data and of measuring, controlling and communicating uncertainty - has become important to science and it is vital to the future of science, Science 2.0.

Over the last 200 years, and certainly with the advent of large-scale computing in the last 30 years, statistics has been an essential part of the social, natural, biomedical and physical sciences, along with engineering; and business analytics.

Statistics helps quantify the reliability, reproducibility and general uncertainty associated with discoveries, because one can easily be fooled by complicated biases and patterns arising by chance.

Image: Freeman Lab

Statistics has matured around making discoveries from data so statistical thinking will be integral to Big Data challenges, like Science 2.0, so it seems odd that statisticians are infrequently included in Big Data teams. There can be no Internet of Things without addressing those challenges.

A new paper highlights the role of statistics in facing the future.

Biological Sciences/Bioinformatics – Biology has changed from a data-poor discipline to a data-intensive one. Today, biologists regularly sift through large data sets. Furthermore, many outcomes of interest, such as gene expression, are dynamic quantities. The complexity is further exacerbated by cutting-edge, yet unpolished, technologies producing measurements noisier than anticipated. This complexity and level of variability makes statistical thinking an indispensable aspect of the analysis. The biologists are now seeking statisticians as collaborators, and these collaborations have led to, among other things, the development of breast cancer recurrence gene expression assays that identify patients at risk of distant recurrence following surgery.

Health Care and Public Health – Personalized predictions of disease risk, as well as time to onset, progression or disease-related adverse events, has the potential to revolutionize clinical practice. Such predictions are important in the context of mental health, cancer, autoimmune disorders, diabetes, inflammatory bowel diseases, and stroke and organ transplants, among others. Statisticians are using huge amounts of medical data to make personalized predictions of disease risk, understand the benefits and harms of drugs and other treatments in addition to environmental factors, analyze the quality of care and understand changing health trends.

Society – Statistics and data mining for crime analysis and predictive policing have had a major impact on the way police patrol in major cities and respond to domestic violence cases, sentencing and crime policy. Statistics was used to demonstrate improved efficiency and better performance for the electric utility grid. Statisticians also have made important contributions on the challenges of measuring traffic and civic infrastructure maintenance. Last, statisticians have made great progress toward public use of government microdata with synthetic data techniques to protect respondents’ privacy.

Social Sciences and humanities – Statistics was central to new United Nations population and fertility probabilistic projections obtained by combining demographic models with Bayesian methods. These methods were central to producing projections released by the UN that influence national policies worldwide. The large-scale field experiments that revolutionized political campaigns are another example in which statistical methods to adjust for noncompliance were critical. Last, statistics was central to research deducing the behavior of individual group members, work that has subsequently been used in litigation by both sides in every state over the Voting Rights Act.

"In the work of OSTP, NSF and other federal agencies to address STEM workforce issues, we strongly encourage attention to attracting and retaining the next generation of statisticians, especially those who can work seamlessly across disciplines. The statisticians engaged in 
interdisciplinary research involving Big Data will need to be computationally savvy, possessing expertise in statistical principles and an understanding of algorithmic complexity, computational cost, basic computer architecture and the basics of both software engineering principles and handling/management of large-scale data,” the paper concludes.

Link: Discovery with Data: Leveraging Statistics with Computer
Science to Transform Science and Society
by A Working Group of the American Statistical Association. Source: American Statistical Association.