World Library  
Flag as Inappropriate
Email this Article


Article Id: WHEBN0000003878
Reproduction Date:

Title: Biostatistics  
Author: World Heritage Encyclopedia
Language: English
Subject: Public health, Tamara Awerbuch-Friedlander, Ronald Fisher, Statistical hypothesis testing, VCU School of Medicine
Collection: Bioinformatics, Biostatistics, Demography, Public Health
Publisher: World Heritage Encyclopedia


Biostatistics (or biometry) is the application of statistics to a wide range of topics in biology. The science of biostatistics encompasses the design of biological experiments, especially in medicine, pharmacy, agriculture and fishery; the collection, summarization, and analysis of data from those experiments; and the interpretation of, and inference from, the results. A major branch of this is medical biostatistics,[1] which is exclusively concerned with medicine and health.


  • History 1
  • Scope and training programs 2
  • Recent developments in modern biostatistics 3
  • Applications of biostatistics 4
  • See also 5
  • References 6
  • External links 7


Biostatistical reasoning and modeling were of critical importance to the foundation theories of modern biology. In the early 1900s, after the rediscovery of Gregor Mendel's Mendelian inheritance work, the gaps in understanding between genetics and evolutionary Darwinism led to vigorous debate among biometricians, such as Walter Weldon and Karl Pearson, and Mendelians, such as Charles Davenport, William Bateson and Wilhelm Johannsen. By the 1930s, statisticians and models built on statistical reasoning had helped to resolve these differences and to produce the neo-Darwinian modern evolutionary synthesis.

The leading figures in the establishment of population genetics and this synthesis all relied on statistics and developed its use in biology.

These individuals and the work of other biostatisticians, mathematical biologists, and statistically inclined geneticists helped bring together evolutionary biology and genetics into a consistent, coherent whole that could begin to be quantitatively modeled.

In parallel to this overall development, the pioneering work of D'Arcy Thompson in On Growth and Form also helped to add quantitative discipline to biological study.

Despite the fundamental importance and frequent necessity of statistical reasoning, there may nonetheless have been a tendency among biologists to distrust or deprecate results which are not Friden calculator from his department at Caltech, saying "Well, I am like a guy who is prospecting for gold along the banks of the Sacramento River in 1849. With a little intelligence, I can reach down and pick up big nuggets of gold. And as long as I can do that, I'm not going to let any people in my department waste scarce resources in placer mining."[2]

Scope and training programs

Almost all educational programmes in biostatistics are at postgraduate level. They are most often found in schools of public health, affiliated with schools of medicine, forestry, or agriculture, or as a focus of application in departments of statistics.

In the United States, where several universities have dedicated biostatistics departments, many other top-tier universities integrate biostatistics faculty into statistics or other departments, such as epidemiology. Thus, departments carrying the name "biostatistics" may exist under quite different structures. For instance, relatively new biostatistics departments have been founded with a focus on bioinformatics and computational biology, whereas older departments, typically affiliated with schools of public health, will have more traditional lines of research involving epidemiological studies and clinical trials as well as bioinformatics. In larger universities where both a statistics and a biostatistics department exist, the degree of integration between the two departments may range from the bare minimum to very close collaboration. In general, the difference between a statistics program and a biostatistics program is twofold: (i) statistics departments will often host theoretical/methodological research which are less common in biostatistics programs and (ii) statistics departments have lines of research that may include biomedical applications but also other areas such as industry (quality control), business and economics and biological areas other than medicine.

Recent developments in modern biostatistics

The advent of modern computer technology and relatively cheap computing resources have enabled computer-intensive biostatistical methods like bootstrapping and resampling methods. Furthermore, new biomedical technologies like microarrays, next generation sequencers (for genomics) and mass spectrometry (for proteomics) generate enormous amounts of (redundant) data that can only be analyzed with biostatistical methods. For example, a microarray can measure all the genes of the human genome simultaneously, but only a fraction of them will be differentially expressed in diseased vs. non-diseased states. One might encounter the problem of multicolinearity: Due to high intercorrelation between the predictors (in this case say genes), the information of one predictor might be contained in another one. It could be that only 5% of the predictors are responsible for 90% of the variability of the response. In such a case, one would apply the biostatistical technique of dimension reduction (for example via principal component analysis). Classical statistical techniques like linear or logistic regression and linear discriminant analysis do not work well for high dimensional data (i.e. when the number of observations n is smaller than the number of features or predictors p: n < p). As a matter of fact, one can get quite high R2-values despite very low predictive power of the statistical model. These classical statistical techniques (esp. least squares linear regression) were developed for low dimensional data (i.e. where the number of observations n is much larger than the number of predictors p: n >> p). In cases of high dimensionality, one should always consider an independent validation test set and the corresponding residual sum of squares (RSS) and R2 of the validation test set, not those of the training set.

In recent times, random forests have gained popularity. This technique, invented by the statistician Leo Breiman, generates a lot of decision trees randomly and uses them for classification (In classification the response is on a nominal or ordinal scale, as opposed to regression where the response is on a ratio scale). Decision trees have of course the advantage that you can draw them and interpret them (even with a very basic understanding of mathematics and statistics). Random Forrests have thus been used for clinical decision support systems.

Gene Set Enrichment Analysis (GSEA) is a new method for analyzing biological high throughput experiments. With this method, one does not consider the perturbation of single genes but of whole (functionally related) gene sets. These gene sets might be known biochemical pathways or otherwise functionally related genes. The advantage of this approach is that it is more robust: It is more likely that a single gene is found to be falsely perturbed than it is that a whole pathway is falsely perturbed. Furthermore, one can integrate the accumulated knowledge about biochemical pathways (like the JAK-STAT signaling pathway) using this approach.

Applications of biostatistics

See also


  1. ^
  2. ^
  3. ^
  4. ^
  5. ^
  6. ^
  7. ^

External links

  • The International Biometric Society
  • The Collection of Biostatistics Research Archive
  • Guide to Biostatistics (
  • Biomedical Statistics
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.

Copyright © World Library Foundation. All rights reserved. eBooks from World eBook Library are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.