Math 058B: Introduction to Biostatistics

Introduction to Biostatistics is a first course in statistics focused on topics and data found in the life sciences. No biological background is needed, but interest in the life sciences is important.

The Course

Art by @allison_horst

Introduction to Biostatistics is an introduction to statistical ideas using R. We will cover the majority of statistical methods which are used in standard analyses (e.g., t-tests, chi-squared analysis, regression, confidence intervals, binomial tests, etc.). The main inferential techniques will be covered using both theoretical approximations (e.g., the central limit theorem) as well as computational methods (e.g., permutation tests and bootstrapping). Focus will be on understanding the methods and interpreting results.

Student Learning Outcomes.

By the end of the semester, students will be able to do the following:

  • Given a study, identify population, sample, parameter, statistic, observational unit, and variable.
  • Describe the differences between, benefits of each, and conclusions which can be drawn in observational studies versus experiments.
  • Given a dataset and research query, create an appropriate figure in R.
  • Given a dataset and research query, compute appropriate statistics in R.
  • Describe the difference between the distribution of a sample of data and a sampling distribution of a particular statistic.
  • For a particular research question, identify whether the task requires descriptive analysis / model, graphic, confidence interval, or hypothesis test,
  • Apply the empirical rule to as an approximation to confidence intervals and hypothesis testing in settings of means and proportions.
  • Be able to describe in words what a p-value is and what it is not.
  • Write down appropriate null and alternative hypotheses, and choose the correct analysis technique.
  • Run the hypothesis test / confidence interval analysis in R.
  • Identify when it is and when it is not appropriate to summarize the relationship between two variables using a least squares line. Describe the optimization procedure the leads to a least squares fit (although not necessarily to do the calculations).
  • Provide the settings in which a causal claim is warranted, and when a strong correlation is possibly due to spurious relationships.
  • Use a regression line to make predictions and distinguish between a prediction interval for an independent response as compared to a confidence interval for the slope parameter.
  • For each descriptive analysis, visualization, confidence interval, or hypothesis test, in words communicate the conclusion of the analysis in the original context of the data.
  • Use R Markdown to run reproducible analyses that include all aspects of the data analysis.

Course website

Introduction to Biostatistics was last taught in Spring 2023, materials can be found on the course website.