DoSStoolkit

The DoSS Toolkit is a bunch of self-paced modules to help you learn and use R.

We all know that R is a critical part of applied statistics and data science these days, but it can have a steep learning curve and be intimidating to get started with.

The Department of Statistical Sciences (DoSS) toolkit is a free series of open source online modules written by undergraduates, that their fellow students and the public can use to learn the essentials of R.

How to use this resource

If you have never used R before

You use this resource by running R code! This may sound intimidating if you’ve never used R before, so we’ve made a video that walks through what you need to do.

Get started by going to R Studio Cloud - https://rstudio.cloud - and creating an account. When you’ve signed up, start a new project, and copy-paste the code below to install packages. (If you already have R and R Studio working on your local computer then you don’t have to use R Studio Cloud, you can install the packages on your local machine instead.)

install.packages('tidyverse')
install.packages('remotes')
install.packages('opendatatoronto')
remotes::install_github("rstudio-education/gradethis")

Then you can install the DoSStoolkit:

remotes::install_github("RohanAlexander/DoSStoolkit")

You’ll use the function run_tutorial to run each module. At the moment we have nine modules. So you can pick one to start with. For instance, if you wanted to run the ‘hello world’ module then run:

learnr::run_tutorial("hello_world", package = "DoSStoolkit")

If you already have R and R Studio on your computer

You can install DoSStoolkit from GitHub with:

# install.packages("devtools")
devtools::install_github("RohanAlexander/heapsofpapers")

# install.packages('tidyverse')
# install.packages('remotes')
# install.packages('opendatatoronto')
remotes::install_github("rstudio-education/gradethis")

Then you can install the DoSStoolkit:

remotes::install_github("RohanAlexander/DoSStoolkit")

You’ll use the function run_tutorial to run each module. At the moment we have ten modules. So you can pick one to start with. For instance, if you wanted to run the ‘hello world’ module then run:

learnr::run_tutorial("hello_world", package = "DoSStoolkit")

Content

We have ten modules. A complete collection is here:

learnr::run_tutorial("hello_world", package = "DoSStoolkit")
learnr::run_tutorial("holding_the_chaos_at_bay", package = "DoSStoolkit")
learnr::run_tutorial("operating_in_an_error_prone_world", package = "DoSStoolkit")
learnr::run_tutorial("hand_me_my_plyrs", package = "DoSStoolkit")
learnr::run_tutorial("totally_addicted_to_base", package = "DoSStoolkit")
learnr::run_tutorial("he_was_a_d8er_boi", package = "DoSStoolkit")
learnr::run_tutorial("to_ggplot_or_not_to_ggplot", package = "DoSStoolkit")
learnr::run_tutorial("r_marky_markdown", package = "DoSStoolkit")
learnr::run_tutorial("git_outta_here", package = "DoSStoolkit")
learnr::run_tutorial("indistinguishable_from_magic", package = "DoSStoolkit")

Hello world!

How to run this module:

learnr::run_tutorial("hello_world", package = "DoSStoolkit")

Module content:

  • Why I love R, by Liza Bolton.
  • Setting up RStudio, by Annie Collins.
  • Getting to know what is what - console, terminal, etc, by Annie Collins.
  • A fun hello world exercise, by Annie Collins.
  • Another fun hello world exercise, by Shirley Deng.
  • R Weekly newsletter, R Ladies, R meetups, by Annie Collins.

Operating in an error prone world

How to run this module:

learnr::run_tutorial("operating_in_an_error_prone_world", package = "DoSStoolkit")

Module content:

  • Why I love R, by Monica Alexander.
  • Getting help is normal, by Michael Chong.
  • Using Google and Stack Overflow, by Michael Chong.
  • Stack overflow, by Annie Collins.
  • How to problem solve when your code doesn’t work, by Michael Chong.
  • Making reproducible examples, by Marija Pejcinovska.
  • How to make the most of R’s cryptic error messages, by Shirley Deng.

Holding the chaos at bay

How to run this module:

learnr::run_tutorial("holding_the_chaos_at_bay", package = "DoSStoolkit")

Module content:

  • Why I love R, by Samantha-Jo Caetano.
  • R projects and setwd(), by Isaac Ehrlich.
  • Folder set-up, by Isaac Ehrlich.
  • Writing comments, by Isaac Ehrlich.
  • install.packages(), by Haoluan Chen.
  • install_github(), by Haoluan Chen.
  • library(), by Mariam Walaa.
  • update.packages(), by Mariam Walaa.
  • read_csv(), by Marija Pejcinovska.
  • read_table(), read_dta(), and other data types, by Isaac Ehrlich.

Hand me my plyrs

How to run this module:

learnr::run_tutorial("hand_me_my_plyrs", package = "DoSStoolkit")

Module content:

  • Why I love R, by Sabrina Sixta.
  • What is the tidyverse?, by Yena Joo.
  • The pipe, by Mariam Walaa.
  • select(), by Yena Joo.
  • filter(), by Shirley Deng.
  • group_by() and ungroup(), by Matthew Wankiewicz.
  • summarise(), by Mariam Walaa.
  • arrange(), by Isaac Ehrlich.
  • mutate(), by Haoluan Chen.
  • pivot_wider() and pivot_longer(), by Annie Collins.
  • rename(), by Mariam Walaa.
  • count() and uncount(), by Annie Collins.
  • slice(), by Annie Collins.
  • c(), matrix(), data.frame(), and tibble(), by Matthew Wankiewicz.
  • length(), nrow(), and ncol(), by Isaac Ehrlich.

Totally addicted to base

How to run this module:

learnr::run_tutorial("totally_addicted_to_base", package = "DoSStoolkit")

Module content:

He was a d8er boi

How to run this module:

learnr::run_tutorial("he_was_a_d8er_boi", package = "DoSStoolkit")

Module content:

  • head(), tail(), glimpse(), and summary(), written by Haoluan Chen.
  • paste(), paste0(), glue::glue() and stringr, written by Marija Pejcinovska
  • names(), rbind() and cbind(), written by Isaac Ehrlich.
  • left_join(), anti_join(), full_join(), etc, written by Haoluan Chen.
  • Looking for missing data, written by Mariam Walaa.
  • set.seed(), runif(), rnorm(), and sample(), written by Haoluan Chen.
  • Simulating datasets for regression, written by Mariam Walaa.
  • Advanced mutating and summarising, written by Mariam Walaa.
  • Tidying up datasets, written by Mariam Walaa.
  • pull(), pluck(), and unnest(), by Isaac Ehrlich.
  • forcats and factors, written by Matthew Wankiewicz.
  • More on strings, written by Annie Collins.
  • Regular expressions, written by Shirley Deng.
  • Working with dates, written by Mariam Walaa.
  • janitor, written by Mariam Walaa.
  • tidyr, written by Mariam Walaa.

To ggplot or not to ggplot

How to run this module:

learnr::run_tutorial("to_ggplot_or_not_to_ggplot", package = "DoSStoolkit")

Module content:

  • ggplot2::ggplot(), by Shirley Deng.
  • Bar charts, by Matthew Wankiewicz.
  • Histograms, by Haoluan Chen.
  • Scatter plots, by Haoluan Chen.
  • Static maps with ggmap, by Annie Collins.
  • Various useful options, by Yena Joo.
  • Saving graphs, by Yena Joo.

R Marky Markdown and the Funky Docs

How to run this module:

learnr::run_tutorial("r_marky_markdown", package = "DoSStoolkit")

Module content:

  • Introduction to R Markdown, written by Shirley Deng.
  • Top Matter: Title, Date, Author, Abstract, written by Yena Joo.
  • Tables: kable, kableextra, gt, written by Yena Joo.
  • Multiple plots with patchwork, written by Michael Chong
  • References and Bibtex, written by Yena Joo.
  • PDF outputs, written by Yena Joo.
  • here::here() and filepaths, written by Matthew Wankiewicz.

Git outta here

How to run this module:

learnr::run_tutorial("git_outta_here", package = "DoSStoolkit")

Module content:

  • What is version control and GitHub?, written by Mariam Walaa.
  • Git: pull, status, add, commit, push, written by Mariam Walaa.
  • Branches in GitHub, written by Matthew Wankiewicz.
  • Dealing with Conflicts, written by Matthew Wankiewicz.
  • Putting (G)it All together in RStudio, written by Matthew Wankiewicz.

Indistinguishable from magic

How to run this module:

learnr::run_tutorial("indistinguishable_from_magic", package = "DoSStoolkit")

Module content:

  • Coding style, written by Marija Pejcinovska.
  • Writing R Packages, written by Matthew Wankiewicz.
  • Getting started with Blogdown, written by Annie Collins.
  • Getting started with Shiny, written by Matthew Wankiewicz.

Contributors

  • Annie Collins is an undergraduate student in the Department of Mathematics specializing in applied mathematics and statistics with a minor in history and philosophy of science. In her free time, she focuses her efforts on student governance, promoting women’s representation in STEM, and working with data in the non-profit and charitable sector.
  • Haoluan Chen is an undergraduate student in the Department of Statistical Science specializing in applied statistics. He is interested in applying data science techniques, especially in NLP, to gain insight from the data.
  • Isaac Ehrlich is an undergraduate student in Statistics and Cognitive Science at the University of Toronto. He enjoys using R for everything from analysing trends in his recent movie-viewing history, to his past research building models on human categorization.
  • Mariam Walaa is an undergraduate student in the Department of Computer and Mathematical Sciences at University of Toronto Scarborough, majoring in Mathematics and minoring in GIS and statistics. Mariam enjoys learning about how to work with data such as spatial and text data to extract insights.
  • Marija Pejcinovska is a graduate student in the Department of Statistical Sciences. Her research is motivated by modelling challenges that arise with “complicated” data (sparse/highly biased/poor quality data), usually in the context of social or health inequalities.
  • Matthew Wankiewicz is an undergraduate student at the University of Toronto, majoring in Statistics and minoring in Mathematics and the History and Philosophy of Science.
  • Michael Chong is a graduate student in the Department of Statistical Sciences. His research aims to build statistical models for demographic estimation in contexts where high quality data are unavailable. There is almost always an active R session on his computer!
  • Paul Hodgetts is a Master of Information candidate concentrating in Human-Centred Data Science in the Faculty of Information, University of Toronto. He sincerely believes that Calvin and Hobbes is the greatest comic ever produced.
  • Rohan Alexander is an assistant professor in the Faculty of Information and the Department of Statistical Sciences. Some people convert to catholicism upon marriage, Rohan converted to R. His greatest professional achievement is probably getting a pull request accepted into R for Data Science (it was just fixing a minor typo but still).
  • Samantha-Jo Caetano is an assistant professor (teaching stream) in the Department of Statistical Sciences. She loves statistics, socializing, her family, and her dogs, not necessarily in that order.
  • Shirley Deng is an undergraduate student specializing in Statistics and majoring in Mathematics. Meticulous and soft-hearted, she often finds herself engrossed in new pastimes by the second at the influence of her peers. One of which that has remained a longtime constant - spending an excessive amount of time helping people debug their R code.
  • Yena Joo is an undergraduate student majoring in Economics and double minoring in Statistics and Computer Science.

Pedagogical underpinnings

Coming soon.

Acknowledgements

We gratefully acknowledge the support of Professor Bethany White, Chair Radu Craiu, and the U of T Faculty of Arts & Sciences Pedagogical Innovation and Experimentation Fund.

We’d like to acknowledge the help of:

  • Liza Bolton
  • Monica Alexander
  • Sabrina Sixta

We’d like to thank Alex Cookson for his collection of datasets.

This toolkit builds on, and complements, the work of many others, including:

Rohan would like to thank Greg Wilson, for sharing his experience, thoughts, and leadership.

References

We draw on the open-source statistical programming language R and a variety of packages. We are grateful for the work that we build on.

Next steps

  • Assessment
  • Teaching
  • French language version

Citation

We have a pre-print coming soon.

How to contribute

The best way to contribute fixes and minor typos is to make a pull request on GitHub.

If you are interested in contributing lessons or modules, then please contact Rohan Alexander. We are particularly interested in partnering with an institution where the language of instruction is French to develop a French language version.

Contact

Please contact Rohan () with any questions, comments, and suggestions.