Keeping busy with data science

Author

Jo Hardin

Published

May 18, 2020

Teach Data Science

The following entry originally appeared on May 18, 2020 at https://teachdatascience.com/keepbusy/, a blog written by Nick Horton (twitter), Hunter Glanz (twitter), and Jo Hardin (Bluesky).



It has become increasingly clear that many college students have found themselves without summer plans. Unfortunately, this blog entry is not a list of possible employment opportunities. Instead, it is a compilation of statistics and data science projects to enhance a summer spent socially distant.

The list below represents opportunities at a variety of levels. If you are just beginning or quite advanced, there are many ideas for you.

1. GitHub

hrdag

While not necessarily the first task that you should undertake this summer, the first recommendation is to set up a GitHub account and use it to post anything you do. Each project should be a separate repository, and you should make sure to always have a README file so that others (and you six months from now) can easily see what you’ve done.

If you are at all serious about doing data science at any point down the road, now is the time to start collecting your data projects into a single place so that your work can be highlighted.

Are you more advanced?

2. Starting with R

hrdag

Particularly if you are new to R, an amazing book to work through is called “R for Data Science” by Grolemund & Wickham (https://r4ds.hadley.nz/). There are many problems you can try out, and the text provides a wealth of ideas for working through data analysis problems. Even if you have been using R for many years, my guess is that the text contains many opportunities to learn how to work with new data structures.

  • for a good start to R in general, check out https://education.rstudio.com/learn/

  • There is a more advanced version to the Grolemund & Wickham text that you might want to try out if you are an advanced R user (“Advanced R” by Wickham, https://adv-r.hadley.nz/). The advanced version includes quite a bit on programming and why R works the way it does.

3. Modeling Data

hrdag

Interested in modeling?

4. Natural Language Processing

hrdag

Interested in text analysis / natural language processing?

5. Practice doing data science

hrdag

One of the most fun things you can do is to practice doing data science. Below are ideas you could work on for one afternoon or that you could commit a few weeks to figuring out. You should choose projects that seem fun and to which you might be able to provide a creative approach to solving.

  • Tidy Tuesday: every Tuesday a new dataset is posted, and individuals (separately and collaboratively) work to visualize the dataset. Details at https://github.com/rfordatascience/tidytuesday

  • https://www.kaggle.com/: is an online community of data scientists who build models, working together to come up with optimal predictions. You can compete in an ongoing Kaggle competition, or you can work through an old competition where many teams have shared their work and their ideas.

  • Work through a COVID-19 analysis. It is worth noting that the current available case data is likely to be under-reported (both cases and deaths across most countries) which makes modeling the actual data somewhat problematic. Instead, you might try to model COVID-19 related data (e.g., flights in the US, unemployment, emissions, weather patterns, etc.).

6. Interactive graphics

hrdag

Register for a (free) shiny account, and create a shiny dashboard to highlight the work you are doing!

7. Art and R

hrdag

Interested in art? Make art with data and R!

8. Watch videos and take classes

hrdag

Learn some new stuff from videos & webinars!

9. Participate in the data science community

hrdag

Engage on Posit Community (https://forum.posit.co/) or https://stackoverflow.com/ – platforms for asking and answering all the questions. StackOverflow is more comprehensive, but it can be aggressive and unhelpful at times. Posit Community is a great place for beginners and way less intimidating than SO.

10. Write an R package!

hrdag

You’ll be surprised to learn that creating your own R package can be reasonably straightforward. Fantastic step-by-step instructions help facilitate putting the R package together.