SDS 261: Data Science, the SQL

Data Science, the SQL is a second course in data science focused on using SQL in the data science pipeline.

The Course

Art by @allison_horst

Data Science, the SQL is a continuation of ideas learned in Foundations of Data Science. The course develops abilities for using SQL databases within the data science pipeline. The core of the course will focus on the why and the how associated with writing SELECT queries in SQL. Additional topics will include subqueries, indexes, keys, and regular expressions. Students will learn how to run SQL queries from both the RStudio IDE as well as from a relational database management system client like DBeaver or DuckDB.

Student Learning Outcomes.

By the end of the term, students will:

  • Database Concepts: be able to explain basic database concepts such as tables, records, fields, and relationships.

  • Introduction to SQL: gain a fundamental understanding of Structured Query Language (SQL), including its history, purpose, and key components.

  • SQL Querying:

    • Writing SQL Queries: learn how to write basic SQL queries to retrieve data from a single table.
    • Filtering and Sorting Data: be able to use SQL to filter and sort data based on specific criteria.
    • Joining Tables: understand how to perform inner and outer joins to combine data from multiple tables.
  • Creating Tables: be able to create a SQL database with multiple tables that link to one another using DuckDB.

  • Inserting and Updating Data: be able to use SQL to insert new records into a table and update existing records. Use SQL to delete records from a table.

  • Basics of Regular Expressions: understand the fundamental concepts of regular expressions. Identify and use basic metacharacters for pattern matching to write simple regular expressions for text search and matching.

Course website

Data Science, the SQL was last taught in January 2024 at Smith College. Materials can be found on the course website.