Data Science Across Disciplines to be taught in January 2024
This J-term, midd.data will offer its fourth iteration of Data Science Across Disciplines. The course will be team-taught by Professors Alex Lyford (Mathematics and Statistics), Bert Johnson (Political Science), Pete Nelson (Geography), Katherine O’Brien (Center for Community Engagement), and two Middlebury College alumni—Conor Stinson (‘06.5) and Michael Czekanski (‘20). The course is expected to enroll more than 70 students.
Data Science Across Disciplines blends traditional instruction in data science with project-based learning. In the morning sessions, students attend lectures where they learn how to visualize, analyze, and wrangle large, messy datasets using R, an open-source statistical programming language. During afternoon sessions, students will work in small groups with one of five instructors on research projects that utilize the students’ newly-acquired data skills. This iteration, afternoon projects include research on voting patterns, pandemic migration, restorative justice programs, local governance in Vermont, and newborn health outcomes. Students present their findings during the last week of classes.
This class is open to any Middlebury College student who has not previously taken Introduction to Data Science (Stat 118 or 201). We particularly encourage students who are data-curious but have limited or no prior coding experience.
To find out more, look for the following courses in the Winter Term Course Catalog:
GEOG 1230: In this section, students will use data science tools to explore the ways migration systems in the United States changed during the COVID-19 pandemic. We will draw on data collected from mobile phones recording each phone’s monthly place of residence at the census tract level. The dataset includes monthly observations from January 2019 through December 2021 allowing the analysis to compare migration systems pre-pandemic with those during the pandemic.
INTD 1230A: Data is a powerful tool for improving health outcomes by making programmatic choices to support justice. In this afternoon section of Data Across the Disciplines, students will be working with Addison County Restorative Justice (ACRJ) on understanding patterns in the occurrence of driving under the influence. ACRJ has over 1,000 cases and would like to better understand their data and come up with ways to access information. We will explore how identity, geography, and support impact outcomes from DUI cases. Using statistical analysis and data visualizations, along with learning about ethical data practices, we will report our findings.
INTD 1230B: Let’s dive into the minutes and reports of local towns to develop an accessible news and history resource. Could this be a tool for small newspapers to track local news more easily? Can we map this fresh data for a new look across geographies? Do you want to help volunteer town officials make decisions and better wrangle with their town’s history and data?
In this course we will develop a focused database of documents produced by several municipal boards and commissions. We will engage in conversation with local officials, researchers, and journalists. This course aims to introduce students to making data from real world documents and the people that make them to generate useful information that is often open but frequently difficult to sift through.
MATH/STAT 1230: Students will explore pediatric healthcare data to better understand the risks correlated with various childhood illnesses through an emphasis on the intuition behind statistical and machine learning techniques. We will practice making informed decisions from noisy data and the steps to go from messy data to a final report. Students will become proficient in R and gain an understanding of various statistical techniques.
PSCI 1230: How do candidates for U.S. national office raise money? From whom do they raise it? In this section we will explore these questions using Federal Election Commission data on individual campaign contributions to federal candidates. Our analysis using R will help us identify geographic patterns in the data, as well as variations in funds raised across types of candidates. We will discuss what implications these patterns may have for the health and functioning of democracy in the U.S.