Davis Family Library: 7:30am - 12am

I am excited to announce the recent integration of AI-driven processes in Special Collections’ digital archiving work alongside the expertise of our very human archivists, librarians, and student workers.

Making use of OpenAI’s Whisper library for audio transcription and leveraging the ChatGPT API to transcribe handwritten documents and generate summaries has allowed me to make major improvements to our digital collections hosted by the Internet Archive

At the same time, I’ve been making use of Microsoft’s Copilot to write a suite of Python scripts and other custom tools to process archival backlogs at a faster rate than ever.


Some examples of what I’ve been able to accomplish:
 

To put these projects in perspective, it takes a human transcriber an average of four hours to transcribe one hour of audio. If each of the WRMC recordings is an hour, that’s about 3,000 hours of labor. Our students work about 10 hours per week, so a project like this would take one student 12 years! In comparison, using the Whisper library to transcribe an hour of audio takes roughly ten minutes (along with the very real electricity and environmental costs).

These new, carefully labeled AI summaries appearing across our digital collections will make searching easier for humans and machines alike, amplifying historical voices from the past. I’m hopeful that these innovations will empower researchers across our campus and the globe to more easily discover Middlebury’s rare and unique Special Collections.

Patrick Wallace is the Digital Projects & Archives Librarian and oversees the digital side of Special Collections and the College Archives. Patrick’s life outside of work is mostly dedicated to film photography, video art, electronic music production, bicycle repair, performance driving, and cats.