Virtual Middlebury

Closed to the Public

Text mining is the process of transforming unstructured texts of all kinds (literary, scholarly, journalistic, scientific, etc.) into a form where the language of the documents can be analyzed. Using tidy data principles can help make these tasks easier, more efficient, and more interoperable with other tools. Luckily, R has packages that make this process work very well inside the R environment.

In this lesson, participants will learn:
* Some basic text mining/analysis concepts
* How to transform texts (e.g. a novel) into a structured dataset ready to use in R
* How use tidy data packages (such as dplyr and tidyr) to manipulate text data
* How to perform basic sentiment analysis and word count tasks in R

Participants should have basic familiarity with R. If you are completely new to R, please be sure to attend the Introduction to R workshop on June 22, 2021. It would also be beneficial for attendees to be familiar with the material covered in our Data wrangling in R with dpylr and tidyr workshop on June 29, 2021, and Creating high quality graphics in R with ggplot2 workshop, on July 6, 2021, if they are able.

Please register for this free workshop at http://go.middlebury.edu/summerdataworkshops/

Sponsored by:
College Libraries

Contact Organizer

Kemp, Jonathan
jkemp@middlebury.edu
(802) 443-2265