Virtual Middlebury

Closed to the Public

Image of a person with their arms out

Text mining is the process of transforming unstructured texts of all kinds (literary, scholarly, journalistic, scientific, etc.) into a form where the language of the documents can be analyzed. Using tidy data principles can help make these tasks easier, more efficient, and more interoperable with other tools. Luckily, R has packages that make this process work very well inside the R environment.

In this lesson, participants will learn:

* Some basic text mining/analysis concepts
* How to transform texts (e.g. a novel) into a structured dataset ready to use in R
* How use tidy data packages (such as dplyr and tidyr) to manipulate text data
* How to perform basic sentiment analysis and word count tasks in R

Participants should have basic familiarity with R. If you are completely new to R, please be sure to attend the Introduction to R workshop on June 14, 2022. It would also be beneficial for attendees to be familiar with the material covered in our Data wrangling in R with dplyr and tidyr and Creating high quality graphics in R with ggplot2 workshops, if they are able.

Please click here to learn more and to register for this workshop.

Sponsored by:
College Libraries

Contact Organizer

Kemp, Jonathan
jkemp@middlebury.edu
(802) 443-2265