In this class, students will continue to engage with advanced ETL and Textual Analysis methods to work with real-world email data from the Enron Email Corpus. The class will introduce more sophisticated parsing tools, building upon foundational ETL knowledge while incorporating complex regular expressions to extract, clean, and restructure data from unstructured text. Through case studies and hands-on activities, students will learn how to handle more intricate data challenges, especially those encountered when dealing with large-scale email datasets. The session will deepen their understanding of transforming messy textual data into structured formats for further analysis.
Case: Enron Email Case Study
Slides: will be available for download by the beginning of class in either
powerpoint
or
pdf formats.
Data: A data update may be required for this class. To ensure your files are the most up-to-date, navigate to ACCTG522_Labs folder and run the command git pull
.
Analytics Tools: Alteryx advanced parsing tools in this class.
git pull
.