Class 5

Advanced ETL and Textual Analytics 2

Wednesday, October 8, 2025

Class Overview

Why is this important?

Mastering advanced ETL processes and textual analysis techniques is vital for accounting professionals working with unstructured or semi-structured data, such as emails or transaction logs. The skills gained in this class prepare students to handle complex data challenges commonly encountered in auditing and forensic accounting. Specifically, working with real-world datasets like the Enron Email Corpus enhances their ability to analyze vast amounts of data, uncover patterns, and derive actionable insights, critical for risk assessment and fraud detection. These advanced techniques also improve their efficiency in managing large datasets and provide a competitive edge in data-driven roles.

What will we do?

In this class, students will engage with advanced ETL and Textual Analysis methods to work with real-world email data from the Enron Email Corpus. The class will introduce more sophisticated parsing tools, building upon foundational ETL knowledge while incorporating complex regular expressions to extract, clean, and restructure data from unstructured text. Through case studies and hands-on activities, students will learn how to handle more intricate data challenges, especially those encountered when dealing with large-scale email datasets. The session will deepen their understanding of transforming messy textual data into structured formats for further analysis.

How this relates to other classes:

This class builds directly on the previous session, where students were introduced to basic ETL processes and the use of regular expressions to clean raw data. While the last class focused on foundational techniques, this class moves into more advanced applications, using real-world data to tackle more complex parsing challenges. By working with the Enron Email Corpus, students will apply their knowledge of regular expressions to extract relevant information from a larger and messier dataset, further honing the skills learned in the previous class on data preparation and automation.

Materials and Preparation

Class Materials

Required Deliverables

Deliverable Due Date Canvas Submission Portal
Professionalism (individual): Enron Case Deliverable October 13th, 2025 Upload to Canvas