Class Overview
Why is this important?
Mastering advanced ETL processes and textual analysis techniques is vital for accounting professionals working with unstructured or semi-structured data, such as emails or transaction logs. The skills gained in this class prepare students to handle complex data challenges commonly encountered in auditing and forensic accounting. Specifically, working with real-world datasets like the Enron Email Corpus enhances their ability to analyze vast amounts of data, uncover patterns, and derive actionable insights, critical for risk assessment and fraud detection. These advanced techniques also improve their efficiency in managing large datasets and provide a competitive edge in data-driven roles.
What will we do?
In this class, students will engage with advanced ETL and Textual Analysis methods to work with real-world email data from the Enron Email Corpus. The class will introduce more sophisticated parsing tools, building upon foundational ETL knowledge while incorporating complex regular expressions to extract, clean, and restructure data from unstructured text. Through case studies and hands-on activities, students will learn how to handle more intricate data challenges, especially those encountered when dealing with large-scale email datasets. The session will deepen their understanding of transforming messy textual data into structured formats for further analysis.
How this relates to other classes:
This class builds directly on the previous session, where students were introduced to basic ETL processes and the use of regular expressions to clean raw data. While the last class focused on foundational techniques, this class moves into more advanced applications, using real-world data to tackle more complex parsing challenges. By working with the Enron Email Corpus, students will apply their knowledge of regular expressions to extract relevant information from a larger and messier dataset, further honing the skills learned in the previous class on data preparation and automation.
Materials and Preparation
Class Materials
- Case: Innovation_mindset_case_studies_Cybersecurity_Audit_Enron_Emails
 - Link: Online Regular Expressions Tool
 - Link: Chat GPT (or use other LLM) for help with regular expressions
 - Link: A Template for the Enron Case Submission (Optional).
 - Slides: PowerPoint or PDF
 - Analytics Tools: Advanced Alteryx parsing tools: RegEx with Tokenize, Parse, Match.
 - 
              
Suggested Pre-Class Preparation
- There is no required preparation for this class, it builds off what we started in the prior class.
 - The case is provided as a reference, it is not necessary to read it before class.
 
 - 
                
Class Plan
- We will again work primarily in the labs to explore the Enron Email data set.
 - The main goal for this class is to build the tools to undertake a sentiment analysis using the tokenize method with regular expressions (RegEx).
 - After completing the exercise, students will screen emails based on sentiment and discuss how the tool works to identify email sentiment using examples from the dataset.
 
 
Required Deliverables
| Deliverable | Due Date | Canvas Submission Portal | 
|---|---|---|
| Professionalism (individual): Enron Case Deliverable | October 13th, 2025 | Upload to Canvas |