Class 5

ETL and Textual Analytics

Wednesday, October 9, 2024

Class Overview

In this class, students will use Alteryx to perform an Extract, Transform, and Load (ETL) process, focusing on the use of regular expressions to clean and restructure raw data. Students will extract specific patterns from unstructured text data, such as customer or transaction records, transform it into structured formats, and load the cleansed data into analytical workflows for further processing. Through hands-on exercises, students will learn how to automate and streamline data preparation tasks, making data more accessible and ready for analysis.

Why is this important?
The ability to perform ETL processes with regular expressions is a critical skill for accounting professionals in today's data-driven environment. Accounting graduate students who master these techniques will be better equipped to handle messy or inconsistent data, which is common in professional settings. These skills enhance their analytical abilities, enabling them to derive meaningful insights from complex datasets and improve decision-making processes. Additionally, automating data preparation tasks with tools like Alteryx increases efficiency and adds value in audit, advisory, and other accounting services where accurate data management is crucial.

Class Materials and Details

Materials:

Case: ETL Case 2
Case: ETL Case 3
Slides: will be available for download by the beginning of class in either powerpoint or pdf formats.
Data: A data update may be required for this class. To ensure your files are the most up-to-date, navigate to ACCTG522_Labs folder and run the command git pull.
Analytics Tools: Alteryx RegEx Tool and some other data tools

Review and Extension:
Building on the foundation from the previous class, where students were introduced to the fundamentals of Alteryx, this class delves deeper into the Extract, Transform, and Load (ETL) process, emphasizing the use of regular expressions to clean and reformat raw data. Students will apply their understanding of Alteryx to extract specific patterns from unstructured data, such as employee ID records and emails, and transform it into usable formats.

Preparation:
  1. There is no required preparation for this class. The cases will guide the Labs, as a reference.

Class Plan:
Teams: during this class, please sit in your discussion teams.
  1. After a very brief review, we will work in the remote labs on ETL cases.
  2. We will work on ETL cases 2 and 3 relatively quickly, and consider multiple solutions to Case 2.
  3. We will also examine a database of emails that will provide more advanced regular expression challenges.