Class 2

Introduction to Extracting, Transforming and Loading Data

Monday, September 30, 2024

Class Overview

This class provides a discussion and set of practical exercises relating to the Extract, Transform, and Load (ETL) process. ETL is the first and most crucial step in data analytics, involving the extraction of raw data, its transformation into a suitable format, and the loading of that data into analytical workflows. Students will focus on preparing data for analysis, recognizing that improper preparation can lead to inaccurate or misleading results. In this session, we will continue working with real-time financial data from the SEC and other financial APIs, applying the ETL process to ensure the data is ready for in-depth analysis.

Why is this important?
Understanding and mastering the ETL process is essential for accounting graduate students because the quality of any data analysis depends on how well the data has been prepared. Poorly structured or unclean data can lead to faulty conclusions, negatively impacting decision-making in professional settings. By working with real-time financial data, students learn to handle common challenges such as inconsistent formats, missing data, and the integration of multiple data sources. This class equips students with the practical skills to ensure their data is reliable, enabling them to deliver accurate, data-driven insights critical for roles in auditing, financial reporting, and advisory services. The ability to effectively prepare and manage large datasets also provides a foundation for more advanced analytics and automation tasks.

Class Materials and Details

Materials:

Case: EDGAR Explorer
Slides: will be available for download by the beginning of class in either powerpoint or pdf formats.
Data: A data update may be required for this class. To ensure your files are the most up-to-date, navigate to ACCTG522_Labs folder and run the command git pull.
Analytics Tools: Git and GitHub, API keys using Python, Alteryx download tool and other Alteryx ETL tools.

Review and Extension:
This class extends the prior session, which introduced the analytical mindset and skillset by focusing on the practical application of these concepts through the ETL process. In the earlier class, students learned how to approach data with a critical, analytical perspective, identifying key patterns and trends. Now, by incorporating real-time data extraction from financial APIs and applying rigorous data preparation techniques, students deepen their understanding of how a properly structured dataset serves as the foundation for robust analysis. This class reinforces the analytical mindset by requiring students to actively engage with complex, unstructured data and refine it for accurate analysis, thus bridging the gap between conceptual knowledge and practical implementation. By extending these skills, students are better prepared to handle the data challenges they will encounter in professional accounting roles.

Preparation:
  1. There are no required readings for this class.

Class Plan:
Teams: during this class, please sit in your discussion teams.
  1. We will continue to work with extracting real time data using APIs.
  2. During class you will be asked to sign up for an AlphaVantage API key.
  3. We will use this API key to download historical stock market data (e.g., market returns).
  4. We will discuss a plan for:
  5. How to extract value/answer questions from these datasets.
  6. How to join or merge datasets with different characteristics.
  7. How to extract information from unstructured text files (hint: Regular Expressions).
  8. How to work with XBRL data, which is the modern way financial accounting information is provided to the public.