| raw_order_id |
Inconsistent prefixes, extra words, underscores/spaces, missing prefix, duplicate indicator. |
Use Trim(), Replace("_", "-"), Regex.Replace(rawText, "[^0-9]", ""), then rebuild "ORD-" + number. |
| raw_customer_name |
Leading/trailing spaces, inconsistent capitalization, multiple internal spaces, blank names, initials. |
Use Trim(), Regex.Replace(rawText.Trim(), "\\s+", " "), ToUpper()/ToLower(); flag blank with String.IsNullOrWhiteSpace(rawText). |
| raw_invoice_amount |
Currency symbols, commas, USD prefix, blanks, negative values, parentheses, letter O instead of zero, too many decimals. |
Use Replace("$",""), Replace(",",""), Replace("USD",""), Regex.Replace(rawText, "[^0-9.()-]", ""), then CDbl(cleanAmountText); flag negatives and unusual decimals. |
| raw_order_date |
Multiple formats, blanks, TBD/N/A, ambiguous or invalid dates. |
Use Trim(); flag TBD/N/A/blank; use date conversion only after standardizing expected formats. |
| raw_quantity |
Text units, written numbers, blanks, negatives, commas, decimals used for whole units. |
Use Regex.Replace(rawText, "[^0-9.-]", ""), then convert; flag blanks, negatives, nonnumeric terms such as "five". |
| raw_email |
Uppercase, double @, missing domain suffix, internal spaces, placeholder text. |
Use Trim().ToLower(), Replace(" ",""), then validation flag for exactly one @ and likely domain structure. |
| raw_state |
Inconsistent abbreviations and names: wa, Washington, W.A., Wash., OR. |
Use Trim().ToUpper(), Replace(".",""), then map Washington/Wash/WA to WA. |
| raw_status |
Inconsistent casing and labels: paid, PAID, paid in full, complete, refund, chargeback, unknown. |
Use Trim().ToLower(), then map related labels to standardized categories such as paid, pending, cancelled, refund, review. |
| Notes (drop me) |
Extra whitespace and free-text indicators of exceptions. |
Use Regex.Replace(rawText.Trim(), "\\s+", " "); search for terms such as missing, duplicate, negative, invalid, review. |