Data Cleaning: Fix Dirty Data and Boost ROI
The Guest
Susan Walsh, known as the Classification Guru, has spent over a decade cleaning, classifying, and normalizing data. A fixer of dirty data, Susan works between the spreadsheets, turning chaos into clarity. From her involvement with the Data Collaboration Alliance to authoring a book on data hygiene and sharing her humor and expertise through creative content, Susan is a true data-cleaning evangelist.
To demystify the data cleaning process, Susan created the “COAT” methodology, a practical framework to ensure your data is ready for action:
- Consistent – Keep inputting names and addresses in the same format—decide if you’re abbreviating street names or states, and stick to it.
- Organized – Categorize it by region, department, or product, so you can quickly pull it out when someone asks, ‘How much did marketing spend last month?’
- Accurate – Define accuracy for your dataset. There can be more than one right answer sometimes, but strive for consistency.
- Trustworthy – If it’s consistent, organized, and accurate, it’s trustworthy, meaning people can rely on it to make decisions. That’s when data becomes truly valuable.
What Is Dirty Data—and How Does It Happen?
Susan believes most dirty data sneaks in at the point of entry. Sometimes it’s innocent: a website form filled out as “Dad Walsh” or “Santa Claus,” in a personal instance from Susan. Other times it’s nefarious: duplicate invoices, altered invoice numbers, or personal bank accounts masking as vendor accounts.
After detailing the importance of data quality, Susan details the role technology plays in helping straighten out tangled data, but adds on a few important caveats.
“There is technology and LLM-based solutions available to help with data taxonomy at scale, but not every company can afford it,” states Susan. “And if it’s trained on flawed or incomplete data, you’re not going to get accurate results. You need to standardize data if you want to build reliable tools with it. And establishing standards in data requires interpretive expertise and context.”
Which pivoted the conversation into a compelling thought; that the “dirtiness” of data can be in the eye of the beholder.
“Data cleaning and classification can be so subjective,” explains Susan. “In the marketing world, same as it is in procurement, there’s often more than one right answer as to the way something should be labeled. It’s important to have context and to constantly aim for consistency.”
Why Data Cleaning Matters
“How could you not get excited about data cleaning?” asks Susan. “I get to peek behind the curtain and take my discoveries and show clients all the opportunities, the risks, the waste., Iit’s fascinating.”
Susan details how she helped a manufacturing client in the UK overcome this dilemma and turn dirty data into something transformative.
“This client was getting sales data from Nielsen, Kantar, Bootes, Tesco, and more,” explains Susan. “They were selling one product, and in each of these data sets the product was named something slightly different in all those systems. So we used the United Nations Standard Products and Services Code (UNSPSC) to standardize the name and categorized it all so they had true visibility on the where/how/when of sales.”
In agencies and brands, data is flowing out of consumer research companies, data brokers, ad platforms, and into CRM systems, internal platforms, or analytics dashboards. This makes clear visibility into your pipeline, customers, and/or leads challenging on its face, regardless of the cleanliness of the data.
The ROI of Data Cleaning: Why It Pays Off
Beyond saving time (and money) by preventing duplication or rework, Susan reveals that clean data can unlock insights and reveal opportunities you never knew existed.
“One client we worked with discovered they were spending 40% more on Legal Services than they thought—and we’re talking millions. Now, that’s visibility you can act on.”
Measuring ROI on preventing disasters is tough, but time-savings and risk-avoidance can be massive. Clean data can also boost your business’s reputation.
“The negative impacts aren’t just monetary, there’s also reputational damage” says Susan. “Having unclean data can impact the reputation of your business, particularly if the dirty data is aired in public.”
Quick Data Cleaning Tips: Small Changes for Big Wins
When asked if she had any quick data cleaning advice for the audience, Susan didn’t disappoint.
“Take a chunk of your data, throw it into Excel, and build a pivot table. Look for multiple versions of the same company name, differences in conventions, labels applied to services or vendors, or high-value items that seem off.”
Susan admits that going line by line isn’t glamorous, but a quick consistency check is easy to implement and can pay immediate dividends.
“If you notice something big like ‘Microsoft’ showing up under ‘Cleaning Services,’ you’ve just uncovered a glaring problem.”
Team Collaboration for Better Data Cleaning Outcomes
A big obstacle for marketers is getting multiple teams aligned on a consistent data-entry process. Susan’s tip? Shadow or job-swap with other departments.
“If your data team can spend a half-day with sales, or go on a ride-along, you’ll see that data isn’t just numbers—these are real customers or prospects.”
Susan also advocates for creating a “what’s in it for me” mindset around data, which can improve communication and foster collaboration, ensuring everyone does their part to keep data clean and understandable.
Data cleaning might not be the most glamorous topic in marketing, but it underpins everything from effective ad targeting to accurate ROI reporting. Susan’s proven track record with clients and data expertise she shares in this interview proves that the smallest cleanup can unlock the biggest value.
The Links
United Nations Standard Products and Services Code
LISTEN TO THE FULL SHOW -> Stay tuned, stay curious and subscribe to What Gets Measured on Apple Podcasts, Spotify, YouTube or add it as a Favorite on your podcast player of choice.