
(Rawf8/Shutterstock)
Bad data has been around since cavemen started making the first errant marks on the cave. Fast forward into our big data age, and the scale of the data quality problem has increased exponentially. While AI-powered automation has soared, many are still stuck in the data dark ages. To help guide organizations toward the light, Anomalo today revealed the six pillars of data quality.
Anomalo was founded in 2021 by two engineers from Instacart who saw the impact that bad data can have on a company. Through automation, CEO Elliot Shmukler and CTO Jeremy Stanley hoped to help enterprises on the path to good data by automatically detecting issues in their structured and unstructured data, and drilling down to address their root causes before they impact downstream applications or AI models.
Anomalo developed its product to handle a range of observability needs. It utilizes unsupervised machine learning to automatically detect issues with data, and then alerts administrators when a problem has been found. It provides a ticketing system for tracking the issues, as well as tools to help automate root cause analysis. The company says its approach can scale to databases with millions of tables, and has been adopted by companies like Discover Financial Services, CollegeBoard, and Block.
Today the Palo Alto, California company rolled out its Six Pillars of Data Quality. The pillars, according to Anomalo, include: enterprise-grade security; depth of data understanding; comprehensive data coverage; automated anomaly detection; ease of use; and customization and control.
CEO Shmukler elaborated on the Six Pillars in a blog post.
- Enterprise-grade security: This is a baseline requirement that is non-negotiable, according to Anomalo. To meet this requirement, an observability tool must be deployed in an organization’s own environment, only use LLMs are approved by an organization and meet strict compliance mandates, and operate at real-time volumes. “A data quality solution that cannot scale or meet security and compliance standards is a non-starter for the enterprise,” Shmukler wrote. “Large organizations typically have strict requirements for auditability, data residency, and regulatory compliance.”
- Depth of data understanding: A good data quality solution will look below surface metadata and analyze the actual data values, Anomalo says. Anomalo dismisses this “observability” form of data quality checks as insufficient and enablers of the data quality challenge, which costs the average nearly $13 million annually. “Some vendors…rely on metadata checks to find hints of issues in your data,” he wrote. “This shortcut, known as observability, comes at a steep cost: surface-level checks miss abnormal values, hidden correlations, and subtle distribution shifts that quietly distort dashboards, analytics, and AI models.”
- Comprehensive data coverage: It’s not uncommon for an organization to have tens of thousands of tables, with billions of rows across multiple databases. In these situations, covering only a few high-profile tables isn’t enough, Anomalo says. “And with more than 80% of enterprise data now unstructured, a figure growing at a rate of 40-60% per year, most vendors leave critical blind spots by just focusing on structured data, just as organizations prepare for AI.”
- Automated anomaly detection: The size and complexity of the modern data stack makes manual or rules-based monitoring unsustainable, the company says. The problem with rules-based approaches, the vendor says, is they can only catch anticipated issues, but enterprises need ways to detect unexpected issues that emerge at scale. “Legacy vendors…rely on rules-based approaches to data quality, which place the burden on enterprises to configure, manage, and update complex rule sets,” Shmukler wrote. “Comprehensive coverage at enterprise scale is impossible to manage with rules alone. Tens of thousands of tables and billions of rows generate too much complexity for manual checks to keep up.”
- Ease of use: It’s great to get insight into data quality problems, but organizations must be able to act on them, Anomalo says. Democratizing access to data quality insight can help make the entire exercise worthwhile. “Monitoring, no matter how thorough, is only useful if people can adapt it to their needs,” Shmukler wrote. “Users such as business analysts, operations managers, and ML engineers all need to know they can trust the data in front of them or understand what’s wrong with it, without having to bug someone on the data team.
- Customization and control: Every company is unique, which means prepackaged data quality solutions are likely to fail, Anomalo says. What’s needed is a extensible framework that integrates with existing tools and workflows. “A solution can check all the boxes, but if it lacks the flexibility to tailor to a company’s unique business rules, regulatory requirements, or operational priorities, it will fail,” Shmukler wrote. “Without that adaptability, even the most powerful platform will create noise, trigger alert fatigue and water-cooler grumbles, and ultimately erode trust.
Clearly, Anomalo had its own product in mind when it wrote the Six Pillars. In any case, the company still provided some useful information for organization that are looking to get a handle on their own peculiar relationship with data.
Related Items:
Data Quality Is A Mess, But GenAI Can Help
Data Quality Getting Worse, Report Says
Anomalo Expands Data Quality Platform for Enhanced Unstructured Data Monitoring