A Dimensional Approach to Data Quality Principles

courtesy: Gemini, ChatGPT, DeepSeek, Grok

Data quality checks often overlap—rather than forcing each principle into its own silo, it’s more practical to group them into logical “dimensions.” This approach helps teams understand where issues live, how they relate, and how to fix them.

I. Intrinsic Data Quality (Characteristics of the Data)

1. Accuracy

What it is: How well data reflects the real world.
Why it matters: Bad accuracy means bad decisions.
Example: Customer address matches actual GPS coordinates.

2. Completeness

What it is: All required data is present.
- Schema completeness: All expected columns exist.
- Column completeness (density): % of non-null values in a column.
- Population completeness: All expected rows exist.
Example: Every customer record has an email_address.

3. Consistency

What it is: No contradictions within or across datasets.
- Intra-record: Fields inside one record don’t clash (e.g., age vs. date_of_birth).
- Inter-record: Same entity looks the same over time (e.g., name spelling).
- Cross-dataset: Shared fields use the same units and formats (e.g., revenue in USD).
Example: Product codes use the same format in sales, inventory, and marketing.

4. Timeliness

What it is: Data is fresh enough for its purpose.
- Currency: How recent it is.
- Frequency: How often it’s updated.
- Punctuality: It arrives when you expect it.
Example: Stock prices updated every second for trading dashboards.

5. Uniqueness

What it is: No duplicates of the same real-world entity.
Example: Each customer has exactly one customer_id.

6. Validity

What it is: Data follows rules, formats, and logical constraints.
- Syntactic: Correct type/format (e.g., dates as YYYY‑MM‑DD).
- Semantic: Meets business logic (e.g., order_date ≤ ship_date).
- Referential integrity: Foreign keys point to real parent records.
Example: Emails match a regex pattern; order_status is one of “Pending,” “Shipped,” “Delivered.”

II. Data Management & Operational Excellence

1. Transformation Integrity

What it is: ETL/ELT logic is correct and preserves quality.
Example: profit = revenue − cost is implemented correctly downstream.

2. Metadata & Lineage

What it is: You know what each field means and where data came from.
- Schemas match docs.
- Business rules documented.
- Data dictionaries/glossaries exist.
- Lineage tracking end-to-end.
Example: A data catalog shows original source files, transformations, and load targets.

3. Performance & Scalability

What it is: Systems handle current workloads and growth.
Example: Indexed tables for fast queries; pipelines auto-scale on spikes.

4. Resilience & Error Handling

What it is: Systems detect, retry, and recover from failures.
Example: ETL jobs retry on transient errors; backups restore lost data.

5. Security & Privacy

What it is: Data is protected and handled per regulations.
Example: PII is encrypted in transit and at rest; RBAC enforces least privilege.

III. Governance, Risk & Compliance

1. Stewardship & Ownership

What it is: Clear accountability for each data domain.
Example: Finance owns transaction data; Marketing owns campaign metrics.

2. Regulatory Compliance

What it is: Processes meet GDPR, HIPAA, SOX, etc.
Example: Customer opt‑outs are enforced in all systems.

3. Auditability & Reproducibility

What it is: You can trace every change and rerun analyses.
Example: Versioned datasets and code; audit logs for data changes.

4. Lifecycle Management

What it is: Data is created, stored, archived, and deleted per policy.
Example: Transaction logs archived after 7 years, purged at 10.

IV. User-Centric & Value Realization

1. Relevance

What it is: Data fits the users’ needs and use cases.
Example: Sales dashboard shows KPIs the team actually uses.

2. Usability & Interpretability

What it is: Data is easy to find, understand, and use.
- Clarity: Human‑readable names and definitions.
- Accessibility: Delivered in formats users can work with.
- Context: Code tables, descriptions, and examples provided.
Example: Dashboards with clear labels and linked data definitions.

V. Continuous Improvement & Monitoring

1. Monitoring & Measurement

What it is: Automated checks track data quality over time.
Example: Daily jobs that flag spikes in null rates or invalid formats.

2. Issue Management & Remediation

What it is: Process to log, prioritize, and fix data issues.
Example: Tickets in a bug tracker for data errors, complete with severity levels and SLAs.

Putting It All Together: A Sample Workflow

Define requirements & metrics

Talk to users: What does “fit for purpose” look like?
Set thresholds for accuracy, completeness, timeliness, etc.

Profile & assess data

Run accuracy checks, null counts, format validators, duplication scans.

Evaluate operations

Audit ETL logic, metadata, system health, security controls.

Review governance

Confirm ownership, compliance controls, audit trails, retention policies.

Establish monitoring & fixes

Schedule automated checks and ticketing for quick resolution.

By viewing data quality through these dimensions, you get a holistic framework that’s both comprehensive and practical—nothing slips through the cracks, and every team knows where they fit in.