DQS System Decomposition Overview
What this is
This page summarizes the Data Quality Service (DQS) system decomposition and maps it to how users work with the product.
Why this decomposition exists
DQS is intentionally split into three parts so teams can evolve:
- rule logic,
- runtime/deployment behavior,
- and output/consumer workflows
independently, without breaking the product contract.
The three parts
1) Rule Definition Layer
This layer defines what quality means and how checks are expressed.
- SHACL Core for structural constraints
- SHACL-AF and
sh:sparqlfor advanced/contextual rules - Namespace functions (
cdf_sdk,cdf_indsl) for platform and time-series logic - Conditional and chained conditional logic for context-aware checks
User impact:
- domain experts and data engineers can encode business semantics explicitly
- rules stay declarative and versionable
2) Infrastructure and Orchestration Layer
This layer defines how rules are published, deployed, triggered, and executed.
- Workflows, Functions, and triggers for scheduled and event-driven execution
- DataProduct + RuleSet APIs for versioned lifecycle management
- YAML compatibility path for transitional and fallback operation
Runtime guarantees:
- idempotent publish/deploy behavior
- payload-aware version reuse
- semver-stable latest selection
- preflight risk checks for version-cap scenarios
3) Output and Consumption Layer
This layer defines what users and systems consume after execution.
DataQualityValidationRecordfor validation outcomesRuleEngineResultfor conditional/inference outputs- deterministic typing for grouped high-volume outcomes (
group_violation)
User impact:
- auditable, queryable records for triage and analytics
- stable contracts for dashboards, automations, and agent workflows
End-to-end flow
In practice, DQS follows:
- define rules,
- deploy and run,
- persist outputs,
- triage and remediate.
This keeps quality governance, runtime operations, and consumer UX aligned.
How this maps to personas
- Data Product Owner / Steward: governs contract and release quality
- Data Engineer / Domain Owner: authors rules and operates runtime pipelines
- Data Consumer / Operator: consumes records, triages issues, tracks health