Cognite Data Quality
cognite-data-quality is a Python package for SHACL-based data quality validation in Cognite Data Fusion (CDF). It validates DMS instances (structured data models) and time series data against declarative rules, stores results in the CDF Records API, and provides tools to deploy and run validation in CDF Functions and Workflows.
Purpose
Industrial data in CDF must meet quality standards before it can be trusted for analytics, dashboards, or automation. Raw data often has gaps, outliers, missing required fields, or rule violations. Manual checks do not scale.
This package provides:
- Declarative validation using SHACL (W3C standard). You define what good data looks like; the package checks whether your data conforms.
- Automated validation in CDF (Functions, Workflows, Triggers) and locally in notebooks and scripts.
- Traceable results in the Records API for monitoring, dashboards, and drilling into failures.
How It Works
- Define rules in TTL (Turtle), JSON rule set, or YAML view config. Rules target views or time series.
- Run validation — locally with
run_validation()(no Function required), or in CDF via deployed Functions triggered by Workflows. - Post results to the Records API (pass/fail, violation details, severity).
The package includes inlined SHACL validation with CDF SPARQL functions and integrates with CDF Data Modeling (DMS). It supports both CDF SDK and INDSL functions for time series quality checks.
Workflows
Local validation and testing
- Create a TOML file (e.g.
config.toml) with CDF credentials. - Write SHACL rules (TTL). Use
run_validation()against live data. - Iterate on rules; optionally set
post_to_records=Truewith aRecordsConfigto test the full pipeline. - Use
DataModelConfigandRecordsConfigfor datamodel, instance space, stream, and rule set.
Production deployment
- Deploy infrastructure with
deploy_validation_infrastructure(client, settings_path=..., views_dir=..., function_secrets=...). This ensures containers (Records, OrchestrationState, FunctionValidationState), deploys the unified validation function, workflows, and triggers. - Optionally run the validation pipeline for a view with
deploy_validation_pipeline(client, settings_path=..., view_external_id=..., wait=True)to process historic data and set up sync and monitor schedules. - Results are written to the Records API automatically.
Invoking deployed CDF Functions from Python
Use the invoke helpers to call already-deployed Cognite Functions from notebooks or scripts. These send the payload to the function running in CDF — they do not run validation locally.
from cognite_data_quality import call_validate_instances_shacl
result = call_validate_instances_shacl(client, data)
To run validation locally (no Function required), use run_validation() instead.
Main Capabilities
| Capability | Description |
|---|---|
| run_validation | Run validation from DMS (no workflows). Supports TTL, JSON, or YAML. Use datamodel, instance_space, records_config, limit, print_output. |
| deploy_validation_infrastructure | Deploy all validation infrastructure (containers, function, workflows, triggers) from settings_path and views_dir. |
| deploy_validation_pipeline | Deploy and run the full validation pipeline for a view (historic partitions, sync trigger, monitor schedule). |
| call_validation | Invoke the deployed CDF Function with type-based dispatch (instance, instance_sync_cursor, timeseries, orchestrator, partitioned, test). Runs in CDF, not locally. |
| invoke helpers | call_validate_instances_shacl, call_validate_timeseries_shacl, etc. — thin wrappers around call_validation() for specific validation types. All invoke deployed CDF Functions. |
Other Features
- Credentials: Load from TOML (e.g.
config.toml) withload_cognite_client_from_toml(). - Rule sources: TTL, JSON rule set, or YAML view config.
- Records output: Default container
dataQuality:DataQualityValidationRecord; configurable stream and rule set inRecordsConfig. - Auto-loading: References between instances are loaded automatically when validating DMS data (configurable depth).
Next Steps
- Installation: Set up the package and credentials.
- Where to Start: Run your first validation and deploy.
- Usage: Run validation, deploy, invoke, and configure rules.