Where to Start
Use this package to run SHACL validation locally, deploy validation infrastructure to CDF, and run the validation pipeline.
This quickstart aligns with the Usage docs model:
- Data Product + RuleSet is the primary rule management and persistence model.
- YAML is the CDF Toolkit representation of Data Product/RuleSet bindings and runtime settings.
- Direct TTL usage is a legacy transition path.
Quick Start Guide
1. Installation
After install, restart your kernel or interpreter so imports pick up the package.
2. Setup credentials
Create a TOML file with your CDF credentials (e.g. config.toml):
[cognite]
project = "your-project"
client_id = "your-client-id"
client_secret = "your-client-secret"
Load the client:
import cognite_data_quality
client = cognite_data_quality.load_cognite_client_from_toml("config.toml")
client.config.timeout = 300 # Optional: for large function deployments
3. Run validation locally
Use run_validation() to validate instances from DMS without deploying any Cognite Functions or workflows. This runs entirely in your local Python environment.
import cognite_data_quality
result = cognite_data_quality.run_validation(
client=client,
rules_path="views/my_view.yaml",
rules_format="yaml",
post_to_records=True,
records_config=cognite_data_quality.RecordsConfig(
stream_id="dq_validation_stream",
rule_set_id="MyViewSHACLv1",
rule_set_version="1.0",
),
limit=10,
print_output=True,
)
print("Instance count:", result.instance_count)
print("Violations:", len([v for v in result.violations if (v.resultSeverity or "").endswith("Violation")]))
Recommended: use YAML view config + RuleSet references so local validation and deployment use the same contract. Direct TTL path remains supported as a legacy transition option.
4. Production: deploy infrastructure
Deploy all validation infrastructure (containers, function, workflows, triggers):
from pathlib import Path
import cognite_data_quality
config_root = Path(".") # Directory containing settings.yaml
settings_path = config_root / "settings.yaml"
views_dir = config_root / "views"
# Credentials for function (orchestrator needs them for triggers)
creds = client.config.credentials
function_secrets = None
if hasattr(creds, "client_id") and hasattr(creds, "client_secret"):
function_secrets = {
"client-id": creds.client_id,
"client-secret": creds.client_secret,
}
result = cognite_data_quality.deploy_validation_infrastructure(
client=client,
settings_path=settings_path,
views_dir=views_dir,
function_secrets=function_secrets,
dry_run=False,
force=False,
)
This ensures Records, OrchestrationState, and FunctionValidationState containers; deploys the monitoring data model and views; deploys the unified validation function and instance (and optional time series) workflows and triggers.
5. Deploy validation pipeline (historic + incremental)
Run the full validation pipeline for a view (historic partitions, sync trigger, monitor schedule):
result = cognite_data_quality.deploy_validation_pipeline(
client,
settings_path=str(settings_path),
view_external_id="LargeBoat", # or "SmallBoat", "NavigationAid", etc.
wait=True,
)
print("Orchestration ID:", result.get("orchestration_id"))
print("Partitions triggered:", result.get("partitions_triggered"), "/", result.get("partition_count"))
6. Invoke deployed CDF Functions
Once infrastructure is deployed, you can invoke the validation function running in CDF from Python. These helpers send the payload to the deployed Cognite Function — they do not run validation locally.
from cognite_data_quality import call_validate_instances_shacl
data = {
"instances": {"items": {"n": [...]}},
"ruleset_references": [{"externalId": "my-ruleset", "version": "1.0.0"}],
"datamodel_space": "my_space",
"datamodel_external_id": "MyDataModel",
"datamodel_version": "v1",
"records_config": {
"stream_id": "dq_validation_stream",
"rule_set_id": "MyViewSHACLv1",
"rule_set_version": "1.0",
},
}
result = call_validate_instances_shacl(client, data)
Or use call_validation(client, validation_type="instance", data=data) for the unified entry point. See Invoke for all invoke helpers and payload details.
Optional: global uniqueness
To validate that a property value (e.g. workOrderNumber) appears only once across a view, declare dqs:uniquenessConstraint in your SHACL/RuleSet and redeploy. A scheduled dq-{view}-uniqueness workflow is created automatically when constraints are present. See Uniqueness.
Next steps
- Usage Guide: Full context, journey, and supporting operations.
- Validation journey: Start from single-instance and follow the end-to-end sequence.
- Config: TOML and view config
- Notebook: Example notebook location and snippets
- Deploy: Deployment options in detail
- Run validation: Parameters and Records
- Invoke: Invoke helpers and payloads
- Uniqueness: SHACL-native global uniqueness
- API Reference: Full API