Skip to content

Where to Start

Use this package to run SHACL validation locally, deploy validation infrastructure to CDF, and run the validation pipeline.

Quick Start Guide

1. Installation

pip install cognite-data-quality

After install, restart your kernel or interpreter so imports pick up the package.

2. Setup credentials

Create a TOML file with your CDF credentials (e.g. config.toml):

[cognite]
project = "your-project"
client_id = "your-client-id"
client_secret = "your-client-secret"

Load the client:

import cognite_data_quality

client = cognite_data_quality.load_cognite_client_from_toml("config.toml")
client.config.timeout = 300  # Optional: for large function deployments

3. Run validation locally

Use run_validation() to validate instances from DMS without deploying any Cognite Functions or workflows. This runs entirely in your local Python environment.

import cognite_data_quality

result = cognite_data_quality.run_validation(
    client=client,
    rules_path="shacl_rules/my_view_shacl.ttl",
    rules_format="ttl",
    datamodel=cognite_data_quality.DataModelConfig(
        space="my_space", external_id="MyDataModel", version="v1"
    ),
    instance_space="my_instances",
    post_to_records=True,
    records_config=cognite_data_quality.RecordsConfig(
        stream_id="dq_validation_stream",
        rule_set_id="MyViewSHACLv1",
        rule_set_version="1.0",
    ),
    limit=10,
    print_output=True,
)

print("Instance count:", result.instance_count)
print("Violations:", len([v for v in result.violations if (v.resultSeverity or "").endswith("Violation")]))

You can also use a YAML view config for rules_path with rules_format="yaml" so datamodel and instance space come from the config.

4. Production: deploy infrastructure

Deploy all validation infrastructure (containers, function, workflows, triggers):

from pathlib import Path
import cognite_data_quality

config_root = Path(".")  # Directory containing settings.yaml
settings_path = config_root / "settings.yaml"
views_dir = config_root / "views"

# Credentials for function (orchestrator needs them for triggers)
creds = client.config.credentials
function_secrets = None
if hasattr(creds, "client_id") and hasattr(creds, "client_secret"):
    function_secrets = {
        "client-id": creds.client_id,
        "client-secret": creds.client_secret,
    }

result = cognite_data_quality.deploy_validation_infrastructure(
    client=client,
    settings_path=settings_path,
    views_dir=views_dir,
    function_secrets=function_secrets,
    dry_run=False,
    force=False,
)

This ensures Records, OrchestrationState, and FunctionValidationState containers; deploys the monitoring data model and views; deploys the unified validation function and instance (and optional time series) workflows and triggers.

5. Deploy validation pipeline (historic + incremental)

Run the full validation pipeline for a view (historic partitions, sync trigger, monitor schedule):

result = cognite_data_quality.deploy_validation_pipeline(
    client,
    settings_path=str(settings_path),
    view_external_id="LargeBoat",  # or "SmallBoat", "NavigationAid", etc.
    wait=True,
)
print("Orchestration ID:", result.get("orchestration_id"))
print("Partitions triggered:", result.get("partitions_triggered"), "/", result.get("partition_count"))

6. Invoke deployed CDF Functions

Once infrastructure is deployed, you can invoke the validation function running in CDF from Python. These helpers send the payload to the deployed Cognite Function — they do not run validation locally.

from cognite_data_quality import call_validate_instances_shacl

data = {
    "instances": {"items": {"n": [...]}},
    "shacl_rules_file_external_id": "my_shacl_rules",
    "datamodel_space": "my_space",
    "datamodel_external_id": "MyDataModel",
    "datamodel_version": "v1",
    "records_config": {
        "stream_id": "dq_validation_stream",
        "rule_set_id": "MyViewSHACLv1",
        "rule_set_version": "1.0",
    },
}
result = call_validate_instances_shacl(client, data)

Or use call_validation(client, validation_type="instance", data=data) for the unified entry point. See Invoke for all invoke helpers and payload details.

Next steps