Skip to content

Where to Start

Use this package to run SHACL validation locally, deploy validation infrastructure to CDF, and run the validation pipeline.

This quickstart aligns with the Usage docs model:

  • Data Product + RuleSet is the primary rule management and persistence model.
  • YAML is the CDF Toolkit representation of Data Product/RuleSet bindings and runtime settings.
  • Direct TTL usage is a legacy transition path.

Quick Start Guide

1. Installation

pip install cognite-data-quality

After install, restart your kernel or interpreter so imports pick up the package.

2. Setup credentials

Create a TOML file with your CDF credentials (e.g. config.toml):

[cognite]
project = "your-project"
client_id = "your-client-id"
client_secret = "your-client-secret"

Load the client:

import cognite_data_quality

client = cognite_data_quality.load_cognite_client_from_toml("config.toml")
client.config.timeout = 300  # Optional: for large function deployments

3. Run validation locally

Use run_validation() to validate instances from DMS without deploying any Cognite Functions or workflows. This runs entirely in your local Python environment.

import cognite_data_quality

result = cognite_data_quality.run_validation(
    client=client,
    rules_path="views/my_view.yaml",
    rules_format="yaml",
    post_to_records=True,
    records_config=cognite_data_quality.RecordsConfig(
        stream_id="dq_validation_stream",
        rule_set_id="MyViewSHACLv1",
        rule_set_version="1.0",
    ),
    limit=10,
    print_output=True,
)

print("Instance count:", result.instance_count)
print("Violations:", len([v for v in result.violations if (v.resultSeverity or "").endswith("Violation")]))

Recommended: use YAML view config + RuleSet references so local validation and deployment use the same contract. Direct TTL path remains supported as a legacy transition option.

4. Production: deploy infrastructure

Deploy all validation infrastructure (containers, function, workflows, triggers):

from pathlib import Path
import cognite_data_quality

config_root = Path(".")  # Directory containing settings.yaml
settings_path = config_root / "settings.yaml"
views_dir = config_root / "views"

# Credentials for function (orchestrator needs them for triggers)
creds = client.config.credentials
function_secrets = None
if hasattr(creds, "client_id") and hasattr(creds, "client_secret"):
    function_secrets = {
        "client-id": creds.client_id,
        "client-secret": creds.client_secret,
    }

result = cognite_data_quality.deploy_validation_infrastructure(
    client=client,
    settings_path=settings_path,
    views_dir=views_dir,
    function_secrets=function_secrets,
    dry_run=False,
    force=False,
)

This ensures Records, OrchestrationState, and FunctionValidationState containers; deploys the monitoring data model and views; deploys the unified validation function and instance (and optional time series) workflows and triggers.

5. Deploy validation pipeline (historic + incremental)

Run the full validation pipeline for a view (historic partitions, sync trigger, monitor schedule):

result = cognite_data_quality.deploy_validation_pipeline(
    client,
    settings_path=str(settings_path),
    view_external_id="LargeBoat",  # or "SmallBoat", "NavigationAid", etc.
    wait=True,
)
print("Orchestration ID:", result.get("orchestration_id"))
print("Partitions triggered:", result.get("partitions_triggered"), "/", result.get("partition_count"))

6. Invoke deployed CDF Functions

Once infrastructure is deployed, you can invoke the validation function running in CDF from Python. These helpers send the payload to the deployed Cognite Function — they do not run validation locally.

from cognite_data_quality import call_validate_instances_shacl

data = {
    "instances": {"items": {"n": [...]}},
    "ruleset_references": [{"externalId": "my-ruleset", "version": "1.0.0"}],
    "datamodel_space": "my_space",
    "datamodel_external_id": "MyDataModel",
    "datamodel_version": "v1",
    "records_config": {
        "stream_id": "dq_validation_stream",
        "rule_set_id": "MyViewSHACLv1",
        "rule_set_version": "1.0",
    },
}
result = call_validate_instances_shacl(client, data)

Or use call_validation(client, validation_type="instance", data=data) for the unified entry point. See Invoke for all invoke helpers and payload details.

Optional: global uniqueness

To validate that a property value (e.g. workOrderNumber) appears only once across a view, declare dqs:uniquenessConstraint in your SHACL/RuleSet and redeploy. A scheduled dq-{view}-uniqueness workflow is created automatically when constraints are present. See Uniqueness.

Next steps