Skip to content

Run validation

What this is

run_validation() reads instances from CDF DMS and validates them against SHACL rules. Use it for development, quick checks, or headless runs. No workflows or functions are required.

Note: run_validation() runs per-instance pyshacl on a limited sample. Global uniqueness (dqs:uniquenessConstraint in SHACL) requires the aggregate executor via deployed CDF Functions — see Uniqueness.

When to use it

Use run_validation() when you need fast iteration on rules and immediate feedback in Python.

Use deployed workflows when you need scheduled production execution, global uniqueness handling, and long-running orchestration.

User mental model

  • Input: SHACL rules + model context (datamodel, instance_space)
  • Execution: validate selected instances in-process
  • Output: ValidationResult (and optionally Records API writes)

Minimal happy path

Use RuleSet references (Data Product + RuleSet model) via YAML view config:

result = run_validation(
    client=client,
    rules_path="views/my_view.yaml",
    rules_format="yaml",
)

When YAML contains shacl_rules.ruleset_references, runtime loads immutable RuleSet versions directly.

TTL rules with explicit config (legacy/transition)

Use a TTL file and pass datamodel, instance space, and optional records config:

from pathlib import Path
from cognite_data_quality import (
    run_validation,
    DataModelConfig,
    RecordsConfig,
)

result = run_validation(
    client=client,
    rules_path="shacl_rules/my_view_shacl.ttl",
    rules_format="ttl",
    datamodel=DataModelConfig(space="my_space", external_id="MyDataModel", version="v1"),
    instance_space="my_instances",
    records_config=RecordsConfig(
        stream_id="dq_validation_stream",
        rule_set_id="MyViewSHACLv1",
        rule_set_version="1.0",
    ),
    limit=10,
    print_output=True,
    post_to_records=False,
)

print("Conforms:", result.conforms)
print("Instance count:", result.instance_count)
violations = [v for v in result.violations if (v.resultSeverity or "").endswith("Violation")]
warnings = [v for v in result.violations if (v.resultSeverity or "").endswith("Warning")]
print("Violations:", len(violations))
print("Warnings:", len(warnings))

When post_to_records=True, you must provide records_config with at least stream_id. The package will ensure the Records stream/container and post each validation result as a record.

YAML view config (CDF Toolkit deployment contract)

Use the same YAML view config as deployment so datamodel and instance space come from the file. In the recommended model, YAML references RuleSet versions and carries deployment/runtime settings.

result = run_validation(
    client=client,
    rules_path="views/my_view.yaml",
    rules_format="yaml",
)

rules_format can be omitted when inferrable from the file extension (.ttl, .yaml, .json).

Runtime behavior

run_validation() returns a ValidationResult with:

  • conformsTrue if no violations
  • violations – List of violation objects (each has resultSeverity, resultMessage, focusNode, etc.)
  • report_text – Text report
  • instance_count – Number of instances validated
  • records – Optional list of record payloads when posting to Records

Listing violations and warnings

violations = [v for v in result.violations if (v.resultSeverity or "").endswith("Violation")]
warnings = [v for v in result.violations if (v.resultSeverity or "").endswith("Warning")]

for v in violations[:10]:
    print(v.resultMessage, "—", v.focusNode)

Parameters summary

Parameter Description
client Cognite client
rules_path Path to TTL, JSON, or YAML rules/config
rules_format "ttl", "json", or "yaml" (optional if inferrable from path)
datamodel DataModelConfig(space, external_id, version); required if not in YAML
instance_space Instance space in CDF; required if not in YAML
records_config RecordsConfig(stream_id, rule_set_id, rule_set_version) for Records output
post_to_records If True, post results to Records (requires records_config)
limit Max instances to validate; None = no limit
print_output If True, print per-instance validation output
verbose Verbose logging (default True)

Best practices

  • Keep one canonical YAML view config and use it in both local validation and deployment.
  • Keep limit small during iteration, then increase for broader checks.
  • Use post_to_records=True only when you want persistent run outputs.

Troubleshooting

  • No violations but expected failures: confirm target class has instances in the selected space/model.
  • Missing record writes: verify post_to_records=True and records_config.stream_id.
  • Uniqueness not detected: run scheduled uniqueness path instead of local per-instance validation.

Previous section

Next section