Run validation
What this is
run_validation() reads instances from CDF DMS and validates them against SHACL rules. Use it for development, quick checks, or headless runs. No workflows or functions are required.
Note:
run_validation()runs per-instance pyshacl on a limited sample. Global uniqueness (dqs:uniquenessConstraintin SHACL) requires the aggregate executor via deployed CDF Functions — see Uniqueness.
When to use it
Use run_validation() when you need fast iteration on rules and immediate feedback in Python.
Use deployed workflows when you need scheduled production execution, global uniqueness handling, and long-running orchestration.
User mental model
- Input: SHACL rules + model context (
datamodel,instance_space) - Execution: validate selected instances in-process
- Output:
ValidationResult(and optionally Records API writes)
Minimal happy path
RuleSet references (recommended)
Use RuleSet references (Data Product + RuleSet model) via YAML view config:
When YAML contains shacl_rules.ruleset_references, runtime loads immutable RuleSet versions directly.
TTL rules with explicit config (legacy/transition)
Use a TTL file and pass datamodel, instance space, and optional records config:
from pathlib import Path
from cognite_data_quality import (
run_validation,
DataModelConfig,
RecordsConfig,
)
result = run_validation(
client=client,
rules_path="shacl_rules/my_view_shacl.ttl",
rules_format="ttl",
datamodel=DataModelConfig(space="my_space", external_id="MyDataModel", version="v1"),
instance_space="my_instances",
records_config=RecordsConfig(
stream_id="dq_validation_stream",
rule_set_id="MyViewSHACLv1",
rule_set_version="1.0",
),
limit=10,
print_output=True,
post_to_records=False,
)
print("Conforms:", result.conforms)
print("Instance count:", result.instance_count)
violations = [v for v in result.violations if (v.resultSeverity or "").endswith("Violation")]
warnings = [v for v in result.violations if (v.resultSeverity or "").endswith("Warning")]
print("Violations:", len(violations))
print("Warnings:", len(warnings))
When post_to_records=True, you must provide records_config with at least stream_id. The package will ensure the Records stream/container and post each validation result as a record.
YAML view config (CDF Toolkit deployment contract)
Use the same YAML view config as deployment so datamodel and instance space come from the file. In the recommended model, YAML references RuleSet versions and carries deployment/runtime settings.
rules_format can be omitted when inferrable from the file extension (.ttl, .yaml, .json).
Runtime behavior
run_validation() returns a ValidationResult with:
conforms–Trueif no violationsviolations– List of violation objects (each hasresultSeverity,resultMessage,focusNode, etc.)report_text– Text reportinstance_count– Number of instances validatedrecords– Optional list of record payloads when posting to Records
Listing violations and warnings
violations = [v for v in result.violations if (v.resultSeverity or "").endswith("Violation")]
warnings = [v for v in result.violations if (v.resultSeverity or "").endswith("Warning")]
for v in violations[:10]:
print(v.resultMessage, "—", v.focusNode)
Parameters summary
| Parameter | Description |
|---|---|
client |
Cognite client |
rules_path |
Path to TTL, JSON, or YAML rules/config |
rules_format |
"ttl", "json", or "yaml" (optional if inferrable from path) |
datamodel |
DataModelConfig(space, external_id, version); required if not in YAML |
instance_space |
Instance space in CDF; required if not in YAML |
records_config |
RecordsConfig(stream_id, rule_set_id, rule_set_version) for Records output |
post_to_records |
If True, post results to Records (requires records_config) |
limit |
Max instances to validate; None = no limit |
print_output |
If True, print per-instance validation output |
verbose |
Verbose logging (default True) |
Best practices
- Keep one canonical YAML view config and use it in both local validation and deployment.
- Keep
limitsmall during iteration, then increase for broader checks. - Use
post_to_records=Trueonly when you want persistent run outputs.
Troubleshooting
- No violations but expected failures: confirm target class has instances in the selected space/model.
- Missing record writes: verify
post_to_records=Trueandrecords_config.stream_id. - Uniqueness not detected: run scheduled uniqueness path instead of local per-instance validation.