Rule sources
What this is
- Data Product + RuleSet (recommended): Manage and persist rules in the CDF RuleSet API, and bind them to Data Product versions for production lifecycle control.
- YAML (CDF Toolkit representation): Use YAML as the CDF Toolkit representation of Data Product and RuleSet bindings, plus deployment/runtime configuration (views, schedules, records, partitioning).
- TTL (legacy/transition): Direct path to a
.ttlfile (e.g.shacl_rules/my_view_shacl.ttl) while migrating to RuleSet-backed management. - JSON: Data contract rule set with
quality[].ruleDefinitions(JSON-LD).
See SHACL rules and CDF data models for how interfaces and sh:targetClass work with the loaded RDF graph, and what to do if a rule targets a class with zero instances (including the validation-time warning).
When to use each source
Choose source type by lifecycle maturity:
- Data Product + RuleSet as the default production model for persistence, versioning, and rollout.
- YAML as the CDF Toolkit representation that points to RuleSet references and carries runtime settings.
- TTL only as a legacy bridge until Data Product + RuleSet is rolled out everywhere.
- JSON when consuming external contract payloads already expressed as JSON-LD.
User mental model
- Author rules in one source form.
- Resolve to executable SHACL at runtime.
- Execute validation and/or inference in one engine pass.
Minimal happy path
Data Product + RuleSet (recommended)
Persist SHACL in the CDF RuleSet API and reference immutable versions from YAML:
shacl_rules:
ruleset_references:
- externalId: "my-view-shacl-rules"
version: "1.0.0"
- externalId: "shared-base-rules"
version: "2.1.0"
When config_source: "dataproduct" is enabled, deployment automatically publishes/reuses RuleSet versions and rewrites view/task payloads to use ruleset_references.
YAML (CDF Toolkit representation)
Use YAML for deployment/runtime contract (datamodel, instance scope, schedules, partitioning, records, and dataproduct bindings):
shacl_rules:
ruleset_references:
- externalId: "my-view-shacl-rules"
version: "1.0.0"
datamodel:
space: "my_space"
external_id: "MyDataModel"
version: "v1"
instance_space: "my_instances"
TTL (legacy/transition)
Use a Turtle file directly only when RuleSet-backed management is not yet available in your environment:
Pass datamodel and instance_space (and optional records_config) to run_validation() when using TTL.
JSON rule set
Use a JSON ruleset that contains:
This is set automatically when you deploy with config_source: "dataproduct" or reference external_dataproducts — the deploy function rewrites each view's shacl_rules to use ruleset_references pointing to the published RuleSet versions. The same applies to time series configs under timeseries/ when dataproducts: is set. You can also set ruleset_references manually if you have pre-published rule sets.
When both ruleset_references and file/external_id are present, ruleset_references takes precedence at runtime and deploy.
Time series configs
Timeseries YAML uses the same shacl_rules shape. In Data Product mode the handler receives ruleset_references in the workflow task payload; in YAML mode it uses shacl_rules_file_external_id.
# timeseries/my_sensors.yaml (Data Product mode — after publish)
shacl_rules:
ruleset_references:
- externalId: "my-sensors-shacl"
version: "1.0.0"
See Deploy — Time series datapoint rules in Data Product mode.
When both ruleset_references and file/external_id are absent, the handler falls back to inline Turtle (shacl_rules string field in the function payload).
Runtime behavior
Inference rules (SHACL-AF)
Quality shapes (sh:property, sh:sparql) and inference shapes (sh:rule) can coexist in the same TTL file. The engine runs both in a single pyshacl.validate(advanced=True) pass — no separate config is needed.
How it works
Each sh:SPARQLRule block uses sh:construct to materialise new RDF triples into the data graph. The engine then collects any subject typed dqs:RuleEngineResult and writes one RuleEngineResult record per result.
Mandatory predicates on a dqs:RuleEngineResult node
| Predicate | Required | Description |
|---|---|---|
dqs:focusNode |
yes | The focus node IRI ($this) |
dqs:ruleId |
yes | Short identifier used as ruleId in the record |
dqs:resultType |
yes | Typically "Inference" |
dqs:resultValue |
no | Scalar audit value (e.g. "Critical", "Overdue") |
dqs:resultPayload |
no | Stringified JSON body |
dqs:causedBy |
no | IRI of an upstream RuleEngineResult (lineage) |
Example: criticality + overdue chained rules
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix dqs: <http://purl.org/cognite/dqs#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sp: <http://purl.org/cognite/my_space/WorkOrder/> .
sp:WorkOrderInferenceRules
a sh:NodeShape ;
sh:targetClass sp:WorkOrder ;
# Rule 1 — classify priority into a criticality label
sh:rule [
a sh:SPARQLRule ;
sh:order 1 ;
dqs:ruleId "WO_Criticality" ;
sh:construct """
PREFIX sp: <http://purl.org/cognite/my_space/WorkOrder/>
PREFIX dqs: <http://purl.org/cognite/dqs#>
CONSTRUCT {
?result a dqs:RuleEngineResult ;
dqs:focusNode $this ;
dqs:ruleId "WO_Criticality" ;
dqs:resultType "Inference" ;
dqs:resultValue ?label .
}
WHERE {
$this sp:priority ?p .
BIND(IF(?p <= 2, "Critical", IF(?p = 3, "High", "Normal")) AS ?label)
BIND(IRI(CONCAT("urn:dqs:wo:criticality:", STR($this))) AS ?result)
}
""" ;
] ;
# Rule 2 — flag open orders past their end time; link back to criticality result
sh:rule [
a sh:SPARQLRule ;
sh:order 2 ;
dqs:ruleId "WO_Overdue" ;
dqs:dependsOn "WO_Criticality" ;
sh:construct """
PREFIX sp: <http://purl.org/cognite/my_space/WorkOrder/>
PREFIX dqs: <http://purl.org/cognite/dqs#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
CONSTRUCT {
?result a dqs:RuleEngineResult ;
dqs:focusNode $this ;
dqs:ruleId "WO_Overdue" ;
dqs:resultType "Inference" ;
dqs:resultValue "Overdue" ;
dqs:causedBy ?critResult .
}
WHERE {
$this sp:status ?status ;
sp:scheduledEndTime ?endTime .
FILTER(?status IN ("CREATED", "IN_PROGRESS"))
FILTER(xsd:dateTime(?endTime) < NOW())
OPTIONAL {
?critResult a dqs:RuleEngineResult ;
dqs:focusNode $this ;
dqs:ruleId "WO_Criticality" .
}
BIND(IRI(CONCAT("urn:dqs:wo:overdue:", STR($this))) AS ?result)
}
""" ;
] .
dqs:dependsOn and sh:order
Use dqs:dependsOn "<ruleId>" to express that a rule depends on the output of a prior rule within the same sh:NodeShape. The deploy-time validator (deploy_validation_infrastructure) enforces:
- The referenced
ruleIdexists in the same shape. sh:orderof the downstream rule is strictly greater than the upstream rule.- No cycles.
This is checked at deploy time only — at runtime pyshacl uses sh:order natively to sequence execution.
Global uniqueness rules (SHACL-native)
Declare global uniqueness in SHACL with dqs:uniquenessConstraint or dqs:unique. The deploy/runtime flow discovers
these blocks and runs aggregate-first uniqueness as a dedicated scheduled workflow (not after each sync run).
Hard requirement: Uniqueness can only be used on properties that have an index defined in CDF Data Modeling. See CDF Data Modeling - Indexes.
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix dqs: <http://purl.org/cognite/dqs#> .
@prefix sp: <http://purl.org/cognite/my_space/WorkOrder/> .
sp:WorkOrderUniqueness
a sh:NodeShape ;
sh:targetClass sp:WorkOrder ;
dqs:uniquenessConstraint [
dqs:property "workOrderNumber" ;
] .
Guidelines:
- Keep uniqueness semantics in SHACL/RuleSet, not in view YAML fields.
- dqs:property is required.
- dqs:property must be indexed in CDF Data Modeling (hard requirement).
- Runtime ignores aggregate buckets where grouped property value is null.
- Runtime processes at most 100 duplicate values per run; fix reported duplicates and rerun to surface additional values.
- Runtime suppresses repeated failure writes for unchanged focusNodes (based on record vs instance lastUpdatedTime).
- Use SHACL/RuleSet as the single source of truth so DataProduct mode works without local YAML rule definitions.
- Do not model uniqueness as per-instance sh:sparql over $this; this risks incorrect semantics or O(n) aggregate calls.
Interfaces and target class
When writing SHACL rules for CDF data models, avoid using sh:targetClass with an interface name (e.g. MMSINumber). The RDF graph is built from views, so nodes have concrete view types (e.g. NavigationAid, ShoreStation), not interface types. A rule that targets the interface will match no nodes and never report violations.
When you run validation, the runner warns if any rule targets a class that has zero instances in the graph. See SHACL rules and CDF data models for details and how to fix it (target concrete views or use a SPARQL-based sh:target).
Best practices
- Keep one source of truth per deployment mode (file-based or RuleSet API).
- Prefer
ruleset_referencesin DataProduct mode for immutable versioned execution. - Keep
dqs:ruleIdstable across versions for analytics continuity.
Troubleshooting
- Rules not found at runtime: confirm
ruleset_references/fileprecedence and payload fields. - Missing inference output: ensure
sh:ruleconstructsdqs:RuleEngineResult. - Missing uniqueness execution: ensure indexed property + uniqueness shape exists in active rules.