Skip to content

Rule sources

What this is

  • Data Product + RuleSet (recommended): Manage and persist rules in the CDF RuleSet API, and bind them to Data Product versions for production lifecycle control.
  • YAML (CDF Toolkit representation): Use YAML as the CDF Toolkit representation of Data Product and RuleSet bindings, plus deployment/runtime configuration (views, schedules, records, partitioning).
  • TTL (legacy/transition): Direct path to a .ttl file (e.g. shacl_rules/my_view_shacl.ttl) while migrating to RuleSet-backed management.
  • JSON: Data contract rule set with quality[].ruleDefinitions (JSON-LD).

See SHACL rules and CDF data models for how interfaces and sh:targetClass work with the loaded RDF graph, and what to do if a rule targets a class with zero instances (including the validation-time warning).

When to use each source

Choose source type by lifecycle maturity:

  • Data Product + RuleSet as the default production model for persistence, versioning, and rollout.
  • YAML as the CDF Toolkit representation that points to RuleSet references and carries runtime settings.
  • TTL only as a legacy bridge until Data Product + RuleSet is rolled out everywhere.
  • JSON when consuming external contract payloads already expressed as JSON-LD.

User mental model

  • Author rules in one source form.
  • Resolve to executable SHACL at runtime.
  • Execute validation and/or inference in one engine pass.

Minimal happy path

Persist SHACL in the CDF RuleSet API and reference immutable versions from YAML:

shacl_rules:
  ruleset_references:
    - externalId: "my-view-shacl-rules"
      version: "1.0.0"
    - externalId: "shared-base-rules"
      version: "2.1.0"

When config_source: "dataproduct" is enabled, deployment automatically publishes/reuses RuleSet versions and rewrites view/task payloads to use ruleset_references.

YAML (CDF Toolkit representation)

Use YAML for deployment/runtime contract (datamodel, instance scope, schedules, partitioning, records, and dataproduct bindings):

shacl_rules:
  ruleset_references:
    - externalId: "my-view-shacl-rules"
      version: "1.0.0"
datamodel:
  space: "my_space"
  external_id: "MyDataModel"
  version: "v1"
instance_space: "my_instances"

TTL (legacy/transition)

Use a Turtle file directly only when RuleSet-backed management is not yet available in your environment:

rules_path = "shacl_rules/my_view_shacl.ttl"

Pass datamodel and instance_space (and optional records_config) to run_validation() when using TTL.

JSON rule set

Use a JSON ruleset that contains:

{
  "quality": [
    {
      "ruleSetExternalId": "...",
      "ruleSetVersion": "1.0",
      "ruleDefinitions": []
    }
  ]
}

This is set automatically when you deploy with config_source: "dataproduct" or reference external_dataproducts — the deploy function rewrites each view's shacl_rules to use ruleset_references pointing to the published RuleSet versions. The same applies to time series configs under timeseries/ when dataproducts: is set. You can also set ruleset_references manually if you have pre-published rule sets.

When both ruleset_references and file/external_id are present, ruleset_references takes precedence at runtime and deploy.

Time series configs

Timeseries YAML uses the same shacl_rules shape. In Data Product mode the handler receives ruleset_references in the workflow task payload; in YAML mode it uses shacl_rules_file_external_id.

# timeseries/my_sensors.yaml (Data Product mode — after publish)
shacl_rules:
  ruleset_references:
    - externalId: "my-sensors-shacl"
      version: "1.0.0"

See Deploy — Time series datapoint rules in Data Product mode.

When both ruleset_references and file/external_id are absent, the handler falls back to inline Turtle (shacl_rules string field in the function payload).

Runtime behavior

Inference rules (SHACL-AF)

Quality shapes (sh:property, sh:sparql) and inference shapes (sh:rule) can coexist in the same TTL file. The engine runs both in a single pyshacl.validate(advanced=True) pass — no separate config is needed.

How it works

Each sh:SPARQLRule block uses sh:construct to materialise new RDF triples into the data graph. The engine then collects any subject typed dqs:RuleEngineResult and writes one RuleEngineResult record per result.

Mandatory predicates on a dqs:RuleEngineResult node

Predicate Required Description
dqs:focusNode yes The focus node IRI ($this)
dqs:ruleId yes Short identifier used as ruleId in the record
dqs:resultType yes Typically "Inference"
dqs:resultValue no Scalar audit value (e.g. "Critical", "Overdue")
dqs:resultPayload no Stringified JSON body
dqs:causedBy no IRI of an upstream RuleEngineResult (lineage)

Example: criticality + overdue chained rules

@prefix sh:  <http://www.w3.org/ns/shacl#> .
@prefix dqs: <http://purl.org/cognite/dqs#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix sp:  <http://purl.org/cognite/my_space/WorkOrder/> .

sp:WorkOrderInferenceRules
    a sh:NodeShape ;
    sh:targetClass sp:WorkOrder ;

    # Rule 1 — classify priority into a criticality label
    sh:rule [
        a sh:SPARQLRule ;
        sh:order 1 ;
        dqs:ruleId "WO_Criticality" ;
        sh:construct """
            PREFIX sp:  <http://purl.org/cognite/my_space/WorkOrder/>
            PREFIX dqs: <http://purl.org/cognite/dqs#>
            CONSTRUCT {
                ?result a dqs:RuleEngineResult ;
                        dqs:focusNode   $this ;
                        dqs:ruleId      "WO_Criticality" ;
                        dqs:resultType  "Inference" ;
                        dqs:resultValue ?label .
            }
            WHERE {
                $this sp:priority ?p .
                BIND(IF(?p <= 2, "Critical", IF(?p = 3, "High", "Normal")) AS ?label)
                BIND(IRI(CONCAT("urn:dqs:wo:criticality:", STR($this))) AS ?result)
            }
        """ ;
    ] ;

    # Rule 2 — flag open orders past their end time; link back to criticality result
    sh:rule [
        a sh:SPARQLRule ;
        sh:order 2 ;
        dqs:ruleId    "WO_Overdue" ;
        dqs:dependsOn "WO_Criticality" ;
        sh:construct """
            PREFIX sp:  <http://purl.org/cognite/my_space/WorkOrder/>
            PREFIX dqs: <http://purl.org/cognite/dqs#>
            PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
            CONSTRUCT {
                ?result a dqs:RuleEngineResult ;
                        dqs:focusNode   $this ;
                        dqs:ruleId      "WO_Overdue" ;
                        dqs:resultType  "Inference" ;
                        dqs:resultValue "Overdue" ;
                        dqs:causedBy    ?critResult .
            }
            WHERE {
                $this sp:status ?status ;
                      sp:scheduledEndTime ?endTime .
                FILTER(?status IN ("CREATED", "IN_PROGRESS"))
                FILTER(xsd:dateTime(?endTime) < NOW())
                OPTIONAL {
                    ?critResult a dqs:RuleEngineResult ;
                                dqs:focusNode $this ;
                                dqs:ruleId "WO_Criticality" .
                }
                BIND(IRI(CONCAT("urn:dqs:wo:overdue:", STR($this))) AS ?result)
            }
        """ ;
    ] .

dqs:dependsOn and sh:order

Use dqs:dependsOn "<ruleId>" to express that a rule depends on the output of a prior rule within the same sh:NodeShape. The deploy-time validator (deploy_validation_infrastructure) enforces:

  • The referenced ruleId exists in the same shape.
  • sh:order of the downstream rule is strictly greater than the upstream rule.
  • No cycles.

This is checked at deploy time only — at runtime pyshacl uses sh:order natively to sequence execution.

Global uniqueness rules (SHACL-native)

Declare global uniqueness in SHACL with dqs:uniquenessConstraint or dqs:unique. The deploy/runtime flow discovers these blocks and runs aggregate-first uniqueness as a dedicated scheduled workflow (not after each sync run).

Hard requirement: Uniqueness can only be used on properties that have an index defined in CDF Data Modeling. See CDF Data Modeling - Indexes.

@prefix sh:  <http://www.w3.org/ns/shacl#> .
@prefix dqs: <http://purl.org/cognite/dqs#> .
@prefix sp:  <http://purl.org/cognite/my_space/WorkOrder/> .

sp:WorkOrderUniqueness
    a sh:NodeShape ;
    sh:targetClass sp:WorkOrder ;
    dqs:uniquenessConstraint [
        dqs:property "workOrderNumber" ;
    ] .

Guidelines: - Keep uniqueness semantics in SHACL/RuleSet, not in view YAML fields. - dqs:property is required. - dqs:property must be indexed in CDF Data Modeling (hard requirement). - Runtime ignores aggregate buckets where grouped property value is null. - Runtime processes at most 100 duplicate values per run; fix reported duplicates and rerun to surface additional values. - Runtime suppresses repeated failure writes for unchanged focusNodes (based on record vs instance lastUpdatedTime). - Use SHACL/RuleSet as the single source of truth so DataProduct mode works without local YAML rule definitions. - Do not model uniqueness as per-instance sh:sparql over $this; this risks incorrect semantics or O(n) aggregate calls.

Interfaces and target class

When writing SHACL rules for CDF data models, avoid using sh:targetClass with an interface name (e.g. MMSINumber). The RDF graph is built from views, so nodes have concrete view types (e.g. NavigationAid, ShoreStation), not interface types. A rule that targets the interface will match no nodes and never report violations.

When you run validation, the runner warns if any rule targets a class that has zero instances in the graph. See SHACL rules and CDF data models for details and how to fix it (target concrete views or use a SPARQL-based sh:target).

Best practices

  • Keep one source of truth per deployment mode (file-based or RuleSet API).
  • Prefer ruleset_references in DataProduct mode for immutable versioned execution.
  • Keep dqs:ruleId stable across versions for analytics continuity.

Troubleshooting

  • Rules not found at runtime: confirm ruleset_references/file precedence and payload fields.
  • Missing inference output: ensure sh:rule constructs dqs:RuleEngineResult.
  • Missing uniqueness execution: ensure indexed property + uniqueness shape exists in active rules.

Previous section

Next section