Skip to content

Config (TOML and view config)

TOML credentials

Create a TOML file in your working directory (e.g. config.toml):

[cognite]
project = "<cdf-project>"
client_id = "<client-id>"
client_secret = "<client-secret>"

Optional fields (depending on your login flow):

[cognite]
tenant_id = "<tenant-id>"
cdf_cluster = "<cdf-cluster>"   # e.g. "api", "westeurope-1"
login_flow = "client_credentials"

Load the client:

from cognite_data_quality import load_cognite_client_from_toml

client = load_cognite_client_from_toml("config.toml")

One TOML file per environment or project.

View config (YAML)

YAML is the CDF Toolkit representation of Data Product and RuleSet bindings, plus deployment/runtime contract details. View configs live next to your rule artifacts (e.g. under config/environments/<env>/views/). Each YAML file describes one view: data model, rule references, partition settings, schedules, and optional records config.

  • settings.yaml – environment settings (records stream, workflow prefix, etc.)
  • views/my_view.yaml – view-specific config (datamodel, SHACL rules, partition field)

When you call deploy_validation_infrastructure(settings_path=..., views_dir=...), the package reads all view YAMLs from views_dir and deploys workflows and triggers accordingly. When you call run_validation(rules_path="path/to/view.yaml", rules_format="yaml"), datamodel and instance space are taken from that YAML.

Recommended persistence model:

  • Persist rules in Data Product + RuleSet (ruleset_references).
  • Use YAML for deployment/runtime behavior.
  • Use direct TTL file references only as a legacy migration path.

Uniqueness rules are declared in SHACL/RuleSet TTL (dqs:uniquenessConstraint or dqs:unique), not in view YAML. Deploy scans SHACL at deploy time and creates a scheduled uniqueness workflow only when constraints exist. Optional YAML field uniqueness_cron can override or disable the schedule — see Uniqueness.

Standard view config

view:
  space: "my_space"
  external_id: "MyView"
  version: "v1"

instance_space: "my_instances"

shacl_rules:
  ruleset_references:
    - externalId: "my-view-shacl-rules"
      version: "1.0.0"

datamodel:
  space: "my_space"
  external_id: "MyDataModel"
  version: "v1"

validation:
  auto_load_depth: 2
  verbose: true

partition_count: 10
partition_field: lastUpdatedTime
chunk_size: 200

records:
  rule_set_id: "MyViewSHACLv1"
  rule_set_version: "1.0"
  # Optional: write records into the validated instance's space instead of the default records space
  use_instance_space: false
  # Optional: override the space used to write records (overrides settings.records.space)
  records_space: null

# Optional: declare which DataProduct(s) this view belongs to.
# Used when config_source: "dataproduct" in settings.yaml — the deploy function
# publishes SHACL rules to the RuleSet API and this view to the DataProduct API.
# dataproducts:
#   - external_id: "my-data-product"
#     version: "1.0.0"

Legacy file-based rule reference (transition only)

Use this only while migrating environments that do not yet persist rules via Data Product + RuleSet:

shacl_rules:
  file: "my_view_shacl.ttl"
  external_id: "my_view_shacl"

When config_source: "dataproduct" is set in settings.yaml, the view YAML uses the same fields as the standard config above. The only difference is that records: should be omitted — rule_set_id and rule_set_version are derived automatically from the DataProduct external_id and version. Add a dataproducts: block to declare which DataProduct(s) the view belongs to:

view:
  space: "my_space"
  external_id: "MyView"
  version: "v1"

instance_space: "my_instances"

shacl_rules:
  ruleset_references:
    - externalId: "my-view-shacl-rules"
      version: "1.0.0"

validation:
  auto_load_depth: 2
  verbose: true

partition_count: 10
partition_field: lastUpdatedTime
chunk_size: 200
sync_batch_size: 1000
use_sync_cursor_mode: true
max_concurrent_executions: 1

dataproducts:
  - external_id: "my-data-product"
    version: "1.0.0"
    schema_space: "my_schema_space"

When external_dataproducts is set in settings.yaml, view configs are built automatically from the CDF DataProduct API — no local YAML needed. See Deploy — externally-owned DataProducts.

When using the RuleSet API (DataProduct mode), shacl_rules should use ruleset_references:

shacl_rules:
  ruleset_references:
    - externalId: "my-view-shacl-rules"
      version: "1.0.0"

See Rule sources for details.

Sync cursor mode

For views with frequent updates, enable cursor-based incremental validation. The workflow trigger fires as a lightweight signal (no instance data in payload); the function fetches and validates all changes since the last saved cursor.

# The view, instance_space, shacl_rules, datamodel, validation, and records
# sections are the same as in the standard config above. partition_count and
# partition_field are not used in sync cursor mode. Add the following fields:

chunk_size: 1000
sync_batch_size: 1000   # Trigger batch size (signal only, no instance payload)
use_sync_cursor_mode: true
max_concurrent_executions: 1  # Required — prevents cursor race conditions

initial_sync_cutoff is not set in the YAML; the orchestrator injects it automatically when creating the sync trigger to align with the historic validation cutoff date.

See Rule sources and SHACL and CDF data models for how rules and views interact.

Environment variable interpolation in YAML

Deployment YAML files support environment variable interpolation in string values:

  • ${VAR}: required environment variable
  • ${VAR:-default}: optional variable with fallback default

Example:

runtime_credentials:
  workflows:
    client_id: ${DQ_WORKFLOW_CLIENT_ID}
    client_secret: ${DQ_WORKFLOW_CLIENT_SECRET}
  schedules:
    client_id: ${DQ_SCHEDULE_CLIENT_ID:-fallback-client}
    client_secret: ${DQ_SCHEDULE_CLIENT_SECRET:-fallback-secret}

If a required ${VAR} is missing, loading the YAML raises a validation error.