Deploy
Quick flow
- Develop rules – Use
run_validation()to validate against DMS. Iterate on rules until results are acceptable. - Deploy – Use
deploy_validation_infrastructure()to deploy the function, workflows, triggers, and data quality containers. - Pipeline (optional) – Use
deploy_validation_pipeline()to run historic validation for a view and set up sync and monitor schedules. - Invoke – Use the invoke helpers to call deployed functions from Python when needed.
Use the same view config (YAML) and SHACL rules for both run_validation and deploy so behavior stays consistent.
Deploy validation infrastructure (recommended)
Deploy all validation infrastructure for an environment in one call:
from pathlib import Path
from cognite_data_quality import deploy_validation_infrastructure, load_cognite_client_from_toml
client = load_cognite_client_from_toml("config.toml")
config_root = Path("config/environments/my_env")
settings_path = config_root / "settings.yaml"
views_dir = config_root / "views"
# Orchestrator needs credentials to create triggers
function_secrets = {
"client-id": "your-client-id",
"client-secret": "your-client-secret",
}
deploy_validation_infrastructure(
client=client,
settings_path=settings_path,
views_dir=views_dir,
function_secrets=function_secrets,
force=False,
dry_run=False,
)
This call:
- Ensures Records API container (
dataQuality/DataQualityValidationRecord) - Ensures state containers (
OrchestrationState,FunctionValidationState) and deploys the monitoring data model and views - Deploys the unified validation function (with embedded handler code)
- Deploys instance validation workflows and triggers
- Deploys time series validation workflows when
timeseries_diris provided - Uploads SHACL rules to CDF Files as needed
- Uploads each view config YAML to CDF Files (
{view_external_id}_view_config, e.g.Pump_view_config) - Uploads the environment settings to CDF Files (
{function_external_id}_settings, e.g.data-quality-validation_settings)
You can extract credentials from the client when available:
creds = client.config.credentials
if hasattr(creds, "client_id") and hasattr(creds, "client_secret"):
function_secrets = {"client-id": creds.client_id, "client-secret": creds.client_secret}
else:
function_secrets = None # Orchestrator triggers will fail without secrets
Deploy validation pipeline (historic + incremental)
After infrastructure is deployed, run the full validation pipeline for a view (historic partitions, sync trigger, monitor schedule).
From CDF Files (recommended — no local files needed)
Because deployment uploads view configs and settings to CDF Files, you can invoke the pipeline from any environment without needing the local config directory:
from cognite_data_quality import deploy_validation_pipeline
result = deploy_validation_pipeline(
client,
view_config_external_id="Pump_view_config", # uploaded during deploy
)
Settings are automatically fetched from data-quality-validation_settings in CDF Files. Override with settings_external_id if your function uses a non-standard external ID:
result = deploy_validation_pipeline(
client,
view_config_external_id="Pump_view_config",
settings_external_id="my-function_settings",
)
From local YAML files
result = deploy_validation_pipeline(
client,
settings_path=str(settings_path),
view_external_id="MyView",
wait=True,
)
# result: orchestration_id, status, partitions_triggered, partition_count,
# sync_trigger_external_id, monitor_schedule_name, distribution, ...
Settings and view config are read from settings_path and the corresponding view YAML under the same environment. The pipeline triggers partitioned validation for historic data and sets up the sync trigger and monitor schedule for incremental runs.
Advanced parameters
Function secrets
Required for the orchestrator to create triggers. Pass them explicitly or via env vars (COGNITE_CLIENT_ID, COGNITE_CLIENT_SECRET):
deploy_validation_infrastructure(
client=client,
settings_path=settings_path,
views_dir=views_dir,
function_secrets={"client-id": "...", "client-secret": "..."},
)
Custom function external ID
Override the default function external ID (e.g. when multiple environments share a project):
deploy_validation_infrastructure(
client=client,
settings_path=settings_path,
views_dir=views_dir,
function_external_id="data-quality-validation-myenv",
)
Time series and SHACL dirs
Optional directories for time series configs and SHACL rules:
deploy_validation_infrastructure(
client=client,
settings_path=settings_path,
views_dir=views_dir,
timeseries_dir=config_root / "timeseries",
shacl_rules_dir=config_root / "shacl_rules",
)
Dry run and force
dry_run=True– Log what would be deployed without making changes.force=True– Redeploy even when content hashes match (useful after dependency or config changes).
Incremental deployment (single view)
For deploying only the function and workflow for a single view config, see the API reference for deploy_incremental(). It takes a view config path and optional config_env; the main recommended path is deploy_validation_infrastructure() plus deploy_validation_pipeline() as above.