Skip to main content

Advanced Configuration

All DataOps Runners including SOLE support Jinja templating to render YAML files.
Vault Credentials usage is another use-case of Jinja in DataOps.

Jinja can be used to compute and generate values at.
Jinja documentation can be referred for advanced configuration and use-cases as per one's need.

Below are few examples of Jinja template engine to generate definitions at run-time.
Refer to Jinja documentation for list of available filters, tests, etc.

Condition Specific Configuration#

Branch Specific Configuration#

Jinja Allows user to add configuration which is only enabled in specific environment.

In DataOps there are 4 types of environment:

  • PROD:
    This environment is created by pipelines run on master branch.

  • QA:
    This environment is created by pipelines run on qa branch.

  • DEV:
    This environment is created by pipelines run on dev branch.

  • FB:
    • This environment is created by pipelines run on any branch other than master, qa or dev.
    • The feature environment names are in format FB_BRANCH_NAME.
    • The branch name is in upper-case.

The following examples illustrate the usage of some branch-specific clauses, but the conditions can be utilized as per requirement.

PROD Only#

In the following example the namespacing parameter is set to prefix if the environment is PROD else both is used.

roles:  INGESTION_ROLE:    {% if env.DATAOPS_ENV_NAME == 'PROD' %}    namespacing: prefix    {% else %}    namespacing: both    {% endif %}

QA Only#

In the following example role INGESTION_ROLE is granted USAGE on database TEST_DATABASE only when environment is QA.

databases:  TEST_DATABASE:    grants:      USAGE:        - ACCOUNTADMIN        - TRANSFORMATION_ROLE        {% if env.DATAOPS_ENV_NAME == 'QA' %}        - INGESTION_ROLE        {% endif %}

Dev Only#

In the following example manage_mode is set to grants on table TEST_TABLE only when environment is DEV.

databases:  TEST_DATABASE:    schemas:      TEST_SCHEMA:        tables:          TEST_TABLE:            {% if env.DATAOPS_ENV_NAME == "DEV" %}            manage_mode: grants            {% endif %}            grants:              SELECT:                - INGESTION_ROLE            columns:              TEST_COLUMN:                type: NUMBER(38,0)

FB Only#

In the following example role comment is added to schema TEST_SCHEMA only when environment is FB.

databases:  TEST_DATABASE:    schemas:      TEST_SCHEMA:        {% if env.DATAOPS_ENV_NAME.startswith('FB') %}        comment: "TEST SCHEMA"        {% endif %}

PROD and QA Only#

In the following example, table TEST_TABLE_1 is only created when environment is PROD or QA.

databases:  TEST_DATABASE:    schemas:      TEST_SCHEMA:        tables:          {% if env.DATAOPS_ENV_NAME == "PROD" or env.DATAOPS_ENV_NAME == "QA" %}          TEST_TABLE_1:            columns:              COLUMN_1:                type: NUMBER(38, 0)          {% endif %}          TEST_TABLE_2:            columns:              COLUMN_1:                type: NUMBER(38, 0)

Every Env except QA#

In the following example, column COLUMN_2 is only created when environment is not QA.

databases:  TEST_DATABASE:    schemas:      TEST_SCHEMA:        tables:          TEST_TABLE_1:            columns:              COLUMN_1:                type: NUMBER(38, 0)              {% if env.DATAOPS_ENV_NAME != "QA" %}              COLUMN_2:                type: NUMBER(38, 0)              {% endif %}

Every Env except FB#

In the following example, column COLUMN_2 is only created when environment is not FB.

databases:  TEST_DATABASE:    schemas:      TEST_SCHEMA:        tables:          TEST_TABLE_1:            columns:              COLUMN_1:                type: NUMBER(38, 0)              {% if not env.DATAOPS_ENV_NAME.startswith('FB') %}              COLUMN_2:                type: NUMBER(38, 0)              {% endif %}

Environment Specific value#

In the following example, the value of namespacing depends on the environment:

  • If environment is PROD, then namespacing is set to none.
  • If environment is QA, then namespacing is set to prefix.
  • If environment is DEV, then namespacing is set to suffix.
  • For all other environments, namespacing is set to both.
    roles:  INGESTION_ROLE:    {% if env.DATAOPS_ENV_NAME == 'PROD' %}    namespacing: none    {% elif env.DATAOPS_ENV_NAME == 'QA' %}    namespacing: prefix    {% elif env.DATAOPS_ENV_NAME == 'DEV' %}    namespacing: suffix    {% else %}    namespacing: both    {% endif %}

Using Configuration to Generate Values#

info

This functionality is only usable in SOLE

Variables can be defined in YAML files as configuration. These variables can be used at run-time to generate definitions for SOLE.

The directory path in which the such files are defined must be set in variable DATAOPS_CONFIGURATION_DIR.

info

YAML files in this directory do not support Jinja. These files are used variable declaration for Jinja

Object Parameter Calculation#

Defined variables can be used to generate value for a parameter of an Object.

In the following example we have the values.yaml defined to be used as variables

warehouse_sizes:    - "X-SMALL"    - "SMALL"    - "MEDIUM"    - "LARGE"    - "X-LARGE"    - "2X-LARGE"    - "3X-LARGE"    - "4X-LARGE"cluster_count:    "X-SMALL":       max_cluster_count: 1      min_cluster_count: 1    "SMALL":       max_cluster_count: 2      min_cluster_count: 1    "MEDIUM":       max_cluster_count: 3      min_cluster_count: 1    "LARGE":       max_cluster_count: 8      min_cluster_count: 1    "X-LARGE":       max_cluster_count: 10      min_cluster_count: 1    "2X-LARGE":       max_cluster_count: 10      min_cluster_count: 2    "3X-LARGE":       max_cluster_count: 10      min_cluster_count: 3    "4X-LARGE":       max_cluster_count: 10      min_cluster_count: 4

The variables defined in the YAML file would be used as input when rendering object definitions with Jinja.

Following is file with resource monitor definition which uses the input variables

{% if env.MAX_CLUSTER_COUNT is defined and env.MAX_CLUSTER_COUNT|int != 0 and env.MAX_CLUSTER_COUNT|int >= 1 and env.MAX_CLUSTER_COUNT|int <= 10 %}  {% set max_cluster_count = env.MAX_CLUSTER_COUNT %}{% else %}  {% if env.WAREHOUSE_SIZE is defined and env.WAREHOUSE_SIZE in warehouse_sizes %}    {% set max_cluster_count = cluster_count[env.WAREHOUSE_SIZE]['max_cluster_count'] %}  {% else %}    {% set max_cluster_count = 1 %}  {% endif %}{% endif %}
{% if env.MIN_CLUSTER_COUNT is defined and env.MIN_CLUSTER_COUNT|int != 0 and env.MIN_CLUSTER_COUNT|int >= 1 and env.MIN_CLUSTER_COUNT|int <= 10 %}  {% set min_cluster_count = env.MIN_CLUSTER_COUNT and env.MIN_CLUSTER_COUNT|int != 0 %}{% else %}  {% if env.WAREHOUSE_SIZE is defined and env.WAREHOUSE_SIZE in warehouse_sizes %}    {% set min_cluster_count = cluster_count[env.WAREHOUSE_SIZE]['min_cluster_count'] %}  {% else %}    {% set min_cluster_count = 1 %}  {% endif %}{% endif %}
warehouses:  WAREHOUSE_1:    max_cluster_count: {{ max_cluster_count }}    min_cluster_count: {{ min_cluster_count }}

Depending of the value of MAX_CLUSTER_COUNT, MIN_CLUSTER_COUNT, WAREHOUSE_SIZE(which are defined as environment variable), the value of max_cluster_count and min_cluster_count are calculated.
The calculated values are then utilized as parameters max_cluster_count and min_cluster_count respectively in a warehouse.

The value of max_cluster_count(and similarly min_cluster_count) would be calculated as:

  • If MAX_CLUSTER_COUNT is set and it is a valid integer between 1 and 10(inclusive), then that value is set as max_cluster_count.
  • If MAX_CLUSTER_COUNT is invalid or missing, and valid value of WAREHOUSE_SIZE is defined, then value is set to integer set in map cluster_count.
  • If MAX_CLUSTER_COUNT and WAREHOUSE_SIZE are invalid or missing, then value is set to 1.

If both MAX_CLUSTER_COUNT and MIN_CLUSTER_COUNT are not set and WAREHOUSE_SIZE is set to 2X-LARGE, then value of max_cluster_count and min_cluster_count would be 10 and 2 respectively.

Configurations can be set as per requirements.

Last updated on