Advanced Configuration

All DataOps Runners including SOLE support Jinja templating to render YAML files.
Vault Credentials usage is another use-case of Jinja in DataOps.

Jinja can be used to compute and generate values at.
Jinja documentation can be referred for advanced configuration and use-cases as per one's need.

Below are few examples of Jinja template engine to generate definitions at run-time.
Refer to Jinja documentation for list of available filters, tests, etc.

Condition Specific Configuration#

Branch Specific Configuration#

Jinja Allows user to add configuration which is only enabled in specific environment.

In DataOps there are 4 types of environment:

PROD:
This environment is created by pipelines run on master branch.
QA:
This environment is created by pipelines run on qa branch.
DEV:
This environment is created by pipelines run on dev branch.
FB:
- This environment is created by pipelines run on any branch other than master, qa or dev.
- The feature environment names are in format FB_BRANCH_NAME.
- The branch name is in upper-case.

The following examples illustrate the usage of some branch-specific clauses, but the conditions can be utilized as per requirement.

PROD Only#

In the following example the namespacing parameter is set to prefix if the environment is PROD else both is used.

roles:  INGESTION_ROLE:    {% if env.DATAOPS_ENV_NAME == 'PROD' %}    namespacing: prefix    {% else %}    namespacing: both    {% endif %}

QA Only#

In the following example role INGESTION_ROLE is granted USAGE on database TEST_DATABASE only when environment is QA.

databases:  TEST_DATABASE:    grants:      USAGE:        - ACCOUNTADMIN        - TRANSFORMATION_ROLE        {% if env.DATAOPS_ENV_NAME == 'QA' %}        - INGESTION_ROLE        {% endif %}

Dev Only#

In the following example manage_mode is set to grants on table TEST_TABLE only when environment is DEV.

databases:  TEST_DATABASE:    schemas:      TEST_SCHEMA:        tables:          TEST_TABLE:            {% if env.DATAOPS_ENV_NAME == "DEV" %}            manage_mode: grants            {% endif %}            grants:              SELECT:                - INGESTION_ROLE            columns:              TEST_COLUMN:                type: NUMBER(38,0)

FB Only#

In the following example role comment is added to schema TEST_SCHEMA only when environment is FB.

databases:  TEST_DATABASE:    schemas:      TEST_SCHEMA:        {% if env.DATAOPS_ENV_NAME.startswith('FB') %}        comment: "TEST SCHEMA"        {% endif %}

PROD and QA Only#

In the following example, table TEST_TABLE_1 is only created when environment is PROD or QA.

databases:  TEST_DATABASE:    schemas:      TEST_SCHEMA:        tables:          {% if env.DATAOPS_ENV_NAME == "PROD" or env.DATAOPS_ENV_NAME == "QA" %}          TEST_TABLE_1:            columns:              COLUMN_1:                type: NUMBER(38, 0)          {% endif %}          TEST_TABLE_2:            columns:              COLUMN_1:                type: NUMBER(38, 0)

Every Env except QA#

In the following example, column COLUMN_2 is only created when environment is not QA.

databases:  TEST_DATABASE:    schemas:      TEST_SCHEMA:        tables:          TEST_TABLE_1:            columns:              COLUMN_1:                type: NUMBER(38, 0)              {% if env.DATAOPS_ENV_NAME != "QA" %}              COLUMN_2:                type: NUMBER(38, 0)              {% endif %}

Every Env except FB#

In the following example, column COLUMN_2 is only created when environment is not FB.

databases:  TEST_DATABASE:    schemas:      TEST_SCHEMA:        tables:          TEST_TABLE_1:            columns:              COLUMN_1:                type: NUMBER(38, 0)              {% if not env.DATAOPS_ENV_NAME.startswith('FB') %}              COLUMN_2:                type: NUMBER(38, 0)              {% endif %}

Environment Specific value#

In the following example, the value of namespacing depends on the environment:

If environment is PROD, then namespacing is set to none.
If environment is QA, then namespacing is set to prefix.
If environment is DEV, then namespacing is set to suffix.

For all other environments, namespacing is set to both.

roles:  INGESTION_ROLE:    {% if env.DATAOPS_ENV_NAME == 'PROD' %}    namespacing: none    {% elif env.DATAOPS_ENV_NAME == 'QA' %}    namespacing: prefix    {% elif env.DATAOPS_ENV_NAME == 'DEV' %}    namespacing: suffix    {% else %}    namespacing: both    {% endif %}

Using Configuration to Generate Values#

info

This functionality is only usable in SOLE

Variables can be defined in YAML files as configuration. These variables can be used at run-time to generate definitions for SOLE.

The directory path in which the such files are defined must be set in variable DATAOPS_CONFIGURATION_DIR.

info

YAML files in this directory do not support Jinja. These files are used variable declaration for Jinja

Object Parameter Calculation#

Defined variables can be used to generate value for a parameter of an Object.

In the following example we have the values.yaml defined to be used as variables

warehouse_sizes:    - "X-SMALL"    - "SMALL"    - "MEDIUM"    - "LARGE"    - "X-LARGE"    - "2X-LARGE"    - "3X-LARGE"    - "4X-LARGE"cluster_count:    "X-SMALL":       max_cluster_count: 1      min_cluster_count: 1    "SMALL":       max_cluster_count: 2      min_cluster_count: 1    "MEDIUM":       max_cluster_count: 3      min_cluster_count: 1    "LARGE":       max_cluster_count: 8      min_cluster_count: 1    "X-LARGE":       max_cluster_count: 10      min_cluster_count: 1    "2X-LARGE":       max_cluster_count: 10      min_cluster_count: 2    "3X-LARGE":       max_cluster_count: 10      min_cluster_count: 3    "4X-LARGE":       max_cluster_count: 10      min_cluster_count: 4

The variables defined in the YAML file would be used as input when rendering object definitions with Jinja.

Following is file with resource monitor definition which uses the input variables

{% if env.MAX_CLUSTER_COUNT is defined and env.MAX_CLUSTER_COUNT|int != 0 and env.MAX_CLUSTER_COUNT|int >= 1 and env.MAX_CLUSTER_COUNT|int <= 10 %}  {% set max_cluster_count = env.MAX_CLUSTER_COUNT %}{% else %}  {% if env.WAREHOUSE_SIZE is defined and env.WAREHOUSE_SIZE in warehouse_sizes %}    {% set max_cluster_count = cluster_count[env.WAREHOUSE_SIZE]['max_cluster_count'] %}  {% else %}    {% set max_cluster_count = 1 %}  {% endif %}{% endif %}
{% if env.MIN_CLUSTER_COUNT is defined and env.MIN_CLUSTER_COUNT|int != 0 and env.MIN_CLUSTER_COUNT|int >= 1 and env.MIN_CLUSTER_COUNT|int <= 10 %}  {% set min_cluster_count = env.MIN_CLUSTER_COUNT and env.MIN_CLUSTER_COUNT|int != 0 %}{% else %}  {% if env.WAREHOUSE_SIZE is defined and env.WAREHOUSE_SIZE in warehouse_sizes %}    {% set min_cluster_count = cluster_count[env.WAREHOUSE_SIZE]['min_cluster_count'] %}  {% else %}    {% set min_cluster_count = 1 %}  {% endif %}{% endif %}
warehouses:  WAREHOUSE_1:    max_cluster_count: {{ max_cluster_count }}    min_cluster_count: {{ min_cluster_count }}

Depending of the value of MAX_CLUSTER_COUNT, MIN_CLUSTER_COUNT, WAREHOUSE_SIZE(which are defined as environment variable), the value of max_cluster_count and min_cluster_count are calculated.
The calculated values are then utilized as parameters max_cluster_count and min_cluster_count respectively in a warehouse.

The value of max_cluster_count(and similarly min_cluster_count) would be calculated as:

If MAX_CLUSTER_COUNT is set and it is a valid integer between 1 and 10(inclusive), then that value is set as max_cluster_count.
If MAX_CLUSTER_COUNT is invalid or missing, and valid value of WAREHOUSE_SIZE is defined, then value is set to integer set in map cluster_count.
If MAX_CLUSTER_COUNT and WAREHOUSE_SIZE are invalid or missing, then value is set to 1.

If both MAX_CLUSTER_COUNT and MIN_CLUSTER_COUNT are not set and WAREHOUSE_SIZE is set to 2X-LARGE, then value of max_cluster_count and min_cluster_count would be 10 and 2 respectively.

Configurations can be set as per requirements.