Advanced Configuration
All DataOps Runners including SOLE support Jinja templating to render YAML files.
Vault Credentials usage is another use-case of Jinja in DataOps.
Jinja can be used to compute and generate values at.
Jinja documentation can be referred for advanced configuration and use-cases as per one's need.
Below are few examples of Jinja template engine to generate definitions at run-time.
Refer to Jinja documentation for list of available filters, tests, etc.
#
Condition Specific Configuration#
Branch Specific ConfigurationJinja Allows user to add configuration which is only enabled in specific environment.
In DataOps there are 4 types of environment:
- PROD:
This environment is created by pipelines run onmaster
branch. - QA:
This environment is created by pipelines run onqa
branch. - DEV:
This environment is created by pipelines run ondev
branch. - FB:
- This environment is created by pipelines run on any branch other than
master
,qa
ordev
. - The feature environment names are in format FB_BRANCH_NAME.
- The branch name is in upper-case.
- This environment is created by pipelines run on any branch other than
The following examples illustrate the usage of some branch-specific clauses, but the conditions can be utilized as per requirement.
#
PROD OnlyIn the following example the namespacing
parameter is set to prefix if the environment is PROD else both is used.
roles: INGESTION_ROLE: {% if env.DATAOPS_ENV_NAME == 'PROD' %} namespacing: prefix {% else %} namespacing: both {% endif %}
#
QA OnlyIn the following example role INGESTION_ROLE is granted USAGE
on database TEST_DATABASE only when environment is QA.
databases: TEST_DATABASE: grants: USAGE: - ACCOUNTADMIN - TRANSFORMATION_ROLE {% if env.DATAOPS_ENV_NAME == 'QA' %} - INGESTION_ROLE {% endif %}
#
Dev OnlyIn the following example manage_mode
is set to grants
on table TEST_TABLE only when environment is DEV.
databases: TEST_DATABASE: schemas: TEST_SCHEMA: tables: TEST_TABLE: {% if env.DATAOPS_ENV_NAME == "DEV" %} manage_mode: grants {% endif %} grants: SELECT: - INGESTION_ROLE columns: TEST_COLUMN: type: NUMBER(38,0)
#
FB OnlyIn the following example role comment
is added to schema TEST_SCHEMA only when environment is FB.
databases: TEST_DATABASE: schemas: TEST_SCHEMA: {% if env.DATAOPS_ENV_NAME.startswith('FB') %} comment: "TEST SCHEMA" {% endif %}
#
PROD and QA OnlyIn the following example, table TEST_TABLE_1 is only created when environment is PROD or QA.
databases: TEST_DATABASE: schemas: TEST_SCHEMA: tables: {% if env.DATAOPS_ENV_NAME == "PROD" or env.DATAOPS_ENV_NAME == "QA" %} TEST_TABLE_1: columns: COLUMN_1: type: NUMBER(38, 0) {% endif %} TEST_TABLE_2: columns: COLUMN_1: type: NUMBER(38, 0)
#
Every Env except QAIn the following example, column COLUMN_2 is only created when environment is not QA.
databases: TEST_DATABASE: schemas: TEST_SCHEMA: tables: TEST_TABLE_1: columns: COLUMN_1: type: NUMBER(38, 0) {% if env.DATAOPS_ENV_NAME != "QA" %} COLUMN_2: type: NUMBER(38, 0) {% endif %}
#
Every Env except FBIn the following example, column COLUMN_2 is only created when environment is not FB.
databases: TEST_DATABASE: schemas: TEST_SCHEMA: tables: TEST_TABLE_1: columns: COLUMN_1: type: NUMBER(38, 0) {% if not env.DATAOPS_ENV_NAME.startswith('FB') %} COLUMN_2: type: NUMBER(38, 0) {% endif %}
#
Environment Specific valueIn the following example, the value of namespacing depends on the environment:
- If environment is PROD, then
namespacing
is set to none. - If environment is QA, then
namespacing
is set to prefix. - If environment is DEV, then
namespacing
is set to suffix. - For all other environments,
namespacing
is set to both.roles: INGESTION_ROLE: {% if env.DATAOPS_ENV_NAME == 'PROD' %} namespacing: none {% elif env.DATAOPS_ENV_NAME == 'QA' %} namespacing: prefix {% elif env.DATAOPS_ENV_NAME == 'DEV' %} namespacing: suffix {% else %} namespacing: both {% endif %}
#
Using Configuration to Generate Valuesinfo
This functionality is only usable in SOLE
Variables can be defined in YAML files as configuration. These variables can be used at run-time to generate definitions for SOLE.
The directory path in which the such files are defined must be set in variable DATAOPS_CONFIGURATION_DIR
.
info
YAML files in this directory do not support Jinja. These files are used variable declaration for Jinja
#
Object Parameter CalculationDefined variables can be used to generate value for a parameter of an Object.
In the following example we have the values.yaml
defined to be used as variables
warehouse_sizes: - "X-SMALL" - "SMALL" - "MEDIUM" - "LARGE" - "X-LARGE" - "2X-LARGE" - "3X-LARGE" - "4X-LARGE"cluster_count: "X-SMALL": max_cluster_count: 1 min_cluster_count: 1 "SMALL": max_cluster_count: 2 min_cluster_count: 1 "MEDIUM": max_cluster_count: 3 min_cluster_count: 1 "LARGE": max_cluster_count: 8 min_cluster_count: 1 "X-LARGE": max_cluster_count: 10 min_cluster_count: 1 "2X-LARGE": max_cluster_count: 10 min_cluster_count: 2 "3X-LARGE": max_cluster_count: 10 min_cluster_count: 3 "4X-LARGE": max_cluster_count: 10 min_cluster_count: 4
The variables defined in the YAML file would be used as input when rendering object definitions with Jinja.
Following is file with resource monitor definition which uses the input variables
{% if env.MAX_CLUSTER_COUNT is defined and env.MAX_CLUSTER_COUNT|int != 0 and env.MAX_CLUSTER_COUNT|int >= 1 and env.MAX_CLUSTER_COUNT|int <= 10 %} {% set max_cluster_count = env.MAX_CLUSTER_COUNT %}{% else %} {% if env.WAREHOUSE_SIZE is defined and env.WAREHOUSE_SIZE in warehouse_sizes %} {% set max_cluster_count = cluster_count[env.WAREHOUSE_SIZE]['max_cluster_count'] %} {% else %} {% set max_cluster_count = 1 %} {% endif %}{% endif %}
{% if env.MIN_CLUSTER_COUNT is defined and env.MIN_CLUSTER_COUNT|int != 0 and env.MIN_CLUSTER_COUNT|int >= 1 and env.MIN_CLUSTER_COUNT|int <= 10 %} {% set min_cluster_count = env.MIN_CLUSTER_COUNT and env.MIN_CLUSTER_COUNT|int != 0 %}{% else %} {% if env.WAREHOUSE_SIZE is defined and env.WAREHOUSE_SIZE in warehouse_sizes %} {% set min_cluster_count = cluster_count[env.WAREHOUSE_SIZE]['min_cluster_count'] %} {% else %} {% set min_cluster_count = 1 %} {% endif %}{% endif %}
warehouses: WAREHOUSE_1: max_cluster_count: {{ max_cluster_count }} min_cluster_count: {{ min_cluster_count }}
Depending of the value of MAX_CLUSTER_COUNT
, MIN_CLUSTER_COUNT
, WAREHOUSE_SIZE
(which are defined as environment variable), the value of max_cluster_count
and min_cluster_count
are calculated.
The calculated values are then utilized as parameters max_cluster_count
and min_cluster_count
respectively in a warehouse.
The value of max_cluster_count
(and similarly min_cluster_count
) would be calculated as:
- If
MAX_CLUSTER_COUNT
is set and it is a valid integer between 1 and 10(inclusive), then that value is set asmax_cluster_count
. - If
MAX_CLUSTER_COUNT
is invalid or missing, and valid value ofWAREHOUSE_SIZE
is defined, then value is set to integer set in mapcluster_count
. - If
MAX_CLUSTER_COUNT
andWAREHOUSE_SIZE
are invalid or missing, then value is set to 1.
If both MAX_CLUSTER_COUNT
and MIN_CLUSTER_COUNT
are not set and WAREHOUSE_SIZE
is set to 2X-LARGE, then value of max_cluster_count
and min_cluster_count
would be 10 and 2 respectively.
Configurations can be set as per requirements.