File Format
Configuration can be provided to Snowflake Object Lifecycle Engine for the following operation with File Format:
- Manage Lifecycle of new and existing File Format
- Manage Grants of File Format
Supported Parameters#
The engine supports the parameters listed below.
- FORMAT_TYPE: Specifies the format of the data files to load.
- REQUIRED
- Configuration Key:
format_type - Data Type: String
- Possible Values:
- CSV
- JSON
- AVRO
- ORC
- PARQUET
- XML
- COMPRESSION: Specifies the current compression algorithm for the data file.
- REQUIRED: Conditional*
- Configuration Key:
compression - Data Type: String
- Possible Values:
- AUTO
- GZIP
- BZ2
- BROTLI
- ZSTD
- DEFLATE
- RAW_DEFLATE
- NONE
- LZO
- SNAPPY
- RECORD_DELIMITER: Specifies one or more singlebyte or multibyte characters that separate records in the data file.
- REQUIRED: Conditional*
- Configuration Key:
record_delimiter - Data Type: String
- FIELD_DELIMITER: Specifies one or more singlebyte or multibyte characters that separate fields in the data file.
- REQUIRED: Conditional*
- Configuration Key:
field_delimiter - Data Type: String
- BINARY_FORMAT: Defines the encoding format for binary input or output.
- REQUIRED: Conditional*
- Configuration Key:
binary_format - Data Type: String
- Possible Values:
- HEX
- BASE64
- UTF8
- ESCAPE: Single character string used as the escape character for field values.
- REQUIRED: Conditional*
- Configuration Key:
escape - Data Type: String
- ESCAPE_UNENCLOSED_FIELD: Single character string used as the escape character for unenclosed field values only.
- REQUIRED: Conditional*
- Configuration Key:
escape_unenclosed_field - Data Type: String
- FIELD_OPTIONALLY_ENCLOSED_BY: Character used to enclose strings.
- REQUIRED: Conditional*
- Configuration Key:
field_optionally_enclosed_by - Data Type: String
- ENCODING: String (constant) that specifies the character set of the source data when loading data into a table.
- REQUIRED: Conditional*
- Configuration Key:
encoding - Data Type: String
- COMMENT: Specifies a comment for the file format.
- Configuration Key:
comment - Data Type: String
- Configuration Key:
- SKIP_HEADER: Number of lines at the start of the file to skip.
- Configuration Key:
skip_header - Data Type: Integer
- Configuration Key:
- FILE_EXTENSION: Specifies the extension for files unloaded to a stage.
- Configuration Key:
file_extension - Data Type: String
- Configuration Key:
- TIME_FORMAT: The format of time values in the data files.
- Configuration Key:
time_format - Data Type: String
- Configuration Key:
- DATE_FORMAT: Defines the format of date values in the data files.
- Configuration Key:
date_format - Data Type: String
- Configuration Key:
- TIMESTAMP_FORMAT: Defines the format of timestamp values in the data files.
- Configuration Key:
timestamp_format - Data Type: String
- Configuration Key:
- SKIP_BLANK_LINES: Boolean that specifies to skip any blank lines encountered in the data files.
- Configuration Key:
skip_blank_lines - Data Type: Boolean
- Configuration Key:
- TRIM_SPACE: Boolean that specifies whether to remove white space from fields.
- Configuration Key:
trim_space - Data Type: Boolean
- Configuration Key:
- NULL_IF: String used to convert to and from SQL NULL.
- Configuration Key:
null_if - Data Type: List/array of string
- Configuration Key:
- ERROR_ON_COLUMN_COUNT_MISMATCH: Boolean that specifies whether to generate a parsing error if the number of delimited columns (i.e. fields) in an input file does not match the number of columns in the corresponding table.
- Configuration Key:
error_on_column_count_mismatch - Data Type: Boolean
- Configuration Key:
- REPLACE_INVALID_CHARACTERS: Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
- Configuration Key:
replace_invalid_characters - Data Type: Boolean
- Configuration Key:
- VALIDATE_UTF8: Boolean that specifies whether to validate UTF-8 character encoding in string column data.
- Configuration Key:
validate_utf8 - Data Type: Boolean
- Configuration Key:
- EMPTY_FIELD_AS_NULL: Specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters.
- Configuration Key:
empty_field_as_null - Data Type: Boolean
- Configuration Key:
- SKIP_BYTE_ORDER_MARK: Boolean that specifies whether to skip the BOM (byte order mark), if present in a data file.
- Configuration Key:
skip_byte_order_mark - Data Type: Boolean
- Configuration Key:
- ENABLE_OCTAL: Boolean that enables parsing of octal numbers.
- Configuration Key:
enable_octal - Data Type: Boolean
- Configuration Key:
- ALLOW_DUPLICATE: Boolean that specifies to allow duplicate object field names
- Configuration Key:
allow_duplicate - Data Type: Boolean
- Configuration Key:
- STRIP_OUTER_ARRAY: Boolean that instructs the JSON parser to remove outer brackets.
- Configuration Key:
strip_outer_array - Data Type: Boolean
- Configuration Key:
- STRIP_NULL_VALUES: Boolean that instructs the JSON parser to remove object fields or array elements containing null values.
- Configuration Key:
strip_null_values - Data Type: Boolean
- Configuration Key:
- IGNORE_UTF8_ERRORS: Boolean that specifies whether UTF-8 encoding errors produce error conditions.
- Configuration Key:
ignore_utf8_errors - Data Type: Boolean
- Configuration Key:
- BINARY_AS_TEXT: Boolean that specifies whether to interpret columns with no defined logical data type as UTF-8 text.
- Configuration Key:
binary_as_text - Data Type: Boolean
- Configuration Key:
- PRESERVE_SPACE: Boolean that specifies whether the XML parser preserves leading and trailing spaces in element content.
- Configuration Key:
preserve_space - Data Type: Boolean
- Configuration Key:
- STRIP_OUTER_ELEMENT: Boolean that specifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents.
- Configuration Key:
strip_outer_element - Data Type: Boolean
- Configuration Key:
- DISABLE_SNOWFLAKE_DATA: Boolean that specifies whether the XML parser disables recognition of Snowflake semi-structured data tags.
- Configuration Key:
disable_snowflake_data - Data Type: Boolean
- Configuration Key:
- DISABLE_AUTO_CONVERT: Boolean that specifies whether the XML parser disables automatic conversion of numeric and Boolean values from text to native representation.
- Configuration Key:
disable_auto_convert - Data Type: Boolean
- Configuration Key:
- MANAGE_MODE: Configures what properties to manage for the file format.
- Configuration key:
manage_mode - Data Type: String
- Possible Values:
nonegrantsall(Default)
- Configuration key:
- GRANTS: List of Privileges and Roles to which privileges are granted to on the current File Format.
- Configuration key:
grants - Data Type: Map
- Configuration key:
Basic syntax#
databases: <database-name>: schemas: <schema-name>: file_formats: <format-name>: <configuration-key>: <value> grants: <privilege>: - <role-name> - <role-name>Parameters required for already managed file format#
Below table contains additional parameters that is required for already managed file format based on value of format_type.
format_type value | Required configuration key |
|---|---|
| CSV | compression |
binary_format | |
record_delimiter | |
field_delimiter | |
escape | |
escape_unenclosed_field | |
field_optionally_enclosed_by | |
encoding | |
| JSON | compression |
binary_format | |
| AVRO | compression |
| PARQUET | compression |
| XML | compression |
Default value#
Below table list parameters, and their default value
| Parameter configuration key | Default value |
|---|---|
compression | AUTO |
binary_format | HEX |
encoding | UTF8 |
escape | NONE |
escape_unenclosed_field | \ |
record_delimiter | \n |
field_delimiter | , |
field_optionally_enclosed_by | NONE |
time_format | AUTO |
timestamp_format | AUTO |
date_format | AUTO |
Supported file_format Grants to Roles#
Following is the list of Privileges Grant to Roles that can be specified in the file_format definition
- ALL PRIVILEGES
- USAGE
- OWNERSHIP
Examples#
- File Format example with
format_typeasCSV
databases: DATABASE_1: comment: "Database one" schemas: SCHEMA_1: comment: "Schema one" file_formats: FILE_FORMAT_1: # Required parameter format_type: CSV compression: AUTO binary_format: HEX record_delimiter: "\n" field_delimiter: "," escape: NONE escape_unenclosed_field: "\\" field_optionally_enclosed_by: NONE encoding: UTF8
# Optional parameter comment: "File format for type csv" skip_header: 0 trim_space: false validate_utf8: true file_extension: ".csv" skip_blank_lines: false date_format: "AUTO" time_format: "AUTO" timestamp_format: "AUTO" error_on_column_count_mismatch: true empty_field_as_null: true null_if: - "\\N"- File Format example with
format_typeasJSON
databases: DATABASE_1: comment: "Database one" schemas: SCHEMA_1: comment: "Schema one" file_formats: FILE_FORMAT_1: # Required parameters format_type: JSON compression: "AUTO" binary_format: "HEX"
# Optional parameters comment: "File format for type JSON" trim_space: false enable_octal: false strip_outer_array: false strip_null_values: false replace_invalid_characters: false ignore_utf8_errors: true skip_byte_order_mark: false allow_duplicate: true date_format: "AUTO" time_format: "AUTO" timestamp_format: "AUTO" file_extension: ".json" null_if: - "NULL" - "null" grants: USAGE: - ANALYST - MARKETING