File Format
Configuration can be provided to Snowflake Object Lifecycle Engine for the following operation with File Format:
- Manage Lifecycle of new and existing File Format
- Manage Grants of File Format
#
Supported ParametersThe engine supports the parameters listed below.
- FORMAT_TYPE: Specifies the format of the data files to load.
- REQUIRED
- Configuration Key:
format_type
- Data Type: String
- Possible Values:
- CSV
- JSON
- AVRO
- ORC
- PARQUET
- XML
- COMPRESSION: Specifies the current compression algorithm for the data file.
- REQUIRED: Conditional*
- Configuration Key:
compression
- Data Type: String
- Possible Values:
- AUTO
- GZIP
- BZ2
- BROTLI
- ZSTD
- DEFLATE
- RAW_DEFLATE
- NONE
- LZO
- SNAPPY
- RECORD_DELIMITER: Specifies one or more singlebyte or multibyte characters that separate records in the data file.
- REQUIRED: Conditional*
- Configuration Key:
record_delimiter
- Data Type: String
- FIELD_DELIMITER: Specifies one or more singlebyte or multibyte characters that separate fields in the data file.
- REQUIRED: Conditional*
- Configuration Key:
field_delimiter
- Data Type: String
- BINARY_FORMAT: Defines the encoding format for binary input or output.
- REQUIRED: Conditional*
- Configuration Key:
binary_format
- Data Type: String
- Possible Values:
- HEX
- BASE64
- UTF8
- ESCAPE: Single character string used as the escape character for field values.
- REQUIRED: Conditional*
- Configuration Key:
escape
- Data Type: String
- ESCAPE_UNENCLOSED_FIELD: Single character string used as the escape character for unenclosed field values only.
- REQUIRED: Conditional*
- Configuration Key:
escape_unenclosed_field
- Data Type: String
- FIELD_OPTIONALLY_ENCLOSED_BY: Character used to enclose strings.
- REQUIRED: Conditional*
- Configuration Key:
field_optionally_enclosed_by
- Data Type: String
- ENCODING: String (constant) that specifies the character set of the source data when loading data into a table.
- REQUIRED: Conditional*
- Configuration Key:
encoding
- Data Type: String
- COMMENT: Specifies a comment for the file format.
- Configuration Key:
comment
- Data Type: String
- Configuration Key:
- SKIP_HEADER: Number of lines at the start of the file to skip.
- Configuration Key:
skip_header
- Data Type: Integer
- Configuration Key:
- FILE_EXTENSION: Specifies the extension for files unloaded to a stage.
- Configuration Key:
file_extension
- Data Type: String
- Configuration Key:
- TIME_FORMAT: The format of time values in the data files.
- Configuration Key:
time_format
- Data Type: String
- Configuration Key:
- DATE_FORMAT: Defines the format of date values in the data files.
- Configuration Key:
date_format
- Data Type: String
- Configuration Key:
- TIMESTAMP_FORMAT: Defines the format of timestamp values in the data files.
- Configuration Key:
timestamp_format
- Data Type: String
- Configuration Key:
- SKIP_BLANK_LINES: Boolean that specifies to skip any blank lines encountered in the data files.
- Configuration Key:
skip_blank_lines
- Data Type: Boolean
- Configuration Key:
- TRIM_SPACE: Boolean that specifies whether to remove white space from fields.
- Configuration Key:
trim_space
- Data Type: Boolean
- Configuration Key:
- NULL_IF: String used to convert to and from SQL NULL.
- Configuration Key:
null_if
- Data Type: List/array of string
- Configuration Key:
- ERROR_ON_COLUMN_COUNT_MISMATCH: Boolean that specifies whether to generate a parsing error if the number of delimited columns (i.e. fields) in an input file does not match the number of columns in the corresponding table.
- Configuration Key:
error_on_column_count_mismatch
- Data Type: Boolean
- Configuration Key:
- REPLACE_INVALID_CHARACTERS: Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
- Configuration Key:
replace_invalid_characters
- Data Type: Boolean
- Configuration Key:
- VALIDATE_UTF8: Boolean that specifies whether to validate UTF-8 character encoding in string column data.
- Configuration Key:
validate_utf8
- Data Type: Boolean
- Configuration Key:
- EMPTY_FIELD_AS_NULL: Specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters.
- Configuration Key:
empty_field_as_null
- Data Type: Boolean
- Configuration Key:
- SKIP_BYTE_ORDER_MARK: Boolean that specifies whether to skip the BOM (byte order mark), if present in a data file.
- Configuration Key:
skip_byte_order_mark
- Data Type: Boolean
- Configuration Key:
- ENABLE_OCTAL: Boolean that enables parsing of octal numbers.
- Configuration Key:
enable_octal
- Data Type: Boolean
- Configuration Key:
- ALLOW_DUPLICATE: Boolean that specifies to allow duplicate object field names
- Configuration Key:
allow_duplicate
- Data Type: Boolean
- Configuration Key:
- STRIP_OUTER_ARRAY: Boolean that instructs the JSON parser to remove outer brackets.
- Configuration Key:
strip_outer_array
- Data Type: Boolean
- Configuration Key:
- STRIP_NULL_VALUES: Boolean that instructs the JSON parser to remove object fields or array elements containing null values.
- Configuration Key:
strip_null_values
- Data Type: Boolean
- Configuration Key:
- IGNORE_UTF8_ERRORS: Boolean that specifies whether UTF-8 encoding errors produce error conditions.
- Configuration Key:
ignore_utf8_errors
- Data Type: Boolean
- Configuration Key:
- BINARY_AS_TEXT: Boolean that specifies whether to interpret columns with no defined logical data type as UTF-8 text.
- Configuration Key:
binary_as_text
- Data Type: Boolean
- Configuration Key:
- PRESERVE_SPACE: Boolean that specifies whether the XML parser preserves leading and trailing spaces in element content.
- Configuration Key:
preserve_space
- Data Type: Boolean
- Configuration Key:
- STRIP_OUTER_ELEMENT: Boolean that specifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents.
- Configuration Key:
strip_outer_element
- Data Type: Boolean
- Configuration Key:
- DISABLE_SNOWFLAKE_DATA: Boolean that specifies whether the XML parser disables recognition of Snowflake semi-structured data tags.
- Configuration Key:
disable_snowflake_data
- Data Type: Boolean
- Configuration Key:
- DISABLE_AUTO_CONVERT: Boolean that specifies whether the XML parser disables automatic conversion of numeric and Boolean values from text to native representation.
- Configuration Key:
disable_auto_convert
- Data Type: Boolean
- Configuration Key:
- MANAGE_MODE: Configures what properties to manage for the file format.
- Configuration key:
manage_mode
- Data Type: String
- Possible Values:
none
grants
all
(Default)
- Configuration key:
- GRANTS: List of Privileges and Roles to which privileges are granted to on the current File Format.
- Configuration key:
grants
- Data Type: Map
- Configuration key:
#
Basic syntaxdatabases: <database-name>: schemas: <schema-name>: file_formats: <format-name>: <configuration-key>: <value> grants: <privilege>: - <role-name> - <role-name>
#
Parameters required for already managed file formatBelow table contains additional parameters that is required for already managed file format based on value of format_type
.
format_type value | Required configuration key |
---|---|
CSV | compression |
binary_format | |
record_delimiter | |
field_delimiter | |
escape | |
escape_unenclosed_field | |
field_optionally_enclosed_by | |
encoding | |
JSON | compression |
binary_format | |
AVRO | compression |
PARQUET | compression |
XML | compression |
#
Default valueBelow table list parameters, and their default value
Parameter configuration key | Default value |
---|---|
compression | AUTO |
binary_format | HEX |
encoding | UTF8 |
escape | NONE |
escape_unenclosed_field | \ |
record_delimiter | \n |
field_delimiter | , |
field_optionally_enclosed_by | NONE |
time_format | AUTO |
timestamp_format | AUTO |
date_format | AUTO |
#
Supported file_format Grants to RolesFollowing is the list of Privileges Grant to Roles that can be specified in the file_format definition
- ALL PRIVILEGES
- USAGE
- OWNERSHIP
#
Examples- File Format example with
format_type
asCSV
databases: DATABASE_1: comment: "Database one" schemas: SCHEMA_1: comment: "Schema one" file_formats: FILE_FORMAT_1: # Required parameter format_type: CSV compression: AUTO binary_format: HEX record_delimiter: "\n" field_delimiter: "," escape: NONE escape_unenclosed_field: "\\" field_optionally_enclosed_by: NONE encoding: UTF8
# Optional parameter comment: "File format for type csv" skip_header: 0 trim_space: false validate_utf8: true file_extension: ".csv" skip_blank_lines: false date_format: "AUTO" time_format: "AUTO" timestamp_format: "AUTO" error_on_column_count_mismatch: true empty_field_as_null: true null_if: - "\\N"
- File Format example with
format_type
asJSON
databases: DATABASE_1: comment: "Database one" schemas: SCHEMA_1: comment: "Schema one" file_formats: FILE_FORMAT_1: # Required parameters format_type: JSON compression: "AUTO" binary_format: "HEX"
# Optional parameters comment: "File format for type JSON" trim_space: false enable_octal: false strip_outer_array: false strip_null_values: false replace_invalid_characters: false ignore_utf8_errors: true skip_byte_order_mark: false allow_duplicate: true date_format: "AUTO" time_format: "AUTO" timestamp_format: "AUTO" file_extension: ".json" null_if: - "NULL" - "null" grants: USAGE: - ANALYST - MARKETING