Skip to main content

File Format

Configuration can be provided to Snowflake Object Lifecycle Engine for the following operation with File Format:

  • Manage Lifecycle of new and existing File Format
  • Manage Grants of File Format

Supported Parameters#

The engine supports the parameters listed below.

  • FORMAT_TYPE: Specifies the format of the data files to load.
    • REQUIRED
    • Configuration Key: format_type
    • Data Type: String
    • Possible Values:
      • CSV
      • JSON
      • AVRO
      • ORC
      • PARQUET
      • XML
  • COMPRESSION: Specifies the current compression algorithm for the data file.
    • REQUIRED: Conditional*
    • Configuration Key: compression
    • Data Type: String
    • Possible Values:
      • AUTO
      • GZIP
      • BZ2
      • BROTLI
      • ZSTD
      • DEFLATE
      • RAW_DEFLATE
      • NONE
      • LZO
      • SNAPPY
  • RECORD_DELIMITER: Specifies one or more singlebyte or multibyte characters that separate records in the data file.
    • REQUIRED: Conditional*
    • Configuration Key: record_delimiter
    • Data Type: String
  • FIELD_DELIMITER: Specifies one or more singlebyte or multibyte characters that separate fields in the data file.
    • REQUIRED: Conditional*
    • Configuration Key: field_delimiter
    • Data Type: String
  • BINARY_FORMAT: Defines the encoding format for binary input or output.
    • REQUIRED: Conditional*
    • Configuration Key: binary_format
    • Data Type: String
    • Possible Values:
      • HEX
      • BASE64
      • UTF8
  • ESCAPE: Single character string used as the escape character for field values.
    • REQUIRED: Conditional*
    • Configuration Key: escape
    • Data Type: String
  • ESCAPE_UNENCLOSED_FIELD: Single character string used as the escape character for unenclosed field values only.
    • REQUIRED: Conditional*
    • Configuration Key: escape_unenclosed_field
    • Data Type: String
  • FIELD_OPTIONALLY_ENCLOSED_BY: Character used to enclose strings.
    • REQUIRED: Conditional*
    • Configuration Key: field_optionally_enclosed_by
    • Data Type: String
  • ENCODING: String (constant) that specifies the character set of the source data when loading data into a table.
    • REQUIRED: Conditional*
    • Configuration Key: encoding
    • Data Type: String
  • COMMENT: Specifies a comment for the file format.
    • Configuration Key: comment
    • Data Type: String
  • SKIP_HEADER: Number of lines at the start of the file to skip.
    • Configuration Key: skip_header
    • Data Type: Integer
  • FILE_EXTENSION: Specifies the extension for files unloaded to a stage.
    • Configuration Key: file_extension
    • Data Type: String
  • TIME_FORMAT: The format of time values in the data files.
    • Configuration Key: time_format
    • Data Type: String
  • DATE_FORMAT: Defines the format of date values in the data files.
    • Configuration Key: date_format
    • Data Type: String
  • TIMESTAMP_FORMAT: Defines the format of timestamp values in the data files.
    • Configuration Key: timestamp_format
    • Data Type: String
  • SKIP_BLANK_LINES: Boolean that specifies to skip any blank lines encountered in the data files.
    • Configuration Key: skip_blank_lines
    • Data Type: Boolean
  • TRIM_SPACE: Boolean that specifies whether to remove white space from fields.
    • Configuration Key: trim_space
    • Data Type: Boolean
  • NULL_IF: String used to convert to and from SQL NULL.
    • Configuration Key: null_if
    • Data Type: List/array of string
  • ERROR_ON_COLUMN_COUNT_MISMATCH: Boolean that specifies whether to generate a parsing error if the number of delimited columns (i.e. fields) in an input file does not match the number of columns in the corresponding table.
    • Configuration Key: error_on_column_count_mismatch
    • Data Type: Boolean
  • REPLACE_INVALID_CHARACTERS: Boolean that specifies whether to replace invalid UTF-8 characters with the Unicode replacement character.
    • Configuration Key: replace_invalid_characters
    • Data Type: Boolean
  • VALIDATE_UTF8: Boolean that specifies whether to validate UTF-8 character encoding in string column data.
    • Configuration Key: validate_utf8
    • Data Type: Boolean
  • EMPTY_FIELD_AS_NULL: Specifies whether to insert SQL NULL for empty fields in an input file, which are represented by two successive delimiters.
    • Configuration Key: empty_field_as_null
    • Data Type: Boolean
  • SKIP_BYTE_ORDER_MARK: Boolean that specifies whether to skip the BOM (byte order mark), if present in a data file.
    • Configuration Key: skip_byte_order_mark
    • Data Type: Boolean
  • ENABLE_OCTAL: Boolean that enables parsing of octal numbers.
    • Configuration Key: enable_octal
    • Data Type: Boolean
  • ALLOW_DUPLICATE: Boolean that specifies to allow duplicate object field names
    • Configuration Key: allow_duplicate
    • Data Type: Boolean
  • STRIP_OUTER_ARRAY: Boolean that instructs the JSON parser to remove outer brackets.
    • Configuration Key: strip_outer_array
    • Data Type: Boolean
  • STRIP_NULL_VALUES: Boolean that instructs the JSON parser to remove object fields or array elements containing null values.
    • Configuration Key: strip_null_values
    • Data Type: Boolean
  • IGNORE_UTF8_ERRORS: Boolean that specifies whether UTF-8 encoding errors produce error conditions.
    • Configuration Key: ignore_utf8_errors
    • Data Type: Boolean
  • BINARY_AS_TEXT: Boolean that specifies whether to interpret columns with no defined logical data type as UTF-8 text.
    • Configuration Key: binary_as_text
    • Data Type: Boolean
  • PRESERVE_SPACE: Boolean that specifies whether the XML parser preserves leading and trailing spaces in element content.
    • Configuration Key: preserve_space
    • Data Type: Boolean
  • STRIP_OUTER_ELEMENT: Boolean that specifies whether the XML parser strips out the outer XML element, exposing 2nd level elements as separate documents.
    • Configuration Key: strip_outer_element
    • Data Type: Boolean
  • DISABLE_SNOWFLAKE_DATA: Boolean that specifies whether the XML parser disables recognition of Snowflake semi-structured data tags.
    • Configuration Key: disable_snowflake_data
    • Data Type: Boolean
  • DISABLE_AUTO_CONVERT: Boolean that specifies whether the XML parser disables automatic conversion of numeric and Boolean values from text to native representation.
    • Configuration Key: disable_auto_convert
    • Data Type: Boolean
  • MANAGE_MODE: Configures what properties to manage for the file format.
    • Configuration key: manage_mode
    • Data Type: String
    • Possible Values:
      • none
      • grants
      • all(Default)
  • GRANTS: List of Privileges and Roles to which privileges are granted to on the current File Format.
    • Configuration key: grants
    • Data Type: Map

Basic syntax#

databases:  <database-name>:    schemas:      <schema-name>:        file_formats:          <format-name>:            <configuration-key>: <value>            grants:              <privilege>:                - <role-name>                - <role-name>

Parameters required for already managed file format#

Below table contains additional parameters that is required for already managed file format based on value of format_type.

format_type valueRequired configuration key
CSVcompression
binary_format
record_delimiter
field_delimiter
escape
escape_unenclosed_field
field_optionally_enclosed_by
encoding
JSONcompression
binary_format
AVROcompression
PARQUETcompression
XMLcompression

Default value#

Below table list parameters, and their default value

Parameter configuration keyDefault value
compressionAUTO
binary_formatHEX
encodingUTF8
escapeNONE
escape_unenclosed_field\
record_delimiter\n
field_delimiter,
field_optionally_enclosed_byNONE
time_formatAUTO
timestamp_formatAUTO
date_formatAUTO

Supported file_format Grants to Roles#

Following is the list of Privileges Grant to Roles that can be specified in the file_format definition

  • ALL PRIVILEGES
  • USAGE
  • OWNERSHIP

Examples#

  • File Format example with format_type as CSV
databases:  DATABASE_1:    comment: "Database one"    schemas:      SCHEMA_1:        comment: "Schema one"        file_formats:          FILE_FORMAT_1:            # Required parameter            format_type: CSV            compression: AUTO            binary_format: HEX            record_delimiter: "\n"            field_delimiter: ","            escape: NONE            escape_unenclosed_field: "\\"            field_optionally_enclosed_by: NONE            encoding: UTF8
            # Optional parameter            comment: "File format for type csv"            skip_header: 0            trim_space: false            validate_utf8: true            file_extension: ".csv"            skip_blank_lines: false            date_format: "AUTO"            time_format: "AUTO"            timestamp_format: "AUTO"            error_on_column_count_mismatch: true            empty_field_as_null: true            null_if:              - "\\N"
  • File Format example with format_type as JSON
databases:  DATABASE_1:    comment: "Database one"    schemas:      SCHEMA_1:        comment: "Schema one"        file_formats:          FILE_FORMAT_1:            # Required parameters            format_type: JSON            compression: "AUTO"            binary_format: "HEX"
            # Optional parameters            comment: "File format for type JSON"            trim_space: false            enable_octal: false            strip_outer_array: false            strip_null_values: false            replace_invalid_characters: false            ignore_utf8_errors: true            skip_byte_order_mark: false            allow_duplicate: true            date_format: "AUTO"            time_format: "AUTO"            timestamp_format: "AUTO"            file_extension: ".json"            null_if:              - "NULL"              - "null"            grants:              USAGE:                - ANALYST                - MARKETING
Last updated on