What is Snowflake?
Snowflake is a Data Solution provided as Software-as-a-Service (SaaS). It’s not available on-premise. It combines a new SQL query engine with an innovative architecture natively designed for the cloud. |
Use cases
Snowflake is optimal as Data Warehouse, Data Lake, Data Exchange, Data Apps, Data Science, and Data Engineering. |
Capacity Options
On-Demand |
Fixed rate for the consumed services |
Pre-paid |
Cheaper, but commitment to Snowflake |
Micro-partitions
All data in Snowflake tables are automatically divided into micro-partitions, contiguous units of storage between 50 and 500MB of uncompressed data, organized in a columnar way.
They are immutable, meaning they cannot be changed once created. |
Pruning process
Technique to analyze the smallest number of micro-partitions to solve a query. It retrieves all the necessary data to give a solution without looking at all the micro-partitions, saving a lot of time to return the result. You can find a real example here |
Load Data
Bulk Load |
Loading batches of data from files already available at any stage into Snowflake tables |
Continuous Load |
Load small volumes of data (micro-batches) and incrementally make them available for analysis. |
Other Concepts
Parnet Connect |
Technology & Solution partners |
Compliance |
HITRUST / HIPAA, ISO/IEC 27001, FedRAMP Moderate, PCI-DSS, etc |
Data Marketplace |
For providers to buy or sell their datasets. Free, Personalized, and Paid Listings |
Column Level Security |
Dynamic Data Masking & External Tokenization |
Sequences
Use case |
Generate unique numbers across sessions and statements |
nextval |
Function to generate a set of distinct values |
File Formats
Structured Data |
CSV. Fastest way to load data. |
Semi-structured Data |
JSON, Parquet, XML, Avro, ORC |
FLATTEN |
Convert semi-structured data to a relational representation |
Access Management Approaches
Discretionary Access Control (DAC) |
Each object has an owner who can, in turn, grant access to that object |
Role-Based Access Control (RBAC) |
Access privileges are assigned to roles, which are, in turn, given to users |
|
|
Editions
Standard |
Enterprise |
Business Critical |
Virtual Private Snowflake |
Cloud Providers
Amazon Web Services |
Azure |
Google Cloud Platform |
Connect to Snowflake
Web Interface |
SnowSQL (CLI Client) |
ODBC |
JDBC |
SDK for Node, Python, Kafka, Go, and more! |
Snowflake Objects
Account |
Must be unique. |
Warehouse |
Virutal Machine to execute queries. Compute Part. |
Database |
Logical Collection of Schemas. |
Schema |
Logical Collection of Objects. The Public schema and the Information_Schema are created when creating a Database. |
Types of tables
Permanent |
Transient |
Temporary |
External |
Types of views
Regular |
Materialized |
Secure |
Stage Metadata
METADATA$FILENAME |
Name of the staged data file the current row belongs to. |
METADATA$FILE_ROW_NUMBER |
Row number for each record in the container staged data file. |
Fail-Safe
Use Cases |
It ensures historical data is protected in the event of a system failure or other catastrophic event |
Retention Period |
NON-CONFIGURABLE 7-day period |
Example |
No, you cannot recover this data alone; you MUST ask Snowflake support |
Note: Fail-Safe requires additional storage, which will be reflected in your monthly storage charges
|