Snowflake Cloud Data Platform Architecture & Storage Layers

What is Snowflake?

A SaaS platform that provides data warehousing, data lakes, data engineering, data application development, and real-time/shared data consumption is built on three primary cloud services (i.e., Amazon Web Services, Microsoft Azure, and Google Cloud Platform) infrastructure. Therefore, there is no hardware or software to select, install, configure, or manage.

Why Should Organizations Use Snowflake?

Snowflake is the only data platform built IN and FOR the cloud that can be used as a data lake and a data warehouse. Since it is built IN and FOR the cloud, it helps in scaling up and down based on needs while at the same time meeting the performance requirements. Therefore, the organizations no longer need a separate data lake and data warehouse.

Over the decades, Data-driven organizations have been using several tools and techniques to collect, process, store and protect proprietary data. Also, while storing and processing the data, the data engineering team can store it either in a data warehouse(processed data) or in a data lake(raw data), which makes it difficult for them to retrieve since the warehouse and data lake are at different sources. This is where Snowflake stems from and eases the workloads of the data engineering team.

Also Read,
Crucial Factors to consider before migrating to Cloud

What are the three layers of Snowflake Architecture?

Snowflake has a unique architecture that consists of three salient layers, which includes:

i) Storage Layer

ii) Compute or Processing Layer

iii) Cloud Services layer

These layers are physically separated but logically integrated, which means all the layers can be independently scalable.

Snowflake Architecture Diagram

Source: docs.snowflake.com

One of the critical features of Snowflake’s popularity is because of its faster query process and retrieval of data from warehouses or data lakes.

Now let us understand the architecture of Snowflake’s storage layer.

When data is ingested into Snowflake, it reorganizes it into multiple micro partitions that are internally optimized, compressed, and stored in columnar format. Data is stored in the cloud and works as a shared-disk model(data accessible by all the clusters), thereby simplifying data management.

The data objects stored by Snowflake are not directly visible nor accessible by the user; they are only accessible through performing SQL query operations.

Few of the significant concepts that make the Snowflake’s table structure for its faster retrieval:

i) Micro partitions

ii) Data Clustering

iii) Columnar Format

Let’s understand each of the above concepts in detail.

In contrast to traditional static partitioning(a column name needs to be manually given to partition the data), all data in Snowflake tables are automatically divided into micro-partitions, which are contiguous storage units.

Micro-partition is a physical structure in Snowflake. Each micro-partition contains between fifty MB and five hundred MB of uncompressed data. Snowflake automatically determines the most efficient compression algorithm for the columns in each micro-partition. Then, the rows in tables are mapped into individual micro-partitions organized in a columnar fashion. While inserting or loading the data, tables are transparently partitioned using the ordering of the data.

All DML operations (e.g., DELETE, UPDATE) utilize the underlying micro-partition metadata to facilitate and simplify table maintenance. For example, few operations like deleting entire records from a table are metadata-only.

Read More,
What is SASE? Secure Access Service Edge | Cloud Computing Security

Data clustering is a critical factor in queries because table data that is not sorted or partially sorted may impact query performance, especially on huge tables.

In Snowflake, when data is inserted or loaded into a table, clustering metadata is collected and recorded for each micro-partition created during the process. Snowflake then uses this clustering information to avoid inessential scanning of micro-partitions during querying, which accelerates the performance of queries that reference these columns.

Data stored in columnar format has significant advantages over row-based format.

  • Data security, since data is not human-readable.
  • Low storage consumption.
  • Efficient in reading data in less time is columnar storage and minimizes latency.
  • Supports advanced nested data structures. Optimized for queries that process large volumes of data.

Physical Structure of Storage Layer

Based on the above concepts, let’s deep dive into the physical structure of the storage layer.

Source: docs.snowflake.com

The above table in the figure consists of 24 rows stored across 4 micro-partitions, with all the records divided equally between each micro-partition. Within each micro-partition, the data is sorted and stored in columnar format, which enables Snowflake to perform the following actions for queries on the table:

  1. Remove micro-partitions that are no longer needed for the query operations.
  2. Remove by column specified in the query operation within the remaining micro-partitions.

Final Thoughts

Any idea why this revolutionized data platform is named Snowflake?

The brand name was chosen as to tribute to the founders’ (Benoit Dageville, Thierry Cranes, and Marcin Żukowski’s) shared love for snow sports(skiing).

Snowflake Cloud Services from Payoda

For any business that uses big data, cloud data platforms have evolved into cost-effective, high-performance calling cards. Snowflake’s Data Cloud combines a number of cloud services for data management and access that is both secure and simple. We use cloud services, cloud software, and infrastructure solutions to help businesses modernise their infrastructure and applications. We help you align business challenges with solutions that are tailored to your needs, objectives, and resources.

Talk to our Experts

Payoda helps you achieve higher levels of agility and resiliency at scale while minimising the risk of moving business-critical processes using our proven experience.

References:

  1. Understanding Snowflake Table Structures
  2. Key Concepts & Architecture — Snowflake Documentation

Authored by: Akash Balakrishnan

--

--

--

Your Digital Transformation partner. We are here to share knowledge on varied technologies, updates; and to stay in touch with the tech-space.

Love podcasts or audiobooks? Learn on the go with our new app.

Quickly launch an Azure AKS Cluster

Text Preprocessing using Spacy

Sentences, words, paragraphs

Build a simple Blockchain

How to deploy static websites to AWS S3 and CloudFront

In-house or SaaS: a Story of Support Tools Search

DevOps Demystified

Building a Modular Music Switching System

Day 28 of July LeetCode Challenge !

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Payoda Technology Inc

Payoda Technology Inc

Your Digital Transformation partner. We are here to share knowledge on varied technologies, updates; and to stay in touch with the tech-space.

More from Medium

Joining Streaming and Historical Data for Real-Time Analytics: Your Options With Snowflake…

Joining Streaming and Historical Data for Real-Time Analytics: Your Options With Snowflake, Snowpipe and Rockset

Bigquery vs Snowflake: What’s the Difference

Bigquery vs Snowflake: What’s the Difference

Loading Data in Snowflake

Snowflake — Stages, Time Travel and Cloning