Amazon Redshift is a column-oriented, fully managed, petabyte-scale data warehouse that makes it easy and cost-effective to analyze all your data. Amazon Redshift achieves efficient query performance through a combination of massively parallel processing, columnar data storage, efficient data compression, and ML powered system optimizations. It enables customers to run and scale analytics on all their data in seconds without having to manage data warehouse infrastructure. Amazon Redshift provides the capability to query petabytes of both structured and semi-structured data stored natively in Amazon Redshift, S3 data lakes, Amazon RDS, and Aurora PostgreSQL managed OLTP databases, all using a standard SQL interface. Native integration with AWS services such as AWS Lake Formation, AWS Glue, Amazon Kinesis, Amazon QuickSight, and Amazon SageMaker makes it easier to handle complex analytics workflows without friction. Innovative capabilities such as Data Sharing, Amazon Data Exchange, and Redshift ML enable a holistic experience to analyze your data at scale, while benefiting from Redshift’s leading price-performance.
Amazon Redshift provides customers ability to setup Redshift with Provisioned clusters as well as Amazon Redshift serverless. This helps enterprises choose the Amazon Redshift architecture based on their enterprise data warehousing needs.
With Provisioned clusters Customers can choose between a range of server types –
- DS2 Instances – Dense Storage (legacy)
- DC2 Instances – Dense Compute
- RA3 Instances
Redshift also offers a Redshift Serverless which is completely AWS Managed. It automatically provisions and scales the data warehouse capacity to deliver high performance for demanding and unpredictable workloads, and customers only pay for the resources they use.
Amazon Redshift and the Low/Zero-ETL narrative:
Amazon Redshift makes loading, distributing and consuming data easy. Its native integration with other services in the AWS ecosystem, helps it reduce and even eliminate the activity of writing code or building complex ETL pipelines. This in turn reduces the data operations, data processing, data storage and data maintenance cost, and greatly increases productivity and time to market.
Here are some of the key features of Amazon Redshift that helps with the Low/Zero-ETL narrative –
- Amazon Redshift Spectrum helps query and analyze data stored on Amazon S3 in real time. This eliminates the need for movement of data from Amazon S3 to Amazon Redshift.
- Auto-Copy feature of Amazon Redshift provides simple, low-code data ingestion from Amazon S3.
- Amazon Redshift provides zero-ETL integration for Amazon Aurora so that you can easily and automatically replicate near real-time data from multiple Aurora databases to Amazon Redshift.
- Amazon Redshift Federated queries helps query, analyze, and integrate data stored on transactional databases, data warehouse and the data lake in the same query. Federated queries can join data from transactional databases such as Amazon RDS for MySQL, Amazon Aurora PostgreSQL, Amazon RDS for PostgreSQL and Amazon Aurora MySQL, with Amazon Redshift Data warehouse and Amazon S3 data lake. This powerful feature helps eliminate ETL and data movement between transactional databases, data warehouse and data lakes.
- The integration of Amazon Redshift with AWS Data Exchange (ADX) helps Amazon Redshift queries instantly connect and query external 3rd party data without data copy or ETL.
- This integration with AWS Data Exchange (ADX) also helps with seamless live data sharing with external parties.
- The integration of Amazon Redshift with Amazon AppFlow enables direct load of data from SaaS platforms into Amazon Redshift, without a need for ETL.
- Amazon Redshift is integrated with Real time data ingestion services such as Amazon Kinesis Data Firehose and Managed Streaming for Kafka which can directly write data to Amazon Redshift.
- Amazon SageMaker natively integrates with Amazon Redshift to read data required for building ML models.
Amazon Redshift Capabilities:
Amazon Redshift’s core capabilities can be classified across 5 categories – Cost Optimization, Operational Excellence, Security, Reliability and Performance Efficiency – to demonstrate differentiation and use.
Date: May 03, 2023 (Originally Posted)
Updated Date: Sep 14, 2023
Author:
Narendra V Joshi
Principal Technology Architect, DNA
Reviewers:
Sudhir Gupta
Principal Partner Solutions Architect,
Amazon Web Services
Ashutosh Dubey
Senior Partner Solution Architect,
Amazon Web Services