Amazon Redshift is a column-oriented, fully managed, petabyte-scale data warehouse that makes it easy and cost-effective to analyze all your data. Amazon Redshift achieves efficient query performance through a combination of massively parallel processing, columnar data storage, efficient data compression, and ML powered system optimizations. It enables customers to run and scale analytics on all their data in seconds without having to manage data warehouse infrastructure. Amazon Redshift provides the capability to query petabytes of both structured and semi-structured data stored natively in Amazon Redshift, S3 data lakes, Amazon RDS, and Aurora PostgreSQL managed OLTP databases, all using a standard SQL interface. Native integration with AWS services such as AWS Lake Formation, AWS Glue, Amazon Kinesis, Amazon QuickSight, and Amazon SageMaker makes it easier to handle complex analytics workflows without friction. Innovative capabilities such as Data Sharing, Amazon Data Exchange, and Redshift ML enable a holistic experience to analyze your data at scale, while benefiting from Redshift’s leading price-performance.
Amazon Redshift and the Low/Zero-ETL narrative:
Amazon Redshift makes loading, distributing and consuming data easy. Its native integration with other services in the AWS ecosystem, helps it reduce and even eliminate the activity of writing code or building complex ETL pipelines. This in turn reduces the data operations, data processing, data storage and data maintenance cost, and greatly increases productivity and time to market.
Here are some of the key features of Amazon Redshift that helps with the Low/Zero-ETL narrative –
- Amazon Redshift Spectrum helps query and analyze data stored on Amazon S3 in real time. This eliminates the need for movement of data from Amazon S3 to Amazon Redshift.
- Amazon Redshift Federated queries helps query, analyze, and integrate data stored on transactional databases, data warehouse and the data lake in the same query. Federated queries can join data from transactional databases such as Amazon RDS for MySQL, Amazon Aurora PostgreSQL, Amazon RDS for PostgreSQL and Amazon Aurora MySQL, with Amazon Redshift Data warehouse and Amazon S3 data lake. This powerful feature helps eliminate ETL and data movement between transactional databases, data warehouse and data lakes.
- The integration of Amazon Redshift with AWS Data Exchange (ADX) helps Amazon Redshift queries instantly connect and query external 3rd party data without data copy or ETL.
- This integration with AWS Data Exchange (ADX) also helps with seamless live data sharing with external parties.
- The integration of Amazon Redshift with Amazon AppFlow enables direct load of data from SaaS platforms into Amazon Redshift, without a need for ETL.
- Amazon Redshift is integrated with Real time data ingestion services such as Amazon Kinesis Data Firehose and Managed Streaming for Kafka which can directly write data to Amazon Redshift.
- Amazon SageMaker natively integrates with Amazon Redshift to read data required for building ML models.
Amazon Redshift Capabilities:
Amazon Redshift’s core capabilities can be classified across 5 categories – Cost Optimization, Operational Excellence, Security, Reliability and Performance Efficiency – to demonstrate differentiation and use.
Amazon Redshift features in preview:
Amazon Redshift is continuously evolving to meet customers current and envisioned future analytics needs. Some important enhancements currently available for customer preview are:
- Streaming ingestion for Kinesis Data Streams– connect to and directly ingest data from multiple Kinesis data streams simultaneously into Amazon Redshift for powerful real time analytics
Date: August 01, 2022
Narendra V Joshi
Principal Technology Architect, DNA
Senior Partner Solution Architect,
Amazon Web Services
Principal Partner Solutions Architect,
Amazon Web Services