AWS Database Services

A relational database is a collection of data items with pre-defined relationships. These items are organized as a set of tables with columns and rows. Tables are used to hold information about the objects represented in the database. Each column in a table holds a certain kind of data and a field stores the actual value of an attribute. The rows in the table represent a collection of related values of one object or entity. Each row in a table could be marked with a unique identifier called a primary key, and rows among multiple tables can be made related using foreign keys. This data can be accessed in many different ways without reorganizing t tables.


Amazon Relational Database Service (Amazon RDS) is a collection of managed services that makes it simple to set up, operate, and scale databases in the cloud. By using Amazon RDS a company can host its MySQL databases on AWS and maintain full control over the operating system, database installation, and configuration. The task of Installing the database engine is an AWS responsibility when a workload is running in Amazon RDS.

• RDS stands for Relational Database Service
• It’s a managed DB service for DB using SQL as a query language.
• It allows you to create a database AWS managest are managed by AWS
• Postgres – Amazon RDS with multiple Availability Zones make a database highly available and fault tolerant when a company needs to deploy a PostgreSQL database into Amazon RDS.
• MariaDB
• Oracle
• Microsoft SQL Server
• Aurora (AWS Proprietary database)

• Automated provisioning, OS patching
• Continuous backups and restore to specific timestamp (Point in Time Restore)! • Monitoring dashboards
• Read replicas for improved read performance
• Multi-AZ setup for DR (Disaster Recovery)
• Maintenance windows for upgrades
• Scaling capability (vertical and horizontal) • Storage backed by EBS (gp2 or io1)

Managing connections to the database and designing encryption-at-rest strategies are the responsibility of the customer when using Amazon RDS to host a database according to the AWS shared responsibility model.

Amazon Aurora (SQL)

Amazon Aurora is a relational database management system (RDBMS) built to complete the cloud with whole MySQL and PostgreSQL compatibility. Aurora gives you the performance and availability of commercial-grade databases at one-tenth the cost.

• Aurora is a proprietary technology from AWS (not open sourced)
• PostgreSQL and MySQL are both supported as Aurora DB
• Aurora is “AWS cloud-optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
• Aurora storage automatically grows in increments of 10GB, up to 64 TB.
• Aurora costs more than RDS (20% more) – but is more efficient
• Not in the free tier

Multi-Availability Zones, Read Replicas, and Multi-Region

Amazon Aurora or Amazon DynamoDB are perfect for a retail company that needs to build a highly available architecture for a new ecommerce platform while using only AWS services that replicate data across multiple Availability Zones.

AWS ElastiCache In-Memory DB

Amazon ElastiCache is a fully managed in-memory data store and cache service by Amazon Web Services (AWS). The service improves the performance of web applications by retrieving information from managed in-memory caches, instead of relying entirely on slower disk-based databases.

• The same way RDS is to get managed Relational Databases…
• ElastiCache is to get managed Redis or Memcached
• Caches are in-memory databases with high performance, low latency
• Helps reduce the load off databases for read-intensive workloads
• AWS takes care of OS maintenance/patching, optimizations, setup, configuration, monitoring, failure recovery, and backups

DynamoDB (serverless)

DynamoDB is a serverless service that automatically scales up and down to adjust for capacity and maintain performance. It also has built-in high availability and fault tolerance. AWS automatically encrypts data that is stored in Amazon DynamoDB. Amazon DynamoDB is an AWS key-value database offering consistent single-digit millisecond performance at any scale.

• Fully Managed Highly available with replication across 3 AZ
• NoSQL database – not a relational database
• Scales to massive workloads, distributed “serverless” database
• Millions of requests per second, trillions of rows, 100s of TB of storage • Fast and consistent in performance

A company that is developing a new Node.js application that must have a scalable NoSQL database to meet increasing demand as the popularity of the application grows can use Amazon DynamoDB to meet their requirements.

• Single-digit millisecond latency – low latency retrieval
• Integrated with IAM for security, authorization, and administration
• Low cost and auto-scaling capabilities
• Standard & Infrequent Access (IA) Table Class

Amazon DocumentDB (with MongoDB compatibility) and Amazon DynamoDB could be used by a company developing a mobile app that needs a high-performance NoSQL database. Amazon DynamoDB global tables is a good solution for a global company that is building a simple time-tracking mobile app that needs to operate globally and must store collected data in a database so it is accessible from the AWS Region that is closest to the user.

DAX (cache for DynamoDB)

DAX is suitable for heavy workloads, especially where there are intensive reads. It is the in-memory cache ideal to use with DynamoDB. However, ElastiCache supports both Redis and Memcached. And when compared to DynamoDB DAX, there is more heavy work to do, including managing invalidations in ElastiCache.

• Fully Managed in-memory cache for DynamoDB
• 10x performance improvement – single-digit millisecond latency to microseconds latency – when accessing your DynamoDB tables
• Secure, highly scalable & highly available
• Difference with ElastiCache at the CCP level: DAX is only used for and is integrated with DynamoDB, while ElastiCache can be used for other databases

AWS Redshift (SQL)

Amazon Redshift is built around industry-standard SQL, with added facility to manage very large datasets and support high-performance analysis and reporting of those data.

• Redshift is based on PostgreSQL, but it’s not used for OLTP. A company that needs to generate reports for business intelligence and operational analytics on petabytes of semistructured and structured data can use Amazon Redshift when these reports are produced from standard SQL queries on data that is in an Amazon S3 data lake.
• It’s OLAP – online analytical processing (analytics and data warehousing). Load data once every hour, not every second
• 10x better performance than other data warehouses, scale to PBs of data Columnar storage of data (instead of row-based)
• Massively Parallel Query Execution (MPP), highly available
• Pay as you go based on the instances provisioned
• Has a SQL interface for performing the queries
• BI tools such as AWS Quicksight or Tableau integrate with it

If a company needs to set up a petabyte-scale data warehouse in the AWS Cloud, Amazon Redshift is a good choice.

AWS HadoopCluster:EMR

Amazon EMR is a cloud-based big data platform running large-scale distributed data processing jobs, interactive SQL queries, and machine learning applications.

• EMR stands for “Elastic MapReduce.”
• EMR helps create Hadoop clusters (Big Data) to analyze and process the vast amounts of data
• The clusters can be made of hundreds of EC2 instances
• Also supports Apache Spark, HBase, Presto, and Flink.
• EMR takes care of all the provisioning and configuration
• Auto-scaling and integrated with Spot instances
• Use cases: data processing, machine learning, web indexing, big data

Amazon Athena

Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing you to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning.

• Serverless query service to analyze data stored in Amazon S3
• Uses standard SQL language to query the files
• SupportsCSV,JSON,ORC,Avro,and parquet(built on Presto)
• Pricing: $5.00 per TB of data scanned
• Use compressed or columnar data for cost savings (less scan)
• Use cases: Business intelligence/analytics/reporting, analysis & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc.
• Exam Tip: analyze data in S3 using serverless SQL, use Athena

AWS QuickSight

Amazon QuickSight allows everyone in your organization to understand your data by asking questions in natural language, exploring through interactive dashboards, or automatically looking for patterns and outliers powered by machine learning.

• Serverless machine learning-powered business intelligence service to create interactive dashboards
• Fast, automatically scalable, embeddable, with per-session pricing
• Use cases:
• Business Analytics
• Building visualizations
• Perform ad-hoc analysis
• Get business insights using data
• Integrated with RDS, Aurora, Athena, Redshift, and S3.

Amazon QuickSight supports the creation of visual reports from AWS Cost and Usage Report data.

Amazon DocumentDB

Amazon DocumentDB is a scalable, highly durable, and fully managed database service for operating mission-critical MongoDB workloads. Document databases are a practical solution to online profiles in which different users provide different types of information. Using a document database, you can store each user’s profile efficiently by storing only the attributes that are specific to each user.

• Aurora is an “AWS-implementation” of PostgreSQL / MySQL …
• DocumentDB is the same as MongoDB (which is a NoSQL database)
• MongoDB is used to store, query, and index JSON data • Similar “deployment concepts” as Aurora
• Fully Managed, highly available with replication across 3 AZ
• Aurora storage automatically grows in 10GB up to 64 TB increments.
• Automatically scales to workloads with millions of requests per seconds

Amazon QLDB

Amazon Quantum Ledger Database (QLDB) is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log.

• QLDB stands for” Quantum Ledger Database.”
• A ledger is a book recording financial transactions
• FullyManaged,Serverless,Highavailable,Replicationacross3AZ
• Used to review the history of all the changes made to your application data over time
• Immutable system: no entry can be removed or modified; cryptographically verifiable
• 2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL
• Difference with Amazon Managed Blockchain: no decentralization component, following financial regulation rules

AWS Glue

AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue is a managed AWS service that is used specifically for extract, transform, and load (ETL) data.

• Managed extract, transform, and load (ETL) service • Useful for preparing and transforming data for analytics • Fully serverless service

Amazon Database Migration

AWS Database Migration Service (AWS DMS) is a cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. You can use AWS DMS to migrate your data into the AWS Cloud or between combinations of cloud and on-premises setups.

• Quickly and securely migrate databases to AWS, resilient, self-healing
Source DB
• The source database remains available during the migration

• Homogeneous migrations: ex Oracle to Oracle
• Heterogeneous migrations: ex Microsoft SQL Server to Aurora

AWS Consulting Partners can help a company that wants to migrate its workloads to AWS but lacks expertise in AWS Cloud computing. AWS Database Migration Service (AWS DMS) should be used to migrate a company’s on-premises MySQL database to Amazon RDS.

AWS Neptune

Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. SQL queries for highly connected data are complex and challenging to tune for performance.

• Fully managed graph database
• A popular graph dataset would be a social network
• Users have friends
• Posts have comments
• Comments have likes from users
• Users share and like posts.
• Highly available across 3 AZ, with up to 15 read replicas
• Build and run applications working with highly connected datasets – optimized for these complex and challenging queries
• Can store up to billions of relations and query the graph with milliseconds latency
• Highly available with replications across multiple AZs
• Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking

Amazon Neptune and Amazon DocumentDB (with MongoDB compatibility) use cloud-native storage that provides replication across multiple Availability Zones by default.

Learn More About Amazon Database Services

Use Amazon RDS with a MySQL database if the IT team has to patch the database and take backup snapshots of the data in MySQL database server clusters when moving this workload to AWS so that these tasks can be completed automatically.