Amazon S3 Tutorial

Amazon S3 Overview: Amazon S3 is object storage built to store and retrieve any amount of data from anywhere. It’s a simple storage service that offers industry-leading durability, availability, performance, security, and virtually unlimited scalability at meager costs.

• Amazon S3 is one of the main building blocks of AWS
• It’s advertised as” infinitely scaling” storage
• Many websites use Amazon S3 as a backbone
• Many AWS services use Amazon S3 as an integration as well

AWS S3 Use cases

• Backup and storage
• Disaster Recovery
• Archive
• Hybrid Cloud storage • Application hosting
• Media hosting
• Data lakes & big data analytics • Software delivery
• Static website

Amazon S3 Buckets

A bucket is a container for objects stored in Amazon S3. You can store any number of objects in a bucket and have up to 100 buckets in your account. To request an increase, visit the Service Quotas Console. Every object is contained in a bucket. For example, if the object is named photos/puppy. A company can use Service Quotas to centrally request and track service limit increases.


• Amazon S3 allows people to store objects (files) in “buckets” (directories). Buckets must have a globally unique name (across all regions and all accounts and are defined at the regional level).
• S3 looks like a global service, but buckets are created in a region
• Naming convention
• No uppercase
• No underscore
• 3-63 characters long
• Not an IP
• Must start with a lowercase letter or number

S3 Objects

An object is a file and any metadata that describes the file. A bucket is a container for objects. To store your data in Amazon S3, you first create a bucket and specify a bucket name and AWS Region. Then, you upload your data to that bucket as objects in Amazon S3.

• Objects (files) have a Key
• The key is the FULL path:
• s3://my-bucket/my_file.txt
• s3://my-bucket/my_folder1/another_folder/my_file.txt
• The key is composed of prefix + object name
• s3://my-bucket/my_folder1/another_folder/my_file.txt
• There’s no concept of “directories” within buckets (although the UI will trick you into thinking otherwise)
• Just keys with very long names that contain slashes (“/”)

• Object values are the content of the body:
• Max Object Size is 5TB (5000GB)
• If uploading more than 5GB, use “multi-part upload.”
• Metadata (list of text key/value pairs – system or user metadata)
• Tags (Unicode key/value pair – up to 10) – useful for security/lifecycle

• Version ID (if versioning is enabled)

AWS S3 Security and Bucket Policy

A bucket policy is a resource-based AWS Identity and Access Management (IAM) policy. You add a bucket policy to a bucket to grant other AWS accounts, or IAM users access permissions for the bucket and the objects in it. Object permissions apply only to the objects that the bucket owner creates.

There are a few ways to handle security with Amazon S3. One may use IAM policies that specify which API calls are allowed for a given user in the IAM console. In addition, we have resource-based security based on bucket policies, object access control lists, and bucket access control lists. An IAM principal can access an object in S3 if the user IAM permissions allow for it or if the resource policy allows it. There also needs to be no explicit denial. Amazon S3 objects can be encrypted using encryption keys if desired.

Example Bucket Policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AddCannedAcl",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::111122223333:root",
                    "arn:aws:iam::444455556666:root"
                ]
            },
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl"
            ],
            "Resource": "arn:aws:s3:::DOC-EXAMPLE-BUCKET/*",
            "Condition": {
                "StringEquals": {
                    "s3:x-amz-acl": [
                        "public-read"
                    ]
                }
            }
        }
    ]
}

• JSON-based policies
• Resources: buckets and objects
• Actions: Set API to Allow or Deny
• Effect: Allow / Deny
• Principal: The account or user to apply the policy to
• Use the S3 bucket for policy to:
• Grant public access to the bucket
• Force objects to be encrypted at upload
• Grant access to another account (Cross Account)

Tag the objects in the S3 bucket and Use network ACLs if you need to restrict access to the objects to meet compliance obligations when storing objects in Amazon S3.

Amazon S3 Websites

You can configure an Amazon S3 bucket to function like a website. This example walks you through hosting a website on Amazon S3.

• The website URL will be:
• bucket-name.s3-website-AWS-region.amazonaws.com OR
• bucket-name.s3-website.AWS-region.amazonaws.com
• If you get a 403 (Forbidden) error, ensure the bucket policy allows public reads!

AWS S3 Versioning

Versioning in Amazon S3 is a means of keeping multiple variants of an object in the same bucket. You can use the S3 Versioning feature to preserve, retrieve, and restore every version of every object stored in your buckets.

• You can version your files in Amazon S3
• It is enabled at the bucket level
• the Same key overwrite will increment the “version”: 1, 2, 3….
• It is best practice to version your buckets
• Protect against unintended deletes (ability to restore a version) • Easy roll back to the previous version

S3 Versioning is important when a company must be able to recover data that is accidentally overwritten or deleted, such as when a company is launching an application in the AWS Cloud that uses Amazon S3 storage.

Amazon S3 Server Access Logging

Server access logging provides detailed records for requests made to an Amazon S3 bucket. Server access logs are helpful for many applications. For example, access log information can benefit security and access audits.

• For audit purposes, you may want to log all access to S3 buckets
• Any request made to S3 from any account, authorized or denied, will be logged into another S3 bucket
• That data can be analyzed using data analysis tools
• Very helpful in identifying an issue’s root cause, audit usage, viewing suspicious patterns, etc.

AWS S3 Replication

Amazon Simple Storage Service (S3) Replication is an elastic, fully managed, low-cost feature replicating objects between buckets. S3 Replication offers the most flexibility and functionality in cloud storage, giving you the controls you need to meet your data sovereignty and other business needs.

• Must enable versioning in source and destination
• Cross Region Replication (CRR)
• Same Region Replication (SRR)
• Buckets can be in different accounts
• Copying is asynchronous
• Must give proper IAM permissions to S3

Amazon S3 Storage Classes

S3 storage classes are purpose-built to provide the lowest cost storage for different access patterns. S3 storage classes are ideal for any use case, including those with demanding performance needs, data residency requirements, unknown or changing access patterns, or archival storage.

• Amazon S3 Standard – General Purpose
• Amazon S3 Standard-Infrequent Access (IA) • Amazon S3 One Zone-Infrequent Access. A company that wants to use Amazon S3 to store its legacy data that is rarely accessed, is critical, and cannot be recreated while being available for retrieval within seconds can use S3 Standard-Infrequent Access (S3 Standard-IA) cost-effectively.
• Amazon S3 Glacier Instant Retrieval
• Amazon S3 Glacier Flexible Retrieval
• Amazon S3 Glacier Deep Archive
• Amazon S3 Intelligent Tiering

A bank that needs to store recordings of calls made to its contact center for 6 years but be able to access them within 48 hours from the time they are requested can use Amazon S3 Glacier.

AWS S3 Glacier Vault Lock & S3 Object Lock

S3 Glacier Vault Lock helps you to quickly deploy and enforce compliance controls for individual S3 Glacier vaults with a Vault Lock policy. You can specify rules such as “write once read many” (WORM) in a Vault Lock policy and lock the policy from future edits. Amazon S3 Object Lock is a new S3 feature that blocks object version deletion during a customer-defined retention period. You can enforce retention policies as a layer of data protection or regulatory compliance.

Glacier Vault Lock

• Adopt a WORM (Write Once Read Many) model
• Lock the policy for future edits (can no longer be changed)
• Helpful for compliance and data retention

Object Lock

• Adopt a WORM (Write Once Read Many) model
• Block an object version deletion for a specified amount of time

AWS Snow Family

The AWS Snow Family is a collection of physical devices that help migrate large amounts of data into and out of the cloud without depending on networks. This enables you to apply various AWS services for analytics, file systems, and archives to your data.

Snowball Edge Data Transfer

AWS Snowball Edge is a type of Snowball device with onboard storage and compute power for select AWS capabilities. Snowball Edge can do local processing and edge-computing workloads in addition to transferring data between your local environment and the AWS Cloud.

• Physical data transport solution: move TBs or PBs of data in or out of AWS
• Alternative to moving data over the network (and paying network fees)
• Pay per data transfer job
• Provide block storage and Amazon S3-compatible object storage
• Snowball Edge Storage Optimized
• 80 TB of HDD capacity for block volume and S3 compatible object storage
• Snowball Edge Compute Optimized
• 42 TB of HDD capacity for block volume and S3 compatible object storage
• Use cases: large data cloud migrations, DCdecommission, disaster recovery

AWS Snowball Edge can be used in a scenario where a company has cargo ships at sea with sensors that collect data and there is intermittent or no internet connectivity. AWS Snowball Edge makes it possible for the cargo company to collect, format, and process the data at sea and move the data to AWS later. AWS Snowball can be used to transfer 60 TB of data from an on-premises data center to AWS within 10 days when a company is migrating to Amazon S3.

AWS Storage Gateway

AWS Storage Gateway is a hybrid cloud storage service that gives you on-premises access to unlimited cloud storage. Storage Gateway provides a standard set of storage protocols such as iSCSI, SMB, and NFS, which allow you to use AWS storage without rewriting your existing applications.

• AWS is pushing for a” hybrid cloud.”
• Part of your infrastructure is on-premises, and part of your infrastructure is on the cloud
• The push for hybrid cloud comes from:
—Long cloud migrations
—Security requirements
—Compliance requirements and IT strategy
• S3 is a proprietary storage technology (unlike EFS / NFS), so how do you expose the S3 data on-premise? You do this with AWS Storage Gateway.

AWS Storage Gateway can be used when a physical tape library to store data backups needs to extend capacity to the AWS Cloud if the tape library is running out of space.

Learn More About AWS S3

Amazon S3 is an object storage service that provides high-level performance, security, scalability, and data availability. This describes some of the core functionality of Amazon S3.