AWS Simple Storage Service From 10,000 Feet


The Simple Storage Service is a secure, durable, highly scalable and highly available Object Storage by AWS. Object Store means we can store Files (like- Pictures, Files, Videos or any other type), cannot use it for installing software/operating systems (which require block based storage). S3 provides easy to use, web service interface to store/retrieve any amount of data from anywhere.

S3 is a cloud based service and AWS stores data in geographically distributed data centers. S3 do not pose any limit on available space, but S3 charges $$$ per GB of storage used. An uploaded file can be of maximum 5 TB. Files are stored in Bucket which is a kind of “Root Directory” stored in a cloud drive that can have sub-directories and has a unique DNS record, i.e. every bucket has a URL associated with it. The URL looks like this-  https://s3-(region_name).AmazonAws.com/(bucket_name)/ . When a file is uploaded using it’s URL, AWS returns HTTP code 200 if the action succeeds. S3 was designed to provide 99.99% of availability but Amazon actually guarantees 99.9% of availability (SLA). Similarly, Amazon guarantees (11x9s) 99.999999999% of data durability (SLA).

S3-10k-Overview

Data Consistency Model: AWS guarantees Write Consistency for new writes, but provides Eventual Consistency of updates and deletes. What this means is that, a “Read” after a “New” object is written, is guaranteed to return the data correctly. However, an “Update” in an “Existing” object, may take some time to propagate to all regions/availability zones (AZ). If a “Read” is done immediately after Update, the content returned by Read-After-Update, may be old as the Read may be execute on a different AZ where content are still not updated.

S3 Object Mechanics: Data in an S3 bucket is stored as a Key/value store. The Key here is the name of file and the Value is the content of file. With Key/Value, the Version ID, Metadata and ACLs are also stored. When versioning on a bucket is enabled, each Object can be overwritten multiple times and we can still access older versions. The Metadata stores data about the content of file, e.g. Tags. The Security features of S3 bucket includes configuration for privacy, permissions, policies and data encryption. The Access Control Lists store permissions about the objects we store and who can access them.

Versioning (Ver) of objects stored in a bucket is an idea that allows S3 to keep copies of objects. Versioning can only be configured at Bucket level, not at Object level. Once Versioning is enabled, it cannot be disabled (but can be Suspended (or paused). When suspended, any change to an Object will not generate a new version, however previous versions will be kept and not deleted. When versioning is enabled, all changes including delete operations, generates a new version of object (in case of delete, a new “delete Marker Object” is generated as the latest version). Every version of an object has it’s own owner and permissions. Delete operations (on versioned buckets) can be protected by MFA validation, in which a unique code generated by MFA device needs to be provided before deletion can actually happen. This article discusses Versioning in greater details.

Cross Region Replication (CRR) is a feature of S3 that can be activated at Bucket level by adding a replication configuration to the source bucket. Any Bucket belong to a specific Region and it is not Global. Within a Region, Objects are automatically replicated to 3 or more availability zones. CRR is the feature that allows its users to replicate entire source Bucket to another Region as destination Bucket even between different AWS accounts. Users can configure to select a subset of objects (and not all objects) for Replication, e.g. a key name prefix “Documents/” configures Replication of Objects in Documents folder within the Bucket. If CRR is configured after objects are created in the source bucket, then only new objects are replicated not existing objects. Along with the objects, CRR also replicates Object metadata, it’s Tags and it’s ACL. When an object’s latest version is deleted from source and a Delete Marker ‘dm’ is added, then CRR also replicates the ‘dm’ in destination bucket. However, an undelete (removal of ‘dm’) isn’t replicated. In order to setup CRR, versioning must be enabled in both source and destination buckets. For more detailed reading on CRR, check it out this article.

Life Cycle Management (LCM) of S3 object is a concept of defining a set of rules to manage storage Classes of objects in a cost effective manner and/or automatically expire them when their age reaches. As discussed in this article, S3 offers different storage class options (with different costing model and different SLAs) for the objects we store in S3. These storage classes are- Standard, IA, IA-1Z and Glacier (with Glacier promoted as a separate produce line). With Standard storage class offering most durability and availability SLAs, it is the most expensive one as well.

With LCM, we can define rules as when to transition an object from Standard storage class to IA to Glacier and eventually when to expire the object (and delete it). The ‘Transition‘ action of LCM allows to set rules to move objects from Standard storage class to IA given a specific age of object, e.g. 30 days, and then from IA to Glacier when age increases even further, e.g. 61 days. The ‘Expiration’ action of LCM works similar to transition, but depending on age of the object allows the user to configure automatic delete of the object.

The transition and expiration rules trigger based on age of object (versions) or on a specific date. When date is specified, they must comply with ISO 8601 format and are in UTC. The following transition of objects within a bucket are possible:

Image courtesy: Amazon S3 Documentation

Objects that are smaller than 128 KB or whose age less than 30 days, cannot be transitioned to next storage class. When a single rule defines how object will transition to more than 1 storage classes, then the object must stay in a storage class for minimum 30 days before next transition can happen. Transition rules for Current and Non-Current versions can be specified separately.

S3 objects that are transitioned to ‘Glacier’ storage class, are still S3 object and are accessible through S3 only not Amazon Glacier service. The expiration rules can expire objects before their minimum storage duration (e.g. 30 days) but AWS will still charge for minimum storage days. LCM configuration on MFA protected buckets isn’t currently supported. An LCM configuration can have upto 1000 rules both Enabled and Disabled combined. Rules can be specifically targeted for objects and ‘Filtered’ based on Tags or Key prefixes. For transition rules on non-current versions of object, the number of days the object has been non-current needs to match the minimum days requirement.

.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.