Cost-effective Backups with AWS S3
AWS S3
The Amazon Simple Storage Service (AWS S3) is a cloud object storage service that can be used to store any amount of data in a scalable and secure way. We use AWS S3 in almost every cloud project, for example, to store application data, back up or run front-end applications using AWS CloudFront. S3 is easy to integrate, even into non-AWS systems.
Storage Classes
In simple terms, so-called “buckets” can be created in which objects (files) can be stored. One of the following storage classes can be flexibly selected for each object:
S3 Standard | S3 Standard-IA | S3 One Zone-IA | S3 Intelligent-Tiering | S3 Glacier | S3 Glacier Deep Archive | |
---|---|---|---|---|---|---|
Latency | low | low | low | low | Minutes till hours | 12h (Standard), 48h (Bulk) |
Accesses | many | few | few | variable | none | none |
Redundancy | min. 3 AZs | min. 3 AZs | 1 AZ | min. 3 AZs | min. 3 AZs | min. 3 AZs |
Notes | cheaper than standard, but reading fees | 20 % cheaper than standard IA | automatically moves objects between standard and standard IA | minimum storage period: 90 days | minimum storage period: 180 days | |
Use cases | application data, analyses and hosting | backups | secondary backups | data with unknown or unpredictable access patterns where one part is infrequently requested | archives, backups | long-term archiving |
AZ = isolated locations in a region, e.g. Frankfurt
The special feature of Glacier and the Glacier Deep Archive is that the data cannot be retrieved immediately. For example, to download the data, the objects must first be requested. In doing so, the objects are made available in another storage class for a defined period of time. Depending on the urgency, the retrieval speed and thus the costs for restoring differ.
Lifecycles
Life cycles can be used to define automatic transitions between storage classes or automatic expiry (deletion). For example, backups could be uploaded to the storage class Standard by default. In this storage class, the objects are available at all times and immediately. With the help of a life cycle, it can be defined that the backups are automatically stored in the Glacier after 30 days. In the Glacier storage class, the monthly storage costs are significantly lower, but the objects are not immediately accessible and must first be requested.
By storing an automatic expiry, the objects are automatically removed after the specified duration. For example, backups could be removed after one year. If you define an automatic deletion, then you should monitor and check your backups automatically. Through the life cycle, objects expire and are removed after the defined duration. By default, S3 is not very interested in whether new backups are uploaded regularly or whether they can be restored at all.
Access to AWS S3
Access to AWS S3 is convenient. The AWS CLI can be used as part of shell scripts to load or synchronize files and directories from various operating systems. In addition, libraries can be used in various programming languages to use AWS services.
Permissions are required for access. These can be defined very granularly. For example, it can be defined that a backup application may only store objects in limited directories (bucket and object prefix) and has no read or delete permissions. Further conditions can be attached to the permissions, such as the IP address of the client.
When accessing AWS services or machines, the permissions can often be assigned directly. For all others, users can be generated with access data that are stored as access key and secret key in the configuration, e.g. the AWS CLI.
Advantages
One of the biggest advantages of AWS S3 in the area of backups is that you can store objects in a scalable and redundant way with little effort. This (geographical) redundancy is often expensive or neglected. You only pay for what you actually use. If you keep a few hundred gigabytes of backups, the monthly bill is in the range of a few cents to euros. You can make your backups intelligent through automatisms such as life cycles or event-based programming.
Disadvantages
Due to the many usage options, as with many AWS services, the pricing model of AWS S3 is complex. When estimating your operating costs, you need to consider not only the storage space used, but also, for example, the number of operations and data transfers. AWS provides a Pricing Calculator that can be used to create more accurate estimates. From experience, storing and uploading backups is very inexpensive. However, if you regularly download several terabytes of backups, you should estimate your outbound traffic from AWS beforehand using the Pricing Calculator.
Conclusion
AWS S3 can be used for many use cases to store files in a scalable and secure way with minimal development effort. The service is billed on a per-use basis, so even small use cases can be operated for a few cents in operating costs. Depending on the use case, the objects can be stored in different storage classes, which differ in terms of access speed and operating costs, for example. In addition, automatisms can be used to automatically move objects to other storage classes, to remove them or to start event-based programs. Privileges can be granted on a fine-granular basis, e.g. to applications or AWS services.
Due to the versatile usage options, the operating costs must be estimated, e.g. with the Pricing Calculator. The complexity makes it more difficult for beginners to get started, but S3 can also be used easily in non-AWS environments, e.g. for backing up on-premises applications.
How do you store backups in the company and do you use cloud services for this?