Hugo on AWS with Terraform

Background

Static generated blogs have seen a surge in popularity over the past few years, as a more cost effective and scalable alternative to traditional publishing platforms such as WordPress. A big reason contributing to why it took me so long to actually start writing online was uncertainty about the publishing medium I should use.

Ghost appealed to me for a while, but it felt like overkill for what I wanted to achieve. Most of the latter half of 2016 was spent diving into the ever-expanding React ecosystem, and it was during one of those dives that I was first exposed to the idea of static generated blogs by Gatsby, a static site generator which is built on the React component model. I am definitely a proponent of React. I have used it extensively in both personal projects and professional projects that have provided my income stream for most of the past six months. However, it didn’t take long for me to come to the conclusion that Gatsby too, was overkill for what I was looking to achieve.

I am nontheless thankful for the brief interlude I had with Gatsby, as it was during this interlude that I came across StaticGen, an incredibly useful website which lists and lets you compare almost all of the currently available open source static site generators available for use today. Over the course of a month I put a number of the various static site generators listsed through their paces before ultimately settling on Hugo.

The primary factor that led to my selection of Hugo over other static site generators was both the quantity and quality of user-written tutorials focused both on starting from scratch and on migrating from other static site generators as well as traditional publishing platforms. This factor was especially important as I believe that the documentation for Hugo, while copious, could stand to be significantly improved as it at times feels impenetrable even if you have been using Hugo for a while. Had it not been for the various user-written blog posts scattered across the internet, I may well have written off Hugo entirely based on the quality of its documentation.

What This Post is and is Not

This post is not intended to show how to get up and running with Hugo. There already exists a wealth of online tutorials addressing that topic. This post instead looks at the stage which comes after having a ./public folder containing a static generated website; getting it online and in front of people.

One of the potential advantages of going with a static generated website (if a static website fits your individual use-case) is that it does not require running a web server. If you want a website backed by a server to be available 247, that means paying for that server for be running 247. On the other hand, by using an object store such as Amazon’s S3 service to host your static content, the costs incurred are relative to the resources used: you only pay for what you use. This is clearly to the advantage of personal websites such as this one which only bring in a modest amount of traffic at the best of times.

When I first set up this website on AWS, I set up everything manually through the AWS console. It was a painful and uncertain experience which required stitching together steps from a variety of online tutorials that had been published over the span of half a decade.

Since that original setup, I have refined the process of getting a static generated website online, and I now have a simple, version controlled, repeatable and reliable way of spinning up all the required AWS infrastructure for a public-facing website behind https.

This post is about using HashiCorp’s Terraform to safely and predictably to automate the creation of the required AWS infrastructure to get any static website online and behind https in minutes.

Terraform

Both the website and the documentation for Terraform do a fantastic job of explaining what Terraform does and outlining some of its common use-cases. In a nutshell, Terraform lets you describe your the different pieces of your cloud infrastructure and the relationships between them as versionable code.

For this use-case, in practical terms this means that instead of going through the AWS console to set up all the different pieces, copying and pasting the ARNs of those different pieces in various places to ensure that they are all talking to each other as they should, you can just describe the infrastructure that you want to create as code, and then create links between those pieces using the variables that will hold the values that will be generated by each piece of your infrastructure as it is created.

Dependencies

If you are running OS X or macOS, you can use Homebrew to install the required dependencies:

brew install awscli
brew install terraform

Prerequisites

  • You should have an AWS account and aws-cli should be configured on your machine
  • You should have a domain name registered (this doesn’t necessarily have to be with Amazon’s Route 53)
  • You should request and receive a certificate for your domain from AWS Certificate Manager
  • Make sure that in addition to yourdomain.com, *.yourdomain.com and www.yourdomain.com are also covered by the certificate

Required Infrastructure

At the bare minimum, we will need to create the following:

  • An S3 bucket with a name matching the string of the bare url of your domain (without www. prepended)
  • An S3 bucket with a name matching the string of the url of your domain with www. prepended
  • A CloudFront distribution for each of the S3 buckets
  • A Route 53 Hosted Zone for your domain
  • An Alias Record in the Route 53 Hosted Zone for each of the CloudFront distributions

Suggested Infrastructure

I would personally also suggest creating specific IAM user used to deploy the website through a continuous integration and deployment platform such as Wercker. This is a particularly popular method for deploying updates for Hugo users and has a very well established procedure to get everything up and running.

In order to deploy from a platform such as Wercker, which essentially listens for git push events to a linked repository and then runs a set of actions whenever those events occur (building the website, uploading it to S3), that platform needs to be able to access resources linked to your AWS account.

It is generally considered sensible to create a specific IAM user with limited access to only the resources required to deploy your website, and to use that user to run actions on your AWS infrastructure rather than a main AWS user account with access to everything.

Terraform Variables

In order to get started, we need to have an understanding of what the main top-level variables to be used across our Terraform template are. We need to know the domain name, the ARN of the certificate for that domain, the AWS region that you want to create your infrastructure in and if you are going to be using Wercker, the name of the IAM user that will be used to deploy the website to AWS.

Additionally, because of Google’s duplicate content penalty and the fact that the approach of using a CloudFront distribution in front of an S3 bucket could allow bots to index both the bucket and the distribution, it makes sense to use an approach which protects access to the bucket. My preferred approach is to use custom headers between the distribution and the bucket. Consequently, a random string value can also be defined as a variable to be used in a custom header allowing the distribution access to the bucket.

Variables in Terraform need to be defined in a variables.tf file, and once they are defined, you can set them using data stored in a terraform.tfvars file.

variables.tf

variable "certificate" {}
variable "deployer" {}
variable "domain" {}
variable "duplicate_content_penalty_secret" {}
variable "region" {}

terraform.tfvars

certificate = "ARN of your certificate"
deployer = "Name of your deploying IAM user"
domain = "yourdomain.com"
duplicate_content_penalty_secret = "Some random value"
region = "AWS region code"

Terraform Modules

With top level variables defined and a file that they can be read from created, we can start describing at a high level what our infrastructure will look like using modules in a top-level main.tf file.

In this file we firstly define a cloud provider, which in this case is AWS, and set the region that we want to use. As your ~/.aws/credentials file will be correctly configured by at this point, there is no need to commit your AWS credentials to this file as Terraform will read them from the relevant environment variables on your machine set up by awscli.

From there, we define the outline for:

  • The site as accessed from the bare domain url
  • The site as accessed from the www domain url, which will redirect to the bare domain url
  • The Route 53 Hosted Zone for the domain and Alias Records for both the bare and www urls
  • The IAM user which can be used to deploy the generated website to AWS from Wercker

main.tf

provider "aws" {
  region = "${var.region}"
}

module "deployment_user" {
  source = "./iam"

  deployment_user_name = "${var.deployment_user_name}"
  domain = "${var.domain}"
}

module "site_bare_url" {
  source = "./site_bare_url"

  acm_certificate_arn = "${var.certificate_arn}"
  domain = "${var.domain}"
  duplicate_content_penalty_secret = "${var.duplicate_content_penalty_secret}"
  not_found_response_path = "/404.html"
}

module "site_www_url" {
  source = "./site_www_url"

  acm_certificate_arn = "${var.certificate_arn}"
  deployer = "${var.deployment_user_name}"
  domain = "www.${var.domain}"
  duplicate_content_penalty_secret = "${var.duplicate_content_penalty_secret}"
  target = "${var.domain}"
}

module "route_53" {
  source = "./route_53"

  bare_url_domain = "${var.domain}"
  bare_url_target = "${module.site_bare_url.website_cdn_hostname}"
  bare_url_cdn_hosted_zone_id = "${module.site_bare_url.website_cdn_zone_id}"

  www_url_domain = "www.${var.domain}"
  www_url_target = "${module.site_www_url.website_cdn_hostname}"
  www_url_cdn_hosted_zone_id = "${module.site_www_url.website_cdn_zone_id}"
}

Each of these modules has its own main.tf file in the specified source subdirectory. While each of these modules definitions is passing in data either from the top level variables or from the generated outputs of other modules, the files in the source subdirectories will contain the configuration details outlining how our infrastructure should look at behave.

Although not shown specifically below, variables passed in to each module will of course also need to be defined in the ./module_source, usually in a separate module_source/variables.tf file, or even in the ./module_source/main.tf file itself if it is relatively small.

IAM Deployment User

To start with, we want to ensure that our deployment user can only perform the required actions on a specific bucket and nothing else. Terraform’s template files provide a simple way to handle this in a sensible and reusable way; specifying the bucket that you want to allow access to using variable interpolation.

./iam/deployment_policy.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Action": [
        "s3:ListBucket"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::${bucket_name}"
    },
    {
      "Action": [
        "s3:DeleteObject",
        "s3:GetObject",
        "s3:GetObjectAcl",
        "s3:ListBucket",
        "s3:PutObject",
        "s3:PutObjectAcl"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::${bucket_name}/*"
    }
  ]
}

Using this template JSON file, in the main.tf file for this module a specific value can be specified to replace the ${bucket_name} variable, which will likely come from the top-level ${domain} variable defined earlier.

The module as a whole essentially goes through four steps:

  1. Create a new IAM user
  2. Create an access key for the new IAM user
  3. Create a policy that only allows access to the bare url website S3 bucket
  4. Attach the new policy to the new IAM user

./iam/main.tf

resource "aws_iam_user" "deployment_user" {
  name = "${var.deployment_user_name}"
}

resource "aws_iam_access_key" "deployment_user_access_key" {
  user = "${aws_iam_user.deployment_user.name}"
}

data "template_file" "deployment_policy_template_file" {
  template = "${file("${path.module}/deployment_policy.json")}"
  vars {
    bucket_name = "${var.domain}"
  }
}

resource "aws_iam_policy" "deployment_policy" {
  name = "${replace("${var.domain}",".","-")}-deployment-policy"
  path = "/"
  description = "Policy allowing to publish a new version of the website to the S3 bucket"
  policy = "${data.template_file.deployment_policy_template_file.rendered}"
}

resource "aws_iam_policy_attachment" "deployment_policy_attachment" {
  name = "${replace("${var.domain}",".","-")}-deployment-policy-attachment"
  users = ["${aws_iam_user.deployment_user.name}"]
  policy_arn = "${aws_iam_policy.deployment_policy.arn}"
}

One thing to note with this approach is that the secret access key is written to the Terraform state file, and this is where you’ll have to go in order to retrieve it if you want to deploy your website using this user in Wercker.

S3 and CloudFront (Main Site)

S3 Bucket

Terraform’s template files can again be used to construct a policy in a reusable way for an S3 bucket. With static generated websites on S3, documents need to be publicly readable. Also, if using the custom header approach to avoid Google’s duplicate content penalty, the duplicate penalty secret defined earlier can be used here directly.

./site_bare_url/website_bucket_policy.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "PublicReadAccess",
      "Principal": {
        "AWS": "*"
      },
      "Effect": "Allow",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::${bucket_name}/*",
      "Condition": {
        "StringEquals": {
          "aws:UserAgent": "${secret}"
        }
      }
    }
  ]
}

When it comes to the bucket itself, all that’s really needed is to ensure that it is set up correctly to serve a static website by specifying an index document and an error document.

./site_bare_url/main.tf

data "template_file" "bucket_policy_template_file" {
  template = "${file("${path.module}/website_bucket_policy.json")}"
  vars {
     bucket_name = "${var.domain}"
    secret = "${var.duplicate_content_penalty_secret}"
  }
}

resource "aws_s3_bucket" "bare_url_bucket" {
  bucket = "${var.domain}"
  policy = "${data.template_file.bucket_policy_template_file.rendered}"
  force_destroy = true

  website {
    index_document = "index.html"
    error_document = "404.html"
  }

  tags {
    Name = "Bare URL bucket for static site ${var.domain}"
  }
}

CloudFront Distribution

The setup of the CloudFront distribution which sits in front of the bucket does a few key things:

  • Sets the S3 bucket hosting the static website pages as the origin
  • Makes use of the duplicate content penalty secret in a custom header when accessing the S3 bucket
  • Sets the previously generated Amazon Certificate Manager SSL certificate to use for https

./site_bare_url/main.tf

resource "aws_cloudfront_distribution" "bare_url_cdn" {
  enabled = true
  price_class = "PriceClass_200"
  http_version = "http1.1"

  "origin" {
    origin_id = "S3-${aws_s3_bucket.bare_url_bucket.id}"
    domain_name = "${aws_s3_bucket.bare_url_bucket.website_endpoint}"
    custom_origin_config {
      origin_protocol_policy = "http-only"
      http_port = "80"
      https_port = "443"
      origin_ssl_protocols = [
        "TLSv1"]
    }
    custom_header {
      name = "User-Agent"
      value = "${var.duplicate_content_penalty_secret}"
    }
  }
  default_root_object = "index.html"
  custom_error_response {
    error_code = "404"
    error_caching_min_ttl = "360"
    response_code = "200"
    response_page_path = "/404.html"
  }
  "default_cache_behavior" {
    allowed_methods = [
      "GET",
      "HEAD",
      "DELETE",
      "OPTIONS",
      "PATCH",
      "POST",
      "PUT"]
    cached_methods = [
      "GET",
      "HEAD"]
    "forwarded_values" {
      query_string = false
      cookies {
        forward = "none"
      }
    }
    min_ttl = "0"
    default_ttl = "300"
    max_ttl = "1200"
    target_origin_id = "S3-${aws_s3_bucket.bare_url_bucket.id}"
    viewer_protocol_policy = "redirect-to-https"
    compress = true
  }
  "restrictions" {
    "geo_restriction" {
      restriction_type = "none"
    }
  }
  "viewer_certificate" {
    acm_certificate_arn = "${var.acm_certificate_arn}"
    ssl_support_method = "sni-only"
    minimum_protocol_version = "TLSv1"
  }
  aliases = [
    "${var.domain}"]
}

Outputs

There are a few outputs from the CloudFront distribution that will be needed a little later when using Route 53 to make the domain point to the distribution. These can be saved either in the main.tf file or in a separate outputs.tf file.

./site_bare_url/outputs.tf

output "website_cdn_hostname" {
  value = "${aws_cloudfront_distribution.bare_url_cdn.domain_name}"
}

output "website_cdn_zone_id" {
  value = "${aws_cloudfront_distribution.bare_url_cdn.hosted_zone_id}"
}

S3 and CloudFront (Redirect Site)

S3 Bucket

The S3 bucket for the www-prefixed url will not actually be storing anything; it will just be redirecting users to the primary bare url of the domain behind https.

./site_www_url/website_redirect_bucket_policy.json

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Sid":"PublicReadAccess",
         "Principal":{
            "AWS":"*"
         },
         "Effect":"Allow",
         "Action":[
            "s3:GetObject"
         ],
         "Resource":"arn:aws:s3:::${bucket_name}/*"
      }
   ]
}

./site_www_url/main.tf

data "template_file" "bucket_policy_template_file" {
  template = "${file("${path.module}/website_redirect_bucket_policy.json")}"
  vars {
    bucket_name = "${var.domain}"
  }
}

resource "aws_s3_bucket" "www_url_bucket" {
  bucket = "${var.domain}"
  policy = "${data.template_file.bucket_policy_template_file.rendered}"
  force_destroy = true

  website {
    redirect_all_requests_to = "https://${var.target}"
  }
}

CloudFront Distribution

Setting up the CloudFront distribution for the redirect site will be exactly the same as in the corresponding step for the main site. ./site_www_url/main.tf

resource "aws_cloudfront_distribution" "www_url_cdn" {
  enabled = true
  price_class = "PriceClass_200"
  http_version = "http1.1"

  "origin" {
    origin_id = "S3-${aws_s3_bucket.www_url_bucket.id}"
    domain_name = "${aws_s3_bucket.www_url_bucket.website_endpoint}"
    custom_origin_config {
      origin_protocol_policy = "http-only"
      http_port = "80"
      https_port = "443"
      origin_ssl_protocols = [
        "TLSv1"]
    }
    custom_header {
      name = "User-Agent"
      value = "${var.duplicate_content_penalty_secret}"
    }
  }
  default_root_object = "index.html"
  custom_error_response {
    error_code = "404"
    error_caching_min_ttl = "360"
    response_code = "200"
    response_page_path = "/404.html"
  }
  "default_cache_behavior" {
    allowed_methods = [
      "GET",
      "HEAD",
      "DELETE",
      "OPTIONS",
      "PATCH",
      "POST",
      "PUT"]
    cached_methods = [
      "GET",
      "HEAD"]
    "forwarded_values" {
      query_string = false
      cookies {
        forward = "none"
      }
    }
    min_ttl = "0"
    default_ttl = "300"
    max_ttl = "1200"
    target_origin_id = "S3-${aws_s3_bucket.www_url_bucket.id}"
    viewer_protocol_policy = "redirect-to-https"
    compress = true
  }
  "restrictions" {
    "geo_restriction" {
      restriction_type = "none"
    }
  }
  "viewer_certificate" {
    acm_certificate_arn = "${var.acm_certificate_arn}"
    ssl_support_method = "sni-only"
    minimum_protocol_version = "TLSv1"
  }
  aliases = [
    "${var.domain}"]
}

Outputs

There are a again few outputs from the CloudFront distribution that will be needed for Route 53.

./site_www_url/outputs.tf

output "website_cdn_hostname" {
  value = "${aws_cloudfront_distribution.www_url_cdn.domain_name}"
}

output "website_cdn_zone_id" {
  value = "${aws_cloudfront_distribution.www_url_cdn.hosted_zone_id}"
}

Route 53

Finally, all that is left is to create a Route 53 Hosted Zone for the domain and two A records which will point to each of the CloudFront distributions.

With this in place, every time somebody browses to https://www.yourdomain.com, http://yourdomain.com or http://www.yourdomain.com, they will end up at https://yourdomain.com which will show the static generated website stored in the first S3 bucket.

./route_53/main.tf

resource "aws_route53_zone" "main_zone" {
  name = "${var.bare_url_domain}"
}

resource "aws_route53_record" "bare_url_cdn_alias_record" {
  zone_id = "${aws_route53_zone.main_zone.zone_id}"
  name = "${var.bare_url_domain}"
  type = "A"

  alias {
    name = "${var.bare_url_target}"
    zone_id = "${var.bare_url_cdn_hosted_zone_id}"
    evaluate_target_health = false
  }
}

resource "aws_route53_record" "www_url_cdn_alias_record" {
  zone_id = "${aws_route53_zone.main_zone.zone_id}"
  name = "${var.www_url_domain}"
  type = "A"

  alias {
    name = "${var.www_url_target}"
    zone_id = "${var.www_url_cdn_hosted_zone_id}"
    evaluate_target_health = false
  }
}

Infrastructure Creation

All that’s left is to run terraform apply and watch the in the output in the console as all required infrastructure is created. A bit of patience is required particularly for the complete initialisation of CloudFront distributions, which I have experienced on occasion taking up to an hour.

Once everything is created, the Hugo documentation has a tutorial for setting up automated deployments of Hugo generated websites to an S3 bucket. Alternatively it is also possible to upload the generated pages directly from your computer.

Contents