Infrastructure as code for Cyber Ranges

This is the first part of my IaC overview based on a personal experiment: building Cyber range using the IaC paradigm. Here are the second and third parts.

During my Twitch session, I’m used to offering a practical lab to attendees. My labs are automatically created on AWS, using Terraform and Ansible.

Scenario

My scenario is pretty simple: I need to create a set of VM inside AWS and configure them with some additional software and services. Those VMs expose some vulnerable services which can be exploited and defended by attendees.

Before starting the Twitch session, I manually started Terraform and Ansible to create and configure VMs. To be specific I use:

Terraform, to create VMs in AWS;
Terraform, to create client-to-site VPN gateway;
OpenVPN to connect to the internal side of the lab;
Ansible, via OpenVPN, to configure VMs.

The building process takes less than 10 minutes.

Plan and standardization

IaC is 80% plan and standardization. It means that we need to identify requirements, scenarios, and corner cases. Then we need to identify how we define our infrastructure (I refer to this phase as “modeling”), and finally, we should write prototypes to verify our idea and approach.

In my case I need to create Cyber range scenarios that can include multiple VMs:

with various operating systems and applications;
attached to different networks;
possibly protected by additional appliances (firewalls).

All VMs must be accessible by a specific host to configure them.

Last requirement: all scenarios should be completely created from scratch and destroyed after the session. No permanent data is expected.

In my case I decided to:

write a custom Terraform module to create the basic infrastructure (Internet access, basic networks, client-to-site VPN gateway, or bastion hosts);
define manually the additional components (additional VMs, networks, and how they are connected);
configure each VMs using roles (multiple roles can be configured within the same VM).

Creating infrastructure

The very basic scenario requires creating a bastion host and at least one Ubuntu Linux VM. The Terraform documentation is pretty explanatory, and there are no issues with that.

My only suggestion is: plan carefully what you need now and shortly, and be ready to adapt.

In my cases, I decided to use tags to track down OS, installed applications, purpose, and administrative users… Those tags will be useful in Ansible.

Configuring the infrastrucutre

Here come the problems: Terraform and Ansible are two different universes, and I need to make them communicate. Using AWS, and planning carefully my infrastructure, I found the AWS EC2 Ansible inventory pretty good, even if it has some limitations.

In short:

Terraform creates the infrastructure by applying tags;
AWS EC2 Ansible inventory fetches the AWS EC2 instances and prepares an Ansible-compatible inventory, maintaining the tags;
After establishing the OpenVPN connection, Ansible can configure internal VMs.

The Ansible AWS EC2 inventory resolves all internal VMs with the private IP address:

plugin: aws_ec2
regions:
  - eu-central-1
filters:
  instance-state-name: running
keyed_groups:
  - key: tags
    prefix: tag
hostnames:
  - tag:Name
compose:
  ansible_host: private_ip_address

At this point, we have an excellent way to build and configure the infrastructure, no matter if VMs are reachable from the Internet or not. The only side effect is that client-to-site VPN connections impact a lot on my AWS account.

Other stuff (certificates)

The above approach requires building and maintaining a CA (Certification Authority):

AWS client-to-site concentrator requires a server certificate;
VPN clients need the CA public certificate to validate the concentrator certificate;
VPN clients need a valid certificate to be accepted by the VPN concentrator;
AWS client-to-site concentrator needs the CA public certificate to validate the client certificates;
additional servers (e.g. web servers) could need valid certificates.

Even if the AWS and Terraform documentations are very good, this task cost me hours to find the right approach.

Conclusions

As I wrote before, IaC is 80% plan and standardization. Speaking about enterprises, there is no “standard approach”, everything should be designed around a specific context, with particular needing. In a real-world scenario, many teams are involved: operations (of course), consumers (e.g. developers), audit and compliance, and security… IaC requires a “holistic approach”, teams cannot go with an IaC approach without involving stakeholders. Going alone means failure: limitations, high costs, and a lot of unplanned exceptions…