YBN CTF 2023 - Hosting The Infrastructure

The YBN Infra Team's experience in hosting the infrastructure for YBN's first CTF.

Introduction

YesButNo CTF 2023 was the first CTF hosted by Yes But No for 24 hours from the 18th to the 19th of November.

Here is the final Top 10 scoreboard.

Our Infrastructure

Starting in September, we looked at various blogs about others who have hosted CTFs. SEETF's blog gave us a good look under the hood of a CTF infrastructure. We had experimented with Docker Swarm, Kubernetes, k3s and more. Since we had absolutely no budget for our CTF, we had to pool together free plans from various services to host our infrastructure. Mainly Oracle Free Tier & AWS Academy accounts.

CTFd

Setting up CTFd was a simple process with the well-documented CTFd platform. We hosted this on 1 Oracle server, with 4 NGINX workers. This Oracle server was set aside for CTFd only as we have seen other CTFs having issues with their CTFd platform having connection issues during the start of CTFs.

Kubernetes

After researching and trying out the different Kubernetes variants, we ended up going with k3s, a lightweight Kubernetes. We also hosted Rancher on it to allow us easy control of the deployments and used Rancher Fleet for Continuous Deployment. It would update the Kubernetes configs when we updated the Github repo and auto-pull images from our self-hosted Docker Registry.

Our k3s Nodes / Servers

Our main server was the Oracle server with 4 vCPU and 24 GB RAM. We had an AWS account that was able to spin up a max of 9 instances. We had set up an EC2 Autoscaller with a k3s node agent template as the image. However, we had issues with the networking during the CTF, so we did not end up using them. You can see in the monitoring section that our challenges Oracle server barely reached 25% load on CPU and 40% RAM.

Challenges

All challenges were also packaged with a CLI tool that JusCodin made. https://github.com/Jus-Codin/CTF-Architect This helped us organize our challenges and get an overview of them in our GitHub repo.

Here is what a CTF-Architect chall.yaml looks like:

challenge:
  author: Daksh
  category: Web
  description: I am interested to purchase stock options in YesButYes Inc. Help me
    find the best CODE to use.
  difficulty: Medium
  discord: dakshthapar
  files: null
  flags:
  - flag: YBN{0h_n0_you_f0und_m3_@2093@!2}
    regex: false
  hints: null
  name: YesButYes Inc.
  requirements: null
services:
  YesButYes Inc:
    name: YesButYes Inc
    path: service/YesButYes Inc
    port: '80'

We also used a custom Python deploy script for uploading the challenges onto CTFd with the information made by CTF-Architect.

Challenge Containerisation

For each hosted challenge, a Dockerfile was provided by challenge authors. To reduce the workload on our servers, our challenges used Alpine Linux for the container OS as much as possible, with static sites being hosted with NGINX on Alpine (nginx:alpine). However, this meant we had to target musl libc instead of glibc. This was not a problem for challenges running on higher-level languages (e.g. Python, JS), but for pwn challenges (i.e. Flag Shop and Locked Out), they had to be compiled with the musl libc compiler, cross-compiled for ARM64.

Challenge Deployments

Each challenge was deployed separately, with a minimum of 2 pods throughout the CTF. All our web challenges had their own subdomain, while our nc challenges were on nc.yes-but-no.org with their own port. We had set up Horizontal Pod Autoscalers (HPA) to autoscale the number of pods depending on load. All nc challenges had their own Traefik load balancer behind the main Traefik node balancer for the k3s cluster.

x86_64 Dependent Challenges

Some challenges in the Pwn category - Duckcraft Story Mode Alpha and Tongue Twister, required x86_64 machines to run on, and exploits would not work on ARM64.

Hence, we launched x86_64 challenge servers (t2.small) on AWS for those specific challenges, running alongside our main challenge infrastructure.

ARM Architecture

Our entire infrastructure was mostly hosted on 64-bit ARM machines, across Oracle Cloud and AWS. While this was due to our unique resource constraints available to us, it was also interesting deploying infrastructure on such a large scale on ARM64 machines, especially for CTFs where x86_64 would be more conventional. For servers on Linux, ARM already has support for most software (Docker, CTFd, Kubernetes), so we were able to operate without facing many architectural issues. (there were still a few, but they were resolved quickly)

AWS

We also hosted our infrastructure on Amazon Web Services. Our server deployment was built for auto-scaling based on load, so when load increases, it would scale out horizontally, and when load decreases, it would scale back in. Our infrastructure on AWS consisted of EC2 instances on a VPC in Oregon, USA.

AWS VPC

We first created the VPC and subnets for our deployment, consisting of the relevant subnets across 2 Availability Zones. The VPC itself was given a private IP range, and the each of subnets used part of the IP range.

An example of a subnet, which had automatic IPv4 assignment:

EC2 Instances & AMIs

We used t4g.medium EC2 instances as part of the challenge infrastructure, as it also has an ARM CPU (AWS Graviton2), hence ensuring architectural compatibility with the rest of our machines in Oracle Cloud.

The instances were loaded with Debian 12, after which k3s would be installed, and connected to the Control Plane (via the public Internet). An Amazon Machine Image (AMI) was made for the challenge instances, which contains a snapshot of the instance for replication.

EC2 Launch Templates & Auto Scaling Groups

We also used auto-scaling for our infrastructure on AWS, which deployed new instances as nodes when CPU utilisation reached a set threshold. This enables us to prevent over-provisioning for the CTF, yet ensure that the CTF would be able to continue even on a high load, horizontally scaled across multiple machines.

For autoscaling, we used EC2 Launch Templates to specify how the instances should be launched. This includes instance size (t4g.medium), storage, and other parameters of the instance.

The launch template was used for an Auto Scaling Group in AWS, set to deploy to the availability zones us-west-2a and 2d.

A target tracking policy was created to monitor the instances (with CloudWatch enabled) for CPU usage.

Additionally, we set activity notifications on the Auto Scaling Group to notify us of any changes in the deployment via email. This helps us to be aware of any instances launched/terminated and if necessary, take action.

Monitoring

We had set up a simple monitoring dashboard with Grafana and Prometheus. This monitored our CTFd server and main challenge server only. We can see that there wasn't actually a very high spike when the CTF started at noon.

Emailing Credentials

Since most challenges were being approved a week before the CTF, and Infra team attempting to stabilise and deploy the challenges. We were only ready to email credentials a day before the CTF. We used Zoho mail at first, but were limited to 50 emails a day. We then switched to Sendgrid with 100 emails a day. Ngee Ann Polytechnic had also blocked our emails so those participants had to open support tickets.

Cloudflare

Everything was proxied through Cloudflare for DDoS protection. We had set an IP rate limit of 60 requests per 10 seconds to prevent brute force attacks.

PreviousYes But No's inaugural CTF!NextYBN CTF 2024 - Hosting The Infrastructure

Last updated 25 days ago