YBN CTF 2023 - Hosting The Infrastructure
The YBN Infra Team's experience in hosting the infrastructure for YBN's first CTF.
Last updated
The YBN Infra Team's experience in hosting the infrastructure for YBN's first CTF.
Last updated
YesButNo CTF 2023 was the first CTF hosted by for 24 hours from the 18th to the 19th of November.
Here is the final Top 10 scoreboard.
Setting up CTFd was a simple process with the well-documented CTFd platform. We hosted this on 1 Oracle server, with 4 NGINX workers. This Oracle server was set aside for CTFd only as we have seen other CTFs having issues with their CTFd platform having connection issues during the start of CTFs.
After researching and trying out the different Kubernetes variants, we ended up going with k3s, a lightweight Kubernetes. We also hosted Rancher on it to allow us easy control of the deployments and used Rancher Fleet for Continuous Deployment. It would update the Kubernetes configs when we updated the Github repo and auto-pull images from our self-hosted Docker Registry.
Our main server was the Oracle server with 4 vCPU and 24 GB RAM. We had an AWS account that was able to spin up a max of 9 instances. We had set up an EC2 Autoscaller with a k3s node agent template as the image. However, we had issues with the networking during the CTF, so we did not end up using them. You can see in the monitoring section that our challenges Oracle server barely reached 25% load on CPU and 40% RAM.
Here is what a CTF-Architect chall.yaml looks like:
We also used a custom Python deploy script for uploading the challenges onto CTFd with the information made by CTF-Architect.
Each challenge was deployed separately, with a minimum of 2 pods throughout the CTF. All our web challenges had their own subdomain, while our nc challenges were on nc.yes-but-no.org with their own port. We had set up Horizontal Pod Autoscalers (HPA) to autoscale the number of pods depending on load. All nc challenges had their own Traefik load balancer behind the main Traefik node balancer for the k3s cluster.
Some challenges in the Pwn category - Duckcraft Story Mode Alpha and Tongue Twister, required x86_64 machines to run on, and exploits would not work on ARM64.
Hence, we launched x86_64 challenge servers (t2.small) on AWS for those specific challenges, running alongside our main challenge infrastructure.
Our entire infrastructure was mostly hosted on 64-bit ARM machines, across Oracle Cloud and AWS. While this was due to our unique resource constraints available to us, it was also interesting deploying infrastructure on such a large scale on ARM64 machines, especially for CTFs where x86_64 would be more conventional. For servers on Linux, ARM already has support for most software (Docker, CTFd, Kubernetes), so we were able to operate without facing many architectural issues. (there were still a few, but they were resolved quickly)
We also hosted our infrastructure on Amazon Web Services. Our server deployment was built for auto-scaling based on load, so when load increases, it would scale out horizontally, and when load decreases, it would scale back in. Our infrastructure on AWS consisted of EC2 instances on a VPC in Oregon, USA.
We first created the VPC and subnets for our deployment, consisting of the relevant subnets across 2 Availability Zones. The VPC itself was given a private IP range, and the each of subnets used part of the IP range.
An example of a subnet, which had automatic IPv4 assignment:
We used t4g.medium EC2 instances as part of the challenge infrastructure, as it also has an ARM CPU (AWS Graviton2), hence ensuring architectural compatibility with the rest of our machines in Oracle Cloud.
The instances were loaded with Debian 12, after which k3s would be installed, and connected to the Control Plane (via the public Internet). An Amazon Machine Image (AMI) was made for the challenge instances, which contains a snapshot of the instance for replication.
We also used auto-scaling for our infrastructure on AWS, which deployed new instances as nodes when CPU utilisation reached a set threshold. This enables us to prevent over-provisioning for the CTF, yet ensure that the CTF would be able to continue even on a high load, horizontally scaled across multiple machines.
For autoscaling, we used EC2 Launch Templates to specify how the instances should be launched. This includes instance size (t4g.medium), storage, and other parameters of the instance.
The launch template was used for an Auto Scaling Group in AWS, set to deploy to the availability zones us-west-2a and 2d.
A target tracking policy was created to monitor the instances (with CloudWatch enabled) for CPU usage.
Additionally, we set activity notifications on the Auto Scaling Group to notify us of any changes in the deployment via email. This helps us to be aware of any instances launched/terminated and if necessary, take action.
We had set up a simple monitoring dashboard with Grafana and Prometheus. This monitored our CTFd server and main challenge server only. We can see that there wasn't actually a very high spike when the CTF started at noon.
Since most challenges were being approved a week before the CTF, and Infra team attempting to stabilise and deploy the challenges. We were only ready to email credentials a day before the CTF. We used Zoho mail at first, but were limited to 50 emails a day. We then switched to Sendgrid with 100 emails a day. Ngee Ann Polytechnic had also blocked our emails so those participants had to open support tickets.
Everything was proxied through Cloudflare for DDoS protection. We had set an IP rate limit of 60 requests per 10 seconds to prevent brute force attacks.
Starting in September, we looked at various blogs about others who have hosted CTFs. gave us a good look under the hood of a CTF infrastructure. We had experimented with Docker Swarm, Kubernetes, k3s and more. Since we had absolutely no budget for our CTF, we had to pool together free plans from various services to host our infrastructure. Mainly Oracle Free Tier & AWS Academy accounts.
All challenges were also packaged with a CLI tool that JusCodin made. This helped us organize our challenges and get an overview of them in our GitHub repo.
For each hosted challenge, a Dockerfile
was provided by challenge authors. To reduce the workload on our servers, our challenges used for the container OS as much as possible, with static sites being hosted with NGINX on Alpine (nginx:alpine
). However, this meant we had to target musl
libc instead of glibc
. This was not a problem for challenges running on higher-level languages (e.g. Python, JS), but for pwn challenges (i.e. Flag Shop and Locked Out), they had to be compiled with the , cross-compiled for ARM64.