# YBN CTF 2023 - Hosting The Infrastructure

## Introduction

YesButNo CTF 2023 was the first CTF hosted by [Yes But No](https://ctftime.org/team/217796) for 24 hours from the 18th to the 19th of November.

Here is the final Top 10 scoreboard.

<figure><img src="/files/3ON74rYK0RCiZwQ68myS" alt=""><figcaption></figcaption></figure>

## Our Infrastructure

Starting in September, we looked at various blogs about others who have hosted CTFs.\
[SEETF's blog](https://infosec.zeyu2001.com/2022/hosting-a-ctf-seetf-2022-organizational-and-infrastructure-review) gave us a good look under the hood of a CTF infrastructure.\
We had experimented with Docker Swarm, Kubernetes, k3s and more.\
Since we had absolutely **no budget** for our CTF, we had to pool together free plans from various services to host our infrastructure. Mainly Oracle Free Tier & AWS Academy accounts.

### CTFd

Setting up CTFd was a simple process with the well-documented CTFd platform. We hosted this on 1 Oracle server, with 4 NGINX workers. This Oracle server was set aside for CTFd only as we have seen other CTFs having issues with their CTFd platform having connection issues during the start of CTFs.

### Kubernetes

After researching and trying out the different Kubernetes variants, we ended up going with k3s, a lightweight Kubernetes. We also hosted Rancher on it to allow us easy control of the deployments and used Rancher Fleet for Continuous Deployment. It would update the Kubernetes configs when we updated the Github repo and auto-pull images from our self-hosted Docker Registry.

### **Our k3s Nodes / Servers**

Our main server was the Oracle server with 4 vCPU and 24 GB RAM.\
\
We had an AWS account that was able to spin up a max of 9 instances. We had set up an EC2 Autoscaller with a k3s node agent template as the image.\
However, we had issues with the networking during the CTF, so we did not end up using them. You can see in the monitoring section that our challenges Oracle server barely reached 25% load on CPU and 40% RAM.

<figure><img src="/files/pVMDOlITt8WKn7b6A8iH" alt=""><figcaption></figcaption></figure>

### Challenges

All challenges were also packaged with a CLI tool that JusCodin made.\
<https://github.com/Jus-Codin/CTF-Architect>\
This helped us organize our challenges and get an overview of them in our GitHub repo.![](/files/cvb38nSXkDr5fYk3485U)

Here is what a CTF-Architect chall.yaml looks like:

```yaml
challenge:
  author: Daksh
  category: Web
  description: I am interested to purchase stock options in YesButYes Inc. Help me
    find the best CODE to use.
  difficulty: Medium
  discord: dakshthapar
  files: null
  flags:
  - flag: YBN{0h_n0_you_f0und_m3_@2093@!2}
    regex: false
  hints: null
  name: YesButYes Inc.
  requirements: null
services:
  YesButYes Inc:
    name: YesButYes Inc
    path: service/YesButYes Inc
    port: '80'
```

We also used a custom Python deploy script for uploading the challenges onto CTFd with the information made by CTF-Architect.

### Challenge Containerisation

For each hosted challenge, a `Dockerfile` was provided by challenge authors. To reduce the workload on our servers, our challenges used [Alpine Linux](https://hub.docker.com/_/alpine) for the container OS as much as possible, with static sites being hosted with NGINX on Alpine (`nginx:alpine`). However, this meant we had to target `musl` libc instead of `glibc`. This was not a problem for challenges running on higher-level languages (e.g. Python, JS), but for pwn challenges (i.e. Flag Shop and Locked Out), they had to be compiled with the [musl libc compiler](https://musl.cc/), cross-compiled for ARM64.

### **Challenge Deployments**

Each challenge was deployed separately, with a minimum of 2 pods throughout the CTF.\
All our web challenges had their own subdomain, while our nc challenges were on nc.yes-but-no.org with their own port.\
We had set up Horizontal Pod Autoscalers (HPA) to autoscale the number of pods depending on load.\
\
All nc challenges had their own Traefik load balancer behind the main Traefik node balancer for the k3s cluster.

<figure><img src="/files/hP3iUxTLeTGhne0n5Omf" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/LPQn1pIXMgV5wSGtHg3z" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/szBc76RpGu5EeMkeqwDK" alt=""><figcaption></figcaption></figure>

### x86\_64 Dependent Challenges

Some challenges in the Pwn category - Duckcraft Story Mode Alpha and Tongue Twister, required x86\_64 machines to run on, and exploits would not work on ARM64.

Hence, we launched x86\_64 challenge servers (t2.small) on AWS for those specific challenges, running alongside our main challenge infrastructure.

<figure><img src="/files/XNsV1I1PlXY4i0tG6tYT" alt=""><figcaption></figcaption></figure>

### ARM Architecture

Our entire infrastructure was mostly hosted on 64-bit ARM machines, across Oracle Cloud and AWS. While this was due to our unique resource constraints available to us, it was also interesting deploying infrastructure on such a large scale on ARM64 machines, especially for CTFs where x86\_64 would be more conventional. For servers on Linux, ARM already has support for most software (Docker, CTFd, Kubernetes), so we were able to operate without facing many architectural issues. (there were still a few, but they were resolved quickly)

### AWS

We also hosted our infrastructure on Amazon Web Services. Our server deployment was built for auto-scaling based on load, so when load increases, it would scale out horizontally, and when load decreases, it would scale back in.\
\
Our infrastructure on AWS consisted of EC2 instances on a VPC in Oregon, USA.

### AWS VPC

We first created the VPC and subnets for our deployment, consisting of the relevant subnets across 2 Availability Zones. The VPC itself was given a private IP range, and the each of subnets used part of the IP range.

<figure><img src="/files/pOJImwz1k8mzNLvmCRpF" alt=""><figcaption></figcaption></figure>

An example of a subnet, which had automatic IPv4 assignment:

<figure><img src="/files/XX1pcIabWkK2DZed7K5V" alt=""><figcaption></figcaption></figure>

### EC2 Instances & AMIs

We used t4g.medium EC2 instances as part of the challenge infrastructure, as it also has an ARM CPU (AWS Graviton2), hence ensuring architectural compatibility with the rest of our machines in Oracle Cloud.

The instances were loaded with Debian 12, after which k3s would be installed, and connected to the Control Plane (via the public Internet). An Amazon Machine Image (AMI) was made for the challenge instances, which contains a snapshot of the instance for replication.

<figure><img src="/files/XNoa82Wm1XXLtXaySa6f" alt=""><figcaption></figcaption></figure>

<figure><img src="/files/q5w2w6GoHIwdS7T5dXv7" alt=""><figcaption></figcaption></figure>

### EC2 Launch Templates & Auto Scaling Groups

We also used auto-scaling for our infrastructure on AWS, which deployed new instances as nodes when CPU utilisation reached a set threshold. This enables us to prevent over-provisioning for the CTF, yet ensure that the CTF would be able to continue even on a high load, horizontally scaled across multiple machines.

For autoscaling, we used EC2 Launch Templates to specify how the instances should be launched. This includes instance size (t4g.medium), storage, and other parameters of the instance.

<figure><img src="/files/ylisOtuOV5OQwE5GSCVb" alt=""><figcaption></figcaption></figure>

The launch template was used for an Auto Scaling Group in AWS, set to deploy to the availability zones us-west-2a and 2d.

<figure><img src="/files/XKqejPh9Fn2PdPyKnels" alt=""><figcaption></figcaption></figure>

A target tracking policy was created to monitor the instances (with CloudWatch enabled) for CPU usage.

<figure><img src="/files/YstuvDuTIiRESAaS28Hn" alt=""><figcaption></figcaption></figure>

Additionally, we set activity notifications on the Auto Scaling Group to notify us of any changes in the deployment via email. This helps us to be aware of any instances launched/terminated and if necessary, take action.

<figure><img src="/files/ndzQmKPG8uBUYdjOE4Om" alt=""><figcaption></figcaption></figure>

### Monitoring

We had set up a simple monitoring dashboard with Grafana and Prometheus.\
This monitored our CTFd server and main challenge server only.\
We can see that there wasn't actually a very high spike when the CTF started at noon.

<figure><img src="/files/xxrdbJmqlYwMSt6ChQ9S" alt=""><figcaption></figcaption></figure>

### Emailing Credentials

Since most challenges were being approved a week before the CTF, and Infra team attempting to stabilise and deploy the challenges. We were only ready to email credentials a day before the CTF. We used Zoho mail at first, but were limited to 50 emails a day. We then switched to Sendgrid with 100 emails a day. Ngee Ann Polytechnic had also blocked our emails so those participants had to open support tickets.

### Cloudflare

Everything was proxied through Cloudflare for DDoS protection.\
We had set an IP rate limit of 60 requests per 10 seconds to prevent brute force attacks.

<figure><img src="/files/WVt4hQpO4ZQHwrd2AqqK" alt=""><figcaption></figcaption></figure>


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://blog.yes-but-no.org/ybn-ctf-2023/infra.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
