Hi,

I'm Rafael Copstein

Computer Science masters student based in Halifax, NS - Canada. Passionate about software architecture, clean architecture enthusiast and fan of DevOps. Wannabe minimalist.

29 May, 2020
keyboard_arrow_right

Retiring Machines on HackTheBox.eu

Lately I have been looking for a new project in order to build my programming portfolio. You know, the kind of small project that we complete in a weekend in order to show a particular skill or use of a certain programming language. My problem is that I want to show skills related to software architecture which usually shine in bigger and more complex projects.

That's why I decided to try something different. I want to develop a "weekend project" while also applying many of the architecture concepts that I know and want to practice more. I hope to write this post as I am developing so I don't miss interesting thoughts and/or questionings that come up while developing.

A close friend of mine, who goes by the name of r0kit, is very into penetration testing and introduced me to a website called HackTheBox.eu. HackTheBox provides access to machines designed to be hacked into. If you manage to complete the challenges you are awarded a number of points and will climb the ranks.

In order to keep itself up-to-date and competitive, HackTheBox has a concept of retiring machines, that is, machines that were once very relevant to the field of penetration testing but are now obsolete and no longer present an interesting challenge. When a machine gets retired, all the points acquired with that machine are removed from the users, essentially re-balancing the ranks.

r0kit has a youtube channel where he goes over retired machines on HackTheBox. You can check his channel out here. (Shameless promotion of friend)

The whole point of talking about HackTheBox and retiring machines is to give some context on an idea that r0kit gave me a couple of days ago: what if there was a program that would alert users when machines were about get retired? One of the things to keep in mind is that HackTheBox does not, at the time of writing, offer an API.

I decided to tackle this project because I can see some interesting points to be explored:

  • Requires non standard methods for data retrieval
  • Requires processing of this data (is not limited to presenting retrieved data)
  • Makes sense as a user application or as a managed service

This set of characteristics caught my attention because I can see this growing into a well-structured project whereas the "hacky" way of writing it as a giant Python script (we have all been there) can quickly get overwhelming and out of control.

With the whole contextualization out of the way, its time to have a look a the problem a little more closely.

Problem Exploration

I think all programs start with understanding the problem. To be fair, I am not an active user of HackTheBox so maybe I do not feel the struggles of a more active user but, as in many professional scenarios, you are solving other people's problems, not yours (with some exceptions, as always).

When I spoke to r0kit he walked me through the website and showed me some of its pages. The first (interesting) one is the machines page:

List of machines on HackTheBox.eu
List of machines on HackTheBox.eu

The second (more interesting) page is the unreleased machines page, also known as the page that shows which machines are being retired and when:

Unreleased machines on HackTheBox.eu
Unreleased machines on HackTheBox.eu

These are likely the pages where we will be able to extract most of the information we will be interested in. Now that we know what kind of information we have at our disposal, we can start defining the project.

Project Definition

In a nutshell, our problem is to identify retiring machines and we know where to get that information so... let's get coding? Well, not yet. As I said before I want to be able to apply the concepts of architecture that I am interested in and, before coding, we must define some things.

The first thing to define is: what will our program do exactly? What is the output of a successful execution?

In this case, I want the program to (somehow) list all retiring machines and order them by date. In other words, I want to display exactly what the website already does. That may sound useless, but its a good place to start. If we can pull this off we know that we can successfully retrieve, parse and process the data that we are interested in.

To keep things simple, my output will be text-based (no fancy UI for now). Right now, I see no need for user input (other than executing the program, obviously).

Domain

One of the aspects of software development that is often overlooked is the proper definition of a domain. We are all guilty (myself included) of assuming we know our domain very well only to later find out that we missed important aspects of it.

It can get quite tricky to properly define a domain. How much detail you will include depends a lot on what your program (or programs!) do. For example, while both a supermarket and a bakery will likely have information about food and food-related products, it is less likely that the bakery will keep track of the brands of each of these items.

In our problem, a machine is clearly part of our domain. But we are only interested in some of its attributes:

  • "Name" is definitely required to identify it
  • "Retirement Date" is the whole point of the application
  • "Replacement" is the machine that will replace this one. Not required, but cool to have

This seems pretty reasonable and, honestly, pretty basic. But while I was writing the above snippet I got myself thinking "Is Replacement Machine just a name or is it another machine altogether?".

The truth is: I don't know.

For the sake of our current project definition there really is no need to have that attribute, but just by doing so (for fun, really) we sparked a new discussion and a second thought about our domain. On that line: does Retirement Date really belong to a machine? To be fair, maybe the entire concept of a "Retirement" needs to be its own thing. With that, we can treat machines, both retiring and replacing, as the same object. I like this.

There are many other questions about the domain that can come up and I believe this is the time to admit that I do not have enough knowledge of it to do much better. This is one of those moments where your experience as architect comes into play. With more experience, the more likely you are to get the domain right sooner. I have little practical experience with software architecture, so expect this domain to change at least a couple more times.

For now, I'll settle for this:

The first version of our domain
The first version of our domain

Up Next

I think this post summarizes well what I will be doing with this project. Even though I hoped to write it all in one post, I believe it is more reasonable to split it into multiple parts. So, here is part one.

I hope you learned something that will be useful in your next projects.

See you in part two.