My Past Screw-up

Where I share my past screwups
Published on Saturday, 22 July 2023

My screwups of the past

Right, I more or less promised to get to this, so here goes 😉

At my first job out of school, I was in charge of managing the company's server infrastructure, It was a great place to get my hands dirty and to get some experience. The one problem was that I was working with very little supervision, and being very green would naturally lead to some interesting screw-ups.

Two different episodes come to mind:

AWS Problems

We were backing up files to S3 on AWS. It was supposed to be easy. Just insteall the s3cmd utility, write the login command once, and then we put the sync command in a cronjob.

We were having issues though, that the command would stall for hours, and we could not figure out why. So we reached out to the Stack-Overflow community to try to get help. After the usual moderation hell with people trying to close the question for arbitrary reasons, one helpful soul asked mw to share the config file for the s3cmd utility. Now, I didn't even know it had a config file, but I managed to find it and copy it verbatim to the Stack Overflow question. After an hour or so, a comment cam in on the question saying something along the lines of: "Hey buddy, your API key is right there in the question!".

So I panic and edit it out of the question. I get a bit annoyed with the creators of the s3cmd tool. Sure, I should have read the config before posting it, of course, but it seems a bit odd to have secrets in the same file as config. Should have probably been kept in environment variables instead.

Anyway, disaster averted, and I crack on looking for solutions to my problem. Then the end of the month comes along, and it is invoicing time. Being a small company, our AWS bill was usually around a couple hundred dollars, so imagine my face when the boss pops up at my desk and asks: "Why does our Amazon bill say $25,000?" 😮

I had removed the leaked API key from my questions, but didn't think to actually reset the key in the AWS portal too, so it couldn't be exploited. That day, I learned that people have scrapers and bots setup to trawl through Stack Overflow, Github, etc. for keys to exploit. The same day I leaked the key, 12 of the highest tier Virtual Machines had been provisioned on our account, using the API key, to do God-knows-what, which had exploded our bill!

Long story short, Amazon was nice enough to waive the fee entirely, I got my heart-rate down, and I learned of secret management, and that having a documented and transparent approach to dealing with incidents such as leaks is a good thing. I should of course have immediately reset the API key and informed my boss.

Linux CLI speed run

Then there was this other separate occasion, where I manage to bring my heart rate into full panic again.

As part of the job was to manage a host of Linux servers, I had to get comfortable with the Command-Line Interface of the Terminal. We had two physical servers setup to hist a number of virtual machines. One of these physical hosts was filling up on disk space, so I set out to jump around the file system to delete some old log files, so I was doing a combination of:

  • cd <folder>
  • rm *
  • cd ..
  • ↑ (up arrow key, to scroll through recent commands)

You can probably already see where this is going. After clearing out a bunch of folders, I ended up jumping out one level too far, to the folder stat stored the data files of all of the VMs. That is, the folder that stored all VM data... rm *... Poof...

The two physical servers were broken up so that one of them kept a bunch of throw-away servers for testing and building our software products, while the other ran all of our vital infrastructure. User logins, firewall, bastion server, DNS, etc. Now, guess which of the servers I was messing with!

I stopped for a second... Ran ls in the folder. Yup, all gone... I went to check the URL of one of the impacted services... It was still running... "Weird", I thought. I checked each one, and they were all still running.

Luckily, I had the other server to compare with, so I went and looked in the matching folder path on the other server, and what I found was my salvation: The folder held only symlinks to each VM folder; not the actual files. So the only thing I had deleted were symlinks. Easily restored, and I was in the clear once again.

Phew

So those have been my grandest screw-ups so far. I was able to recover from both of them with no lasting damage, but mostly due to luck 🍀

Still looking forward to the day where I run a SQL query with DELETE FROM <table> and forget to add the WHERE <condition>. A classic!

What about your stories? Please feel free to share your stories in the comment sections below, if you dare 🥳



Blog Logo

Hi! thanks for dropping in to pick my brain

I write on this blog for my own benefit. I write because I like to, and because it helps me process topics. It is also my own little home on the web and a place for me to experiment.

Since you're here though, I'd love to hear your thoughts on what you've read here, so please leave a comment below. Also, if you like what you read and want to give a small tip, fell free to:

Buy Me A Coffee