Skip to main content

Tales of the systems administrator newbie - Part 1 - SSH / Networking

Today I learned a lesson the hard way: never stop networking when your only access to a server is via ssh.

What happened

One of the websites we were hosting stopped responding on its dedicated IP address. As we began to investigate, we found nothing wrong with the server load, nothing wrong with the apache configuration. We were using aegir, and re-verified the site and everything was successful. Next, we typed the dedicated IP into the browser and the browser just spun. When we browsed to another dedicated IP on the same server, we received a page not found returned by Apache. This was expected, but why was one of our IP addresses not working?

We checked the /etc/network/interfaces file and everything seemed to be normal. We tried restarting networking, and all IP addresses responded except one? We thought the problem was with this one IP address and thought we could stop networking to kill any processes that might have been stuck. So we ran /etc/init.d/networking stop.....

I knew as soon as I hit enter that something was wrong, SSH seemed to have stopped and then it dawned on me, "I just f*cking locked myself out of the server!". SSH was down, clearly because I stopped networking! All websites were down instead of just the one problem child. Naturally, it was after hours, so our data center administrators were unaccessible and we were unable to reboot the server at the console level.

Lesson learned:

  • Never stop networking while using SSH
  • Never stop networking without access to console or a way to reboot the VM

How it should have been handled

After a bit of research I came across the following posts:

It seems I should have ran both the stop and start commands on one line: /etc/init.d/networking stop; /etc/init.d/networking start OR I should have started a screen session, and ran the commands inside of the screen.