The server that was awesome – at one hour intervals

FrustrationThe DNS should have propagated for all of you now which means you can all see Driver Database on the new server. I got it working last night and it was blindingly quick! Googlebot (the spider Google uses to index the webpages) and Yahoo crawler all went crazy with happiness because now they could index the site a lot quicker. Googlebot was downloading a site every 0.15 seconds and the server wasn’t even breaking a sweat. Happy happy joy joy!

But then it just broke down. All of a sudden I couldn’t connect. I could log in to the system and saw that it wasn’t running at 100% like I thought it would be, it was idle. 0% of the CPU’s were occupied and it just sat there twiddling its virtual thumbs with nothing to do. Odd, to say the least. I rebooted the machine and it was back like it should like nothing ever happened. I started looking at the system logs and started to find worrying things. Both the httpd (web server) and the mysql (database server) were being blocked according to the kernel. I started googling, looking for an answer only to find that it was A) A bug in the kernel, or B) a hard drive going corrupt. I wasn’t satisfied with any of these answers.

After an hour or so the server went back again. Same procedure as last hour. Rebooted, pulled my hair and cursed a bit.

Then I suddenly noticed that I was getting strange PHP errors on the site suggesting to me that I had forgot to import one of the tables. I found it odd as I was pretty sure it had worked before. I had a look in the database only to find that the table responsible for adding pageviews to the profile pages (for the popularity and most buzz lists) was corrupt. I repaired it and the errors went away. After an hour the server went down again.

I rebooted, looked in the database and saw that the very same table was corrupt again. I had a look in the error log from the database server and found out that some 10 minutes before the server stops responding that table goes corrupt. The server then tries to write to it a few times for each viewed page on the site (which at Googlebot’s and Yahoo’s rate were more than ten pages per second) spitting out error messages all over the logs. And then the shit goes down. A coincidence? I was hoping it wasn’t.

I removed the code that calls that table and decided to head for bed (it was 4 in the morning by then).

It looks like the server’s been running for over five hours now. I’m hoping the problem’s fixed now, eventhough I won’t be betting on it.

The small price to pay to keep it running might have to be that the most buzz and DriverDB popularity will be a thing of the past. Worse things have happened.

Leave a Reply