TL;DR: Read logs, delete useless shit, keep the DB clean.
* Warning: A long read! *
Oh well, I feel like a n00b while typing this but that’s how things roll. I am not sure if I have any regular readers left, as to have regular readers one needs to be a regular author, which I am clearly not. (Yes, I wish to change.)
Still, if you go back and stalk this blog, you will notice that post frequency for 2018 has been abysmal at it’s best. But yes, there is a reason an interesting story behind the same. A story which might teach you a thing or two about hosting WordPress on your own hosting. It has certainly taught me much.
So, it all started back in January when I started noticing poor loading performance of the site. Being a self-shared hosted instance, I chalked it up to poor bandwidth or bad optimisation by Team GoDaddy (not much positive about them). Soon I started getting 503 error randomly on the home page, so it was time to investigate stuff…
I logged into GoDaddy account and went into my hosting status page. Immediately I was greeted with an orange banner(link here) stating I am reaching the resource limit and I need to upgrade my hosting plan soon to keep up the smooth operations. I scoffed mildly to their marketing tactics and opened the boot to look under the hood.
I opened the CPanel and took a glance over at the system panel on the left. To my amazement, almost all of the parameters were either terminal red or warning orange. I looked up the labels to understand the meaning of these indicators.
Well, clearly, I was a bit surprised as I have an experience of running WordPress since 2006-ish and I have had run pretty complexly themed blogs on my potato local computer (2006 PC, yeah!) using XAMMP on Windows.
If you are a backend guy and read this line above now (in 2018), you will probably cringe harder than you do on Nicki Minaz songs. Everything about that line is WRONG (2006, PC, XAMMP).
Anyway, I had a fond memory of WordPress stack being super efficient and decent at handling a mere 100+ blog posts with ~15 plugins. Especially when I was not even posting posts regularly and traffic was on a decline.
Something was wrong here.
I called up GoDaddy
tech sales support and patiently explained my problem to him only to get his sales pitch – “Sar, I can see the upgrade banner in your account, so can you. Please give money, we give you moar resourcez. Oaky?“. Hmm, probably not that brash but you get the gist. I (mildly irritated) requested him to escalate my call to his supervisor or somebody from *real* tech support.
Well, they kinda did. A girl (I am NOT a sexist) picked up the call and I swear to the odin that she was not able to understand anything about wp-config and the 503 error and asked me if I have cleared my browser cache. I politely asked her to transfer the call to her supervisor.
This time a rather mature sounding guy picked up the call and ask my problem. Folks, I was already 3 levels deep and 20 minutes on the call. I still explained to him my problem. He opened his admin console, got to my box and disabled all plugins (important) and my custom theme.
The site seemed to breathe for a while and we were able to access the same. He told me plainly that this is a classic case of resource overutilization and I need to upgrade my hosting from the basic starter plan to at least their Delux combo something plan. I made the rookie’s mistake of asking the cost for the same as he immediately said he will just transfer my call to his sales representative. *facepalm* I held up the connection before they could plug me another level deep.
I deep down knew that I need to investigate this myself before throwing moolah on the table.
Lazy boi excuses; Part – I
This was February 2018. I kept my weekend free and planned to drill down into my GoDaddy shared hosting server to find the resource problem. I was sure about some bug leaking memory or some infinite loop sucking out my CPU burst cycles. I planned to replicate the setup on AWS t2.micro free instance and made an account on the same. It does require a credit card on the file before letting to fire up ec2 instances. My AMEX CC had some problem as it debits the verification money but still said pending for 48 hours. Fair enough, I thought I will start in 2 days…
But suddenly (a software engineer way of shedding happy tears!), I got a huge project to work on from scratch at my last job @ Shuttl. (Yeah, I have switched career, yet again). The project name rhymed with XMS. I was pretty excited to build a Python Django project from scratch along with my 2 talented senior teammates. I was happy that I will get to learn a ton and will deploy a whole project live AND……both of my 2 talented senior teammates left before even the completion of the first milestone of the project. Yep, just left. And I was struck with a lot of legacy code to work on, with a very little idea about the framework. I had a good experience with Flask framework but Django had some things implemented differently.
I slogged at work and the good part was that I was able to understand most of the code and got pretty good at Django, implemented a ton of APIs and built a basic dashboard UI. Anyway, that sucked next 2 months of my life and I completely forgot about this blog, the 503 issue and meanwhile it kept getting worse as it started opening occasionally and kept throwing 503 errors for the most part.
Lazy boi excuses; Part – II
Nah, let’s move ahead. I have shared too much personal stuff anyway. 😛
Let’s start fresh? Scrap all shit.
It was around May 2018 and I got some interns and a junior to help me with new projects that our product team was pumping out PRDs out at an amazing rate. I was working non-stop on same but still a window of personal time opened up. Meanwhile, we migrated our code repositories to Github from Gitlab and I got to know about the concept of gh-pages.
Github pages – A neat nifty project by Github which allows you to host stuff from your repo as simple websites or blogs. For free!
This sounded like a sweet chime to my ears as I was tired of the non-existent support from GoDaddy and their incompetent tech team (free-tier at least). I started formulating a plan to nuke bitsnapper altogether and start from scratch and make a simple Martin Fowler-esque blog.
So, I created a simple site blog on jatinkrmalik.github.io and even posted some posts (maybe 1). But due to lack of much formatting options and ability to customise stuff, was a bummer.
I lost interest in Github pages faster than America did in Trump.
AND soon I resigned from Shuttl and left in July due to [redacted] reasons.
A new beginning, AWS way?!
In late July, I joined a very early stage startup called Synaptic.io after what felt like a swyamwar of offer letters. (okay, no bragging). I was impressed by the product and size of the team which you could count on one hand. It felt lucrative to get into core team, build something great and have a chance to witness growth from the inside.
Anyway, Synaptic being a data-heavy company, we use a lot of third party services and tools for automated deployment to staging, prod etc. Naturally, AWS is the backbone of our deployment infra. I got a new AWS account both for staging and prod, so I started reading about the same and got to know about Bitnami WordPress AMI which comes preloaded with the WordPress stack goodies and one can deploy with a click. It was time to reactivate my AWS account and fire this up.
A few weeks ago.
At the start of August 2018, I was finally able to authenticate my AWS account by punching a new credit card. I fired up a bitnami WordPress instance and did a setup for a standard WordPress installation. Now all I had to do was just back up stuff from GoDaddy servers and restore here.
Sounds easy right?
I logged into my good old CPanel, got the FTP creds, loaded FileZilla and started the transfer. The ETA was in the north of double-digit hours as the website’s public_html folder was somewhere around 1.5 GB which is understandable as I have a lot of media files and videos. Fair enough. But this again was going to take a lot of time as the problem with transferring a folder is that each micro-size file (<100 kb) takes mere milliseconds to download but takes it’s own sweet time to write on the local disk when downloading from the internet. The obvious way was to pack the public_html folder into a zip file and then transfer.
I did an SSH into the box and ran the command
zip -rv public_html.zip public_html/ to zip the directory, but one thing which I forgot was that even while zipping a directory, I will hit the same problem of the zip program manually iterating over all the files (including microsized one) and will take a lot of time to try to compress each one. I left it for 20 minutes only to find it only 10% through my all files. Improvement? Sure but I am not a very patient man.
I looked into the log (thanks to -v…verbose), and found out that I had a lot of files in my
public_html folder in my xcloner plugin directory due to some failed attempts to take site backup from a plugin. I found more such folders of some plugins which have no active role in powering up this blog.
So, I deleted those folders in
public_html/wp-content/plugins and tried running the
zip command again. It was still slow and I gave up in a few minutes.
I google about wrapping files in a zip without compressing much and got to learn about levels of compression in zip utility which goes from 1-9 with 1 being least compression and 9 being the highest degree of compression while it defaults to 6. So, I tried again this time with but
zip -1rv public_html.zip public_html/ soon realized the iteration over gazzilion files take more time than compression logic for the CPU.
I read more and found out that creating a tarball w/o compression is faster than zip utility, so it was time to try that and maybe let it complete in its own sweet time. So, I fired up the command:
tar -caf public.tar public_html and left it running.
Not sure if it ever completed…
Then I logged into phpMyAdmin (a web app to manage MySQL instance) to take a backup of my bitsnapper WordPress DB. I simply clicked on export and the downloaded file was of size 48 MBs which was odd as in UI it was showing a DB size of 1.2 GB. I knew SQL backup can compress some data but of this magnitude? WTF. I opened the SQL file in VS code and obviously, the file was incomplete and had some HTML gibberish at the end which on inspection was the HTML for phpMyAdmin. Weird?
I tried exporting the DB once again from the UI and this time the size of the
backup.sql file was 256 MBs. I felt this was appropriate but my instinct did a right click and opened in my editor once again. Surely enough the file was still incomplete with that gibberish. Fair to say, the backup from phpMyAdmin was corrupted.
I did an ssh into my hosting box using the creds in my GoDaddy account and tried everything from checking the output of system commands like:
top, ps -ef, free but the box is nicely sandboxed by GoDaddy to avoid any unauthorised access. I even tried to do a privilege escalation with intention of gaining more control over my hosting account and maybe restart
mysqld but all in vain.
I knew about taking direct DB backups from the shell using
mysqldump -h<host> -u<user> -p<password> <database_name> > db_backup.sql so it was time to try that. I ran the command and tailed the backup SQL file with
tail -f db_backup.sql to look into its content as it populated. It started exporting DB nicely and as I started feeling badass and went to grab a cup of coffee, the terminal presented me with the error message:
I googled about the problem and it had something to do with the max_alllowed_packet variable of MySQL. The only two ways to change that was either my editing
/etc/my.cnf file (which I was sure I didn’t have
sudo access to) or run
SET GLOBAL max_allowed_packet=1073741824; query in the MySQL console.
Yeah, both of them didn’t work. Obviously. You need respective system admin user access for both.
The roadblock was getting stupidly irritating, and I had to get the backup. I googled more and someone suggested to pass the max_allowed_packet variable with the
mysqldump command as.
--max-allowed-packet=1073741824 Tried that too, didn’t work.
I was tired and had to sleep, so I terminated my ec2 instance and slept.
Today I was feeling motivated and planned to look into the problem from another angle.
Instead of using the WordPress AMI, I decided to create the whole setup from scratch. I launched an instance of ec2 with Amazon Linux AMI. The aim was to understand if this is really GoDaddy messing with me or is it some fault in my database which is leading to the whole shebang.
I used this post as guidance to set up everything from grounds up.
I logged in again to my GoDaddy account to be greeted by the orange banner urging me to upgrade. I felt weak and was just about to click upgrade and throw some dough to get the easy way out. But no, that’s against the hacker mentality I work with.
So, I opened the CPanel, phpMyAdmin and tried taking a backup again. It again downloaded a 250-something MB file with gibberish at the end. I manually removed the last part of the file and uploaded it to my ec2 instance via
scp and imported it into my remote MySQL instance.
After uploading the public_html files, importing SQL backup and configuring wp-config.php file with DB host and creds, I restarted both httpd (Apache server) and MySQL (DB server) and opened http://ec2-instance-url:80 and to my partial euphoria, it did load up my header and footer for bitsnapper but no posts were visible.
Hmm… something was missing.
I looked into the tables on phpMyAdmin and my MySQL server on the ec2 instance and duh, my wp_xxx_posts table and wp_xxx_postsmeta was missing. Yeah! So, the problem was that my DB size has such large that Godaddy shared hosting limited bandwidth was not allowing me to take a backup of the whole DB. Clearly, I had to fix this.
I wrote a custom python script to take a backup of the bitsnapper DB table-by-table to avoid hitting the max_allowed_packet limit as observed last week but the same error
mysqldump: Error 2013: Lost connection to MySQL server during query when dumping table wp_xxx_postmeta kept popping up.
I started my intense googling session once again and queried along the line of ‘how to backup DB from GoDaddy shared hosting‘ and ‘GoDaddy + shared hosting + mysqldump + error 2013‘ and somehow by a fluke, I landed on the Backup section of my ….drumroll?….. CPanel! *facepalm*
It had everything I was trying to do above with a flick of a click. I stopped all my previous efforts and download the full website backup which had both public_html folder and DB backup SQL. The whole archive was still 2.5 GB which was huge for a small blog like this.
Anyway, I did a
scp to my ec2 instance and tried to replace the files in
public_html/ and restored mysql backup via
mysql -u<user> -p<password> <database_name> < db_backup.sql and this time it worked without an error. I restarted my mysql and apache http server by
service mysql restart; serivce httpd restart; and tried to load
http://<ec2-public-ip-addr>:80 and presto! the whole site loads up.
The euphoria this time also lasted for a short burst as within 2-3 reloads the replicated site again started throwing up the same 503 error and my shell session chocked up.
I fired up a new terminal and tried SSHing into the box but the box just became unresponsive. I went to the AWS admin console to check my ec2 instance monitoring for the machine parameters and observed a similar pattern as with the CPanel (left panel) parameter console. It was all red and orange once again.
Clearly, GoDaddy’s hosting wasn’t the only culprit.
Real IT support == Self-troubleshooting!
It was then when I decided to shed my fear of peeping into the DB tables as it was a gone cause anyway and I still had a partial backup from last year when I migrated the blog from WordPress managed hosting to conventional shared hosting box.
Boys, it was time to run some queries. The first thing I did was to login to the phpMyAdmin and look up the tables, schema and properties. I was assured that the problem is with the DB size and that’s the reason for slow queries which is chocking up the CPU burst time.
I looked into tables and found table wp_xxx_postmeta to be around 950 MB in size with just 11000 records. This immediately fired up alarms in my head as I have worked with multi-million row DBs during my stint in Adobe & Shuttl and the table size was mostly in the range of few MBs only. A rough back of the note calculation stated a median size of 100 kb per record in this table which was weird because when I looked up the schema for the same, it just was storing 4 records i.e.
meta_id, post_id, meta_key, meta_value.
It was time to prod this table and understand the data inside it. I fired up a simple query:
SELECT meta_key, count(*) FROM 'wp_s4w671g0kp_postmeta' GROUP by meta_key order by count(*) desc;
Lo behold, the result was a bit surprising as till now I was thinking that this table might contain post revisions or metadata only but the query result was something like this:
Do you see it? There are some 1800+ records for _view_ip_list, _total_views, _jetpack_related_posts_cache which is basically nothing but records originated from WordPress own homegrown popular plugin – Jetpack. I googled a bit about safety delete for these records, didn’t find anything, took a leap of faith and executed:
Delete from FROM 'wp_xxx_postmeta' WHERE meta_key = '_view_ip_list';
Delete from FROM 'wp_xxx_postmeta' WHERE meta_key = '_total_views';
Delete from FROM 'wp_xxx_postmeta' WHERE meta_key = '_jetpack_related_posts_cache';
It deleted some 4,000 records out of 11,000 records it had and look what happened when I refreshed phpMyAdmin?
Yus! My wp_xxx_postmeta table size dropped from 900-something MBs to 6.3 MBs by just deleting ~4,000 records. What sick joke is that? My total DB size dropped to 25 MBs from ~1.2 GBs, probably due to the cascade effect of foreign key constraints of the records I deleted.
My website was a breeze once again. The load time went down considerably, probably because of faster DB queries and even the system monitoring parameters on the CPanel went down from Red/Orange to Green. I did some load testing by executing multiple curl requests to my home page via terminal, and the server was not breaking
Look at that RAM usage! WHAT? Remember when I talked about running WordPress stack on my 2006 PC with some 128 MB of memory? Yeah!
- Troubleshooting Godaddy is a long-term thing. You can either get into the shit and fix it yourself or you can start throwing cash on the screen until you escalate to their core tech team which I guess is probably never for shared hosting plans. They might have a great support for dedicated servers though.
- AWS is a f##king amazing piece of tech. If you know how to harden servers, by all means, just drop these legacy hosting providers and go for your own setup. It’s probably cheaper, faster and more VFM. A simple t2.micro instance will cost < ₹700/month. (Maybe more, sigh world economy!).
- Sometimes, being a smartass is not good. Most of the times, however, it keeps you safe.
- Always check logs. -_-