It’s been a few weeks and the upgrades continue for all our servers with some surprises along the way. As of this post we’ve now upgraded 4 servers with 3 of them being cPanel servers as well as one VPS node. Here’s a quick break down of the big change after the upgrades for some of the machines after the upgrades pulled right from our Munin system:
Here’s Saturn after the upgrade:
This is a monthly graph since Saturn was upgraded a while ago. The i/o wait as you can see is almost gone after switching. Thanks to moving from 7.2K SATA to 15K SAS and adding a ton more memory there is less waiting on the slowest component of the system which is the drives. Now Saturn was not lagging or anything before but the new one obviously can handle many more users and it can also handle much larger spikes in traffic or users abusing services for a minute with poorly made scripts.
Yet another huge increase from the old server. There is now is twice as much cache as there was total memory before. This makes a huge difference in terms of performance of applications. It also helps with spikes of processes say a web server restart or a big increase in traffic for one user causing a lot of PHP processes to spawn. Right now we’re have a hard time even coming close to using the 12GB the server has.
With Yoda it went from a Dual Xeon 5430 with 8GB ram and 4x250GB SATA Raid-10 to a Dual Xeon 5520 with 24GB ram and 4x300GB SAS 15K Raid-10. So here’s some of it’s graphs for comparison:
There isn’t a huge difference in CPU. You notice a bit of i/o wait while we’re migrating to the new node then we forgot to fix Munin on the new system so a big white blank for a bit before it was fixed. This graph actually should show up to 1600% but it does not fit. Still the number of CPU’s though it just shows more thanks to hyper threading. It’s still a CPU upgrade going from a 5430 to a 5520 even if it’s the same number of cores.
The old server was not even graphing memory correctly recently so I don’t even have a comparison. I just figured I’d show the amount of cache being done on the VPS node since it’s gone from a 8GB server to 24GB which is three times as much memory as before.
Now Mars is probably the neat one in the upgrade bunch as we have a Dual Xeon 5520 on our Dallas VLAN which I said was not possible. Due to some availability issues SoftLayer ended up racking especially for us a 5520 that would normally not run on our current VLAN but we needed the machine and could not wait for a Dual Xeon 5430 to be available. So here are the graphs for it:
You’ll notice when the graph jumps up the i/o wait goes entirely away. Just like Saturn the new drives plus a lot more memory make so the CPU is basically never waiting for the drives. It was not overloaded before but now it has way more capacity and also can handle a lot more burst or abuse than previously. As far as the CPU usage right now the machine is actually doing it’s backup seed when I grabbed the graph. So it’s using nearly an entire CPU doing that and you’d think it would cause i/o wait but no not with the SAS drives even while copying every single piece of data to our CDP server.
Just like Saturn tons of cache now being used that it’ll be able to handle web server restarts very well and even a big increase in traffic from a user in terms of PHP processes.
So this is just gives you an idea of how these new machines are performing. Skyline which was the first one is basically running almost idle now with all that extra cpu, ram and faster drives. As far as other upgrades coming well Mercury will be done soon Hardware Migration [10/30/2009] – [11/03/2009] then a week after that Jupiter will be done. Although we had plans on doing Jupiter last since it was a newer machine we noticed a misconfiguration in it’s partition setup which could become a problem so we want to get it moved over so we do not even need to address the issue. Then finally Neptune the last cPanel machine will receive it’s upgrade 2 weeks after Jupiter it’s looking like. As far as the VPS nodes we’re still playing that by ear as with those migrations are very quick. It’s also not as popular as our shared hosting so we can take our time on when we actually do that upgrade.
Excellent way of showing the differences between the old and new servers. Fantastic improvements!
1) Can we see graphs for Skyline?
2) Now that 5520s are possible on Dallas VLAN, does that mean future new servers for Dallas can be 5520 or higher?
1) At this point they’re too old to mean anything. These ones were newer so it was easy to see. For skyline we’re talking about 2 months ago.
2) This was a special case the VLAN we have our Dallas servers on is an old one on a FCR at Softlayer who’s racks are not equiped for a third network connection. Basically old servers needed 2 nics (one public, one private). These newer machines require one public and two private as the IPMI device has it’s own port. These racks do not have this so that was the issue. We had a 5520 used as a special request went through their operations manager due to there being no 5400 series cpu’s available.
Once we upgrade the Mercury system will be be using a new VLAN on a newer FCR meaning the racks will support our extra connection. So I suppose technically new servers will be 5520 or higher. But as I said CPU is generally not the limiting factor but memory and the hard drives. We’ll use all the ram up before we max out the CPU’s.
love the visuals tony, what program are you using to generate them? i would be interested in running something like that to see statistics on a vps!
We’re using Munin to generate those graphs. Keep in mind it can be pretty intensive which is why we have a server dedicated to just handling our Munin graphs. All the servers themselves do is send the data to the master which generates all the graphs.