DreamHost Status » Blog Archive » The Network is Still Bad
Aside from early indications that the router hardware swap was successful last night it seems today that things are depressingly close to the way they were before.While core01 is running on all new hardware (except power cables and power supply) it was not upgraded. Unfortunately while configuring the new chassis with the Sup720 supervisor and improved blades and extra “high speed” fan tray, we ran into a problem.
The router complained that the power supplies we have (1300W, 27amp) are not “compatible” with the new fan tray. This isn’t to say that they can’t power it, just that somewhere a version or model number is incompatible.
We already knew that we’d need 2500W to power the new supervisor and fan tray, and we’d spoken to cisco techs about running our two 1300W power supplies in cumulative mode for now which basically makes them output 2600W worth of power. They said that should work fine. We spoke to our Cisco network engineer yesterday and he said it should work fine. Cisco’s documentation clearly says:
With insertion of equal-wattage power supplies in cumulative mode:
The system power is the combined power capability of both supplies.So, it definitely seems that it’s a false limitation somewhere that would be solved by getting two 2500W powersupplies. These are already on their way and scheduled to arrive tomorrow.
The upshot of this is we’ll be having another network maintenance window tomorrow, September 13th, at 10pm PDT.
(Just to get everybody on the same page.. essentially ALL performance and stability problems experienced over the last months have been due to these internal network problems. High web and mail server loads are due to the lag they’re getting when NFS mounting the file system from our file servers. This causes slow/flakey performance across the board. It’s actually not a matter of overloaded servers or anything like that, it’s all this very weird network problem that has had us and Cisco stumped. We’ve now completely replaced all the related physical networking gear and it’s still happening. Cisco support and network consultants have been unable to find any problems with our configuration or hardware and have no explanation as to why things are still behaving like they are. So we’re now upgrading the core routers to be 8 times more powerful.)
Also, we’ve decided to start allowing comments on this blog.. remember the note at the top though: Posting in the comments here WILL NOT help you get your problem resolved faster.
We hope it will at least help foster some better communication related to events we post here.
OK, I’ll give them this, they are upfront about their problems. And so far they have been making good at getting things back up and trying to keep it to a minimum. But he slow ass crawl is back and hopefully will be fixed soon. That said, GAH!
PS – Now to add to that, I’ve been getting Server 500′s (which I assume you are too) as the server starts puking over the internal network load.
Tags: Interesting, Site Update




