| Author |
Message |
| < Public Announcements ~ Server4 Outage |
|
Posted:
Fri Aug 19, 2005 8:27 pm
|
|
|
|
|
Server4 had a short outage at apporixmately 5.00am CDT, this was caused by a high load on the server. The server required a reboot, and an fsck.
We are still determining the cause of the high load, it is expected that the daily backups had a resource leak and caused the high load. The server normally runs at about 20-35% capacity. |
|
|
|
|
|
 |
|
Posted:
Sat Aug 20, 2005 4:01 pm
|
|
|
|
|
| We were not able to find the exact cause of the outage, but upon investigation we did find 5600+ emails in the mail queue which may have caused the load to spike and cause the server to become inaccessible. We also noticed cpanellog consuming a large amount of resouces. We are monitoring the server to check for further outages. |
|
|
|
|
|
 |
|
Posted:
Sat Aug 20, 2005 4:01 pm
|
|
|
|
|
| server4 had another small apache outage today, we are currently investigating the cause of the outage. |
|
|
|
|
|
 |
|
Posted:
Sun Aug 21, 2005 11:54 pm
|
|
|
|
|
It appears that server4 had another outage today, we suspect it was caused by a high load again. This is the time of the day when all the server crons run, so we will be working our way through all the daily crons which run to try and find the one causing the server to crash.
We apologise for the inconvenience caused.
Regards,
Aaron |
|
|
|
|
|
 |
|
Posted:
Tue Aug 23, 2005 3:28 pm
|
|
|
|
|
There seems to be ongoing stability problems with the server, we are not yet able to pinpoint the exact cause of the random lockups of the server. We will be performing a few tests and upgrades on the server which will result in a few short outages.
We suspect one of the following could be the cause for the massive increases in loads causing the server to crash.
1. Iowait bug with linux kernel 2.4+Dual Xeon+RHEL which was previously prelevant (6+months ago). This was resolved some time ago, but it appears the problem may have been resurrected.
2. A DoS attack of some sort, although we have no evidence to show this.
3. A script (client or server run) is consuming a large amount of server resouces causing the server to crash.
4. Faulty hardware, although there are no signs of this, its possible that there is faulty hardware. After some upgrades and tests, we may perform hardware replacements to rule out this option.
If you have any questions regarding this please feel free to contact us. |
|
|
|
|
|
 |
|