Network Downtime Technical Report

                                                A Network Downtime Technical Report

Impact/Effect, Causes, Preventive and Corrective Measures

Duration

Start time -- 9:30am on 10/08/2023

End Time --12:00pm on 10/08/2023

Impact/Effect

Complete downtime of mobile payment app, Web App and other Payment Platforms: As a result of the aforementioned, the e-channel platform becomes inaccessible which makes users unable to make payments, check balance and perform other transactions on the web and mobile platforms.

Branches become crowded with tensed customers and contact centers buzzed with emails and phone calls from various customers across the nation. Negative reviews are also associated on all social media channels as a result of this downtime.

Root Cause: The downtime was as a result of overloading of servers and network glitch and power outage from the Internet Services Provider (ISP). A phone call made through the internet service provider reveals that they were having a power outage and the backup power supply was not engaged after preventive maintenance had been carried out the previous day. This was noticed after the technical team tried pinging the unresponsive servers several times. A further diagnosis shows that the servers are not powered up, the main switch was off and the change over switch was set to neutral.

Unfortunately before the servers from both ends get fully up and running, time has elapsed and the flooding request from users immediately the server comes up makes it break down leading to another downtime.

Corrective and Preventive Measures

Uninterruptible Power Supply should be provided to computer systems and servers, due to frequent power outage, solar power inverters should be explored to provide electrical power. Since a good solar power system, the photovoltaic cells and battery has a lifespan of 25 years, the problem being initial cost of purchase and installation. Aside from the initial cost, the solar system has low maintenance cost and monitoring units and breakers can easily be inculcated to provide seamless operation of the solar systems.

On the other hand, multiple servers should be taken into consideration with a load-balancer such as nginx should be explored for effective and efficient distribution of network traffic across the servers.

Also, multiple internet service providers (ISP) should be used, so that if one is down the other ISP can serve since various plans are now available as options to users (such as pay as you go instead of monthly, quarterly or yearly subscription) to save cost.

Summary

Power outages have a great impact on business not only the manufacturing industries but across all sectors. Financial institutions and other organizations that provide services using web or mobile applications need electrical power to supply their servers all day long, every day and for a long period of time. When these power systems need maintenance or replacement, another source of power should be utilized to avoid downtime and outage.

Servers can fail due to various reasons, but utilizing identical multiple servers and balancing the traffic load across them and a constant backup can be of help. Other monitoring tools should be also used such that any abnormality and inconsistency in output can be easily identified and troubleshooted in a short period of time.


Comments