System repeatedly crashes, developers are baffled by what is causing the crash… Developers are pointing fingers at the Infrastructure Operators (IO) and IOs are defending their work and its ground. Due to instability the whole company is shaking from the foundation. A Nightmare every company wishes to avoid however; typical scenario in most web enabled businesses. Because of Sony incident, no company feels secure and if fingers are pointed to a direction, you bet everyone will gaze with scrutinizing eye.
Blame was put on the IOs in this situation. The workings of Bonded networks, Load-Balanced Fail-Over system and Replication system that was built to create redundancy for the company’s data got scrutinized to a point that everything got reset to single point of failure. “To simplify the overview” they say…
Development department simply said “No… it can’t be our code” “There is connection issues losing session with the database” “It’s the network” and “Probably replication system is creating the lag” etc… etc.. etc…
So the Infrastructure gets stripped down. Management call. Everyone in the company starts to think the crashes are IO’s fault. It is hurtful moment for IOs seeing their months of work getting destroyed in front of their eyes by their own hands. Down to direct cross-over connect to the database with all clusters turned off.
First crash happens, then second, back-to-back, logs are pulled, examined, same problem. Oh… mighty…
After so many crashes with so many people involved to find a resolution to the crashes, Development team now finds all the references online about the issue similar to ours. “Its the Alfresco platform” They say.
They could of saved all the trouble if they listened to the IOs when the platform was chosen. “It’s not built for the business we are running.”
This platform has been a head ache for IOs. It’s over complicated nature is a big turn off. When the platform was chosen as a base system, IOs were against it.
“We are sailing off to a war in a cruise liner, It’s too fat. Its a resource hog with services that we don’t need. We need a lean machine that can utilize database clusters in a fast, controlled manner. We do not know how efficient this platform is behind the curtain.”
on IOs upon hearing this.
Decisions were made while casting out the IOs.
Years goes by, application got developed on top of this platform, functions and services were created central to this system’s core.
Now its too late to revert back, neck deep in this shit platform that can’t even sustain a feather weight of users. Development finds a critical bug on the platform which collapses when it exceeds 20 concurrent users. Resource intensive Java process they say is the cause.
To IOs, it sucks when it brings down sixteen core, 64 GB RAM system to its knees.
to be continued…