You may remember that in January 2017, Delta Airlines had a major IT outage. It had to cancel nearly 300 flights over two days because “essential IT systems” went down. This isn’t an isolated incident either—United Airlines also grounded flights for two and a half hours in January due to IT issues.
United Airlines got the attention of the U.S. Senate, especially as a couple of senators had already been investigating other IT challenges at Delta and Southwest earlier in 2016. It’s ironic that an industry so dependent on the latest technology for passenger safety and airplane effectiveness can easily be crippled by legacy IT systems. It’s estimated that the Delta IT outage caused a loss of around $150 million.
The Implications of IT Service & Systems Failures
The issue here is that IT systems failure goes far beyond the impact to the application or service itself. There are all sorts of secondary problems:
- Loss of revenue due to customers not being able purchase
- Compensation for delays and other issues
- Political fallout due to the high visibility of these types of incidents
- Damaged trust with customers and the public at large
- Tarnishing the brand of a business
That’s why effective IT service delivery is so vital. No one notices these services when they are working perfectly, but as soon as there’s a failure, everyone notices!
This has always been the strongest argument for investing in IT service assurance. Although businesses love innovation and new products and services, it’s the “bread and butter” operational applications and processes that keep a business running. That might not be as exciting as new product development, but it’s even more vital.
Service Assurance—Reducing Risk & Protecting Your Business
Service assurance should be part of the IT DNA of every mature business. Your IT ecosystem needs checks, balances, tracking, and other disciplines to keep your systems up and running. It’s not about reactive incident management or break/fix, it’s about knowing you are unlikely to reach that point in the first place. Service assurance gives you that confidence:
- Demand and capacity planning so business growth doesn’t inhibit production IT services
- Active network monitoring to keep applications responsive for employees and users
- Availability management so that systems are there when people need them
- Effective change and release management so any changes to the IT environment don’t cause outages
- Asset and configuration management so you understand exactly what your IT assets are doing, where they are, and what they support
When you invest in these areas, you dramatically reduce the risk of IT outages and the embarrassment and loss of revenue that they can cause.
Improving Your Service Assurance Processes
There are several ways you can enhance service assurance across your organization.
- Get your CIO to take it seriously
- Service assurance is critical. A CIO needs to focus as much (if not more) on existing services as they do on new projects and development
- Put the right policies and processes in place
- Service assurance should be owned by everyone in IT. That means putting the right guidelines, policies, and processes in place to support it
- Put the right control points in place
- Service assurance relies on having thresholds and accurate reporting. You need to understand when services are being put under strain and get early warning so you can take proactive action
How ScienceLogic Can Enhance Service Assurance
ScienceLogic can help with all of the above. Our solution is built around service assurance—giving you the insight and control you need to stop IT outages before they start. Here’s how:
- Automatically map every part of your IT ecosystem so you understand where everything is, what it does, and the services it delivers
- Query all your assets for their configuration details and automatically populate your configuration management database
- Augment popular IT service management tools like ServiceNow
- Actively monitor every IT asset and alert if anything is approaching an unacceptable threshold or is likely to create an issue
- Create automation that can “self-heal” IT infrastructure when performance issues arise
- Reduce your risks by alerting you about IT issues early so you can resolve them before they have a major impact