The client expressed the following needs:
- Creation of a console that is shared among the various operators and administrators belonging to the IT Architecture and Operations Group for management of alarms coming from instruments managed directly by the Automation Systems group and as well as instruments managed by other work groups.
- Create an integration between the events management system and the Corporate Service Desk Structure, that supports the implementation of ITIL Processes of Incident and Change Management.
- The integration will be used as a work tool for all the control room operators to start the process management workflow;
- An IT Service Management Architecture that maximizes the benefit/cost ratio and allows an integration, without any particular technological constraints, of all the different monitoring systems that now exist within the area of IT Architecture and Operations.
To ensure continuity and reliability of the previous Event Management product, and to also minimize costs, Omninecs has proposed a solution focused on:
- monitoring software derived from Op5 Nagios Open Source software platform, regarding the monitoring of systems and applications.
- The SolarWinds Product regarding network monitoring and Configuration Management
- Both products work together for alarms in “Kriu for IT Operations”, a proprietary Omninecs solution, that will ensure proper escalation and integration tools for the Altassian Jira tracking platform for Change and Incident Management.
- Op5 Monitor is the instrument for monitoring systems, middleware and applications software, and databases.
- SolarWinds NPM and NCM: the market leader in instruments of network monitoring that allow the creation of custom dashboards for real and immediate analysis of any possible alarm, troubleshooting and managing configuration management for the network administrators team.
- Kriu for IT Operations Platform: proprietary Omninecs Platform of “IT Event & Correlation Analysis”, the hub of the whole IT Service Management Architecture.
The solution has provided:
- The installation and configuration of Op5. The main driver of the customer’s choice was taking advantage of the extensive existing Nagios agent installations (for both Linux and Windows) coming from the previous version of the monitoring system. Op5, derived from an open Nagios platform, allows the use of custom scripts and tools from both the extensive user community and those developed ad hoc by the Omninecs team. The implementation also allows the use of agentless solutions for a variety of controls for which it is not necessary or is not practical, to install an actual agent: for example for appliance hardware or for monitoring of communication flows on specific TCP ports. Also, for added load balancing and for necessary redundancy, the platform systems servers are distributed across five separate servers (peers) on a Linux RedHat system. The implementation, to reduce the impact on the target server (especially for the business part), also includes the use of bridge machines that deal with results that will subsequently be submitted to comparison with predefined thresholds.
- For the network monitoring side, the equipment and infrastructure subject to monitoring are those relating to the corporate network as well as those assigned to business activities. The architecture based on Windows systems and on MS SQL Server, provides both a redundancy of the front end system using a dedicated server (Additional Web Server), as well as SNMP and ICMP polling systems. There is also a provision for an additional server to store the equipment configuration for backup and restore purposes.
- Configuration of Kriu for IT Operations in order to:
- Collect and filter events from Op5, from SolarWinds network monitoring software, from VMWare infrastructure and from all sources of third-party monitoring;
- Correlate these events for a better and quick understanding of inefficiency;
- Automatically generate on Atlassian Jira all incidents that, through appropriate escalation to the competency group, with proper severity assigned and with the accessory information, enable control room operators to trace the development of the incident and change management activity;