The AI market is projected to reach a $3 trillion mark by 2024, and machine learning, which is a big part of AI, is the key driver of that growth.
AI, is a generalfield with broad scope, and machine learning is the branch of AI that covers the statistical part of AI. Deep learning is a special field of machine learning that involves a deeper level of computation. Machine learning (ML) is the field of computer science to allow machines to make decisions without being explicitly programmed to do so. ML takes the approach of detecting and learning patterns and relationships from data and also applying the inferences to future datasets. ML learning performance improves over time with data. ML helps in automating decisions at scale, which otherwise would have to be carried out by humans, who now can divert resources and attention to other high value activities.
Identifying and clustering symptoms from millions of alerts and events
Process historical alert and event data to identify logical groups or clusters of problem symptoms by applying unsupervised clustering on this data. Iterate and experiment with the right cluster sizing. Ensure the real symptoms are extracted as these messages are embedded with many actors, variables & identities, otherwise clustering will be skewed around identities such as time stamps and proper nouns.
Benefit: Clear identification of real issues lurking in the environment
Identifying and clustering symptoms from millions of log messages
Identify different groups or clusters of log messages, mask the asset identifiers with placeholders and view the alert clusters on a time boxed view to identify hot-spots. For example, with this approach a dataset containing 50,000+ log messages can be reduced to 20 clusters, each identifying a unique log symptom.
Benefit: → Identify problems easily and get early signals from logs
Deriving application or server health based on multiple sensors or telemetry data
Pretrain a classification model with labeled examples of when applications or servers were healthy or unhealthy. For new telemetry data, the system can now classify if application or server as healthy or unhealthy. Simple example for a server is to train the server health CPU, Memory, I/O, Disk, System Load.
Benefit: Efficient determination of IT assets' health
Prediction of alert volume or specific alert conditions for better operational readiness
Get prediction insights by running regression jobs on alert rate or volume generated by monitoring tools, which will give clues about time periods or days during which the ops team can expect high volumes of alerts/events. Regression jobs can also be run at a granular level like:
Benefit: Opportunity to reduce unforeseen outages/degradations
Detecting seasonality and anomalies in alert volume to spot abnormal behavior
Perform regression analysis on alert volume time-series data to establish baseline, identify seasonality and detect anomalies.
Benefit: Less or no dependency on managing health rules
Identifying correlated time series metrics or symptoms for faster root cause inference
Correlation is a proven method to group together different data series that have a high degree of association. By using this method, the system can group all the relevant data in a particular context into a smaller number of correlation groups. This allows users to focus on a subset of the data for accelerating their analysis
Benefit: Symptom groups - alerts or log messages
Incident classification based on incident description and metadata using neural networks
Perform incident classification using neural networks based on incident description and metadata to classify as to which category the incident belongs to, like Application Performance issue, Device Performance Issue, Network Issue etc.
Benefit: Efficient incident processing, Accurate automation and Faster resolution times
Keyword extraction to identify dominant phrases or topics in incidents, alerts or log messages
Perform keyword extraction to identify phrases or topics and relative occurrence frequency to highlight dominant topics or phrases. With this approach, one could clearly see frequency graphs or a word cloud that might show keywords like "Timeout", "Unreachable", "not mounted", "connection failure" or key identities like hostnames or IP addresses etc. With named entity recognition
Benefit: Identify dominant keywords or topics, Enables remedial activities