AI in Support & Operations: From reactive IT to proactive business engine
Support and operations are undergoing a transformation: artificial intelligence is increasingly turning operational tasks into strategic tools. Martin Weber, director service management at diconum, explains how AI is positioning support and operations (S&O) as a true business enabler, significantly increasing efficiency in the process. In this interview, you will learn how AI is already adding value in practice today, from intelligent search and analysis assistants to fully automated incident response. The interview specifically addresses hurdles in data protection in cloud AI, security through SIEM toolkits and SRE (Site Reliability Engineering) principles that enable the transition from reactive to proactive, AI-supported processes.
Hello Martin. How is AI changing support & operations—particularly in terms of strategies, team structures, processes, and company-wide collaboration?
Martin Weber: AI is driving fundamental change in support and operations. It enables us to anticipate incidents and reduce downtime by shifting from reactive ticket handling to proactive system management. Continuous optimization is automated, and manual effort is massively reduced. This allows S&O to scale without proportionally increasing headcount—and positions it as a true business enabler.
New roles such as AI operations analysts and automation strategists are emerging. AI takes over simple and medium-level tickets, allowing teams to focus on optimization and complex cases. This requires further training at the SRE level. The classic tiered model is being replaced by AI-supported SRE teams.
In terms of processes, AI replaces manual triage with root cause analysis and classification. It becomes the core of value creation – with industrial, AI-based process chains. Knowledge management improves significantly, and relevant information is easier to access. This reduces solution times and increases quality.
Collaboration also benefits: AI provides insights into operational weaknesses and improves coordination with development teams and stakeholders. Real-time metrics help quantify service impacts and shorten innovation cycles.
Where has AI already proven its value in support and operations—and what specific benefits do companies see in practice?
AI is already proving its value in several key areas, here comes six examples:
- Simple search: AI enhancements in ticket tools or local RAG solutions dramatically improve the quality of search results. The hidden knowledge in our systems is made usable and different data sources can be integrated, which massively increases relevance for engineers and other stakeholders.
- AI assistant for analysis: Such an assistant not only structures relevant data, but also makes suggestions for problem analysis. It is an enormous help for technicians and engineers who do not have specific expertise and enables iterative refinement of the analysis.
- AI assistant for resolution: Building on the analysis function, it acts as a sparring partner that suggests solution steps. The quality of the suggestions must be absolutely reliable, which is why the use of agent-based RAG systems is crucial here to check and improve the quality of AI responses.
- Real-time monitoring and predictive alarming: AI enables better anomaly detection and forecasting than ever before. Integrating these solutions with LLM-based assistants provides deeper insights into system operations and alerts S&O teams early on.
- Automated incident response: An AI-based system can prevent and resolve standardized issues without human interaction. Although there are still guardrails for complex cases, this is a rapidly evolving area that is continuously increasing the number of cases that can be handled automatically.
- Fully autonomous AI service management: The vision is a fully autonomous agent that takes over the entire service management lifecycle. It handles service requests, resolves incidents, and even performs standard changes, updating ITSM records and learning continuously as it goes. This greatly reduces the burden on operational teams and leads to a proactive, self-improving service management model.
What data protection hurdles do companies have to overcome with AI in the cloud, and when does a local installation make more sense?
Support in particular requires access to highly sensitive data, ranging from infrastructure details to personal customer data. Cloud-based AI solutions pose significant data protection hurdles. US providers are subject to the Cloud Act, which means that data transfers outside the EU cannot be ruled out. Security breaches and non-transparent data use for training purposes are real risks, even with the strictest settings.
My recommendation: Local installations offer clear advantages in terms of security and data protection, as these can be fully controlled. In addition, different AI models can be flexibly combined instead of being tied to a single provider. At diconium, we are pioneers in this field and offer services based on local LLMs in highly secure environments or their implementation and operation in our customers' private clouds.
What types of threats do companies have to reckon with when operating AI-supported support systems, and how can these risks be minimized?
When operating AI-supported support systems, we have to expect typical threats from software vulnerabilities, imperfect operating processes, and human negligence or even intent. Added to this are previously unknown threats.
Even the best risk management measures cannot guarantee that an IT system will not be successfully attacked without residual risk. The key to managing these risks is a complete SIEM (Security Information and Event Management) tool stack. Static code analysis tools cover only a small part of the production environment; a SIEM stack, on the other hand, monitors the entire production environment without gaps. It helps to actively detect malicious activities as soon as they occur, even if attackers exploit unknown vulnerabilities.
Specifically, we rely on tools such as:
- Dependancy-Track provides timely reports on software vulnerabilities in the whole production environment so that new challenges can be addressed immediately.
- Wazuh combines SIEM and XDR (Extended Detection and Response) capabilities to monitor and respond to security threats in real time across all endpoints, cloud workloads, and infrastructures. It alerts us immediately of any malicious activity.
- Suricata complements Wazuh with real-time detection and defense of network-level intruders (NIDS/NIPS). It performs deep packet inspection, blocks malicious traffic (IPS), and detects anomalies at the system perimeter.
These components, combined with strict access controls and data separation, are essential for detecting and minimizing the impact of an attack if it does occur. They form the basis for a high level of trust and provide 24/7 global security.
Why is it important in service operations to think like an owner – and how do ITIL and DevOps work hand in hand for proactive, AI-powered support?
The mindset of operating the system as if it were your own is absolutely critical. When availability, reliability, and security become personal priorities, problems can be fixed at their root before they escalate. This strengthens personal responsibility, continuous risk assessment, and the reduction of technical debt—for stability and trust.
For proactive, AI-supported support, we combine ITIL and DevOps: ITIL provides structure and governance, while DevOps delivers speed through automation. This creates stability without compromising speed – and IT becomes a true business partner.
We rely on comprehensive monitoring, machine learning, and predictive analytics to identify risks early on and take automated countermeasures. Maintenance and vulnerability scans run automatically, minimizing human error. SRE principles ensure the quality of AI automation with specific SLOs, confidence thresholds, full transparency, and feedback loops. Error budgets, verifiability, and bias testing ensure that control and trust are maintained at all times.
What does a mature support and operations team actually look like? How does AI help move from reactive to proactive operations?
A mature support and operations team, as described in Diconium's SRE maturity model, moves from a reactive stance to an optimized, strategic partner.
- In Level 1 (Reactive), teams often operate in silos, respond to problems on an ad hoc basis, and learn more by chance than by design.
- In stage 2 (managed), we establish initial structures, identify skills, begin debriefings, and introduce SLAs.
- Stage 3 (Proactive) is our goal, where cross-functional S&O teams work together. Here, employees are SRE-trained, there is a strong learning culture, and we use SLOs and error budgets, with a focus on prevention. Problems are identified early and root causes are eliminated.
- In Level 4 (Optimized), operations and development are already aligned in the design phase, we have predictive/self-healing systems and AI observability. Incidents are rare or prevented.
AI plays a crucial role in the transition from reactive to proactive processes: From Level 2 to 3, we use comprehensive observability tools to identify problems in advance. For the leap from Level 3 to 4, the integration of AI/ML is essential: It enables anomaly detection, automatic ticket triage, and the reduction of correlated alerts. We develop our team and our measures by continuously updating scorecards, setting concrete goals, and consistently expanding automation and the use of higher-quality, predictive AI as soon as trust and data maturity are established. We prioritize systemic changes and always align ourselves with the business to justify investments and manage trade-offs between reliability and feature speed.
How are models such as Follow-the-Sun evolving with AI, and what does this mean for the distribution of operational expertise?
AI is fundamentally changing the distribution of operational expertise. Centralized AI agents reduce the need for specialist knowledge in regional teams, which become smaller but more specialized and solve complex problems together with AI.
The classic Tier 1 model will be completely replaced by AI in the future. Handovers between shift and global teams will become more efficient thanks to AI-curated summaries, diagnoses, and proposed solutions. Real-time translations and the AI's contextual memory will further improve collaboration.
We see a clear shift from reactive operations to AI monitoring and system governance. Operational excellence is increasingly defined by data quality, model performance, and collaboration—not location. Self-healing scripts further reduce the need for reactive ops teams. At diconium, we rely on a hybrid support model with a centralized 24/7 help desk and specialized experts onshore and offshore to optimize the efficiency gains achieved through the use of local AI assistance systems.