Five Ways to Improve Reliability and Get Better Business Results

CEO and Co-Founder of capture pointa digital experience intelligence company.

Prior to founding Catchpoint, I was Vice President of Operations and Service Quality at DoubleClick. Here, I was responsible for reliability and created the Quality of Service unit to improve reliability, performance, availability and resiliency across the organization.

Every morning, I report to the executive team on our data from a reliability standpoint. What is the response time? What is the availability? What is the data processing time? I do this for every function we care about.

At 10am, representatives from operations, network engineering, customer support, and product management reviewed our report and determined next steps based on our data. This provides the basis for the integrity of the work, and It drives better business outcomes, two elements that I believe are integral to overall success.

Focus on Customer Experience Reliability Generates Business Results

Organizations are 260% more likely to focus on customer experience reliability than underperforming organizations. I see clients confirming this every day.

Good enough is no longer good enough, especially in the competitive, volatile environment we find ourselves in at the end of 2022. Striving for excellence is now table stakes. Elite organizations prioritize the reliability of the customer experience because having a site that is available, accessible, and performing reliably is critical.

Five Ways to Improve Reliability

So how can organizations improve reliability and achieve Internet resilience? Here are five suggestions.

1. Introduce SRE at the right time.

Hire a Site Reliability Engineer (SRE) for your organization when the time is right. Remember, SRE is not ITOps-sounding; rather, it’s a transformative role. SREs are agents of internal and external change. They can bring a new way of thinking to the job, providing an objective perspective on day-to-day operations. They can improve infrastructure and automation, improve reliability, enable different levels of incident management, and, if you let it, change the culture. Give them interesting jobs and let them work.

Reliability is about size, but it’s also about power. By collecting data through monitoring and telemetry, we gain the potential of knowledge. SRE is about connecting the dots between different telemetry data as quickly as possible. The system will only get more complex if we move forward, so the sooner we accept this role to solve the next set of problems, the better.

2. Invest in a Chief Reliability Officer.

Ultimately, reliability is a business-level metric. Individual practitioners and executives are divided on various key DevOps concepts, such as tool sprawl. We need to find ways to bridge the gap between individual contributors and executors so that perspectives are understood by both parties and business outcomes are not lost in the process.

One effective approach is to introduce a new executive role: the Chief Reliability Officer (CRO). One of the most effective ways to achieve reliability and resilience in the same way as security is to make reliability a board-level conversation. A CRO can help determine your reliability and resilience posture and ensure you monitor every inch of your company to understand exactly how you work today and what may need to change. This can help the alignment gap disappear.

3. Create opportunities for better communication.

To further demonstrate this disparity, 59 percent of executives derive medium or high value from AIOps, compared to just 20 percent of individual practitioners. Both sides have their own reasons for feeling this way. Perhaps executives have more of a bird’s-eye view through which they can see the value as part of a larger picture (and perhaps solution) of the challenges facing SRE.

Besides creating a CRO, what else can companies do to ensure better communication between executives and everyday practitioners? Find new ways to communicate and collaborate. Consider how to give and receive feedback. Set alignment opportunities to identify shared goals and drive accountability for data-driven decision-making.

4. Create the right culture.

SRE changes the culture, but businesses must also enable this to happen even in the most stressful of situations. SREs play a role in high-intensity incident management and must quickly determine root cause under intense pressure.

Having a just culture in these critical moments can support reliability practitioners and significantly impact business outcomes. Organizations that operate with a culture of justice are more likely to be meritocratic.

5. Empower SREs to do their jobs.

According to Steve McGhee, reliability advocate and SRE at Google Cloud, SREs “thrive best when they feel truly empowered: when their organization trusts them to do the right thing, and they’re given the resources and freedom they need. ’” He added that it’s critical for leaders to listen and support “rather than inserting preconceived notions or interpretations.”

We must work hard to close the gap. Find ways to have more agile and consistent conversations, listen to SREs and act on findings. By doing this, you can effect change—not just once or every once in a while, but on an ongoing level that business requires today.


The Forbes Technology Council is an invite-only community for world-class CIOs, CTOs, and technology executives. Am I eligible?


Source link