1+ months

Head of Site Reliability Engineering

Cisco Systems Inc.
Texas City, TX 77590

Have you ever had a poor customer experience and thought of a million ways in which it could be improved? Have you ever empathized with those whose job it is to deliver customer experience and imagined how, if properly empowered and motivated, the overall service delivery experience could be better? If so, we want to talk.

In the Cisco Contact Center R&D group, we are passionate about delivering amazing experiences to both customers and employees. We believe software can truly change the world and how it gives and receives customer service and it is our mission to be the leader in this space. To enable this vision, we are looking for passionate, smart, and motivated Engineering leaders to help up-level our game and deliver reliable software to our customers.

As the Head of Site Reliability Engineering you will:

  • Build a global team of passionate and dedicated system engineers to monitor, manage and automate our globally-distributed cloud platform.
  • Design and implement best practices and processes for delivering industry-leading cloud services reliability and resiliency for mission-critical customer use cases.
  • Work to establish ambitious Service Level Objectives for platform services and drive instrumentation efforts to deliver Service Level Indicators to report on the objectives for the purposes of customer and partner availability and reliability transparency.
  • Own and manage end-to-end incident management practices and processes that enable us to fail and recover fast and continuously learn and adjust.
  • Promote and evangelize the engineering discipline of Site Reliability within the division and throughout Cisco. Be recognized as a thought-leader in identifying and sharing best practices for running reliable and resilient global services at cloud-scale.

In this role, you will work with:

  • Software Engineers, Architects, and Managers within our Product Development Engineering teams to ensure the software they produce meets the reliability, serviceability, and resiliency standards our customers deserve.
  • Technical Product Managers in helping them understand and appreciate the criticality of investing in the non-functional, systemic qualities (the ilities) of the products we produce. This working relationship will also be critical for Service Level Objective identification and definition.
  • Customer-facing Support and Solution Assurance teams to establish and manage service level expectations and participate in communications related to service status and reliability.
  • Peer Site Reliability Engineers and Leaders across the division and throughout Cisco in driving best practices and patterns that will contribute to Ciscos reputation as an industry leader for running cloud services.

To get this job, you will need:

  • A passion for solving the hard problems of running large-scale cloud services at the highest levels of reliability and resiliency.
  • A vision for applying software engineering skills and experiences to automate all aspects of the software delivery and management process from build/test/deploy, monitoring and alerting, service level reporting, to automatic failover and capacity management.
  • 7 or more years experience as a Site Reliability Engineer and a Site Reliability Engineering Manager in a cloud/SaaS-based environment where reliability and resiliency are critical factors in business continuity.
  • A consistent record of building SRE teams that have taken existing systems and improved them to the next level or two of reliability and resiliency. This includes having established ambitious Service Level Objectives and Indicators to drive continuous improvement through measurement.
  • Detailed experience and knowledge of industry-leading best practices, patterns, and toolsets for Site Reliability Engineering.
  • Experience driving incident management processes and delivering improvements in response and recovery times.
  • Experience with public cloud service providers such as AWS, Google Cloud, and Azure.
  • Familiarity and experience using industry-standard toolsets for provisioning and configuration management automation and operational monitoring.
  • Previous (preferably current) experience in one or more programming languages used in system automation and management.
  • Ability to make rational, data-driven decisions under pressure while maintaining a calm and confidence-inspiring demeanor.
  • Excellent project management and communication skills.

We Are Cisco

#WeAreCisco, where each person is unique, but we bring our talents to work as a team and make a difference. Heres how we do it.

We embrace digital, and help our customers implement change in their digital businesses. Some may think were old (30 years strong!) and only about hardware, but were also a software company. And a security company. An AI/Machine Learning company. We even invented an intuitive network that adapts, predicts, learns and protects. No other company can do what we do you cant put us in a box!

But Digital Transformation is an empty buzz phrase without a culture that allows for innovation, creativity, and yes, even failure (if you learn from it.)

Day to day, we focus on the give and take. We give our best, we give our egos a break and we give of ourselves (because giving back is built into our DNA.) We take accountability, we take bold steps, and we take difference to heart. Because without diversity of thought and a commitment to equality for all, there is no moving forward.

So, you have colorful hair? Dont care. Tattoos? Show off your ink. Like polka dots? Thats cool.

#LI-JS1

#LI-PRIORITY

Categories

Posted: 2020-08-21 Expires: 2020-11-15

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Head of Site Reliability Engineering

Cisco Systems Inc.
Texas City, TX 77590

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast