1+ months

Site Reliability Engineer (SRE)

Cisco Systems Inc.
San Jose, CA 95113


What You'll Do 

You are deeply motivated Site Reliability engineer with background in DevOps/SRE software development and operations. Ideal candidate must have experience building, shipping and operating software-as-a-service (SaaS) product. Ideal candidate would have managed such products using Cloud Native Principles and exposed to cloud technologies. This position will enable Continuous Monitoring & Management of infrastructure while providing timely response within designated SLA times to service effecting faults and performance issues. As an SRE you will work closely with our Managed Services Team to diagnose & characterize issues to provide continuous improvement and to develop infrastructure best practices. As SRE you will be driven to build highly scalable, fault-tolerant, and easy to administer infrastructure. You must be pro-active and organized, diligent about documentation, and passionate about monitoring and automating everything.

This can only be accomplished by a candidate with substantial real-world experience actually building, deploying and operating distributed systems using cloud technologies. 


Who You'll Work With 

Cisco is transforming the networking industry. To make this happen, we are heavily investing in team responsible for The Network. Intuitive. We are disrupting the industry by building a new networking platform that can learn, adapt, and secure itself at the speed of todays businesses. This Digital Network Architecture platform automates network management and provides our customers with state-of-the-art analytics and insights. This team's innovations span artificial intelligence, machine learning, analytics, IoT, security, automation, and more. 


Who You Are 

This role is primarily to apply your SRE skills to create complete self-serve Software Delivery Machine. The targeted platform will support vast number of cloud and hybrid customers.  The candidate is expected to have strong hands-on skills and will guide and contribute technically to the infrastructure engineering. 

       Develop full-fledged software tooling to deliver programmable infrastructure (infrastructure as code)

       Develop tooling to drive end-to-end micro-services monitoring and management

       Implement Kubernetes compliance and best practices in terms of security, audits, network policies, reporting

       Develop Self-service Console to provide infrastructure visibility 



 Responsibilities

       Manage the availability, scalability and performance of the Infrastructure platforms.

       Create the tools and infrastructure leveraged by the rest of the engineering teams

       Diagnose and repair network, application, and hardware bottlenecks

       Test and tune network, hardware, and software configurations to maximize performance

       Deploy and manage monitoring and diagnostic tools

       Monitoring systems, databases and networks for proper operation and performance.

       Providing a 724 on call support for the operations infrastructure.

       Create and maintain continuous integration (CI) and continuous deployment (CD) environments to facilitate an agile development process.

       Work is generally expected to take place during normal working hours however the Platform Operations Team provides Tier2 and Tier3 7x24x365 on call escalation and candidates should be flexible with schedules to meet the needs and demands of the business.



Qualifications

       Strong knowledge of core Enterprise LINUX (Red Hat/CentOS) with a focus upon building, maintaining, securing and performance tuning systems.

       Proven experience capacity planning, performance tuning, and infrastructure architecture. Experience scaling web, application, and data systems horizontally and vertically.

       Experience with K8S and other virtual infrastructure platforms.

       High-level shell fluency + one or more scripting languages ( Python, Go, Perl, or similar ).

       Experience with system automation using Ansible.

       Experience with monitoring, alerting, and pipeline analysis tools

       Experience with queuing/data-pipelining.

       Experience with SQL/NoSQL systems such as PostgresSQL, MySQL, Cassandra, or Redis.

       Experience in the development of operational procedures, processes, and scripts 

The candidate expected to have strong hands-on skills and will guide and contribute technically to the product.

      BS/MS in Computer Science or related area

      Four or more years of relevant work experience

      Hands on experience working with Kubernetes infrastructure

      Kubernetes Certification is highly preferred

      Expert understanding of Kubernetes internals (clustering, scheduling, controllers, API server, etc.

      Very good understanding of container networking

      Very good software programming skills using Go/Python/YM

      Excellent understanding of microservices architecture

      Experience with Kubernetes monitoring tools (prometheus) 


 Why Cisco

At Cisco, each person brings their unique talents to work as a team and make a difference.

 

Yes, our technology changes the way the world works, lives, plays and learns, but our edge comes from our people.

 

o   We connect everything people, process, data and things and we use those connections to change our world for the better.

o   We innovate everywhere - From launching a new era of networking that adapts, learns and protects, to building Cisco Services that accelerate businesses and business results. Our technology powers entertainment, retail, healthcare, education and more from Smart-Cities to your everyday devices.

o   We benefit everyone - We do all of this while striving for a culture that empowers every person to be the difference, at work and in our communities.

 

Colorful hair? Dont care. Tattoos? Show off your ink. Like polka dots? Thats cool. Pop culture geek? Many of us are. Be you, with us! #WeAreCisco

 

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.


*LI-IS1


Categories

Posted: 2020-04-03 Expires: 2020-08-02

Before you go...

Our free job seeker tools include alerts for new jobs, saving your favorites, optimized job matching, and more! Just enter your email below.

Share this job:

Site Reliability Engineer (SRE)

Cisco Systems Inc.
San Jose, CA 95113

Join us to start saving your Favorite Jobs!

Sign In Create Account
Powered ByCareerCast