Site Reliability Engineer x2
Cooperative Computing (CC): We accelerate growth minded companies into the automated economy.
The business market is in rapid change, with consumer behavior placing significantly high expectations on businesses at every phase in the client experience. Each day, we experience a company dramatically shifting “industry norms” and in many cases, removing historical market leaders from their market leading position. These elements create extraordinary opportunities for our team members to excel.
CC delivers superior client experiences as the premier digital enabler of growth minded enterprises enabling their rapid growth and ensuring their sustainable and smooth transition into the Automated Economy.
Our team is passionate about delivering client value and is fanatical in how we go about ensuring we deliver extraordinary business results for our clients. We are committed to growing as individuals first, becoming the best version of ourselves in who we have been created to be. We take responsibility in our thoughts and actions, know our purpose and our end in mind, and put these first in our lives.
- Be Fanatical and Passionate Delivering Superior Client Experiences - It’s who we are!
- Growth is Contagious - I grow, You grow, We all grow!
- Be Innovative - Looking at tomorrow today. We live outside our comfort zone; we ask difficult questions of ourselves; we take risks, and we are fearless to experiment and lead the way forward
- Show Empathy & Be Honest - Every single word spoken, or action performed for our Customers, Team Members, Partners & Stakeholders will be filled with kindness, candor and honesty
- High Performance - It’s not for everyone - Our culture is our team members. We make the lives of our fellow team members better by first recognizing “I” am a team member first. We measure our progress constantly to be a better version of ourselves with every new day
As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining the reliability of our client software systems and infrastructure. You will collaborate with cross-functional teams to ensure the availability, performance, and scalability of our applications, with a focus on automation and continuous improvement.
Capabilities (Key Behaviors):
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- Proven experience as a Site Reliability Engineer or in a similar role.
- Strong programming and scripting skills in Python, Bash, or Go.
- Experience with containerization and orchestration tools.
- Proficiency in cloud platforms (AWS, Azure, GCP).
- Collaborate with software engineers to design scalable and reliable systems.
- Evaluate system architecture and recommend improvements for performance and reliability.
- Develop and maintain automation tools for deployment, monitoring, and scaling.
- Implement infrastructure as code (IaC) practices to ensure consistency and repeatability.
- Implement and maintain monitoring solutions to proactively identify and address performance issues.
- Participate in on-call rotations and respond to incidents, troubleshoot, and resolve issues in a timely manner.
- Conduct capacity planning to ensure systems can handle current and future loads.
- Work closely with teams to forecast resource requirements and scale infrastructure accordingly.
- Identify and address performance bottlenecks in software and infrastructure.
- Experience with CI/CD pipelines and automation tools (Jenkins, AWS/Azure pipelines).
- Define and track key reliability metrics.
- Generate regular reports on system reliability and performance.
- Drive initiatives to improve system reliability through process enhancements and technology upgrades.
- Participate in post-incident reviews and implement preventive measures.
- Create and maintain documentation for system configurations, procedures, and best practices.
- Strong communication and collaboration skills.
- Certification in relevant technologies will be a plus (AWS DevOps Engineer, Azure cloud architect).
- Successfully designed, implemented, and maintained the reliability of client software systems and infrastructure.
- Collaborated with cross-functional teams to ensure the availability, performance, and scalability of applications, emphasizing automation and continuous improvement.
- Demonstrated strong programming and scripting skills in Python, Bash, and Go.
- Utilized experience with containerization and orchestration tools to optimize system performance.
- Showcased proficiency in cloud platforms such as AWS, Azure, and GCP.
- Collaborated with software engineers to design scalable and reliable systems, evaluating architecture for performance and reliability improvements.
- Developed and maintained automation tools for deployment, monitoring, and scaling.
- Implemented Infrastructure as Code (IaC) practices to ensure consistency and repeatability.
- Implemented and maintained monitoring solutions to proactively identify and address performance issues.
- Participated in on-call rotations, responded to incidents, troubleshooted, and resolved issues in a timely manner.
- Conducted capacity planning to ensure systems can handle current and future loads.
- Worked closely with teams to forecast resource requirements and scaled infrastructure accordingly.
- Identified and addressed performance bottlenecks in software and infrastructure.
- Demonstrated experience with CI/CD pipelines and automation tools, including Jenkins, AWS, and Azure pipelines.
- Defined and tracked key reliability metrics for continuous system improvement.
- Generated regular reports on system reliability and performance.
- Drove initiatives to improve system reliability through process enhancements and technology upgrades.
- Participated in post-incident reviews and implemented preventive measures.
- Created and maintained documentation for system configurations, procedures, and best practices, showcasing strong communication and collaboration skills.