Site Reliability Engineer with Splunk

Guadalajara, JAL, MX

Capgemini

A global leader in consulting, technology services and digital transformation, we offer an array of integrated services combining technology with deep sector expertise.

View company page

At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the world’s most innovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology.

About the Role:
We are seeking a highly skilled and motivated Site Reliability Engineer (SRE) with expertise in Splunk/App Dashboard management to join our dynamic team. The ideal candidate will be responsible for ensuring the high availability, performance, and security of our applications and systems. This role involves a deep technical understanding of application monitoring, log management, and operational intelligence to proactively identify and resolve issues, including memory leaks and other performance bottlenecks.

Your colaboration: 

• Splunk/AppDynamics/Grafana dashboard:
◦ Design, develop and manage various application dashboards, reports, and alerts to monitor system health and performance.
◦ Build on-demand dashboard to support production releases and product launches.
◦ Build dashboard to capture key performance metrics to generate summary after product launches and prod releases.


• Performance Analysis & Optimization:
◦ Conduct regular system performance analysis and capacity planning to proactively address potential production issue.
◦ Utilize Splunk and other monitoring tools to identify memory leaks, CPU bottlenecks, and other resource constraints.
◦ Analyzing heap and thread dumps to research on performance issues.
 

• Reliability and Availability:
◦ Design and implement strategies to ensure the reliability and availability of services, including disaster recovery planning and capacity forecasting.
◦ Participate in incident response and resolution, working to minimize downtime and impact on users.
◦ Contribute to root cause analysis while working on critical production issue.
 

• Automation:
◦ Identify areas of improvement in systems and processes and contribute to ongoing optimization efforts.
◦ Build Jemter/Python/SQL scripts to process large amount of data while dealing with critical production incidents.
 

• Collaboration and Documentation:
◦ Work closely with development, operations, and security teams to foster a culture of continuous improvement.
◦ Document systems architecture, processes, and procedures.
◦ Build DFD, use cases and sequence diagram of critical business flows.
◦ Collaborate with operations and dev team during prod release support and monitoring.
 

Your Profile:
• Proven experience with Splunk or similar log management and analytics platforms.
• Strong background in SRE practices, including expertise in analyzing and resolving memory leaks, performance tuning, and automation.
• Experience with scripting languages (e.g., Python, Bash) for automation.
• Knowledge of AWS cloud services and containerization technologies (Docker, Kubernetes).
• Excellent problem-solving skills and ability to work under pressure.
• Strong communication and collaboration skills.

Preferred Qualifications:
• Splunk Certification (e.g., Splunk Certified Administrator, Splunk Certified Architect).
• AWS Certified Practitioner or associate level
• Experience with Java and Microservices. 

 

WHAT YOU’LL LOVE ABOUT WORKING HERE?

  • Capgemini Employer Promise: Learning + Flexibility + Team Spirit + Inclusion + Innovation.
  • Work from home: fully remote position.
  • Get competitive benefits above the law.
  • Build your future within a worldwide leader in ER&D projects.
  • Feel free to grow within different industries and choose your career path.
  • Be part of a great family of Engineers, and people all over Mexico and the world.
     

At Capgemini Mexico, we aim to attract the best talent and are committed to creating a diverse and inclusive work environment, so there is no discrimination based on race, sex, sexual orientation, gender identity or expression, or any other characteristic of a person. All applications welcome and will be considered based on merit against the job and/or experience for the position.

Capgemini is a global leader in partnering with companies to transform and manage their business by harnessing the power of technology. The Group is guided everyday by its purpose of unleashing human energy through technology for an inclusive and sustainable future. It is a responsible and diverse organization of over 300,000 team members in nearly 50 countries. With its strong 50-year heritage and deep industry expertise, Capgemini is trusted by its clients to address the entire breadth of their business needs, from strategy and design to operations, fueled by the fast evolving and innovative world of cloud, data, AI, connectivity, software, digital engineering and platforms.

Apply now Apply later
  • Share this job via
  • or

* Salary range is an estimate based on our InfoSec / Cybersecurity Salary Index 💰

Tags: Analytics Automation AWS Bash Cloud Docker Grafana Incident response Java Kubernetes Microservices Monitoring Python Scripting Splunk SQL Strategy

Perks/benefits: Career development Health care

Regions: Remote/Anywhere North America
Country: Mexico
Job stats:  12  0  0

More jobs like this

Explore more InfoSec / Cybersecurity career opportunities

Find even more open roles in Ethical Hacking, Pen Testing, Security Engineering, Threat Research, Vulnerability Management, Cryptography, Digital Forensics and Cyber Security in general - ordered by popularity of job title or skills, toolset and products used - below.