Senior Engineer, Production Engineering & Incident Response

Plantation, FL

Applications have closed

Magic Leap

Explore Magic Leap AR for business. Improve your organization's training, 3D visualization, collaboration, and remote assistance workflows.

View company page

Find more jobs like this

Posted 3 years ago

Senior Engineer, Production Engineering & Incident Response

Magic Leap is looking for a senior engineer to focus on live site operations and incident response management.

Job Description

In this role, you will be responsible for day-to-day operations of our production live site systems, coordinate response to an outage and build incident management engineering systems based on industry standards and ITSM principals.

The ideal candidate is very knowledgeable with ITSM and is experienced in IT Incident Management engineering, processes improvement with a proven track record of resolving critical impacting incidents affecting microservice architect-based engineering services.

Responsibilities

Oversight of 24x7 Major Incident Response
Continually improve the engineering, efficiency and effectiveness of the Incident Response program
Develop, measure, and report process performance and functional metrics in order to identify opportunities, measure success, or validate expected outcomes
Tightly integrate incident management tools & processes with monitoring & observability platforms, production engineering dashboard and other ITSM tools.
Define SLO & SLA metrics with engineering service owners & work with monitoring team to
Bring continuous improvement to support and operational practices.
Handle escalations and communicate clearly and effectively to all stakeholders including senior company leaders

Qualifications

10+ years of incident management in a high paced technology company
Track record of managing complex incident management
8+ years of experience in managing production system of build & release tools, large scale public cloud based micro service with 100K+ concurrent users
Prior experience of working in production engineering w/ regional NOC & SOC
Prior experience with instrumenting mission critical services on a globally distributed level, using cloud hosting providers like AWS, GCP and more
Prior experience integrating event management systems such as Pager Duty and other production engineering system
Prior experience with Cloud Watch, StackDriver, Prometheus, Data Dog, Sumo Logic

Education

BA/BS in Computer Science or related field and equivalent experience

Additional Information

All your information will be kept confidential according to Equal Employment Opportunities guidelines.

Find more jobs like this

Tags: AWS Cloud Computer Science GCP Incident response Monitoring Prometheus

Region: North America

Job stats: 5 1 0

Categories: Incident Response Jobs Security Engineering Jobs

More jobs like this

« Back to job search To the top ↑

Explore more InfoSec / Cybersecurity career opportunities

Find even more open roles in Ethical Hacking, Pen Testing, Security Engineering, Threat Research, Vulnerability Management, Cryptography, Digital Forensics and Cyber Security in general - ordered by popularity of job title or skills, toolset and products used - below.

Senior Engineer, Production Engineering & Incident Response

Plantation, FL

Applications have closed

Magic Leap

More jobs like this

Principal/Senior Staff Security Engineer - (Product Security)

Sr. Cloud Security Engineer

Staff Security Operations Engineer

Principal Product Security Engineer

Principal Security Engineer

Sr Cyber Security Engineer I

Incident Response Lead

Cybersecurity Threat Modeling Engineer

Senior Software Engineer, Privacy Engineering

Senior Cyber Software Engineer

Explore more InfoSec / Cybersecurity career opportunities