Software Development Manager, ICC Incident Response
Austin, Texas, USA
Job summary
Amazon strives to be the world’s most customer centric company. To succeed, our products and services must be available at all times to our customers. The Consumer Incident Response (IR) team is responsible for improving the availability of our shopping experiences (website, mobile experience, in-store) worldwide (across 21 different marketplaces).
As a Software Development Manager on the IR team, your contributions will directly impact and reduce the total Mean-Time-to-Repair (MTTR) for retail customer-impacting outages. You will lead the development of new, greenfield systems which classify and pinpoint software outages in a service graph of tens of thousands of services and automatically facilitate engagement of precisely the right teams within seconds to triage and repair those outages. Machine learning will be used to identify patterns in similar types of problems and to recommend the next-best team to engage if we need to keep searching. You’ll determine which initiatives are the most important at any given time based on data and how the retail software ecosystem is currently evolving. Your contributions will have a force multiplier effect on the immediate team and larger organization by automating solutions, instituting operational excellence and leading through others.
This role is a perfect fit for an experienced engineer who is passionate about availability, alerting, metrics, machine learning, site reliability engineering (SRE) and automation. You thrive in a fast-paced, startup-like environment, build new systems end-to-end from the ground up, communicate effectively to all stakeholders (tech, non-tech) and ship complex software in quick iterations at scale. You quickly learn new technologies/concepts and are able to make the right technical decisions for the products and business you support to provide the best experience for Amazon customers. You will shape the future of how Amazon Retail (and beyond) detects and responds to issues proactively.
Reasons to join our team; you will have the opportunity to:
• Deliver high-impact, high-visibility projects that are used by thousands of Amazon Retail services, driving career advancement.
• Invent processes, tools, and technology to force multiply the effect of your contributions across many organizations.
• Be responsible for owning, scoping, leading and delivering projects and experiments end-to-end, leveraging statistical evaluation, pattern recognition, and machine learning.
• Have the ability to dive deep into a wide variety of problems and technologies to guide the right decisions for the products and the customers you will support.
As an ideal candidate, your responsibilities would be:
• Bring multiple years of Engineering experience (with expertise in DevOps or SRE specialization preferred) - from both owning and operating solutions at massive scale; and leverage your unique experience and fresh perspective to drive innovation, simplification and new experiments.
• Be a strong communicator, with proven abilities in both architectural and software solutioning, and a proven track record of shipping complex software solutions through an agile methodology.
• Act a vocal leader and mentor for other engineers. Raise the bar when it comes to best practices, processes, and technical quality.
• Interact with developers across the company to understand their challenges, and work with leaders on the team to develop a roadmap for our portfolio and platform.
· 7+ years of experience working directly within engineering teams
· Experience partnering with product OR program management teams
· 3+ years of people management experience, managing engineers
· 3+ years of experience architecting and designing (architecture, design patterns, reliability and scaling) of new and current systems
• Experience building and maintaining large-scale, high-availability distributed systems
• Excellent oral and written communication skills with both technical and non-technical stakeholders
• Experience identifying and prioritizing the most important software initiatives for the business/customer, taking them from scoping through launch and into daily operation.
• Ability to identify and solve ambiguous problems in short timeframes with limited oversight/direction.
• Experience influencing software engineers, infrastructure engineers and operators on best practices (full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations)
• Computer Science fundamentals in object-oriented design, data structures, algorithms, problem solving and complexity analysis
• Understanding of CI/CD, test automation and robust system health monitoring (metrics, monitors, alarms)
• Experience with Unix/Linux environments
• Understanding and ability to apply Agile (or similar) software development practices to improve software development reliability and velocity
• Bachelor's degree in Computer Science, a related technical field OR equivalent training and industry experience
• Specific production experience with AWS services & tools (IAAS, PAAS, APIs, tools)
• Experience with Site Reliability Engineering (SRE) concepts, practices
• Experience with statistical analysis and machine learning
• Experience with full-stack development
Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us.
Amazon strives to be the world’s most customer centric company. To succeed, our products and services must be available at all times to our customers. The Consumer Incident Response (IR) team is responsible for improving the availability of our shopping experiences (website, mobile experience, in-store) worldwide (across 21 different marketplaces).
As a Software Development Manager on the IR team, your contributions will directly impact and reduce the total Mean-Time-to-Repair (MTTR) for retail customer-impacting outages. You will lead the development of new, greenfield systems which classify and pinpoint software outages in a service graph of tens of thousands of services and automatically facilitate engagement of precisely the right teams within seconds to triage and repair those outages. Machine learning will be used to identify patterns in similar types of problems and to recommend the next-best team to engage if we need to keep searching. You’ll determine which initiatives are the most important at any given time based on data and how the retail software ecosystem is currently evolving. Your contributions will have a force multiplier effect on the immediate team and larger organization by automating solutions, instituting operational excellence and leading through others.
This role is a perfect fit for an experienced engineer who is passionate about availability, alerting, metrics, machine learning, site reliability engineering (SRE) and automation. You thrive in a fast-paced, startup-like environment, build new systems end-to-end from the ground up, communicate effectively to all stakeholders (tech, non-tech) and ship complex software in quick iterations at scale. You quickly learn new technologies/concepts and are able to make the right technical decisions for the products and business you support to provide the best experience for Amazon customers. You will shape the future of how Amazon Retail (and beyond) detects and responds to issues proactively.
Reasons to join our team; you will have the opportunity to:
• Deliver high-impact, high-visibility projects that are used by thousands of Amazon Retail services, driving career advancement.
• Invent processes, tools, and technology to force multiply the effect of your contributions across many organizations.
• Be responsible for owning, scoping, leading and delivering projects and experiments end-to-end, leveraging statistical evaluation, pattern recognition, and machine learning.
• Have the ability to dive deep into a wide variety of problems and technologies to guide the right decisions for the products and the customers you will support.
As an ideal candidate, your responsibilities would be:
• Bring multiple years of Engineering experience (with expertise in DevOps or SRE specialization preferred) - from both owning and operating solutions at massive scale; and leverage your unique experience and fresh perspective to drive innovation, simplification and new experiments.
• Be a strong communicator, with proven abilities in both architectural and software solutioning, and a proven track record of shipping complex software solutions through an agile methodology.
• Act a vocal leader and mentor for other engineers. Raise the bar when it comes to best practices, processes, and technical quality.
• Interact with developers across the company to understand their challenges, and work with leaders on the team to develop a roadmap for our portfolio and platform.
Basic Qualifications
· 7+ years of experience working directly within engineering teams
· Experience partnering with product OR program management teams
· 3+ years of people management experience, managing engineers
· 3+ years of experience architecting and designing (architecture, design patterns, reliability and scaling) of new and current systems
• Experience building and maintaining large-scale, high-availability distributed systems
• Excellent oral and written communication skills with both technical and non-technical stakeholders
• Experience identifying and prioritizing the most important software initiatives for the business/customer, taking them from scoping through launch and into daily operation.
• Ability to identify and solve ambiguous problems in short timeframes with limited oversight/direction.
• Experience influencing software engineers, infrastructure engineers and operators on best practices (full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations)
• Computer Science fundamentals in object-oriented design, data structures, algorithms, problem solving and complexity analysis
• Understanding of CI/CD, test automation and robust system health monitoring (metrics, monitors, alarms)
• Experience with Unix/Linux environments
• Understanding and ability to apply Agile (or similar) software development practices to improve software development reliability and velocity
• Bachelor's degree in Computer Science, a related technical field OR equivalent training and industry experience
Preferred Qualifications
• 8+ years of software development experience, preferably in building large-scale end-to-end distributed systems• Specific production experience with AWS services & tools (IAAS, PAAS, APIs, tools)
• Experience with Site Reliability Engineering (SRE) concepts, practices
• Experience with statistical analysis and machine learning
• Experience with full-stack development
Amazon is committed to a diverse and inclusive workplace. Amazon is an equal opportunity employer and does not discriminate on the basis of race, national origin, gender, gender identity, sexual orientation, protected veteran status, disability, age, or other legally protected status. For individuals with disabilities who would like to request an accommodation, please visit https://www.amazon.jobs/en/disability/us.
Tags: Automation AWS DevOps IaaS Incident response Linux Machine Learning PaaS UNIX
Perks/benefits: Career development Startup environment
Region:
North America
Country:
United States
Job stats:
2
0
0
Categories:
Incident Response Jobs
Leadership Jobs
Explore more Cyber Security career opportunities
Find open roles in Ethical Hacking, Pen Testing, Security Engineering, Threat Research, Vulnerability Analysis, Cryptography, Digital Forensics and Cybersecurity in general, filtered by job title or popular skill, toolset and products used.
- Open Security Operations Analyst jobs
- Open Senior Security Operations Engineer jobs
- Open Senior DevSecOps Engineer jobs
- Open Senior Infrastructure Security Engineer jobs
- Open Head of Information Security jobs
- Open Application Security Engineer/Architect jobs
- Open Senior Security Analyst jobs
- Open SOC Analyst jobs
- Open Offensive Security Engineer jobs
- Open Lead Security Engineer jobs
- Open Staff Security Engineer jobs
- Open Information System Security Officer (ISSO) jobs
- Open Sr. Security Engineer jobs
- Open Senior Information Security Engineer jobs
- Open Staff Application Security Engineer jobs
- Open Senior Penetration Tester jobs
- Open Information Security Officer jobs
- Open Senior Threat Intelligence Analyst jobs
- Open Cloud Security Operations Lead jobs
- Open Security Researcher jobs
- Open Security Engineer II jobs
- Open Senior Information Security Analyst jobs
- Open Cloud Security Automation Specialist jobs
- Open Account Executive, Cyber Security jobs
- Open Security Consultant jobs
- Open GCP-related jobs
- Open Kubernetes-related jobs
- Open Java-related jobs
- Open Analytics-related jobs
- Open Malware-related jobs
- Open DevOps-related jobs
- Open Audits-related jobs
- Open Clearance-related jobs
- Open PCI-related jobs
- Open Agile-related jobs
- Open Threat intelligence-related jobs
- Open OWASP-related jobs
- Open Forensics-related jobs
- Open IDS-related jobs
- Open CISM-related jobs
- Open Ruby-related jobs
- Open Governance-related jobs
- Open CISA-related jobs
- Open DevSecOps-related jobs
- Open Open Source-related jobs
- Open JavaScript-related jobs
- Open ISO 27001-related jobs
- Open Security assessments-related jobs
- Open Encryption-related jobs
- Open GDPR-related jobs