InComm Operations Engineer I in St. Louis, Missouri
When you think of InComm Payments, think of Innovative Payments Technology. We were founded 25 years ago and continue to be a pioneer in the payment (FinTech) industry. Since our inception, we have grown to be a team of over 2,500 employees in 30 countries around the world. We own over 386 global technical patents and a network that includes over 500,000 points of retail distribution that points to our industry expertise.
InComm Payments works with the most recognized and valued brands in the world, and we are partnered with most of the world’s leading merchants. InComm Payments is highly focused on our people and their growth, and we work hard to make a career at InComm Payments meaningful and rewarding. We value innovation, quality, passion, integrity and responsibility in all that we do, and we are looking for great people to join our team as we move forward towards a very bright future.
You can learn more about careers at InComm Payments here:www.incomm.comor connect with us onTwitter (http://twitter.com/incomm) ,Facebook (http://facebook.com/incomm) ,LinkedIn (http://www.linkedin.com/company/incomm) , orOur Blog (http://www.incomm.com/blog) .
Inside InComm (https://vimeo.com/185012736) fromInComm (https://vimeo.com/incomm) onVimeo (https://vimeo.com) .
About This Opportunity
The Operations Engineer is a key contributor to the success of all the Enterprise Support Software operations efforts. Their responsibilities cover general support to Tier-1 level support of production applications developed and deployed within a continuous delivery life-cyle. Managing production Incidents and Requests ensuring these are adhering to defined service-level-agreements (SLA) and operational-level-agreements (OLA). Reporting on progress, following up with customers and escalations to Tier-2 (development) support. However, the Operations Engineer is not only responsible for the production operations which includes server maintenance, compliance, and security. But, also will be part of the overall DevOps efforts where automation is created to make operations more efficient while supporting development. This is an engineer position where a focused, self-driven attitude will be paramount to success leveraging automation and integrations between different platforms, tools, and services. Building a suite of automated routine tasks so that more efforts can be spent on innovating and improving the application's success and the user's experience. Root cause analysis and deep-diving into the application's architecture and processes will be required to properly escalate and track problems to their resolutions. Log-Splunking and analysis through different tools to identify causations is a required skill. Monitoring APM data and building & tracking dashboards and reports for proactive responses of issues before they become incidents. Participating in on-call schedules and responding to automated alerting. First and foremost the Operations Engineer is responsible for ensuring the production operations of the applications while working with developers to improve the delivery pipeline and support experience by the user and the development team.
Respond to Application Support emails, incidents, and requests (All reported incidents, issues need to be logged in SNOW and managed there - not in emails)
Investigate at a Tier 1 level of support to errors, failures, or other reported issues
Collaborate with involved system's support teams
Capture all pertinent information and data related to the reported incident
User's environment details (OS, Browser, Application, Versions, Console Logs, Steps to reproduce, error messages)
Application environment details (full stream app logs, version, environment, servers, involved integrated systems)
Determine cause of incident: Defect, User Error and resolve accordingly
Perform Root Cause Analysis (RCA) on each reported Incident
Create/Update necessary Knowledge Base Articles to support the incident's findings
Reassign Incident if dependent system is the cause and no action is required or needed on the front-line platform; otherwise create a new Incident for the dependent system linked to the original for traceability and follow up with customer
Escalate Incidents which cause/resolution cannot be determined through Tier 1 efforts to Tier 2 support through Development On-Call engineers.
Follow-up with escalation towards resolution
Ensure resulting defects or enhancements are appropriately linked
Follow-up with incidents pending defects and enhancement requests for customer closure
Create and maintain Application Runbooks, perform integrations with dynamic and interactive tools for real-time runbooks (i.e. Canary API's, Dashboards, Ansible Playbooks)
Participate in On-Call Schedules for 24/7 support and response to reported issues
Manage user access control for those applications without self-service features, in collaboration with the Application's Ownership teams
Manage the application's knowledge base repositories
Maintain Incident reports and dashboards for regular reporting and metric analysis to management and team members
Participate, support, create interactive alerting of available application logs and data for proactive responses of potential or active issues.
Collaborate, communicate and support disaster recovery planning and efforts
Participate in Database monitoring
Participating in and contributing to automation efforts towards improved process and reduction of manual routine procedures.
End user notifications and communication of scheduled, current, or otherwise known application incidents, deployments and changes which may impact their ability to use the application.
Maintain Server Manifests
Track server vulnerabilities for resolution and patching
Perform routine server reboots as vulnerabilities and patching are conducted
Ensure all Applications have appropriate logging and monitoring integrations
Routine monitor server performance and integrity (CPU, RAM, HDD) through telemetry metrics
Automate Infrastructure management through Ansible Playbooks and Scripting
Maintain infrastructure reports and dashboards for regular reporting and metric analysis to management and team members
Participate, support, create interactive alerting of available infrastructure logs and data for proactive responses of potential or active issues.
Collaborate, communicate and support disaster recovery efforts
Support applications under support in infrastructure and network migrations as architectural and network topologies evolve.
Perform deployments of new application artifacts (push-button or manual)
Ensure and maintain appropriate Change Control procedures according to the defined delivery pipeline for the application
Assist in troubleshooting deployment issues and improving the pipeline in relation to production deployments with the Delivery and Development Engineers
Monitor and prepare for upcoming releases through delivery pipeline dashboards and readiness meetings
Contributing to an overall Continuous Delivery Maturity transformation through automation and integration of tools
Bachelor’s degree in Computer Science, Engineering, or related field required
3 years of production support and monitoring of web applications and services
Expertise in troubleshooting and root cause analysis
Demonstrated experience in Agile/Lean development, application design, software development, and testing
Experience writing and managing knowledge bases and incidents
Demonstrated experience with high-level communications with consumers, users, and leaders of the platforms supported
Understands data modeling and general SQL database querying
Willingness and ability to automate routine operational tasks, with assistance from development engineers
Experienced Developer with any or all languages: Python, Ruby, SQL, and other industry standard languages for developing operational tooling and automation
Experience working in a DevOps oriented environment
Experience supporting complex ETL and applications in production
Experience with GIT version control
Proficient with Linux & Windows platforms
InComm provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity or national origin, citizenship, veteran’s status, age, disability status, genetics or any other category protected by federal, state, or local law.
*This position is eligible for the Employee Referral Bonus Program - Tier II
Job LocationUS-MO-St. Louis | US-Remote | US-MN-Minneapolis