Senior Storage & HPC Administrator - R&D HPC team

NXP Semiconductors • austin, tx, us • 3m ago

Job Description

Senior Storage & HPC administrator - R&D HPC team - Austin, US

This is what you will do as HPC DevOps engineer at NXP

You are expected to work very closely with your global colleagues within R&D IT and help deliver the Storage & HPC services (High Performance Computing and Virtual Desktop Infrastructure) to our engineering and R&D customers. Your AMEC team has operational responsibility for all designated services within the region and help provide backup support to our other two regions, (EMEA and APAC), when necessary to ensure good coverage and fast response for any issues requiring attention. Your work will be combination of development and operations.

What’s in it for me?

The team keeps a healthy mix of 50% development and 50% operational activities. This requires a constant eye on innovation. You can contribute in every development phase, i.e., from problem analysis and design to implementation, testing, and bug fixing.

You will receive support from your direct colleagues, a Scrum team of data engineers and data scientists
You will get training and personal development opportunities from NXP in alignment with your personal and professional needs
The Product Owner, representatives from the R&D business, and other stakeholders will give you guidance on the solution direction
The opportunity to work with cutting-edge technologies in an AWS cloud based serverless data platform architecture
The greatest work place in the Austin and a safe working environment that lives up to its company values

Primary responsibility:

In this role, you will play a pivotal part in supporting the architecture and design of large-scale data storage related services on-prem and cloud
You will be responsible for resolving highly technical issues, providing Level 3 support, and serving as a senior resource for IT initiatives and operations
You will also participate in design reviews and drive specific goals around performance, scalability, and availability
You will fill a key technical role in our team with equally dedicated engineers supporting our local client’s data storage infrastructure and services
Perform level 3 support functions, including installation, configuration, upgrades, and support of AWS FSxN, NetApp ONTAP and StorageGRID Environments
Identify and determine root causes of performance issues in an EDA environment from end users to storage subsystems
Create dashboards for end users to enhance visibility and usability
Find opportunity for performance improvement while providing multi-level technical support in defining, developing and evaluating storage products
Provide recommendations to improve the storage infrastructure and address/lead critical issues and analysis
Work with application and infra teams to ensure the appropriate storage level provisioning is maintained. Monitors server/storage infrastructure and any processes related to these systems
Automation to deliver highly optimized infrastructure
Strong automation skills preferable using Ansible, Python.
Experienced as a DevOps engineer with IT infra as code (IaC), configuration management tools & CICD
Keep yourself updated on different storage technologies, protocols, and file systems (eg: NAS, SAN, NFS, CIFS, object storage).

Secondary responsibility:

As a Linux Systems Administrator with RedHat you will ensure the stability, integrity, and efficient operation of our infrastructure by monitoring, maintaining, supporting, and optimizing our development and production environments
Provide installation/configuration, operation, and maintenance of systems software and related infrastructure
Perform ongoing performance tuning, hardware upgrades, and resource optimization as required
Perform daily system monitoring, verifying the integrity and availability of all hardware, server resources, systems and key processes
Reviewing system and application logs, and verifying completion of scheduled jobs such as backups, both internally and externally
Implement automated approaches for system administration tasks
As a Senior Linux Systems Administrator, you will: Provide technical support for the software, hardware, wired network infrastructure, and system security for a Linux High Performance Computing (HPC) environment. Having knowledge and hands on experience on specific Grid computing applications like LSF, Univa Grid, Slurm, etc would be a plus.
Test operating system and software upgrades, patches, and hot fixes before they are applied to the operational systems
Create and test backups of data, provide data cleansing services, verify data integrity, implement access controls

Typically, you…

You can describe yourself as follows:

Minimum qualifications: Bachelor’s degree or equivalent practical experience in Software Engineering or Computer Science
10-12 years of hands-on experience in NetApp storage technologies.
NetApp certified System Administrator certification
AWS system architect certification
Five years of experience working on large DevOps implementations and operations
Solid DevOps mindset and ability to evaluate and optimize DevOps processes and practice using Lean or Agile.
Proficiency with maintaining automation and configuration management using tools such as Jenkins, GitLab, AWS CodeDeploy.
Deep understanding of Linux operating systems, networking, and server management and use tools such as Bash, Ansible, PowerShell.
Great problem-solving and analytical skillset to troubleshoot and resolve complex issues that may arise and use tools such as Splunk, Grafana, or Prometheus to monitor and analyze the performance and health of the systems that you support
Understanding of identity and access management relating to provisioning, access control, and certifications