System Administrator
NATIONAL HEART, LUNG, AND BLOOD INSTITUTE
The Laboratory of Computational Biophysics (LCB) is a group of researchers who employ computational simulation methods to investigate problems in biophysics and chemistry using the Linux-based LoBoS high-performance computing (HPC) cluster within the National Heart, Lung, and Blood Institute (NHLBI) at the National Institutes of Health (https://www.lobos.nih.gov/LoBoS.shtml). LoBoS consists of several hundred CPU/GPU computational nodes, three tiers of storage (home directories, scratch space, and archive), associated network infrastructure (both Infiniband and Ethernet), and Linux desktops for users.
This position is for the day-to-day management of the LoBoS HPC compute nodes, storage systems, and desktops. The position involves working as part of a small team (at least two people) whose primary responsibilities are to keep the cluster running in good order and ensuring the cluster follows security best-practices as determined by the NIH and Department of Health and Human Services. It also involves maintaining the usability of the LoBoS cluster via yearly purchase and installation of hardware to replace aging components.
ABOUT THE POSITION
• Oversee that various components of the LoBoS cluster stay in good working order such as network configuration, firewall management (Palo Alto), file system management (ZFS, VAST), security, batch queuing systems (SLURM), database administration, distributed computing, file transfer services, web servers, and electronic mailing lists.
• May occasionally require work outside normal 9-5 hours in order to address emergency situations with the cluster (e.g. significant numbers of down nodes, storage outages, etc.) or cybersecurity incidents (FISMA).
• Ensure that the LoBoS cluster has sufficient capabilities to run the scientific software needed by the LCB scientists. Evaluates the existing system to determine when updates/upgrades to hardware and/or software are necessary. Responsible for managing the budget used to procure new
hardware/software for LoBoS. Oversee configuration and installation of virtual and physical servers and manage upgrades to existing hardware.
• Ensure that patches, security updates, and configuration changes to software systems are applied to enhance reliability and to meet security needs. Collaborate with OCIO, CIT, and NHLBI security teams to ensure adherence to compliance policies.
• Assist in maintaining the LoBoS Assessment & Authorization package based on National Institute of Standards and Technology SP 800-53 security controls under guidance from NHLBI's Information System Security Officers.
• Serve as a technical resource for HPC, LCB, NHLBI, and other NIH personnel in areas such as the Linux operating system, networking, database system administration, distributed computing. May serve on technical evaluation panels for institute-wide initiatives.
• Technology tracking: Stay informed regarding new developments in hardware/software, and evaluate their potential usability for LoBoS/LCB. Participates in conferences and meetings of professional groups concerned with the application of HPC, AI/machine learning, and other emerging computer technologies.
• Prepare software documentation and technical reports related to assigned projects.
ABOUT YOUR BACKGROUND
• 5+ years of experience in Linux HPC systems administration is preferred. However, less experienced candidates with outstanding qualifications will also be considered.
• Comprehensive knowledge of shell scripting. Broad knowledge of systems administration tools (e.g. Puppet, Ansible, etc.) along with a detailed knowledge of tools used in a particular area such as file system management, usage accounting, mail configuration, database system administration, file transfer, or security.
• Experience with government computer security rules and standards is desirable.
• Extensive knowledge of at least two high level computer languages such as C, C++, FORTRAN, Ruby, Perl, or Python is desirable.
• Experience implementing and managing SLURM batch queueing software preferred.
• Solid interpersonal, leadership, and critical thinking skills.
• Excellent written and oral communication skills.
ADDITIONAL INFORMATION
• Location: 9000 Rockville Pike, Bethesda, Maryland, which is accessible via bus/bicycle/Metro (Red Line: Medical Center).
• Some travel to professional meetings (e.g. Super Computing Conference) may occasionally be required.
• Some remote work is acceptable (up to 3 days per week).
• Employment type: full-time government contractor.
• Salary range: From $100,000 to $180,000/year, which will be commensurate with education and experience.
• A selection of health and wellness benefits will be offered.
HOW TO APPLY
To be considered, please submit your resume and cover letter to Dr. Daniel R. Roe at daniel.roe@nih.gov with the subject heading of System Administrator. Appointees must be U.S. citizens. Applications should be submitted by November 4, 2024.
We are an equal opportunity employer, and we actively prohibit discrimination and harassment of any kind. We strongly encourage people of color, LGBTQ+ people, immigrants, women, and people who are differently-abled to apply.