HPC Systems Engineer (W/M) – Open Position
Our infrastructure includes:
- 2k+ compute nodes (CPU + GPU),
- Large storage systems,
Deployment and configuration management tools,
- A centralized platform for storing, visualizing and monitoring metrics and events.
And these ongoing developments:
- 100+ nodes for remote 3D visualization,
- Elastic compute solutions in the cloud,
- Long-term archive systems.
To strengthen our team, we are seeking an HPC Systems Engineer . The successful candidate will join the Systems Team responsible for the HPC services’ deployment, operations, and evolution.
Working for the EPFL means being part of a prestigious school that consistently ranks among the top 20 universities worldwide.
Main duties and responsibilities include :
More specifically, you will:
- Design, build and deploy highly scalable scientific compute environments in less than one hour to allow infinite growth.
- Design and manage a variety of storage solutions and tools to monitor it.
- Investigate and troubleshoot issues with hardware, operating systems, networking and scientific applications.
- Develop automated tests to ensure no regression is caused by a changes.
- Participate in the SCITAS selection process for the acquisition of next-generation HPC systems.
- Support and train the user community.
- Take a leading role in one or more of the activities described above.
Your profile :
Applicants must have:
- Extensive experience in managing distributed GNU/Linux computing systems (Beowulf clusters), including services, low latency network and massive upgrade operations.
- Comprehensive knowledge of distributed file systems used for large-scale computing clusters (GPFS, Lustre or BeeGFS).
- Deep knowledge of Configuration Management tools, such as Puppet, Ansible or similar.
- Experience with container technologies, including Docker and Kubernetes.
- Experience with Infrastructure as Code and other development tools, including Terraform, Vault, Git and Jenkins.
- Robust Python and shell scripting experience.
- Ability to clearly document procedures with a focus on sharing knowledge.
Applicants should have:
- Master’s or Bachelor’s degree in an applicable field.
- Experience with workload management systems such as Slurm at large scales.
- Experience with monitoring and alerting systems.
We offer :
What you can expect from us
- We offer competitive salaries which takes into account job profiles, skills and years of experience.
- Employees are affiliated to the EPFL’s advantageous pension system.
- Based on a five-day week, the work week is 41 hours for full-time employees. Work schedules are mutually agreed upon by supervisors and staff members based on service requirements. Some flexibility is allowed.
- In addition to public holidays, staff members are eligible for five to six weeks of vacation per year based on age.
- EPFL offers family allowances. Employees receive a cantonal allowance plus a supplement from EPFL.
- We have day cares located on campus that welcome (in priority) children of EPFL employees.
- More than 100 sports activities are available at the Sports Centre with very attractive rates.
What you need to know before applying
- At SCITAS we speak French and English fluently. We accept non-bilingual applicants willing to learn the other language.
- Only candidates who applied through the EPFL website or our partner Jobup’s website will be considered.
- Promoting equality between women and men in scientific careers as well as within the administrative and technical staff is an integral part of the policy of continued excellence implemented by EPFL.
- You will work remotely from any location within Switzerland during the periods when EPFL imposes telework.
- There will be no relocation assistance provided.
- In response to the COVID-19 pandemic, all interviews will be conducted virtually.
- Only selected candidates will be contacted. We appreciate the time spent on the application.