ML Infrastructure Engineer
About Neuralfinity
As a fast-growing deep tech startup, our mission is to unlock the true potential of generative AI for startups, enterprises, and the public sector. Our research-driven platform provides customers with the tools to develop custom AI models designed to meet their specific needs, leveraging high-quality data and adhering to the highest standards of data privacy.
Details about the role
As ML Infrastructure Engineer you’ll be part of building the next-generation AI training platforms. You will architect and implement the infrastructure that powers our large-scale language and vision model training operations, working across bare metal GPU clusters and cloud environments. You’ll be at the intersection of high-performance computing and modern DevOps, designing automated solutions for cluster deployment, optimization, and management. This is a unique opportunity to tackle complex challenges in distributed systems while working with cutting-edge AI technologies. You’ll play a crucial role in developing the infrastructure that enables our ML teams to efficiently train and deploy large-scale models, from designing high-performance GPU clusters to implementing sophisticated orchestration systems for distributed training workloads.
Responsibilities:
- Design and implement scalable infrastructure for large GPU clusters
- Develop and maintain automated deployment systems for standing up GPU clusters
- Create and maintain Infrastructure as Code (IaC) templates
- Design and implement job scheduling and workload management systems
- Collaborate with ML engineers to optimize training infrastructure
- Develop automation tools and scripts for cluster management
- Implement security best practices and compliance measures
Requirements:
- Strong expertise in Infrastructure as Code tools
- Proven experience with container orchestration platforms
- Deep understanding of networking concepts
- Experience with CI/CD tools and practices
- Proficiency in infrastructure automation programming languages
- Demonstrated experience with GPU computing and technologies
Benefits:
- Opportunity to work in a fast-growing startup environment
- Competitive salary & equity compensation
- Networking opportunities within the industry
- Remote work from anywhere
- Flexible working hours
- Rapid career growth opportunities
A Final Note
You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.