ML Infrastructure Engineer

    Remote (Europe)
    Full Time

    About Neuralfinity

    As a fast-growing deep tech startup, our mission is to unlock the true potential of generative AI for startups, enterprises, and the public sector. Our research-driven platform provides customers with the tools to develop custom AI models designed to meet their specific needs, leveraging high-quality data and adhering to the highest standards of data privacy.

    Details about the role

    As ML Infrastructure Engineer you’ll be part of building the next-generation AI training platforms. You will architect and implement the infrastructure that powers our large-scale language and vision model training operations, working across bare metal GPU clusters and cloud environments. You’ll be at the intersection of high-performance computing and modern DevOps, designing automated solutions for cluster deployment, optimization, and management. This is a unique opportunity to tackle complex challenges in distributed systems while working with cutting-edge AI technologies. You’ll play a crucial role in developing the infrastructure that enables our ML teams to efficiently train and deploy large-scale models, from designing high-performance GPU clusters to implementing sophisticated orchestration systems for distributed training workloads.

    Responsibilities:

    • Design and implement scalable infrastructure for large GPU clusters
    • Develop and maintain automated deployment systems for standing up GPU clusters
    • Create and maintain Infrastructure as Code (IaC) templates
    • Design and implement job scheduling and workload management systems
    • Collaborate with ML engineers to optimize training infrastructure
    • Develop automation tools and scripts for cluster management
    • Implement security best practices and compliance measures

    Requirements:

    • Strong expertise in Infrastructure as Code tools
    • Proven experience with container orchestration platforms
    • Deep understanding of networking concepts
    • Experience with CI/CD tools and practices
    • Proficiency in infrastructure automation programming languages
    • Demonstrated experience with GPU computing and technologies

    Benefits:

    • Opportunity to work in a fast-growing startup environment
    • Competitive salary & equity compensation
    • Networking opportunities within the industry
    • Remote work from anywhere
    • Flexible working hours
    • Rapid career growth opportunities

    A Final Note

    You do not need to match all of the listed expectations to apply for this position. We are committed to building a team with a variety of backgrounds, experiences, and skills.