Lead MLOps Engineer

2 недель назад


Киев, Киев, Украина Capgemini Полный рабочий день

At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the world's most innovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology experts think outside the box as they provide unique R&D and engineering services across all industries. Join us for a career full of opportunities. Where you can make a difference. Where no two days are the same.

Your Client

Our client is at the forefront of revolutionizing AI computing by re-engineering infrastructure at the system level. Its architecture, combined with sophisticated software intelligence, abstraction, and an orchestration layer, enables developers to leverage a diverse array of compute resources, achieving efficient and reliable computing at a fraction of the cost. Founded by industry veterans from Nvidia, Apple, Tesla, Intel, and Zoox, it's shaping the future of AI.

As the Lead/Staff AI Runtime Engineer, you'll play a pivotal role in the design, development, and optimization of the core runtime infrastructure powering distributed training and deployment of large AI models. This is a hands-on leadership role - ideal for a systems-minded software engineer who thrives at the intersection of AI workloads, runtimes, and performance-critical infrastructure.

Your Role
  • Own the core runtime architecture supporting AI training and inference at scale.
  • Design resilient and elastic runtime features (for example, dynamic node scaling and job recovery) within the custom PyTorch-based stack.
  • Optimize distributed training reliability, orchestration, and job-level fault tolerance.
  • Profile and enhance low-level system performance across training and inference pipelines.
  • Improve packaging, deployment, and integration of customer models in production environments.
  • Design and maintain libraries and services that support the full model lifecycle: training, checkpointing, fault recovery, packaging, and deployment.
  • Implement observability hooks, diagnostics, and resilience mechanisms for deep-learning workloads.
  • Champion best practices in CI/CD, testing, and software quality across the AI Runtime stack.
  • Work cross-functionally with Research, Infrastructure, and Product teams to align runtime development with customer and platform needs.
  • Guide technical discussions, mentor junior engineers, and help scale the AI Runtime team's capabilities.
Your Profile
  • PyTorch, TensorFlow, JAX (Advanced)
  • Python, C++ (Go/Rust optional)
  • Distributed training frameworks
  • Multi-GPU, multi-node optimization
  • Container orchestration (Kubernetes, Docker)
  • CI/CD, fault recovery, job scheduling
  • TorchElastic, Ray, custom orchestrators
  • Runtime architecture and systems performance tuning

Nice to Have

  • Contributions to PyTorch internals or open-source deep learning infrastructure projects.
  • Intel OpenVINO
  • Familiarity with LLM training pipelines, checkpointing, or elastic training orchestration.
  • Experience with Kubernetes, Ray, TorchElastic, or custom AI job orchestrators.
  • Background in systems research, compilers, or runtime architecture for high-performance computing (HPC) or machine learning.
  • Start-up experience.
  • Ability to travel to the EU.
What You Will Love About Working Here
  • We care about all our employees and want them to feel as comfortable as possible. That's why we offer them health insurance from the first days, regardless of the probationary period.
  • The gift from the company - Christmas holidays from 25 December to 31 December.
  • Сooperation with Superhumans center and Veteran HUB. Capgemini Engineering has supported the launch of psychological rehabilitation department of Superhumans. Our team also donated over UAH prosthetics for three Ukrainian defenders. Currently, we support psychological counseling provided by the Veteran Hub, and we have implemented an internal policy making the company friendly to military and veterans with the assistance of the Hub.

Capgemini is a global business and technology transformation partner, helping organizations to accelerate their dual transition to a digital and sustainable world, while creating tangible impact for enterprises and society. It is a responsible and diverse group of 340,000 team members in more than 50 countries. With its strong over 55-year heritage, Capgemini is trusted by its clients to unlock the value of technology to address the entire breadth of their business needs. It delivers end-to-end services and solutions leveraging strengths from strategy and design to engineering, all fueled by its market leading capabilities in AI, generative AI, cloud and data, combined with its deep industry expertise and partner ecosystem.


  • ML/MLOps in Data

    2 недель назад


    Киев, Киев, Украина Ajax Systems Полный рабочий день

    Ajax Systems — міжнародна технологічна компанія, найбільший в Європі розробник і виробник систем безпеки із можливостями розумного дому. Це ціла екосистема зі 180 пристроїв, мобільних і десктопних...

  • Senior Python GenAI Engineer

    1 неделя назад


    Киев, Киев, Украина Deloitte Полный рабочий день

    Description & RequirementsWho we are looking for5+ years Python (complex backend & AI projects)Expertise in LangChain, LangGraph, AutoGen, CrewAI, or similar agentic AI frameworksHands-on experience with LLMs (OpenAI, Anthropic Claude, Google Gemini), multi-agent systems, RAG architectures, vector databases (Pinecone, Weaviate, etc.)Proven delivery of...

  • Senior ML Developer/Team Leader

    2 недель назад


    Киев, Киев, Украина Tietoevry Полный рабочий день 120 000 ₴ - 180 000 ₴ в год

    Job Description Tietoevry Create is inviting a talented professional to join our growing team as a Senior ML Developer/Team Leader to oversee machine learning development, establish technical standards, and guide team performance optimization. You will contribute to ML architecture design, performance optimization, and end-to-end MLOps automation with a...

  • Senior ML Developer/Team Leader

    1 неделя назад


    Киев, Киев, Украина Tietoevry Полный рабочий день

    Job DescriptionTietoevry Create is inviting a talented professional to join our growing team as aSenior ML Developer/Team Leaderto oversee machine learning development, establish technical standards, and guide team performance optimization. You will contribute to ML architecture design, performance optimization, and end-to-end MLOps automation with a focus...

  • Google Cloud Engineer

    1 неделя назад


    Киев, Киев, Украина SoftwareOne Полный рабочий день

    Why SoftwareOne?SoftwareOne and Crayon have come together to form a global, AI-powered software and cloud solutions provider with a bold vision for the future. With a footprint in over 70 countries and a diverse team of 13,000+ professionals, we offer unparalleled opportunities for talent to grow, make an impact, and shape the future of technology. At the...

  • Google Cloud Engineer

    1 неделя назад


    Киев, Киев, Украина SoftwareOne Полный рабочий день

    Why SoftwareOne?SoftwareOne and Crayon have come together to form a global, AI-powered software and cloud solutions provider with a bold vision for the future. With a footprint in over 70 countries and a diverse team of 13,000+ professionals, we offer unparalleled opportunities for talent to grow, make an impact, and shape the future of technology. At the...

  • Machine Learning Engineer

    1 неделя назад


    Киев, Киев, Украина Comprehensive Rehab Consultants Полный рабочий день

    About Comprehensive Rehab Consultants (CRC)Comprehensive Rehab Consultants partners with skilled nursing facilities to improve clinical quality, efficiency, and financial performance. CRC is recognized as an innovator in the post-acute care space, developing new AI-driven care and revenue models that decrease hospitalizations and improve outcomes.We are...

  • Front-end Engineer Lead

    7 дней назад


    Киев, Киев, Украина Genesis Полный рабочий день

    Boosters — це українська продуктова компанія. Ми створюємо продукти в сферах EdTech та life-improvement, які несуть цінність для 55 мільйонів людей в усьому світі. Наші додатки регулярно потрапляють в ТОПи...

  • Team Lead React Native Engineer

    3 дней назад


    Киев, Киев, Украина AUTO1 Group Полный рабочий день

    Company DescriptionWe are looking for a Team Lead React Native Engineer with strong English communication skills, a hands-on mindset, and proven leadership experience to join our mobile team and drive the next stage of our React Native app for iOS and Android.You will lead a team of mobile engineers while staying close to the code: shaping architecture,...

  • Team Lead React Native Engineer

    3 дней назад


    Киев, Киев, Украина AUTO1 Group Полный рабочий день

    We are looking for a Team Lead React Native Engineer with strong English communication skills, a hands-on mindset, and proven leadership experience to join our mobile team and drive the next stage of our React Native app for iOS and Android.You will lead a team of mobile engineers while staying close to the code: shaping architecture, improving quality and...