AI QA Engineer

2 недель назад


Украина, Украина «Київстар» Полный рабочий день 90 000 $ - 120 000 $ в год

We are seeking an AI QA Engineer with specialization in LLM/NLP model quality assurance to ensure our language models and NLP applications meet the highest standards of accuracy, reliability, and safety. In this role, you will develop rigorous testing strategies for our AI models – including large language models – and lead efforts to detect issues such as factual errors, biases, and instability in model outputs. You will work closely with data scientists and engineers to integrate testing into the model development lifecycle, from early prototyping to post-deployment monitoring. This position is ideal for someone with a strong quality assurance background and a passion for AI, who can bridge the gap between traditional software QA and the unique challenges of evaluating AI systems (chatbots, NLP APIs, etc.) in the context of our Ukrainian LLM projectand other products.

Responsibilities:

  • Develop and execute comprehensive AI model evaluation strategies to assess the performance of our NLP and LLM systems. Define testing methodologies that cover correctness (e.g., accuracy of responses, compliance with requirements), consistency, and fairness of model outputs.
  • Analyze benchmarking datasets, define gapsgaps, and develop the first SOTA benchmarking framework for Ukrainian language.
  • Analyze training datasets and collaborate with data engineers on processing pipelines improvement. Implement training data testing framework.
  • Implement both automated and manual testing for applications powered by large language models. This includes creating automation scripts or test harnesses that can systematically query models with test cases (prompts/questions) and verify responses, as well as performing hands-on review of outputs for subjective evaluation.
  • Build and curate high-quality test datasets for model evaluation. Manage a repository of test inputs (e.g., sample user queries, edge-case scenarios, conversational dialogues) along with expected or reference outputs when applicable. Ensure these datasets are diverse, balanced, and representative of real-world use cases, including Ukrainian language content and culturally relevant scenarios.
  • Develop pipelines for synthetic data generation and adversarial example creation to challenge the model's robustness. Use techniques such as paraphrasing, noise injection, or adversarial prompting to produce test cases that can reveal model weaknesses.
  • Design and maintain testing frameworks to detect hallucinations, biases, and other failure modes in LLM outputs.
  • Define and track key AI performance metrics. Monitor metrics like factual accuracy, coherence/fluency, relevancy to prompt, response diversity, latency of response, and user satisfaction ratings if available. Establish baseline metrics for each new model version and ensure subsequent iterations meet or exceed these benchmarks.
  • Work closely with the AI development team to integrate QA in the development process. Collaborate with data scientists to test models at early stages (e.g., evaluating prototypes before full deployment), and with ML engineers to include automated tests in CI/CD pipelines for model updates.
  • Debug and analyze AI model failures. When tests uncover issues (e.g., a model consistently gives incorrect information in a certain domain or shows a bias), investigate and identify root causes by analyzing model outputs and underlying data. Provide clear, detailed reports on issues with steps to reproduce and potential causes.
  • Provide feedback and recommendations for model improvement. Work with prompt engineers or NLP scientists to refine prompts and instructions that guide the model towards better performance.
  • Implement continuous monitoring in production to catch regressions or new issues. Set up mechanisms to regularly evaluate live model outputs (via sampling or user feedback analysis) and alert the team if any quality metrics degrade over time (indicative of model drift or unforeseen use cases).
  • Maintain comprehensive test documentation and reports. Document test plans, test case suites, and summarize the results of evaluations for each model version (including graphs/metrics and qualitative findings). Communicate findings to both technical teams and stakeholders in a clear, actionable manner.

Required Qualifications:

  • QA Experience:
  • 3+ years in a Quality Assurance or Testing role, with at least part of that focused on testing AI, ML, or complex data-driven systems and 2+ years in data analysis.
  • Strong foundation in QA methodologies, test planning, and test case design.
  • Experience writing test plans and handling bug tracking for software projects.
  • AI/ML Knowledge:
  • Familiarity with machine learning concepts and specific challenges of testing AI models.
  • Experience with AI/ML testing frameworks and LLM evaluation methodologies – for example, knowledge of how to measure model accuracy on benchmarks, how to perform AB testing on model versions, or using frameworks like Hugging Face's evaluation tools or custom Python-based testing.
  • NLP Domain Skills:
  • Solid understanding of Natural Language Processing tasks and common failure modes of language models.
  • Awareness of issues like model hallucination (making up facts), bias in AI (and methods to test for bias), and the importance of context in language understanding.
  • Ideally, hands-on experience testing chatbots, virtual assistants, or language generation systems.
  • Programming & Tools:
  • Proficiency in Python for developing test automation and evaluation scripts.
  • Familiarity with testing frameworks (PyTest, unittest) and libraries commonly used in ML/NLP (pandas, numpy for data handling; possibly Hugging Face transformers for model interfacing).
  • Experience with tools for dataset handling and annotation; ability to write simple scripts to manipulate and evaluate text data.
  • Data Management:
  • Experience creating and managing test datasets, including annotation and labeling processes.
  • Comfortable with basic data engineering to gather logs or outputs from models and analyze them.
  • Knowledge of using version control for test scripts and maintaining a repository of test cases.
  • Analytical Skills:
  • Strong problem-solving and debugging skills specifically applied to AI outputs
  • Ability to notice patterns in model errors and analytically determine what they have in common.
  • Capacity to interpret model evaluation metrics and translate them into actionable improvements.
  • Communication:
  • Excellent written and verbal communication skills.
  • Able to clearly document bugs, write detailed QA reports, and discuss issues with developers and researchers.
  • Fluent Ukrainian is a must, as our LLM is oriented towards Ukrainian – you should be able to evaluate outputs in Ukrainian for correctness and nuance.
  • Attention to Detail:
  • A keen eye for spotting subtle errors or oddities in AI behavior.
  • Patience and thoroughness in performing manual testing when needed, and creativity in thinking of edge cases or tricky scenarios to test the model's limits.

Preferred Qualifications:

  • AI Testing Tools:
  • Experience with specialized tools or frameworks for AI testing, such as model evaluation harnesses, adversarial testing platforms, or crowdsourced evaluation methods.
  • Familiarity with techniques like prompt engineering and how prompt changes affect model output quality.
  • Statistical Analysis:
  • Ability to perform statistical analyses on model performance results (significance testing for A/B comparisons, etc.) to determine if changes are improvements.
  • Understanding of experiment design in AI (e.g., proper control groups for new model versions).
  • Continuous Integration:
  • Experience integrating tests into CI/CD pipelines for ML – for example, automatically evaluating a model on a validation set every time it's updated, and blocking deployment if it fails certain criteria.
  • Familiarity with ML model versioning and deployment workflows.
  • Security & Compliance Testing:
  • Knowledge of testing AI models for security and compliance issues – e.g. prompt injection attacks on LLMs, data privacy in outputs, or ensuring no disallowed content is generated according to usage policies.
  • UX Perspective:
  • Some experience or understanding of user experience as it relates to AI products.
  • Being able to anticipate how end-users might interact with the AI (for instance, phrasing questions in unexpected ways) and ensuring the model handles such interactions gracefully.
  • Testing Certifications:
  • Any certifications or formal training in Quality Assurance, software testing (such as ISTQB) or in AI/ML could be a plus, demonstrating a commitment to the discipline.

What we offer:

  • Office or remote — it's up to you. You can work from anywhere, and we will arrange your workplace.
  • Remote onboarding.
  • Performance bonuses for everyone (annual or quarterly — depends on the role).
  • We train employees: with the opportunity to learn through the company's library, internal resources, and programs from partners.
  • Health and life insurance.
  • Wellbeing program and corporate psychologist.
  • Reimbursement of expenses for Kyivstar mobile communication.

  • Manual QA Engineer

    2 недель назад


    Украина, Украина Solvd Полный рабочий день 60 000 $ - 80 000 $ в год

    Solvd is an AI-first advisory and digital engineering firm delivering measurable business impact through strategic digital transformation. Taking an AI-first approach, we bridge the critical gap between experimentation and real ROI, weaving artificial intelligence into everything we do and helping clients at all stages accelerate AI integration into each...

  • Manual QA Engineer

    2 недель назад


    Украина, Украина Solvd Полный рабочий день 60 000 $ - 80 000 $ в год

    Solvd is an AI-first advisory and digital engineering firm delivering measurable business impact through strategic digital transformation. Taking an AI-first approach, we bridge the critical gap between experimentation and real ROI, weaving artificial intelligence into everything we do and helping clients at all stages accelerate AI integration into each...

  • Strong Middle/Senior Manual QA Engineer

    2 недель назад


    Украина, Украина Opinov8 Полный рабочий день 6 000 ₴ - 8 000 ₴ в год

    We are looking for an experienced QA Manual Engineer to join our client's team to support ongoing testing efforts across web applications. This role is focused on manual testing, but candidates should also have a good understanding of Playwright and be comfortable collaborating with automation engineers or contributing lightweight tests when needed.PROJECT:...

  • AI/ML Engineer

    1 неделя назад


    Украина, Украина Genesis Полный рабочий день 70 000 $ - 120 000 $ в год

    At Promova, we're redefining language education to make it accessible, personal, and effective for today's fast-paced world. Our growing team of 170 professionals is on a mission to connect people, bridge cultures, and empower lifelong learners — reaching every country except aggressor states (yes, even Antarctica). We blend AI-driven innovation with...

  • Lead AI Engineer

    2 недель назад


    Украина, Украина DraftKings Полный рабочий день 104 000 $ - 130 878 $ в год

    About DraftKingsAt DraftKings, AI is becoming an integral part of both our present and future, powering how work gets done today, guiding smarter decisions, and sparking bold ideas. It's transforming how we enhance customer experiences, streamline operations, and unlock new possibilities. Our teams are energized by innovation and readily embrace emerging...

  • Senior AI/ML Engineer

    2 недель назад


    Украина, Украина Intellias Полный рабочий день 90 000 $ - 120 000 $ в год

    Vacancy detailsAI/ML EngineeringMachine Learning EngineerSeniorUkraineRemoteRefer a friend nowWe're building a corporate AI platform that will power products, automate core processes, and unlock new revenue – one of the company's top strategic initiatives.To accelerate this journey, we're hiring a Senior AI/ML Engineer (Azure) to lead architecture and...

  • Trainee / Junior Generative AI Engineer

    2 недель назад


    Украина, Украина SOMBRA Полный рабочий день 40 000 $ - 60 000 $ в год

    Trainee / Junior Generative AI EngineerHybrid, Office, RemoteUkraineWe are seeking a Trainee / Junior Generative AI Engineer to join our AI team and learn how to build and deploy Generative AI solutions. This role is ideal for someone passionate about AI who wants to gain hands-on experience with Large Language Models (LLMs), Retrieval-Augmented Generation...

  • Middle General QA Engineer

    2 недель назад


    Украина, Украина Genesis Полный рабочий день 60 000 ₴ - 100 000 ₴ в год

    At Promova, we're redefining language education to make it accessible, personal, and effective for today's fast-paced world. Our growing team of 170 professionals is on a mission to connect people, bridge cultures, and empower lifelong learners — reaching every country except aggressor states (yes, even Antarctica). We blend AI-driven innovation with...

  • Junior–Middle QA Engineer

    1 день назад


    Украина, Украина MGID Полный рабочий день 20 000 ₴ - 25 000 ₴ в год

    Job DescriptionMGID is a global advertising platform that helps publishers monetize their audiences and enables brands to promote their products and services across the open web effectively. Using AI-powered technology, we deliver high-quality native, display, and video ads in brand-safe environments, balancing user experience and performance.Every month,...

  • Senior Lead AI Engineer

    2 недель назад


    Украина, Украина DraftKings Полный рабочий день 104 000 $ - 130 878 $ в год

    About DraftKingsAt DraftKings, AI is becoming an integral part of both our present and future, powering how work gets done today, guiding smarter decisions, and sparking bold ideas. It's transforming how we enhance customer experiences, streamline operations, and unlock new possibilities. Our teams are energized by innovation and readily embrace emerging...