Data Engineer
5 часов назад
About us
Kyivstar.Tech is a Ukrainian hybrid IT company and a resident of Diia.City. We are a subsidiary of Kyivstar, one of Ukraine's largest telecom operators.
Our mission is to change lives in Ukraine and around the world by creating technological solutions and products that unleash the potential of businesses and meet users' needs.
Over 600+ KS.Tech specialists work daily in various areas: mobile and web solutions, as well as design, development, support, and technical maintenance of high-performance systems and services.
We believe in innovations that truly bring quality changes and constantly challenge conventional approaches and solutions. Each of us is an adherent of entrepreneurial culture, which allows us never to stop, to evolve, and to create something new. What you will do
- Design, develop, and maintain ETL/ELT pipelines for gathering, transforming, and storing large volumes of text data and related information. Ensure pipelines are efficient and can handle data from diverse sources (e.g., web crawls, public datasets, internal databases) while maintaining data integrity.
- Implement web scraping and data collection services to automate the ingestion of text and linguistic data from the web and other external sources. This includes writing crawlers or using APIs to continuously collect data relevant to our language modeling efforts.
- Implementation of NLP/LLM-specific data processing: cleaning and normalization of text, like filtering of toxic content, de-duplication, de-noising, detection, and deletion of personal data.
- Formation of specific SFT/RLHF datasets from existing data, including data augmentation/labeling with LLM as teacher.
- Set up and manage cloud-based data infrastructure for the project. Configure and maintain data storage solutions (data lakes, warehouses) and processing frameworks (e.g., distributed compute on AWS/GCP/Azure) that can scale with growing data needs.
- Automate data processing workflows and ensure their scalability and reliability. Use workflow orchestration tools like Apache Airflow to schedule and monitor data pipelines, enabling continuous and repeatable model training and evaluation cycles.
- Maintain and optimize analytical databases and data access layers for both ad-hoc analysis and model training needs. Work with relational databases (e.g., PostgreSQL) and other storage systems to ensure fast query performance and well-structured data schemas.
- Collaborate with Data Scientists and NLP Engineers to build data features and datasets for machine learning models. Provide data subsets, aggregations, or preprocessing as needed for tasks such as language model training, embedding generation, and evaluation.
- Implement data quality checks, monitoring, and alerting. Develop scripts or use tools to validate data completeness and correctness (e.g., ensuring no critical data gaps or anomalies in the text corpora), and promptly address any pipeline failures or data issues. Implement data version control.
- Manage data security, access, and compliance. Control permissions to datasets and ensure adherence to data privacy policies and security standards, especially when dealing with user data or proprietary text sources.
- Education & Experience: 3+ years of experience as a Data Engineer or in a similar role, building data-intensive pipelines or platforms. A Bachelor's or Master's degree in Computer Science, Engineering, or a related field is preferred. Experience supporting machine learning or analytics teams with data pipelines is a strong advantage.
- Data Pipeline Expertise: Hands-on experience designing ETL/ELT processes, including extracting data from various sources, using transformation tools, and loading into storage systems. Proficiency with orchestration frameworks like Apache Airflow for scheduling workflows. Familiarity with building pipelines for unstructured data (text, logs) as well as structured data.
- Programming & Scripting: Strong programming skills in Python for data manipulation and pipeline development. Experience with NLP packages (spaCy, NLTK, langdetect, fasttext, etc.). Experience with SQL for querying and transforming data in relational databases. Knowledge of Bash or other scripting for automation tasks. Writing clean, maintainable code and using version control (Git) for collaborative development.
- Databases & Storage: Experience working with relational databases (e.g., PostgreSQL, MySQL), including schema design and query optimization. Familiarity with NoSQL or document stores (e.g., MongoDB) and big data technologies (HDFS, Hive, Spark) for large-scale data is a plus. Understanding of or experience with vector databases (e.g., Pinecone, FAISS) is beneficial, as our NLP applications may require embedding storage and fast similarity search.
- Cloud Infrastructure: Practical experience with cloud platforms (AWS, GCP, or Azure) for data storage and processing. Ability to set up services such as S3/Cloud Storage, data warehouses (e.g., BigQuery, Redshift), and use cloud-based ETL tools or serverless functions. Understanding of infrastructure-as-code (Terraform, CloudFormation) to manage resources is a plus.
- Data Quality & Monitoring: Knowledge of data quality assurance practices. Experience implementing monitoring for data pipelines (logs, alerts) and using CI/CD tools to automate pipeline deployment and testing. An analytical mindset to troubleshoot data discrepancies and optimize performance bottlenecks.
- Collaboration & Domain Knowledge: Ability to work closely with data scientists and understand the requirements of machine learning projects. Basic understanding of NLP concepts and the data needs for training language models, so you can anticipate and accommodate the specific forms of text data and preprocessing they require. Good communication skills to document data workflows and to coordinate with team members across different functions.
- NLP Domain Experience: Prior experience handling linguistic data or supporting NLP projects (e.g., text normalization, handling different encodings, tokenization strategies). Knowledge of Ukrainian text sources and data sets, or experience with multilingual data processing, can be an advantage given our project's focus. Understanding of FineWeb2 or a similar processing pipeline approach.
- Advanced Tools & Frameworks: Experience with distributed data processing frameworks (such as Apache Spark or Databricks) for large-scale data transformation, and with message streaming systems (Kafka, Pub/Sub) for real-time data pipelines. Familiarity with data serialization formats (JSON, Parquet) and handling of large text corpora.
- Web Scraping Expertise: Deep experience in web scraping, using tools like Scrapy, Selenium, or Beautiful Soup, and handling anti-scraping challenges (rotating proxies, rate limiting). Ability to parse and clean raw text data from HTML, PDFs, or scanned documents.
- CI/CD & DevOps: Knowledge of setting up CI/CD pipelines for data engineering (using GitHub Actions, Jenkins, or GitLab CI) to test and deploy changes to data workflows. Experience with containerization (Docker) to package data jobs and with Kubernetes for scaling them is a plus.
- Big Data & Analytics: Experience with analytics platforms and BI tools (e.g., Tableau, Looker) used to examine the data prepared by the pipelines. Understanding of how to create and manage data warehouses or data marts for analytical consumption.
- Problem-Solving: Demonstrated ability to work independently in solving complex data engineering problems, optimising existing pipelines, and implementing new ones under time constraints. A proactive attitude to explore new data tools or techniques that could improve our workflows.
- Office or remote – it's up to you. You can work from anywhere, and we will arrange your workplace.
- Remote onboarding.
- Performance bonuses.
- We train employees with the opportunity to learn through the company's library, internal resources, and programs from partners.
- Health and life insurance.
- Wellbeing program and corporate psychologist.
- Reimbursement of expenses for Kyivstar mobile communication.
-
Data Engineer
6 дней назад
Киев, Киев, Украина Capgemini Engineering Полный рабочий день 40 000 ₴ - 80 000 ₴ в годOverviewJob DescriptionWe are seeking a skilled and proactive Data Engineer with deep expertise in AWS and Apache Iceberg to lead and support a strategic data platform migration initiative. This role is ideal for someone who thrives in dynamic environments and is passionate about building scalable, high-performance data solutions.Key ResponsibilitiesLead the...
-
Data Engineer
6 дней назад
Киев, Киев, Украина Capgemini Полный рабочий день 80 000 ₴ - 120 000 ₴ в годJob Description Overview:We are seeking a skilled and proactive Data Engineer with deep expertise in AWS and Apache Iceberg to lead and support a strategic data platform migration initiative. This role is ideal for someone who thrives in dynamic environments and is passionate about building scalable, high-performance data solutions.Key Responsibilities:Lead...
-
Data Engineer
1 неделя назад
Киев, Киев, Украина Solidgate Полный рабочий день 120 000 ₴ - 240 000 ₴ в годOur Mission and VisionAtSolidgate, our mission is clear: to empower outstanding entrepreneurs to build exceptional internet companies. We exist to fuel the builders — the ones shaping the digital economy — with the financial infrastructure they deserve. We're on an ambitious journey to become the #1 payments orchestration platform in the world.Solidgate...
-
Data Engineer
2 дней назад
Киев, Киев, Украина Deloitte Полный рабочий день 250 000 ₴ - 500 000 ₴ в годDescription & RequirementsWho we are looking forData Engineer to join a team which plays a central role in digitization and data analysis in the company. The team is responsible for the implementation of Data & AI activities and supports automotive brand globally and across departments. This role involves planning, development, and implementation of new data...
-
Middle Data Engineer
5 часов назад
Киев, Киев, Украина Ajax Systems Полный рабочий день 60 000 ₴ - 120 000 ₴ в годAjax Systems is a full-cycle company working from idea generation and R&D to mass production and sales. We do everything: we produce physical devices (the system includes many different sensors and hubs), write firmware for them, develop the server part and release mobile applications. The whole team is in one office in Kyiv, all technical and product...
-
Senior Data Engineer
5 часов назад
Киев, Киев, Украина Tietoevry Полный рабочий день 60 000 ₴ - 120 000 ₴ в годCompany Description Job Description We are looking for a highly passionate and results-oriented Data Engineer specializing in Azure Synapse, Spark, and SQL who will focus on leading the design, development, optimization, and governance of enterprise-scale data platforms and pipelines on the Microsoft Azure cloud. Our customer is one of the...
-
Data Engineer
5 часов назад
Киев, Киев, Украина Brainstack Полный рабочий день 80 000 ₴ - 120 000 ₴ в годJob description Привіт, ми — Brainstack, українська мультипродуктова IT-компанія.Якщо ти шукаєш простір можливостей для постійного масштабування і любиш задачі із зірочкою — тоді тобі до нас. Тут...
-
Senior Data Engineer
1 неделя назад
Киев, Киев, Украина Star Полный рабочий день 80 000 ₴ - 120 000 ₴ в годWe are looking for youAs we architect the next wave of data solutions in the AdTech and MarTech sectors, we're looking for a Senior Data Engineer—a maestro in data architecture and pipeline design. If you're a seasoned expert eager to lead, innovate, and craft state-of-the-art data solutions, we're keen to embark on this journey with you.Contract type: Gig...
-
Data Science Engineer
1 неделя назад
Киев, Киев, Украина Deloitte Полный рабочий день 104 000 ₴ - 180 000 ₴ в годGeneral InformationPositionData Science Engineer | UkraineWork arrangementFull-timeCityKyivCountryUkraineDepartmentConsultingTeamEngineering, AI & DataArea of interestIT - DevelopmentWay of workRemoteDescription & RequirementsWho we are looking forWe're searching for a proactive and technically skilled Data Science Engineer with hands-on experience in...
-
Data Engineer
5 часов назад
Киев, Киев, Украина Ajax Systems Полный рабочий день 800 000 ₴ - 1 200 000 ₴ в годAjax Systems — міжнародна технологічна компанія, найбільший в Європі розробник і виробник систем безпеки із можливостями розумного дому. Це ціла екосистема зі 180 пристроїв, мобільних і десктопних...