Job Title: Data Engineer Location: Remote across India Job Type: Permanent
Responsibilities:
Participate in requirements clarification and sprint planning sessions.
Design technical solutions and implement them, inc .
ETL Pipelines – Build robust data pipelines in PySpark to extract, transform, using PySpark
Optimize ETL Processes – Enhance and tune existing ETL processes for better performance, scalability, and reliability
Writing unit and integration tests.
Support QA teammates in the acceptance process.
Resolving PROD incidents as a 3rd line engineer.
Required skills
Min 5 Years of experience in IT/Data
Programming: Proficiency in PySpark for distributed computing and Python for ETL development.
Strong expertise in writing and optimizing complex SQL queries, preferably with experience in databases such as PostgreSQL, MySQL, Oracle, or Snowflake.
Data Warehousing: Experience working with data warehousing concepts and platforms, ideally DataBricks.
ETL Tools: Familiarity with ETL tools & processes
Data Modelling: Experience with dimensional modelling, normalization/denormalization, and schema design.
Version Control: Proficiency with version control tools like Git to manage codebases and collaborate on development.
Data Pipeline Monitoring: Familiarity with monitoring tools (e.g., Prometheus, Grafana, or custom monitoring scripts) to track pipeline performance.
Data Quality Tools: Experience implementing data validation, cleansing, and quality frameworks, ideally Monte Carlo
Agile Methodologies: Comfort in working within an Agile team, including sprint planning, stand-ups, and retrospectives
Ability to work in a team environment with tools like Azure Dev Ops, Jira and/or Confluence.
Testing and Debugging: Strong debugging and problem-solving skills to troubleshoot complex data engineering