Data Reliability Engineer

Job Description:

We are undertaking a Data Transformation and as data becomes fundamental to what we are doing, the ability to have the right data in the right place at the right time, that is accurate and consistent is essential.

This role will play a vital role within this evolving landscape, working alongside the Data Teams to orchestrate and facilitate reliable delivery of data to the business. You will work alongside the Data Engineering team, Infrastructure, Data analytics & Science team to ensure the landscape of data lake is reliable & available for business intelligence and implement proactive data observability processes within the Technology area.

You will be responsible for helping deliver high data availability and quality through the entire data life cycle from ingestion to end-product: Dashboards, ML models & production datasets. You will aid the data engineering to deliver outstanding service to their uses, making more data driven

Responsibilities

Your responsibility lies within two spectrums of proactive & reactive data reliability engineering.
Designing data reliability strategy focusing exclusively on tacking end-to-end reliability improvements
You will monitor the performance and reliability of our big data systems and recommend performance and reliability opportunities.
Applying best practices from DataOps and Site reliability engineering of setting data SLAs, and data observability within data lake platform
Defining and setting SLAs & SLOs for all data sets and processes running in production in collaboration with engineering heads, architects & business stakeholders.
Owning & governing data reliability lifecycle of detect, resolve & prevent.
Owning the incident management process to ensure that incidents are resolved quickly, and root cause analysis is performed & understood to prevent repeat occurrences
You will maintain infrastructure reliability for data pipelines, deployments in CI/CD and availability of these systems.
Driving the most vital KPI’s of Big data platforms like Mean Time to detection (MTTD) and Mean time to recover (MTTR) to green rag status.

Requirements

4+ years’ experience developing, monitoring and optimising cloud-based data solutions such as Azure, Databricks or AWS/GCP
Strong understanding of data warehousing and applications from design to deployment using SQL, Python other big data processing tools
Proficient in one of the IaC technologies such as Microsoft ARM templates or Terraform.
Experience in Data streaming technologies such as Kafka, Apache Nifi
Adept experience in ETL orchestration, workflow management, cluster management using Databricks API
Experience in CI/CD tools like Azure Devops, Jenkins, etc
Experience working with Azure Synapse, Redshift or other DBMS Platforms
Experience working with Incident Management, JIRA, Monitoring/Alerting, and data quality tools.
Knowledge of reporting tools such as Power BI, tableau, etc
Working in an agile environment and able to effectively work in a fast-paced organisation.

APPLY HERE

Published On: August 02, 2023 16:00

Data Reliability Engineer

Confiz

DETAILS

Join newsletter to receive jobs updates.