Context
SII Group USA is recruiting a Senior Data Engineer to start a new project.
The Data Platform is implemented under Google Cloud Platform, with Databricks is used as the Data processing hub.
Improvements targeted :
- Better data administration,
- Get data lineage and availability of additional Databricks features
- Implementation of Databricks Unity Catalog feature.
- migration of data from a single default cloud storage bucket to data-source specific buckets in Google Cloud storage.
Global organization with product engineering teams split between USA and EMEA.
Main Tasks to achieve :
In full autonomy, and as an Data Engineering Expert, you will have to :
- Review current design & propose design updates if required
- Create buckets for each component and environment in Terraform
- Create Meta store and Catalogues in Terraform and assign buckets
- Create users / groups / service principals in Terraform and assign access rights
- Create new ddl and table library for unity catalogue structure
- Migrate to shared clusters and refactor get notebook context info
- Update all notebooks to use new ddl and table library
- Complete validation of unity catalogue on Dev - updated existing unitests - new unit tests added for new notebooks - integration tests
- Data migration workflow with result report – will migrate existing data from existing Delta Tables (saved in Databricks system buckets) to Delta Tables saved in external locations (specific buckets)