Role : AWS Data Architect
Duration : Long Term Contact
Location : Woodland Hills, CA Hybrid
Must-haves: Step Function, EMR and ICEBERG. 2/3
Responsibilities:
• Designing and Implementing Data Solutions: Create blueprints for data storage, processing, and access using AWS services like S3, Glue, EMR, Kinesis by considering factors like performance, scalability, security, and cost-effectiveness.
• Lakehouse and ETL: Component design and build of Lakehouse Solution on AWS using Lambda, EMR and EKS for compute resource and Iceberg and S3 as Storage resource. Design and implement ETL (Extract, Transform, Load) pipelines to move and transform data from various sources into the Lakehouse across Bronze, Silver and Gold layers.
• Data Delivery for Consumption: Design for data access for querying with Athena, Integration of Iceberg Tables in Glue Catalog with Snowflake for Analytics and Reporting
• Big Data Solutions: Working with large datasets and distributed computing using services like EMR (with Spark) to process and Streaming solution with Kinesis, Kafka
• Data Governance and Security: Ensuring data quality, compliance with regulations (like GDPR), and implementing security measures to protect sensitive information.
• Collaboration and Communication: Working with stakeholders (business analysts, data scientists, developers) to understand their needs and translate them into technical solutions. You'll need to explain complex concepts clearly.
Technical skills
• AWS Cloud Expertise: Deep understanding of core AWS services (S3, EC2, EKS, EMS, VPC, IAM, Glue, Athena )
• AWS certifications (e.g., Solutions Architect, Big Data Specialty) are highly valued.
• Data Warehousing and Modeling: Strong knowledge of dimensional modeling, schema design, and data warehousing principles.
• ETL and Data Pipelines: Experience with tools and techniques for data extraction, transformation, and loading.
• Big Data Technologies: Familiarity with Hadoop, Spark, Hive, and other big data frameworks.
• Databases: Proficiency in SQL and experience with relational databases are must and nice to have NoSQL databases (like DynamoDB) experience.
• Programming: Python with PySpark hands on coding for automation, and data processing tasks.
• Data Governance and Security: Understanding of data security best practices, access control, and compliance requirements.
Qualifications
• Education: A bachelor's degree in computer science, engineering, or a related field is usually expected.
• Experience: 10+ years of experience