Data Engineering Projects

Welcome to my data engineering projects portfolio. Here you’ll find a collection of projects focused on building scalable data pipelines, designing efficient data models, and leveraging technologies like Apache Spark, PostgreSQL, and Apache Cassandra to process and analyze large datasets. Each project showcases my expertise in data engineering and my ability to tackle complex data challenges.

Data Pipelines with Airflow

Automating ETL processes for Sparkify’s data warehouse using Apache Airflow.

Published on July 17, 2021

Building a Data Lake with Spark

Creating an ETL pipeline to transform and load data into a data lake using Apache Spark for a music streaming startup.

Published on July 07, 2021

Data Warehousing with AWS

Building an ETL pipeline to extract, transform, and load data into AWS Redshift for a music streaming startup, enabling efficient analysis of user activity data.

Published on June 30, 2021

Data Modeling with Apache Cassandra

Building an ETL pipeline and data modeling with Apache Cassandra to analyze song play data for a music streaming startup

Published on June 21, 2021

Data Modeling with Postgres

A comprehensive data modeling project that creates a Postgres database schema and ETL pipeline for Sparkify music streaming startup, enabling efficient analysis of user activity data through optimized SQL queries

Published on June 06, 2021