View / download Resume (last updated on 06-June-2025)

About

As an Azure Data Engineer at Capgemini, I develop end-to-end data engineering solutions like EDH, ODS, Data Marts and Data Lakes. I am experienced in creating big data ETL pipelines with medallion architecture in Azure using Python, SQL and PySpark. I migrated on-prem processes to cloud, implemented CDC process with SCD Type 2, improved pipelines’ reusability by developing metadata-driven architecture, performed data cleaning and transformations, and created PII-masked views and extracts as per business requirements. I also automated manual and repetitive tasks like creating SQL queries and directory structures, optimized pipelines and Python programs to reduce average runtime, and worked on dynamic real-time status and metadata tracking of ETL job extracts using Python.

I hold active “Microsoft Certified: Azure Data Engineer Associate” (DP-203) and “AWS Certified Cloud Practitioner” (CLF-C02) certificates, and have a strong background in Data Science and Machine Learning. I also completed my M.Tech. (CSE) from BIT, Mesra, with a thesis focused on this domain.

I am passionate about finding solutions for individual and organizational growth, and focus on continuously improving and utilizing my skills. I collaborate with my team and clients to deliver high-quality results and value. I am always eager to learn new technologies and tools, and to apply them to solve real-world problems.

Skills

Programming Languages: SQL, Python, PySpark
Azure Services: Synapse Analytics, Databricks, Data Factory, SQL Database, Data Lake Storage, Logic Apps
Big Data Engineering: Apache Spark, ETL Development
Databases: Azure SQL Database, Microsoft SQL Server, MySQL, Oracle, PostgreSQL, SQLite

Certifications

Passed “Microsoft Certified: Azure Data Engineer Associate” (DP-203) certification exam of Microsoft (View Certificate)
Passed “AWS Certified Cloud Practitioner” (CLF-C02) certification exam of Amazon Web Services (View Certificate)

Professional Experience

Current Role: Azure Data Engineer
Current Designations: Associate Consultant
Organization: Capgemini
Duration: March 2022 - Present

Data Engineering

End-to-end development of Enterprise Data Hub, Operational Data Store, Data Marts and Data Lakes with Views and Extracts generation

Migrated on-prem big data ETL processes to cloud, by creating storage event and schedule triggered pipelines with medallion architecture in Azure using Python, SQL and PySpark
Implemented change data capture (CDC) process to store transformed data with SCD Type 2 implementation
Improved reusability by developing metadata-driven architecture to create dynamic pipelines, which selectively fetch data by joining required source tables and applying transformations, to generate PII-masked views and extracts as per business requirements
Optimized pipelines by applying conditional activity executions to reduce average runtime by 38%
Identified and automated the manual and repetitive tasks by developing dynamic Python scripts to generate SQL queries and create directory structures
Implemented pre-load, data quality and data control checks
Performed data cleaning and applied transformations on Parquet and CSV big data feeds
Implemented status email notification functionality in pipelines using Azure Logic Apps and Web Activities
Improved fault tolerance by identifying and covering multiple edge cases

Software Engineering

Status and Metadata Reports Generation

Dynamic real-time status and metadata tracking of ETL job extracts using Python
Extracted metadata properties and row counts dynamically from DAT and TXT files

Miscellaneous

Optimised Python programs to reduce average runtime by 23%
Automated Excel macro runs by creating Python scripts, to email daily consolidated status reports

Education

Birla Institute of Technology, Mesra

Degree: Master of Technology
Branch: Computer Science and Engineering
CGPA: 8.06
Duration: July 2018 to July 2020

Thesis Work

Title: Diabetes Prediction using Machine Learning (View on GitHub)

Languages: Python (NumPy, Pandas, Matplotlib, Seaborn, scikit-learn / sklearn), Markdown

Software: Jupyter Notebook (Anaconda)

Achieved up to 81.6% accuracy in Diabetes prediction on Pima Indians Diabetes Database with Random Forest classifier
Applied and analysed the accuracies of “K-Nearest Neighbors, Support Vector Machine, Decision Tree and Random Forest” classification algorithms for diabetes prediction
Achieved up to 7.04% improvement in the accuracy of the Decision Tree classification algorithm for Diabetes prediction
Predicted missing values present in the dataset using a set of “Linear Regression, Support Vector Regression, Decision Tree and Random Forest” regression algorithms
Performed dataset balancing using SMOTE algorithm and then Feature scaling

Project Work

Title: CoWIN Vaccine Notifier (View on GitHub)

Languages: Python, Markdown