Projects

A showcase of my key projects and analysis.

Multilingual Transliteration Model

(Python, mBART, OpenNMT, BLEU, CHRF, TER)

Fine-tuned Facebook's mBART model for Indian language transliteration, achieving BLEU: 66.29, CHRF: 82.18, TER: 23.95.
Processed and cleaned 300K+ entries with character-level modeling and optimized inference latency.
Integrated preprocessing, validation, and evaluation pipelines with OpenNMT-py for reproducibility and deployment.

Toxic Comment Classifier

(Python, TF-IDF, CNN) | Link

Built a robust toxic comment classification model using TF-IDF and a 1D CNN architecture, achieving 95% accuracy.
Designed for real-time deployment with low-latency inference, suitable for chat systems and forums.
Surpassed several baselines in accuracy and F1 score through extensive tuning and error analysis.

COVID-19 Vaccine Analysis

(Matplotlib, Pandas, NumPy, Seaborn, Tableau, Excel) | Link

Analyzed 7.6K+ rows across 24 attributes to uncover COVID-19 vaccine distribution inequalities in Indian states.
Utilized heatmaps, bar graphs, and box plots to detect state-wise gaps in dose administration and age-wise prioritization.
Recommended 3 regional policy changes based on data-backed insights to ensure equitable distribution.

Uber Dataset Analysis

(Python, Pandas, NumPy, Matplotlib, Seaborn) | Link

Processed 29K+ Uber ride records across NYC to extract patterns in hourly demand and borough-wise usage.
Analyzed weather-related features like temperature, wind speed, and precipitation to find correlations with pickup rates.
Optimized peak hour prediction models with a 40% faster runtime compared to baseline approaches.