Multilingual Transliteration Model
(Python, mBART, OpenNMT, BLEU, CHRF, TER)
- Fine-tuned Facebook's mBART model for Indian language transliteration, achieving BLEU: 66.29, CHRF: 82.18, TER: 23.95.
- Processed and cleaned 300K+ entries with character-level modeling and optimized inference latency.
- Integrated preprocessing, validation, and evaluation pipelines with OpenNMT-py for reproducibility and deployment.
Toxic Comment Classifier
(Python, TF-IDF, CNN) | Link
- Built a robust toxic comment classification model using TF-IDF and a 1D CNN architecture, achieving 95% accuracy.
- Designed for real-time deployment with low-latency inference, suitable for chat systems and forums.
- Surpassed several baselines in accuracy and F1 score through extensive tuning and error analysis.
COVID-19 Vaccine Analysis
(Matplotlib, Pandas, NumPy, Seaborn, Tableau, Excel) | Link
- Analyzed 7.6K+ rows across 24 attributes to uncover COVID-19 vaccine distribution inequalities in Indian states.
- Utilized heatmaps, bar graphs, and box plots to detect state-wise gaps in dose administration and age-wise prioritization.
- Recommended 3 regional policy changes based on data-backed insights to ensure equitable distribution.
Uber Dataset Analysis
(Python, Pandas, NumPy, Matplotlib, Seaborn) | Link
- Processed 29K+ Uber ride records across NYC to extract patterns in hourly demand and borough-wise usage.
- Analyzed weather-related features like temperature, wind speed, and precipitation to find correlations with pickup rates.
- Optimized peak hour prediction models with a 40% faster runtime compared to baseline approaches.