Available for opportunities

Data Science
Explorer
& ML Engineer

Specializing in end-to-end machine learning systems — from data preprocessing and imbalance handling to optimized model deployment via FastAPI and Docker. Turning complex data into business impact.

Shreevarsha
Shreevarsha S
● Open to Work · Chennai, TN
0.849
ROC-AUC Score
16% → 61%
Negative Recall ↑
100K+
Records Processed
20%
Model Performance ↑
Tindivanam, Tamil Nadu
BE · Electronics & Communication
Data Science Intern · VCodez
About Me

Turning Data into
Meaningful Insights

Hi! I'm Shreevarsha S, a Data Science fresher with internship experience passionate about solving real-world problems through data. I enjoy finding patterns in complex datasets and turning them into decisions that create measurable business impact.

With experience across the full ML lifecycle — from EDA and feature engineering to model evaluation and deployment — I've built projects spanning NLP sentiment analysis, customer churn prediction, and KPI dashboards. I'm proficient in Python, FastAPI, Docker, Power BI, and Streamlit, constantly pushing to grow at the intersection of data and product.

Machine Learning NLP FastAPI Docker Power BI ETL Pipelines Data Visualization
Download CV
What I Know

Technical Skills

Programming & Querying

Python (Pandas, NumPy), SQL (MS SQL Server, SQLite)

Machine Learning

Scikit-learn, Logistic Regression, Decision Tree, Random Forest, XGBoost, NLP, TF-IDF, Predictive Modeling, Feature Engineering, SHAP, Cross-Validation

Data Engineering

ETL Pipelines, Data Cleaning, Data Validation, Data Quality Checks, Class Imbalance Handling

Tools & Deployment

FastAPI, Docker, Joblib, Git, GitHub, Azure (basic), PyTorch/TensorFlow (basic)

Work History

Work Experience

VCodez
Chennai, Tamil Nadu
Sep 2025 – Jan 2026
Python ML Streamlit Power BI
Data Science Intern
Full-time internship · On-site
  • Processed and engineered structured datasets exceeding 100,000+ records using Python-based preprocessing and feature engineering, supporting end-to-end machine learning pipelines and predictive modeling workflows.
  • Conducted exploratory data analysis (EDA) to identify behavioral trends and operational risk indicators, translating findings into actionable business insights.
  • Developed and evaluated machine learning models using Scikit-learn (Logistic Regression, Decision Tree, Random Forest), improving model performance by 15–20% through hyperparameter tuning and cross-validation.
  • Developed an interactive analytics application using Streamlit to monitor model outcomes and 6 KPIs, improving reporting reliability and efficiency by 20%.
What I've Built

Featured Projects

Oct 2025 – Nov 2025

Sentiment Analysis using NLP

Developed an NLP sentiment analysis pipeline analyzing 4,900+ Amazon product reviews using TF-IDF vectorization with bigrams (n-gram) and Logistic Regression. Implemented advanced text preprocessing — regex cleaning, tokenization, lemmatization — and handled severe class imbalance (90% vs 10%) using class-weighted training, improving negative sentiment recall from 22% → 72%. Conducted sentiment keyword analysis to identify major drivers of customer dissatisfaction.

82% Accuracy 0.95 F1-Score Recall 22%→72%
NLP TF-IDF Logistic Regression Class Imbalance N-gram 5-Fold CV
Dec 2025 - Jan 2026

Adaptive Intelligence Engine for Predicting Human Skill Evolution

🏢 Internship Final Project

Developed a full ML pipeline to predict student skill evolution on digital platforms using academic & behavioral data. Compared Ridge Regression (R² 0.86) vs Random Forest Regression (R² 0.91), confirming RF's superiority in capturing non-linear learning patterns. Saved model with Joblib and built Power BI dashboard for insights.

R² 0.9133 MAE 7.20 Power BI
Random Forest Ridge Regression Feature Engineering EDA Joblib Power BI
Nov 2025

Customer Churn Prediction & Deployment

Developed a telecom attrition prediction system for 7,043 customers addressing 26.6% class imbalance. Implemented ETL pipeline, compared Logistic Regression vs Random Forest vs XGboost, Applied SHAP explainability analysis to identify key churn drivers and deployed a FastAPI inference service with Docker containerization and Joblib model persistence.

ROC-AUC 0.836 FastAPI + Docker
ETL Pipeline Random Forest XGboost SHAP FastAPI Docker Power BI DAX
Nov 2025

Customer Segmentation & Behavioral Analysis using K-Means

Segmented customers into 4 distinct groups using K-Means clustering to support targeted marketing and personalized business strategies. Discovered income- and age-based customer patterns improving campaign targeting efficiency by ~20%.

4 Clusters ~20% Efficiency ↑
Python EDA K-Means Clustering Power BI
Oct 2025

Breast Cancer Wisconsin Dataset using Logistic Regression

Built a binary classification model using the Wisconsin Breast Cancer dataset to distinguish malignant from benign tumors. Applied StandardScaler, 80/20 train-test split, and evaluated with confusion matrix and classification report achieving ~97% accuracy.

~97% Accuracy High Precision & Recall
Python Logistic Regression StandardScaler Confusion Matrix Classification Report
Credentials

Certificates

Data Science Intern Certificate

VCodez
January 2026
Data SciencePythonPandasNumPy
View Certificate

GenAI Powered Data Analytics Job Simulation

Forage (Tata)
August 2025
GenAIEDABusiness Report
View Certificate

Python 101 for Data Science

IBM · Cognitive Class
February 2026
PythonPandasNumPyMatplotlib
View Certificate

TCS MasterCraft DataPlus Overview Course

TCSiON
December 2022
Data PrivacyData QualityData Modeling
View Certificate

Excel Dashboard for Beginners

SkillUP by Simplilearn
September 2022
Data AnalysisReportingExcel Dashboards
View Certificate
Academic Background

Education

BE · Electronics & Communication Engineering
Sri Sivasubramaniya Nadar College of Engineering, Chennai
Nov 2020 – Apr 2025 CGPA: 7.45 / 10
Get In Touch

Let's Connect

I'm open to full-time Data Science roles and ML engineering opportunities. Whether you have a role in mind or just want to say hi — feel free to reach out!