Specializing in end-to-end machine learning systems — from data preprocessing and imbalance handling to optimized model deployment via FastAPI and Docker. Turning complex data into business impact.
Hi! I'm Shreevarsha S, a Data Science fresher with internship experience passionate about solving real-world problems through data. I enjoy finding patterns in complex datasets and turning them into decisions that create measurable business impact.
With experience across the full ML lifecycle — from EDA and feature engineering to model evaluation and deployment — I've built projects spanning NLP sentiment analysis, customer churn prediction, and KPI dashboards. I'm proficient in Python, FastAPI, Docker, Power BI, and Streamlit, constantly pushing to grow at the intersection of data and product.
Download CVPython (Pandas, NumPy), SQL (MS SQL Server, SQLite)
EDA, Data Cleaning, Statistical Analysis, Matplotlib, Seaborn, Power BI (DAX, KPI Reporting), Streamlit
Scikit-learn, Logistic Regression, Decision Tree, Random Forest, XGBoost, NLP, TF-IDF, Predictive Modeling, Feature Engineering, SHAP, Cross-Validation
ETL Pipelines, Data Cleaning, Data Validation, Data Quality Checks, Class Imbalance Handling
FastAPI, Docker, Joblib, Git, GitHub, Azure (basic), PyTorch/TensorFlow (basic)
Developed an NLP sentiment analysis pipeline analyzing 4,900+ Amazon product reviews using TF-IDF vectorization with bigrams (n-gram) and Logistic Regression. Implemented advanced text preprocessing — regex cleaning, tokenization, lemmatization — and handled severe class imbalance (90% vs 10%) using class-weighted training, improving negative sentiment recall from 22% → 72%. Conducted sentiment keyword analysis to identify major drivers of customer dissatisfaction.
Developed a full ML pipeline to predict student skill evolution on digital platforms using academic & behavioral data. Compared Ridge Regression (R² 0.86) vs Random Forest Regression (R² 0.91), confirming RF's superiority in capturing non-linear learning patterns. Saved model with Joblib and built Power BI dashboard for insights.
Developed a telecom attrition prediction system for 7,043 customers addressing 26.6% class imbalance. Implemented ETL pipeline, compared Logistic Regression vs Random Forest vs XGboost, Applied SHAP explainability analysis to identify key churn drivers and deployed a FastAPI inference service with Docker containerization and Joblib model persistence.
Segmented customers into 4 distinct groups using K-Means clustering to support targeted marketing and personalized business strategies. Discovered income- and age-based customer patterns improving campaign targeting efficiency by ~20%.
Built a binary classification model using the Wisconsin Breast Cancer dataset to distinguish malignant from benign tumors. Applied StandardScaler, 80/20 train-test split, and evaluated with confusion matrix and classification report achieving ~97% accuracy.
I'm open to full-time Data Science roles and ML engineering opportunities. Whether you have a role in mind or just want to say hi — feel free to reach out!