Coursework & Exploratory Projects

  • Performed data mining in benchmarking unsupervised ML algorithms such as K-Means and Hierarchical Agglomerative Clustering on GenAI-generated datasets with outliers removed using IQR to enhance pattern discovery.
  • Achieved top silhouette scores with K-Means on Dataset 1 (0.359, 3 clusters) and Dataset 2 (0.625, 2 clusters), and with Agglomerative on Dataset 3 (0.984, 2 clusters).
  • Conducted data engineering, data wrangling and cleaning appending suitable columns with appropriate imputation strategies followed by exploratory data analysis on a dataset with 56 columns and 202,760 tuples.
  • Curated experiments through research for two supervised machine learning approaches multivariate linear regression on daily mortality rates, achieving an R2 score of 0.974 and MSE of 0.0003 and daily policy effectiveness using a decision tree classifier with an F1 score of 0.984.
  • Conducted binary sentiment classification on The Large Movie Review Dataset developed by Stanford NLP research group. This dataset provides 25,000 movie reviews each for training and testing, along with additional unlabeled data.
  • For preprocessing, first, each tuple under text was converted to lowercase to extract word tokens. These tokens were filtered to remove stopwords and maintain alphabets as per the language of the text selected, i.e. English. These tokens were shortened to their root words using lemmatization. These set of tokens are used to create a Bag Of Words of a trigram model using the train set, which was used to transform the test set.
  • Classification reports of an MLP classifier was compared with a SGD model, identifying that MLP outperformed SGD on all metrics except recall for positive reviews.
  • Implemented a healthcare management system with CRUD operations for each entity along with specific management and reporting views in Django and MySQL with a user guide for clinical workflows.
Open Source Contribution PR
  • Added opt-in selected-count display to react-native-multiple-select (500k+ downloads), keeping backward compatibility for existing apps.
DataCamp Data Scientist Code
  • Completed 6 projects, 23 courses, and 3 assessments; archived notebooks and exercises in Python.
Competitive Programming Code
  • Solved LeetCode competitive programming coursework in Python with pattern-focused implementations.
  • Conducted data wrangling and exploratory analysis across four ML mini-projects; documented results and code.
Waabec Hybrid App Report
  • Feasibility study for a hybrid language-training app serving sector-specific curricula across Spain.
Covid-19 Dashboard Code
  • A dashboard that shows the number of active Covid cases, deaths and patients recovered using REST protocol to fetch data from https://ncov2019-admin.firebaseapp.com with offline support via shared preferences and structured error handling.
Minor in Computational Mathematics Code
  • Implemented time-series smoothing and ANOVA experiments for coursework in applied statistics and experimental design.