Technical Projects

Below is a selection of data science projects demonstrating my end-to-end capabilities, from raw data engineering to statistical modeling and stakeholder presentation.

1. Credit Card Customer Segmentation

Figure 1: Customer Segments Visualisation

Tools: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, Kneed Status: Ongoing Research

The Challenge

Credit card companies struggle to personalise marketing or identify risk within diverse, unlabelled datasets. The challenge was to uncover natural customer groupings using unsupervised learning to reveal hidden behavioural patterns and improve retention

The Solution

A full unsupervised learning pipeline was developed to discover meaningful customer segments. The approach included:

Applying multiple clustering algorithms
Identifying representative customer profiles for each segment

This strategy ensured robust segmentation and deeper insight into customer behaviour.

Key Technical Achievements

Built a full clustering pipeline (K‑Means, DBSCAN, GMM, HCA) with robust preprocessing and scaling.
Identified K‑Means as the optimal model (Silhouette Score: 0.32), while detecting that DBSCAN and GMM failed to form distinct clusters on this dataset.
Evaluated models using silhouette scores, inertia, PCA visualisations, and the Davies-Bouldin Index.
Produced clear customer segments with actionable business insights.

Code Snapshot

A snippet demonstrating the initialization of the location and model parameters:


#| code-summary: "View Hybrid PCA Code"
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import numpy as np
import pandas as pd

# 1. Retain critical features, that are essential for risk profiling (Ensures interpretability)
direct_features = ['ONEOFF_PURCHASES', 'BALANCE', 'PAYMENTS','CREDIT_LIMIT', 'CASH_ADVANCE']
remaining_features = [col for col in df3.columns if col not in direct_features]

# 2. Apply PCA only to the redundant features
pca = PCA(n_components=6) # Retaining 95% variance
principal_components = pca.fit_transform(scaled_remaining_features)

# 3. Combine critical Features + PCA Components
final_data = np.hstack([scaled_direct_features, principal_components])
final_df = pd.DataFrame(final_data, columns=direct_features + [f'PC{i+1}' for i in range(6)])

2. Global Rehabilitation Analysis

Tools: Power BI, SQL, DAX Status: Ongoing Research

The Challenge

Global health systems lack clear, accessible insight into rising rehabilitation needs. Disease burden varies widely across regions and conditions, yet policymakers struggle to interpret complex DALY data and identify where services are most needed. The challenge was to turn high‑dimensional global health metrics into clear, trends,and find gaps in the system—through interactive, Power BI dashboards.

Technical Projects

1. Credit Card Customer Segmentation

The Challenge

The Solution

Key Technical Achievements

Code Snapshot

2. Global Rehabilitation Analysis

The Challenge

The Solution

Key Technical Achievements

Power BI Snapshot

3. Web Funnel & A/B Testing Analysis

The Challenge

The Solution

Key Technical Achievements

Code Snapshot