Machine Learning · FIB-UPC

Hypothyroid Classification

End-to-end binary classification pipeline — predict hypothyroidism from clinical data using 9 model families in sklearn. The interactive demos below run real trained models directly in your browser.

ML Pipeline
FIB-UPC · Machine Learning·with Marta Granero
📂
ARFF data
hypothyroid.arff
🔧
Preprocess
Impute, scale, encode
📊
EDA
Correlation, distributions
🤖
Models
9 sklearn families
Evaluate
CV, ROC, confusion matrices
Logistic RegressionLDA / QDANaive Bayesk-NNSVM (linear + kernel)MLP (neural nets)Random ForestRidge / LassoBayesian hyperparameter search
🏥

Hypothyroid predictor

Real LogReg model · runs in browser

Age50 years
TSH2 µU/mL
TT4109 nmol/L
T32 nmol/L
Negative — healthy
Logistic Regression · 97.3% confidence
📍

k-NN on PCA projection

200 real test-set points · click to classify

k =
Click the plot to classify a point

Feature importance

Age
0.11
TSH
5.42
TT4
0.76
T3
0.12

|coeff| from Logistic Regression on standardized features — TSH dominates prediction.

Dataset summary

hypothyroid.arff — UCI-style with numeric (age, TSH, T3, TT4, FTI, T4U) and categorical attributes. Target: binaryClass (P/N).

Challenges: heavy NaN, outliers in age, dropped TBG, class imbalance — handled via imputation, appropriate metrics (F1, ROC-AUC).

Jupyter notebook preview

Open in Nbviewer ↗
▸ Run the notebooks locally
git clone https://github.com/cuberhaus/APA_Practica.git
cd APA_Practica
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
jupyter lab PracticaAPA-Hipotiroidismo-PolCasacubertaMartaGranero.ipynb