Note
Go to the end to download the full example code.
A minimal example for the ADNI application#
DtypeWarning: Columns (11,12,13,14,15,16,17,18,19,20) have mixed types. Specify dtype option on import or set low_memory=False.
<class 'pandas.core.frame.DataFrame'>
Index: 821 entries, 94 to 9695
Columns: 348 entries, PHASE to MMSCORE
dtypes: float64(326), int64(2), object(20)
memory usage: 2.2+ MB
ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.634e+03, tolerance: 5.668e+01
ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.834e+03, tolerance: 5.668e+01
ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.837e+03, tolerance: 5.668e+01
ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.634e+03, tolerance: 5.668e+01
ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.834e+03, tolerance: 5.668e+01
ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 3.837e+03, tolerance: 5.668e+01
np.int64(7)
import os
import subprocess
import pandas as pd
from io import StringIO
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import EarlyStopping as es
password = os.environ.get("PASSWORD")
result = subprocess.run(
[
"gpg",
"--batch",
"--yes",
"--pinentry-mode", "loopback", # <-- important in CI
"--passphrase",
password,
"--decrypt",
"data/ADNI_data.gpg",
],
capture_output=True,
check=True,
text=True, # so result.stdout is already str
)
merged_data = pd.read_csv(StringIO(result.stdout))
# Boosting example for PHASE = ADNI1
min_example_data = merged_data[merged_data["PHASE"] == "ADNI1"]
min_example_data = min_example_data.drop_duplicates(subset=["RID"], keep = "first")
min_example_data.info()
min_example_data.to_csv("min_example_data.csv", index=False)
response = min_example_data["MMSCORE"].to_numpy()
np.isnan(response).any()
first_covariate_location = min_example_data.columns.get_loc("ST101SV")
last_covariate_location = min_example_data.columns.get_loc("ST155SV")
design = min_example_data.iloc[:, first_covariate_location:last_covariate_location].to_numpy()
design = np.nan_to_num(design, nan = 0)
np.isnan(response).any()
alg = es.L2_boost(design, response)
alg.iterate(300)
# Discrepancy stop
noise_estimate = alg.get_noise_estimate(K = 1)
stopping_time = alg.get_discrepancy_stop(critical_value = noise_estimate, max_iteration=300)
stopping_time
# Early stopping via residual ratios
stopping_time = alg.get_residual_ratio_stop(max_iteration=200, K=1.2)
stopping_time
stopping_time = alg.get_residual_ratio_stop(max_iteration=200, K=0.2)
stopping_time
stopping_time = alg.get_residual_ratio_stop(max_iteration=200, K=0.1)
stopping_time
# Classical model selection via AIC
aic_minimizer = alg.get_aic_iteration(K=2)
aic_minimizer
Total running time of the script: (0 minutes 8.842 seconds)