libuplift.datasets.pbc#

The pbc datasets from R survival package.

Functions#

fetch_pbc([data_home, download_if_missing, ...])

Load the pbc dataset from R survival package (uplift survival).

Module Contents#

libuplift.datasets.pbc.fetch_pbc(data_home=None, download_if_missing=True, random_state=None, shuffle=False, categ_as_strings=False, return_X_y=False, as_frame=False)[source]#

Load the pbc dataset from R survival package (uplift survival).

Download it if necessary.

Only first 312 records with assigned treatment are kept.

Following the original dataset, the edema variable is numerical: but can also be treated as categorical: 0 no edema, 0.5 untreated or successfully treated, 1 edema despite diuretic therapy

Variables

chol, copper, trig, platelet contain missing data

Parameters:

data_homestring, optional: Specify another download and cache folder for the datasets. By default all scikit-learn data is stored in ‘~/scikit_learn_data’ subfolders.
download_if_missingboolean, default=True: If False, raise a IOError if the data is not locally available instead of trying to download the data from the source site.
random_stateint, RandomState instance or None (default): Determines random number generation for dataset shuffling. Pass an int for reproducible output across multiple function calls.
shufflebool, default=False: Whether to shuffle dataset.
categ_as_stringsbool, default=False: Whether to return categorical variables as strings.
return_X_yboolean, default=False.: If True, returns (data.data, data.target) instead of a Bunch object.
as_frameboolean, default=False: If True features are returned as pandas DataFrame. If False features are returned as object or float array. Float array is returned if all features are floats.

Returns:

datasetdict-like object with the following attributes:
dataset.datanumpy array: Each row corresponds to the features in the dataset.
dataset.target_statusnumpy array: Censoring status: 0=censored, 1=transplant, 2=dead.
dataset.target_timenumpy array: Censoring, transplant or death time.
dataset.DESCRstring: Description of the dataset.
(data, target_time, target_status)tuple if: return_X_y is True

libuplift.datasets.pbc#

Functions#

Module Contents#

This Page