libuplift.datasets.IST
======================

.. py:module:: libuplift.datasets.IST

.. autoapi-nested-parse::

   The International Stroke Trial dataset.

   This is a randomized clinical trial of heparin and aspirin treatment
   for stroke patients.

   This dataset is derived from the corrected dataset available here:
   https://datashare.ed.ac.uk/handle/10283/128
   The webpage contains detailed descriptions.

   This version only includes pre-randomization variables, two targets,
   and several additional targets related to side effects.

   ..
       !! processed by numpydoc !!


Functions
---------

.. autoapisummary::

   libuplift.datasets.IST.fetch_IST


Module Contents
---------------

.. py:function:: fetch_IST(include_pilot=True, include_location_vars=True, include_prediction_model_vars=True, data_home=None, download_if_missing=True, random_state=None, shuffle=False, categ_as_strings=False, return_X_y=False, as_frame=False)

   
   Load the International Stroke Trial (IST) dataset.

   Download it if necessary.

   This is a randomized clinical trial of heparin and aspirin treatment
   for stroke patients.

   This dataset is derived from the corrected dataset available here:
   https://datashare.ed.ac.uk/handle/10283/128
   The webpage contains detailed descriptions.

   This version only includes pre-randomization variables, two main
   targets, and several additional targets related to side effects.

   The two main targets are:
   target_ID14 - death after 14 days
   target_OCCODE - outcome after 6 month.  Original study used
   ("dead" or "dependent") as outcome of interest

   Additionally there are 9 targets describing side effects at 14
   days: target_H14, target_ISC14, target_NK14, target_STRK14,
   target_HTI14, target_PE14, target_DVT14, target_TRAN14,
   target_NCB14

   **Variables**

   See https://datashare.ed.ac.uk/handle/10283/128 

   **Changes to the original dataset**

   - Only pretreatment variables, variables describing outcomes at 14
     days and 6 month outcome code are included
   - Change all N/Y variables to 0/1
   - Level H of RXHEP recoded as M for pilot study cases
   - Add var IS_PILOT indicating pilot study obtained by testing if
     RHEP24 is NaN.  The variable is only added if include_pilot is
     True.
   - RDATE variable has been split into RYEAR and RMONTH, month names
     have been translated to English
   - Recoded OCCODE to descriptive values, merge two "missing status"
     categories to "NA"

   :Parameters:

       **include_pilot** : boolean, default=True
           Whether to include records from a pilot study with 984
           patients.  Some values (RATRIAL and RASP3) are missing in the
           pilot.

       **include_location_vars** : boolean, default=True
           Should variables describing hospitals and their locations be
           included. These are categorical variables with large number of
           levels.  The variables are: HOSPNUM, COUNTRY

       **data_home** : string, optional
           Specify another download and cache folder for the datasets. By default
           all scikit-learn data is stored in '~/scikit_learn_data' subfolders.

       **download_if_missing** : boolean, default=True
           If False, raise a IOError if the data is not locally available
           instead of trying to download the data from the source site.

       **random_state** : int, RandomState instance or None (default)
           Determines random number generation for dataset shuffling. Pass an int
           for reproducible output across multiple function calls.

       **shuffle** : bool, default=False
           Whether to shuffle dataset.

       **categ_as_strings** : bool, default=False
           Whether to return categorical variables as strings.

       **return_X_y** : boolean, default=False.
           If True, returns ``(data.data, data.target)`` instead of a Bunch
           object.

       **as_frame** : boolean, default=False
           If True features are returned as pandas DataFrame.  If False
           features are returned as object or float array.  Float array
           is returned if all features are floats.


   :Returns:

       **dataset** : dict-like object with the following attributes:
           ..

       **dataset.data** : numpy array
           Each row corresponds to the features in the dataset.

       **dataset.target** : numpy array
           Each value is 1 if a purchase was made 0 otherwise.

       **dataset.DESCR** : string
           Description of the dataset.

       **(data, target)** : tuple if ``return_X_y`` is True
           ..


   ..
       !! processed by numpydoc !!