homeabout uscontact us

Platinum

Tutorial 6: Step 3 Discretize the Data

 

The first step in our analysis of this dataset is to use SLAM™ to look for associations between multiple genes and the tumor type.

SLAM™ finds associations between genes based on identical patterns of gene expression. For example, if Gene A is HIGH whenever Gene B is LOW, SLAM™ identifies an association between Gene A and Gene B. Because the number of possible patterns is enormous, particularly when looking for patterns between five or ten genes rather than just two, we need a fast, simple means of comparing expression levels. By discretizing the data, it becomes possible to compare expression levels in terms of a small number of discrete categories (e.g. HIGH/MEDIUM/LOW) rather than continuous values. This speeds up the comparison process by many orders of magnitude.

 

Discretize the Data

1. Click the Khan_training_data dataset in the Experiments navigator. The item is highlighted.

2. Click the Discretize toolbar icon , or select Discretize Data from the Predict menu, or right-click the item and select Discretize Data from the shortcut menu. The Discretization parameters dialog is displayed.

Operation Type

Number of Bins

Choosing the number of bins is a balancing act. The more bins you use, the less information is discarded by the discretization. But the more bins there are, the fewer associations SLAM™ will find.

Accept the default parameters (Quantile discretization, Per Gene, and 3 bins).

3. Click OK. The Experiment Progress dialog is displayed. It is dynamically updated as the discretization operation is performed.

Upon successful completion, a new Discretized: 3 bins/gene | quantile dataset is added under Khan_training_data in the Experiments navigator.