homeabout uscontact us

 

Nearest Neighbors Missing Value Estimation

 

Overview

The process of handling missing values consists of two steps: first, genes that have a minimum number of missing values are removed; and second, the remaining missing values are estimated using Nearest Neighbors estimation.

Nearest Neighbors estimation is a process by which missing values in a dataset are filled in with estimated values based on similarity between genes.

To estimate a missing value in a gene, the k genes with the closest profile (smallest distance) to the gene containing the missing value are determined. The missing value is then computed as a weighted average of the k values in that sample of the neighbors. Note: the k nearest neighbors can be computed only on complete datasets. Missing values have to be filled in with an initial approximation. The distance between two genes is computed using either Euclidean distance or Pearson Correlation.

The input to this function is an incomplete dataset; the output is a complete dataset. K is an integer representing the number of nearest neighbors to be taken into consideration.

On the Estimate Missing Values dialog, when the Remove Genes That Have Missing Values slider is set to 1, the rest of the dialog is grayed out. This is because all genes that have at least one missing value will be removed leaving no missing values to be estimated.

 

Process Outline

 

Actions

1. Click an incomplete dataset in the Experiments navigator. The item is highlighted.

2. Click the Estimate Missing Values toolbar icon , or select Estimate Missing Values from the Data menu, or right-click the item and select Estimate Missing Values from the shortcut menu. The Estimate Missing Values dialog is displayed.

3. Set the parameters.

Parameter

Description

Remove Genes That Have Missing Values

Set the threshold for culling genes prior to missing value estimation (1 = remove all genes with missing values).

Replacement Technique

Select Nearest Neighbors Estimation.

Options

Set the Distance Metric to Euclidean or Pearson Correlation.

Set the Number of Nearest Neighbors.

 

4. Click OK. The Experiment Progress dialog is displayed. It is dynamically updated as the Estimate Mising Values operation is performed. To cancel the Estimate Missing Values operation, click the Cancel button.

Upon successful completion, a new complete dataset is added under the original dataset in the Experiments navigator.

 

Related Topics:

Overview of Estimating Missing Values

Removing or Estimating Missing Values