homeabout uscontact us

 

Filtering Overview

 

Overview

Filtering provides a number of gene prioritization options. The processes generally take a large number of genes and apply selection criteria so that the output includes fewer genes.

Some methods remove all of the genes that do not meet specified criteria, while others allow you to specify the number of genes that will be left after the filtering.

Filtering and normalization processes can be applied one or more times to a dataset.

Note that for Affymetrix® data, it is recommended that genes with a high signal-to-noise ratio be used, since some experts believe that Affymetrix® values below 150 tend to be unreliable.

 

Complete and Incomplete Datasets

The only filtering operation that can be applied directly to an incomplete dataset is gene list filtering. If you do not have a gene list that contains one or more genes in the incomplete dataset, the gene list filtering option is disabled on the Filter Genes dialog. To resolve this, close the Filter Genes dialog, create an appropriate gene list, and then perform the gene list filtering operation.

To apply other filtering techniques to an incomplete dataset, the missing values first need to be estimated or eliminated (resulting in a complete dataset). All filtering techniques can be applied to complete datasets.

 

Note on N-Fold Culling

N-Fold Culling cannot complete and displays a message if the minimum value for any gene is 0.0 ('The experiment could not be completed. Check that the operation and its parameters are appropriate to the data.') If the dataset contains negative values (but no zeroes) no error message is displayed, but N-Fold Culling may remove highly-changing genes.  

Both these problems can be avoided this way:

Before applying N-Fold Culling, display a Summary Statistics chart of the dataset to see what its minimum value is. If it is zero or negative, then:

1. Use Remove Values to remove values less than some small threshold (e.g. the smallest positive value your equipment can meaningfully detect).

2. Use Missing Value Estimation to replace the removed values with some small positive constant (e.g. the same number used as a removal threshold).

 

Filtering Techniques Available in GeneLinker™:

Maximum Culling

Range Culling

N-Fold Culling with N

N-Fold Culling with a Specified Number of Genes

Spotted Array N-Fold Culling

Gene List Filtering

F-Test

 

Related Topic:

Overview of Estimating Missing Values