Importing Data from Tabular Files

Overview

A Tabular file is a single file of expression values for multiple samples or chips. This is a generic format, not specific to any particular microarray software. If your data is not in one of the other formats described in Selecting a Template for Data Import, then you should use tabular format.

You can transform your data into tabular format in a number of ways, but the simplest is to use a spreadsheet program (like Microsoft Excel®, for example). Cut-and-paste your expression measurements into a simple table, and then export the table to an intermediate file. GeneLinker now supports import of tabular data from binary Excel-format (XLS) files. To use this capability select the "Convert XLS" checkbox in the import dialog box. Please be aware the XLS conversion is currently quite slow, and may take serveral minutes for large files. This will be improved in future versions of GeneLinker.

In order for it to import properly into GeneLinker™, the intermediate file should have the following characteristics:

The data must all be in one text or XLS binary file (DOS®/Windows®, UNIX, or Macintosh).
The data must be in a table. That is, it must be organized into rows of equal lengths and columns of equal lengths.
By default GeneLinker™ expects the rows of the file to represent samples and the columns genes, but this is not required. If the data file represents genes as rows and samples as columns, then you can orient it properly by ensuring the Transpose box is checked during the verification step of the data import process.
The first row should contain column names. The first column should contain row names. Absent column or row names may cause parts of your data to be misinterpreted.
A single character must delimit fields. Example delimiter characters are the comma or the tab character. Comma-delimited is recommended over tab-delimited. For best results ensure your data is in a .csv file before importing. In a Comma Separated Values (.csv) file, each record (row) is stored as text with a comma delimiter separating each field and a carriage return/line feed character pair marking the end of each record (row).
At least one row and one column of data must be present. These are in addition to the row and column names.
Missing values are signified by leaving blank space or no space between a consecutive pair of column delimiters. Alternatively, missing values may be signified by the string 'NA'.
Anything preceding the first column separator in the first row will be ignored. That is, the upper left cell may contain anything, or nothing.

Example of a CSV data file with 4 genes and 3 samples:

,G1,G2,G3,G4

S1,1.1,1.2,1.3,1.4

S2,2.1,2.2,2.3,2.4

S3,3.1,3.2,3.3,3.4

Example of a CSV data file with missing values:

,G1,G2,G3,G4

S1,1.1,1.2,1.3,1.4

S2,2.1,,2.3,

S3,,NA,3.3,3.4

Merging replicate genes:

If you have replicate spots (genes) on each chip, you may choose to have GeneLinker™ merge these into a single average measurement. The spread between the replicates will be converted into a reliability measure. For more background on this process, read Merging Within-Chip Replicate Measurements.

In order to do this, you have to select the template that properly describes the organization of your data. If you have a table in which each column represents a gene and each row a sample, then use the Tabular Merge Replicate Columns template. If you have a table in which each row represents a gene and each column a sample, then use the Tabular Merge Replicate Rows template.

Reliability Measures:

If you have some other source for reliability measures, you can import them into GeneLinker™ along with your expression data. Use the Tabular with Reliability Measures template.

The reliability measures must be in a tabular file of identical shape to your gene expression data file. If your gene expression data file is named FileName.ext then your reliability measures must be in a file named FileName_rm.ext in the same folder. GeneLinker™ expects that reliability measures will be between 0 and 1 inclusive, and that values close to 0 will indicate highly reliable data.

See Reliability Measures for more information.