unc.edu_BRCA_IlluminaHiSeq_RNASeq.geneExp

Created By Kyle Ellrott kellrott
Note: This file needs to be updated (2011-08-17) UNC RNA-Seq Workflow - BWA Alignment to Transcriptome Date: 20101108 Authors: Sara Grimm Brian O'Connor Versions: This analysis was carried out using the SeqWare Pipeline project, version 0.7.0. The workflow was "RNASeqAlignmentBWA" version 0.7.4 (UNCID:23324). UNC provides all our analysis software through this open source project. Users can download this software to run the identical RNA-Seq analysis described in the steps below. See the project website at http://seqware.sf.net for more information. The UNCIDs provided in file names are identifiers unique to UNC and can be used to provide data/analysis provenance tracking. Annotations: The Generic Annotation File (GAF) that provides all of our annotations for genes, exons, etc can be found at https://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/other/GAF/GAF_bundle/outputs/TCGA.Sept2010.09202010.gaf Conventions: Please note our spljxn.quantification.txt, exon.quantification.txt, and the GAF file above use the convention of chr:smaller_int-larger_int:+ for plus strand features and chr:larger_int-smaller_int:- for negative strand features. Carefully examine future versions of these annotation and quantification files since this convension is subject to change. Column Headers: These are just brief descriptions of the column headers you will find in the various level 3 files. See the DESCRIPTION.txt file in the mage-tab bunlde for more detailed methods on how each of these files were created. File: *.trimmed.annotated.gene.quantification.txt gene: This is the Entrez/LocusLink gene symbol followed by the Entrez/LocusLink gene ID. raw_counts: The number of reads mapping to this gene. median_length_normalized: This is the total aligned bases to all transcript models associated with this gene divided by the mean transcript length. RPKM: See the DESCRIPTION.txt file in the mage-tab bunlde for information on how this is calculated. File: *.trimmed.annotated.exon.quantification.txt exon: This is the location of the exon in hg19 (GRCh37) based on the UCSC Gene standard track (December 2009 version). raw_counts: The number of reads mapping to this exon. median_length_normalized: This is the total aligned bases to this exon divided by the exon length. RPKM: See the DESCRIPTION.txt file in the mage-tab bunlde for information on how this is calculated. File: *.trimmed.annotated.spljxn.quantification.txt This file does not include normalized counts since splice junctions are a fixed size. junction: This is the location of the splice junction in hg19 (GRCh37) based on the UCSC Gene standard track (December 2009 version). raw_counts: The number of reads mapping to this splice junction. File: *.wig This is a WIG file format that represents coverage, see http://genome.ucsc.edu/FAQ/FAQformat.html#format6 for more information.

acronym: BRCA
disease: cancer
species: Homo sapiens
fileType: genomicMatrix
platform: IlluminaHiSeq_RNASeq
dataCenter: unc.edu
lastUpdate: 2012-10-12
tissueType: breast
dataSubType: geneExp
lastUpdated: 2012-10-12