I tried to align the samples in syn8612213 with STAR. However, a bunch give "unexpected end of file" errors. I redownloaded syn8620828 to see if the problem replicates and it is not an issue with the initial download, and when I try to read the whole file I get the same error:
```
zcat 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz | tail
gzip: 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz: unexpected end of file
```
Can it be that some fastq files are corrupt? I get the error with these fastq files as well:
```
732_CER.FCC7BMUACXX_L8IGAGTGG.r1.fastq.gz
751_CER.FCC7KUNACXX_L8IACTGAT.r1.fastq.gz
775_CER.FCC7DRJACXX_L5IACTTGA.r1.fastq.gz
777_CER.FCC7L2YACXX_L7ITAGCTT.r1.fastq.gz
785_CER.FCC7KRCACXX_L7ICGTACG.r1.fastq.gz
786_CER.FCC7L0WACXX_L6IGGCTAC.r1.fastq.gz
791_CER.FCC7L0WACXX_L1ITTAGGC.r1.fastq.gz
797_CER.FCC7BMUACXX_L8IATTCCT.r1.fastq.gz
7015_CER.FCC7KUNACXX_L6ITAGCTT.r1.fastq.gz
725_CER.FCC7L0WACXX_L1IACTTGA.r1.fastq.gz
991_CER.FCC7KUNACXX_L3ICGTACG.r1.fastq.gz
813_CER.FCC7L0WACXX_L5IACTTGA.r1.fastq.gz
816_CER.FCC7L0WACXX_L7IGTGGCC.r1.fastq.gz
1058_CER.FCC7L0WACXX_L4IATTCCT.r1.fastq.gz
1085_CER.FCC79ETACXX_L1IACTTGA.r2.fastq.gz
1103_CER.FCC7KRCACXX_L5IATCACG.r2.fastq.gz
1103_CER.FCC7KRCACXX_L5IATCACG.r2.fastq.gz
1104_CER.FCC7KUNACXX_L4IACTGAT.r2.fastq.gz
1029_CER.FCC7L0WACXX_L2IGGCTAC.r2.fastq.gz
1036_CER.FCC7KUNACXX_L6IGGCTAC.r1.fastq.gz
11285_CER.FCC7L0WACXX_L3ICGTACG.r1.fastq.gz
11288_CER.FCC7KUNACXX_L2IGATCAG.r1.fastq.gz
1129_CER.FCC7L2YACXX_L8IGGCTAC.r2.fastq.gz
11300_CER.FCC7L0WACXX_L7IGTTTCG.r1.fastq.gz
11303_CER.FCC7L0WACXX_L2ITAGCTT.r1.fastq.gz
11374_CER.FCC7KRCACXX_L4IACTGAT.r1.fastq.gz
11396_CER.FCC7KUNACXX_L7IGTTTCG.r1.fastq.gz
11474_CER.FCC7DRJACXX_L4IATTCCT.r1.fastq.gz
11456_CER.FCC7KRCACXX_L6ITAGCTT.r2.fastq.gz
1146_CER.FCC7KRCACXX_L7IGTTTCG.r1.fastq.gz
11460_CER.FCC79ALACXX_L6ITTAGGC.r1.fastq.gz
11505_CER.FCC7L0WACXX_L5ITTAGGC.r1.fastq.gz
11507_CER.FCC7L0WACXX_L2IGATCAG.r1.fastq.gz
11491_CER.FCC7KKVACXX_L2ITAGCTT.r2.fastq.gz
1214_CER.FCC7KRCACXX_L4IATTCCT.r1.fastq.gz
11494_CER.FCC7KUNACXX_L1IACTTGA.r1.fastq.gz
11497_CER.FCC7L0WACXX_L1IATCACG.r1.fastq.gz
11500_CER.FCC7KUNACXX_L4IATTCCT.r1.fastq.gz
1933_CER.FCC7KUNACXX_L1ITTAGGC.r2.fastq.gz
1934_CER.FCC7E15ACXX_L4IACTGAT.r1.fastq.gz
142_CER.FCC7L0WACXX_L8IACTGAT.r2.fastq.gz
1963_CER.FCC7KRCACXX_L7IGTGGCC.r2.fastq.gz
6821_CER.FCC7KKVACXX_L8IATTCCT.r1.fastq.gz
6880_CER.FCC7L0WACXX_L4IACTGAT.r1.fastq.gz
```
Created by Niek de Klein NiekdeKlein Hi @NiekdeKlein,
head, tail, and more do produce the output you describe, but for some reason. But the less and wc -l commands don't see the file as gzipped. If you cp the file to .fastq it all appears to be gzipped. For now you're more than welcome to use the Bam files whoch were used to extract reads from located in syn5049322. You will just need to convert them to fastq first.
```
module load star/2.5.1b picard
sample=`basename $1 .snap.bam`
region=$2
# Define paths
rootdir="/sc/orga/projects/AMP_AD/reprocess"
indir="${rootdir}/inputs/Mayo/Mayo${region}-BAM-from-synapse"
fastqdir="${rootdir}/inputs/Mayo/Mayo${region}-fastq-from-synBam"
# Reference files
index='/sc/orga/projects/PBG/REFERENCES/GRCh38/star/Gencode24'
# Sort aligned BAM and convert to FASTQ
java -Xmx8G -jar $PICARD SortSam \
INPUT="${indir}/${1}" \
OUTPUT=/dev/stdout \
SORT_ORDER=queryname \
QUIET=true \
VALIDATION_STRINGENCY=SILENT \
COMPRESSION_LEVEL=0 \
| java -Xmx4G -jar $PICARD SamToFastq \
INPUT=/dev/stdin \
FASTQ="${fastqdir}/${sample}.r1.fastq" \
SECOND_END_FASTQ="${fastqdir}/${sample}.r2.fastq" \
VALIDATION_STRINGENCY=SILENT
# Zip FASTQ files
gzip "${fastqdir}/${sample}.r1.fastq"
gzip "${fastqdir}/${sample}.r2.fastq"
``` It Looks gzipped to me:
```
tail 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz
??????Fh?t?J?t??B?T9?=?w;??????j@????M?
????C??^>T?P.?r?J?j?|g?a?{/?J????????J??S???1@J?)??:?Sl???K?Y???
Ptt??
\??x????????W>@?\????GG?y??? 7NP?jy?n?I
```
When I move to fastq, then gzip it is double zipped (still binary after zcat)
```
mv 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq
gzip 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq
zcat 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz | tail
??????Fh?t?J?t??B?T9?=?w;??????j@????M?
```
Hi @NiekdeKlein,
Those files do not appear to be gzip'd despite having a .gz extension
```ls -ltr 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz
-rw-rw-r-- 1 jgockley jgockley 7072366592 Dec 16 22:13 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz
less 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz
@R0230412:381:C7KUNACXX:5:1101:10000:13114/1
+
CCCFFFFFFHAHHJJJIHJFIJJJEHHHGHJEGGIGHHJJIIEHHFFFDBC>A??@@;?B?BDDBDCDDCDDDDDDCDECCD@C>CCC@B@DC@@C@BBDB
mv 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq
gzip 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq
less 1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq.gz
^_<8B>^H^Hx^RX^@^C1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq^@<92>6$_f
<9A>j$3<99>$@<90>^DYl^?\5^?<92><8B>^H^@<99>U=?QA^B<81>^H^Ow^O^^Nxz<9C>#<9F>^?8^^^O<90>?^^
^_<8B>^H^Hx^RX^@^C1000_CER.FCC7KUNACXX_L5IATCACG.r1.fastq^@<92>6$_f
<9A>j$3<99>$@<90>^DYl^?\5^?<92><8B>^H^@<99>U=?QA^B<81>^H^Ow^O^^Nxz<9C>#<9F>^?8^^^O<90>?^^
```
Apologies for the inconvenience!
Drop files to upload
Are some of the fastq files of syn8612213 corrupt? page is loading…