Gfas_Syms_CellRanger_Analysis
Goal: To re-analyze the 10X single cell data with the combined genomic references of Galaxea fasicularis and potential symbionts
Reference Genomes:
- Galaxea fasicularis v1
- From Reef Genomics
- Download site
- Durisdinum trenchii (SCF082)
- Breviolum (Symbiodinium minutum)
- Cladocopium
- Download site
- alternate Download site
- Reference: Liu H, Stephens TG, González-Pech RA, Beltran VH, Lapeyre B, Bongaerts P, Cooke I, Aranda M, Bourne DG, Forêt S, Miller DJ, van Oppen MJH, Voolstra CR, Ragan MA, Chan CX Symbiodinium genomes reveal adaptive evolution of functions related to coral-dinoflagellate symbiosis. Commun Biol. 2018;1():95. doi: 10.1038/s42003-018-0098-3
- Symbiodinium microadriaticum
- Download site
- alternate Download Site
- Reference: Aranda M, Li Y, Liew YJ, Baumgarten S, Simakov O, Wilson MC, Piel J, Ashoor H, Bougouffa S, Bajic VB, Ryu T, Ravasi T, Bayer T, Micklem G, Kim H, Bhak J, LaJeunesse TC, Voolstra CR Genomes of coral dinoflagellate symbionts highlight evolutionary adaptations conducive to a symbiotic lifestyle. Sci Rep. 2016 Dec 22;6():39734. doi: 10.1038/srep39734
File upload and organization
mkdir sym_genomes
cd sym_genomes/
scp -r symA3_symb.gff.gz kxw755@pegasus.ccs.miami.edu:/nethome/kxw755/sym_genomes/symA3_symb.gff.gz
scp -r symC_40.gff.gz kxw755@pegasus.ccs.miami.edu:/nethome/kxw755/sym_genomes/symC_40.gff.gz
scp -r symbB.v1.2.augustus.gff3.gz kxw755@pegasus.ccs.miami.edu:/nethome/kxw755/sym_genomes/symbB.v1.2.augustus.gff3.gz
scp -r symA3_37.fasta.gz kxw755@pegasus.ccs.miami.edu:/nethome/kxw755/sym_genomes/symA3_37.fasta.gz
scp -r symbB.v1.0.genome.fa.gz kxw755@pegasus.ccs.miami.edu:/nethome/kxw755/sym_genomes/symbB.v1.0.genome.fa.gz
scp -r symC_scaffold_40.fasta.gz kxw755@pegasus.ccs.miami.edu:/nethome/kxw755/sym_genomes/symC_scaffold_40.fasta.gz
the sym C and A gff files did not have exon reads and cell ranger could not make the reference genome. I will download the following to see if these work
scp -r Clago1_AssemblyScaffolds_Repeatmasked.fasta.gz kxw755@pegasus.ccs.miami.edu:/nethome/kxw755/sym_genomes/Clago1_AssemblyScaffolds_Repeatmasked.fasta.gz
scp -r Clago1_GeneCatalog_genes_20200812.gff3.gz kxw755@pegasus.ccs.miami.edu:/nethome/kxw755/sym_genomes/Clago1_GeneCatalog_genes_20200812.gff3.gz
scp -r Symmic1_AssemblyScaffolds_Repeatmasked.fasta.gz kxw755@pegasus.ccs.miami.edu:/nethome/kxw755/sym_genomes/Symmic1_AssemblyScaffolds_Repeatmasked.fasta.gz
scp -r Symmic1_all_genes_20180603.gff3.tgz kxw755@pegasus.ccs.miami.edu:/nethome/kxw755/sym_genomes/Symmic1_all_genes_20180603.gff3.tgz
Unzip and convert to gff3
gunzip symA3_symb.gff.gz
mv symA3_symb.gff symA3_symb.gff3
gunzip symbB.v1.2.augustus.gff3.gz
gunzip symC_40.gff.gz
mv symC_40.gff symC_40.gff3
gunzip Dtrenchii_SCF082_ANNOT_gff.gz
mv Dtrenchii_SCF082_ANNOT_gff Dtrenchii_SCF082_ANNOT.gff3
gunzip symA3_37.fasta.gz
gunzip symbB.v1.0.genome.fa.gz
gunzip symC_scaffold_40.fasta.gz
gunzip Dtrenchii_SCF082_ASSEMBLY_fasta.gz
mv Dtrenchii_SCF082_ASSEMBLY_fasta Dtrenchii_SCF082_ASSEMBLY.fasta
gunzip Clago1_AssemblyScaffolds_Repeatmasked.fasta.gz
gunzip Clago1_GeneCatalog_genes_20200812.gff3.gz
tar -xvzf Symmic1_all_genes_20180603.gff3.tgz
cd /nethome/kxw755/sym_genomes/global/projectb/sandbox/fungal/data/Symbiodinium_microadriaticum/Symmic1/download
cp Symmic1.ExternalModels.gff3 ../../../../../../../../
gunzip Symmic1_AssemblyScaffolds_Repeatmasked.fasta.gz
these gff3 files seem to have all the genomic information rather than just CDS
Converting this GFF3 file to GTF:
nano convert.job
#!/bin/bash
#BSUB -J scSeq_convertGTF
#BSUB -q general
#BSUB -P dark_genes
#BSUB -n 6
#BSUB -W 12:00
#BSUB -u kxw755@earth.miami.edu
#BSUB -o dtre_convertGTF.out
#BSUB -e dtre_convertGTF.err
###################################################################
module load cufflinks/2.2.1
gffread Dtrenchii_SCF082_ANNOT.gff3 -T -o Dtrenchii_SCF082_ANNOT.gtf
bsub < convert.job
nano convert2.job
#!/bin/bash
#BSUB -J scSeq_convertGTF
#BSUB -q general
#BSUB -P dark_genes
#BSUB -n 6
#BSUB -W 12:00
#BSUB -u kxw755@earth.miami.edu
#BSUB -o syms_convertGTF.out
#BSUB -e syms_convertGTF.err
###################################################################
module load cufflinks/2.2.1
gffread symA3_symb.gff3 -T -o symA3_symb.gtf
gffread symbB.v1.2.augustus.gff3 -T -o symbB.v1.2.augustus.gtf
gffread symC_40.gff3 -T -o symC_40.gtf
bsub < convert2.job
nano convert3.job
#!/bin/bash
#BSUB -J scSeq_convertGTF
#BSUB -q general
#BSUB -P dark_genes
#BSUB -n 6
#BSUB -W 12:00
#BSUB -u kxw755@earth.miami.edu
#BSUB -o syms_convertGTF3.out
#BSUB -e syms_convertGTF3.err
###################################################################
module load cufflinks/2.2.1
gffread Clago1_GeneCatalog_genes_20200812.gff3 -T -o Clago1_GeneCatalog_genes_20200812.gtf
gffread Symmic1.ExternalModels.gff3 -T -o Symmic1.ExternalModels.gtf
bsub < convert3.job
Making combined genome reference
nano mkgtf_syms.job
#!/bin/bash
#BSUB -J scSeq_mkgtf_syms
#BSUB -q bigmem
#BSUB -P dark_genes
#BSUB -n 16
#BSUB -W 120:00
#BSUB -R "rusage[mem=15000]"
#BSUB -u kxw755@earth.miami.edu
#BSUB -o gfas_syms_mkgtf_v2.out
#BSUB -e gfas_syms_mkgtf_v2.err
###################################################################
cellranger mkref \
--genome=gfas_1.0 --fasta=/nethome/kxw755/Gfas_v1/gfas_final_1.0.fasta --genes=/nethome/kxw755/Gfas_v1/gfas_1.0.filtered.gtf \
--genome=symA --fasta=/nethome/kxw755/sym_genomes/Symmic1_AssemblyScaffolds_Repeatmasked.fasta --genes=/nethome/kxw755/sym_genomes/Symmic1.ExternalModels.gtf \
--genome=symB --fasta=/nethome/kxw755/sym_genomes/symbB.v1.0.genome.fa --genes=/nethome/kxw755/sym_genomes/symbB.v1.2.augustus.gtf \
--genome=symC --fasta=/nethome/kxw755/sym_genomes/Clago1_AssemblyScaffolds_Repeatmasked.fasta --genes=/nethome/kxw755/sym_genomes/Clago1_GeneCatalog_genes_20200812.gtf \
--genome=symD --fasta=/nethome/kxw755/sym_genomes/Dtrenchii_SCF082_ASSEMBLY.fasta --genes=/nethome/kxw755/sym_genomes/Dtrenchii_SCF082_ANNOT.gtf \
--memgb 20
bsub < mkgtf_syms.job
error:
Fatal LIMIT error: the number of junctions to be inserted on the fly =2833619 is larger than the limitSjdbInsertNsj=1000000
Fatal LIMIT error: the number of junctions to be inserted on the fly =2833619 is larger than the limitSjdbInsertNsj=1000000
SOLUTION: re-run with at least --limitSjdbInsertNsj 2833619
Mar 26 16:12:54 ...... FATAL ERROR, exiting
Maybe lets try this without symA and see if this frees up the memory/RAM usage
nano mkgtf_syms2.job
#!/bin/bash
#BSUB -J scSeq_mkgtf_syms2
#BSUB -q bigmem
#BSUB -P dark_genes
#BSUB -n 16
#BSUB -W 120:00
#BSUB -R "rusage[mem=15000]"
#BSUB -u kxw755@earth.miami.edu
#BSUB -o gfas_syms_mkgtf_v2.out
#BSUB -e gfas_syms_mkgtf_v2.err
###################################################################
cellranger mkref \
--genome=gfas_1.0 --fasta=/nethome/kxw755/Gfas_v1/gfas_final_1.0.fasta --genes=/nethome/kxw755/Gfas_v1/gfas_1.0.filtered.gtf \
--genome=symB --fasta=/nethome/kxw755/sym_genomes/symbB.v1.0.genome.fa --genes=/nethome/kxw755/sym_genomes/symbB.v1.2.augustus.gtf \
--genome=symC --fasta=/nethome/kxw755/sym_genomes/Clago1_AssemblyScaffolds_Repeatmasked.fasta --genes=/nethome/kxw755/sym_genomes/Clago1_GeneCatalog_genes_20200812.gtf \
--genome=symD --fasta=/nethome/kxw755/sym_genomes/Dtrenchii_SCF082_ASSEMBLY.fasta --genes=/nethome/kxw755/sym_genomes/Dtrenchii_SCF082_ANNOT.gtf
bsub < mkgtf_syms2.job
Fatal LIMIT error: the number of junctions to be inserted on the fly =1992803 is larger than the limitSjdbInsertNsj=1000000
Fatal LIMIT error: the number of junctions to be inserted on the fly =1992803 is larger than the limitSjdbInsertNsj=1000000
SOLUTION: re-run with at least --limitSjdbInsertNsj 1992803
Mar 27 11:04:18 ...... FATAL ERROR, exiting
okay - according to this link I need to adjust a parameter in the STAR file, but I don’t think I have access. I am going to just map the symbionts with out the Galaxea genome and see what hits.
nano mkgtf_syms_only.job
#!/bin/bash
#BSUB -J scSeq_mkgtf_symsonly
#BSUB -q bigmem
#BSUB -P dark_genes
#BSUB -n 16
#BSUB -W 120:00
#BSUB -R "rusage[mem=15000]"
#BSUB -u kxw755@earth.miami.edu
#BSUB -o syms_mkgtf_only.out
#BSUB -e syms_mkgtf_only.err
###################################################################
cellranger mkref \
--genome=symA --fasta=/nethome/kxw755/sym_genomes/Symmic1_AssemblyScaffolds_Repeatmasked.fasta --genes=/nethome/kxw755/sym_genomes/Symmic1.ExternalModels.gtf \
--genome=symB --fasta=/nethome/kxw755/sym_genomes/symbB.v1.0.genome.fa --genes=/nethome/kxw755/sym_genomes/symbB.v1.2.augustus.gtf \
--genome=symC --fasta=/nethome/kxw755/sym_genomes/Clago1_AssemblyScaffolds_Repeatmasked.fasta --genes=/nethome/kxw755/sym_genomes/Clago1_GeneCatalog_genes_20200812.gtf \
--genome=symD --fasta=/nethome/kxw755/sym_genomes/Dtrenchii_SCF082_ASSEMBLY.fasta --genes=/nethome/kxw755/sym_genomes/Dtrenchii_SCF082_ANNOT.gtf
bsub < mkgtf_syms_only.job
Rerunning Count with the combined genome
Run 1
cd /nethome/kxw755/20231004_SingleCell_DG/
nano count_W045_deep_syms.job
#BSUB -J count_W045_deep_syms
#BSUB -q general
#BSUB -P dark_genes
#BSUB -n 6
#BSUB -W 120:00
#BSUB -u kxw755@earth.miami.edu
#BSUB -o count.out
#BSUB -e count.err
#BSUB -B
#BSUB -N
###################################################################
cellranger count \
--id=W-045_1_deep_syms \
--transcriptome=/nethome/kxw755/sym_genomesgfas_1.0_and_symA3_and_symB_and_symC_and_symD \
--fastqs=/nethome/kxw755/20231004_SingleCell_DG \
--sample=AndradeRodriguez-15275-001_GEX3
bsub < count_W045_deep_syms.job
Export:
scp -r kxw755@pegasus.ccs.miami.edu:/nethome/kxw755/20231004_SingleCell_DG/W-045_1_deep_combgenome/outs/filtered_feature_bc_matrix.h5 /Users/kevinwong/MyProjects/DarkGenes_Bleaching_Comparison/output/CellRanger/20240326_W045_combgeno_filt_feature_bc_matrix.h5
scp -r kxw755@pegasus.ccs.miami.edu:/nethome/kxw755/20231004_SingleCell_DG/W-045_1_deep_combgenome/outs/web_summary.html /Users/kevinwong/MyProjects/DarkGenes_Bleaching_Comparison/output/CellRanger/20240326_W045_combgeno_web_summary.html