20210104 BUSCO on P. astreoides transcriptome assembly

Project: Porites Genome Assembly

Goal

Assessing the completness of our transcriptome assembly from Trinity using BUSCO. The pipeline used for running BUSCO on Bluewaves can be found here.

This script had to be re-run with a new shell script to specifify transcriptome mode for BUSCO

Making the new shell script in transcriptome mode
nano run-busco-transcriptome.sh

#!/bin/bash

#SBATCH --job-name="busco"
#SBATCH --time="100:00:00"
#SBATCH --nodes 1 --ntasks-per-node=20
#SBATCH --mem=250G
##SBATCH --output="busco-%u-%x-%j"
##SBATCH --account=putnamlab
##SBATCH --export=NONE

echo "START" $(date)

labbase=/data/putnamlab
busco_shared="${labbase}/shared/busco"
[ -z "$query" ] && query="${labbase}/REFS/Past/Past_genome_filtered_v1_Genewiz.fasta" # set this to the query (genome/transcriptome) you are running
[ -z "$db_to_compare" ] && db_to_compare="${busco_shared}/downloads/lineages/metazoa_odb10"

source "${busco_shared}/scripts/busco_init.sh"  # sets up the modules required for this in the right order

# we require the agustus_config/ directory copied to a "writetable" location for
# busco to run and AUGUSTUS_CONFIG_PATH set to that

if [ ! -d "${labbase}/${USER}/agustus_config" ] ; then
    echo -e "Copying agustus_config/ to ${labbase}/${USER} .. "
    tar -C "${labbase}/${USER}" -xzf "${busco_shared}/agustus_config.tgz"
    echo done
fi

export AUGUSTUS_CONFIG_PATH="${labbase}/${USER}/agustus_config"
# This will generate output under your $HOME/busco_output
cd "${labbase}/${USER}"
busco --config "${busco_shared}/scripts/busco-config.ini"  -f -c 20 --long -i "${query}" -l "${db_to_compare}" -o busco_output -m transcriptome

echo "STOP" $(date)

Running BUSCO on assembled reference transcriptome

sbatch -o ~/%u-%x.%j.out -e ~/%u-%x.%j.err \
       --export query=/data/putnamlab/kevin_wong1/20201221_P.astreoides_Ref_Transcriptome/trinity_5/trinity_out_dir.Trinity.fasta  \
       /data/putnamlab/kevin_wong1/scripts/run-busco-transcriptome.sh

Submitted batch job 1816763

Results (20210106)

# BUSCO version is: 4.0.6
# The lineage dataset is: metazoa_odb10 (Creation date: 2019-11-20, number of species: 65, number of BUSCOs: 954)
# Summarized benchmarking in BUSCO notation for file /data/putnamlab/kevin_wong1/20201221_P.astreoides_Ref_Transcriptome/trinity_5/trinity_$
# BUSCO was run in mode: transcriptome

        ***** Results: *****

        C:21.5%[S:14.6%,D:6.9%],F:36.1%,M:42.4%,n:954
        205     Complete BUSCOs (C)
        139     Complete and single-copy BUSCOs (S)
        66	Complete and duplicated BUSCOs (D)
        344     Fragmented BUSCOs (F)
        405     Missing BUSCOs (M)
        954     Total BUSCO groups searched

Written on January 4, 2021