KofamScan Workflow
This post is inspired by E. Chille’s and D. Becker’s posts on genomic functional annotations.
Map KEGG terms to genome
KofamScan is the command-line version of the popular KofamKOALA web-based tool, used to map Kegg terms (containing pathway information) to a genes. KofamScan and KofamKoala work by using HMMER/HMMSEARCH to search against KOfam (a customized HMM database of KEGG Orthologs (KOs). Mappings are considered robust because each Kegg term has an individual pre-defined threshold that a score has to exceed in order to map to a gene. While all mappings are outputted, high scoring (significant) assignments are highlighted with an asterisk.
The commands that I used are below. In order to run KofamScan, you will need a fasta file of predicted protein sequences (preferably the same one used to run InterProScan).
General Protocol:
1. Download and inflate the Kofam database
To get the most up-to-date Kofam database, download it just before running KofamScan. You will also need to download the profiles associated with the Kofam database containing threshold information.
cd /nethome/kxw755/opt
curl -O ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz #download and unzip KO database
curl -O ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz #download and inflate profiles
gunzip ko_list.gz
tar xf profiles.tar.gz
Downloaded 20230323
Install Anaconda on pegasus (only if you have not done this before):
wget https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh
bash Anaconda3-2021.05-Linux-x86_64.sh
source /nethome/kxw755/anaconda3/bin/activate
Installed 20230323
Install KofamScan:
cd /nethome/kxw755/opt/
conda install -c bioconda kofamscan
something did not install correctly here. Need to revisit
I am going to excute this on andromeda instead
mkdir kofamscan
cd ../../data/putnamlab/kevin_wong1/kofamscan
curl -O ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz #download and unzip KO database
curl -O ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz #download and inflate profiles
gunzip ko_list.gz
tar xf profiles.tar.gz
Downloaded 20230323
Installed 20230323
2. Run KofamScan
Download Mnemi protein file:
wget https://research.nhgri.nih.gov/mnemiopsis/download/proteome/ML2.2.aa.gz
gunzip ML2.2.aa.gz
nano kofamscan_ML.sh
#!/bin/bash
#SBATCH --job-name="KofamScan"
#SBATCH -t 30-00:00:00
#SBATCH --export=NONE
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --nodes=1 --ntasks-per-node=20
#SBATCH --mem=100GB
#SBATCH -D /data/putnamlab/kevin_wong1/kofamscan
echo "Loading modules" $(date)
module load kofam_scan/1.3.0-foss-2019b
module load libyaml/0.1.5
module unload HMMER/3.3.1-foss-2019b
module load HMMER/3.3.2-gompi-2019b
module list
#echo "Starting analysis... downloading KO database" $(date)
#wget ftp://ftp.genome.jp/pub/db/kofam/ko_list.gz #download KO database
#wget ftp://ftp.genome.jp/pub/db/kofam/profiles.tar.gz
#gunzip ko_list.gz
#tar xf profiles.tar.gz
echo "Beginning mapping" $(date)
/opt/software/kofam_scan/1.3.0-foss-2019b/exec_annotation \
-o Mnemi_KO_annot.txt \
-k ./ko_list \
-p ./profiles/eukaryote.hal \
-E 0.00001 \
-f detail-tsv \
--report-unannotated ./ML2.2.aa
echo "Analysis complete!" $(date)
scp -r kevin_wong1@ssh3.hac.uri.edu:/data/putnamlab/kevin_wong1/kofamscan/Mnemi_KO_annot.txt /Users/kevinwong/MyProjects/Mnemi_Phagocyte/output/KofamScan/Mnemi_KO_annot.txt