EpiDiverse SNP Analysis
Documentation
- https://github.com/EpiDiverse/snp
- https://github.com/EpiDiverse/snp/blob/master/docs/usage.md (usage for options while running)
- Emma Strand’s Notebook Post
Set up
Make new directory:
[kevin_wong1@ssh3 Past_WGBS]$ pwd
/data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS
[kevin_wong1@ssh3 Past_WGBS]$ mkdir EpiDiverse
[kevin_wong1@ssh3 Past_WGBS]$ pwd
/data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS
[kevin_wong1@ssh3 Past_WGBS]$ cd EpiDiverse/
[kevin_wong1@ssh3 EpiDiverse]$ pwd
/data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse
Download the freebayes fasta_generate_regions.py
script:
[kevin_wong1@ssh3 EpiDiverse]$ wget https://github.com/freebayes/freebayes/blob/master/scripts/fasta_generate_regions.py
Clone the EpiDiverse GitHub:
[kevin_wong1@ssh3 EpiDiverse]$ git clone https://github.com/EpiDiverse/snp.git
Create the conda environment with mamba. This steo takes ~ 2 hours.
[kevin_wong1@ssh3 EpiDiverse]$ interactive
[kevin_wong1@n063 EpiDiverse]$ module load Mamba/22.11.1-4
[kevin_wong1@n063 EpiDiverse]$ mamba env create -f /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/snp/env/environment.yml
To view any environments: conda info --envs
You may be prompted to exit and re-enter Andromeda, that’s OK. Log back in and re-enter the interactive node.
conda activate snps
Now run sbatch episnp.sh command to run the desired script.
Run jobs in interactive node but be able to switch wifi
- ssh into HPC.
- Run the following command:
screen -S <name-of-my-session>
(Replace the (including the < and >) with a descriptive name of your job - don’t use spaces in the description. Run your job(s). - Close your computer, go home, switch WiFi networks, whatever. The job will continue running!!
- To get back to that session, ssh back into the HPC.
- The, resume the screen session:
screen -r <name-of-my-session>
. If you can’t remember the name, you can run screen -list. That will list any running screen sessions.
I named my session epi_TTM.
Run EpiDiverse
Run the following script:
[kevin_wong1@n063 EpiDiverse]$ screen -S epi_TTM
[kevin_wong1@n063 EpiDiverse]$ interactive
[kevin_wong1@n063 EpiDiverse]$ module load Mamba/22.11.1-4
[kevin_wong1@n063 EpiDiverse]$ conda info --envs #double check `snps` is there
[kevin_wong1@n063 EpiDiverse]$ conda activate snps
20230315 Attempt
nano episnp.sh
#!/bin/bash
#SBATCH -t 200:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=18
#SBATCH --export=NONE
#SBATCH --account=putnamlab
#SBATCH -D /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --error=output_messages/"%x_error.%j" #if your job fails, the error report will be put in this file
#SBATCH --output=output_messages/"%x_output.%j" #once your job is completed, any final job report comments will be put in this file
# load modules needed (specific need for my computer)
source /usr/share/Modules/init/sh # load the module function
# load modules needed
echo "START" $(date)
module load Nextflow/20.07.1 #this pipeline requires this version
module load SAMtools/1.9-foss-2018b
Pysam/0.15.1-foss-2018b-Python-3.6.6
# define location for fasta_generate_regions.py
fasta_generate_regions.py = /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/fasta_generate_regions.py
# only need to direct to input folder not *bam files
NXF_VER=20.07.1 nextflow run epidiverse/snp -resume \
--input /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/bismark_deduplicated/ \
--reference /data/putnamlab/kevin_wong1/Past_Genome/past_filtered_assembly.fasta \
--output /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/ \
--clusters \
--variants \
--coverage 5 \
--take 47 # Number of samples
echo "STOP" $(date) # this will output the time it takes to run within the output message
Did not work… trying to resolve container(?) issue:
(base) [kevin_wong1@n063 EpiDiverse]$ conda init
no change /opt/software/Mamba/22.11.1-4/condabin/conda
no change /opt/software/Mamba/22.11.1-4/bin/conda
no change /opt/software/Mamba/22.11.1-4/bin/conda-env
no change /opt/software/Mamba/22.11.1-4/bin/activate
no change /opt/software/Mamba/22.11.1-4/bin/deactivate
no change /opt/software/Mamba/22.11.1-4/etc/profile.d/conda.sh
no change /opt/software/Mamba/22.11.1-4/etc/fish/conf.d/conda.fish
no change /opt/software/Mamba/22.11.1-4/shell/condabin/Conda.psm1
no change /opt/software/Mamba/22.11.1-4/shell/condabin/conda-hook.ps1
no change /opt/software/Mamba/22.11.1-4/lib/python3.10/site-packages/xontrib/conda.xsh
no change /opt/software/Mamba/22.11.1-4/etc/profile.d/conda.csh
no change /home/kevin_wong1/.bashrc
No action taken.
20230808 Attempt
We are going to try this without activiating conda before running the script.
nano episnp.sh
#!/bin/bash
#SBATCH -t 200:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=18
#SBATCH --export=NONE
#SBATCH --account=putnamlab
#SBATCH -D /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --error="%x_error.%j" #if your job fails, the error report will be put in this file
#SBATCH --output="%x_output.%j" #once your job is completed, any final job report comments will be put in this file
# load modules needed (specific need for my computer)
source /usr/share/Modules/init/sh # load the module function
# load modules needed
echo "START" $(date)
module load Nextflow/20.07.1 #this pipeline requires this version
module load SAMtools/1.9-foss-2018b
module load Pysam/0.15.1-foss-2018b-Python-3.6.6
# define location for fasta_generate_regions.py
#fasta_generate_regions.py = ./fasta_generate_regions.py
# only need to direct to input folder not *bam files
NXF_VER=20.07.1 nextflow run epidiverse/snp -resume \
-profile conda \
--input /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/bismark_deduplicated/ \
--reference /data/putnamlab/kevin_wong1/Past_Genome/past_filtered_assembly.fasta \
--output /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/ \
--clusters \
--variants \
--coverage 5 \
--take 47 # Number of samples
echo "STOP" $(date) # this will output the time it takes to run within the output message
(base) [kevin_wong1@n063 EpiDiverse]$ less episnp.sh_error.273737
Exception in thread "Thread-3" groovy.lang.GroovyRuntimeException: exception while reading process stream
at org.codehaus.groovy.runtime.ProcessGroovyMethods$TextDumper.run(ProcessGroovyMethods.java:496)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: java.io.IOException: Stream closed
at java.base/java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:176)
at java.base/java.io.BufferedInputStream.read1(BufferedInputStream.java:289)
at java.base/java.io.BufferedInputStream.read(BufferedInputStream.java:351)
at java.base/sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at java.base/sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at java.base/sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.base/java.io.InputStreamReader.read(InputStreamReader.java:185)
at java.base/java.io.BufferedReader.fill(BufferedReader.java:161)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:326)
at java.base/java.io.BufferedReader.readLine(BufferedReader.java:392)
at org.codehaus.groovy.runtime.ProcessGroovyMethods$TextDumper.run(ProcessGroovyMethods.java:489)
... 1 more
[kevin_wong1@ssh3 EpiDiverse]$ less episnp.sh_output.273737
Error executing process > 'SNPS:preprocessing (18-227_S170_L004_R1_001_val_1_bismark_bt2_pe.deduplicated)'
Caused by:
Failed to create Conda environment
command: conda env create --prefix /glfs/brick01/gv0/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/work/conda/snps-8882ee7ea1a0aa0094bec65b6ca3edc3 --file /ho
me/kevin_wong1/.nextflow/assets/epidiverse/snp/env/environment.yml
status : 120
message:
==> WARNING: A newer version of conda exists. <==
current version: 22.11.1
latest version: 23.7.2
Please update conda by running
$ conda update -n base -c conda-forge conda
Or to minimize the number of packages updated during conda update use
conda install conda=23.7.2
It looks like we need to update or reference a different conda version. Check which conda versions are available on Andromeda
kevin_wong1@ssh3 EpiDiverse]$ module av -t |& grep -i conda
all/Anaconda3/2020.11
all/Anaconda3/2021.11
all/Anaconda3/2022.05
all/Anaconda3/4.2.0
all/Anaconda3/5.3.0
all/Anaconda3/default
all/Miniconda3/22.11.1-1 # I think it is using this one
all/Miniconda3/4.6.14
all/Miniconda3/4.7.10
all/Miniconda3/4.9.2
lang/Anaconda3/2020.11
lang/Anaconda3/2021.11
lang/Anaconda3/2022.05
lang/Anaconda3/4.2.0
lang/Anaconda3/5.3.0
lang/Miniconda3/22.11.1-1
lang/Miniconda3/4.6.14
lang/Miniconda3/4.7.10
lang/Miniconda3/4.9.2
Anaconda3/2020.11
Anaconda3/2021.11
Anaconda3/2022.05
Anaconda3/4.2.0
Anaconda3/5.3.0
Anaconda3/default
Miniconda3/22.11.1-1
Miniconda3/4.6.14
Miniconda3/4.7.10
Miniconda3/4.9.2
20230811 Attempt
Running the same script again but removing all previous auto-generated files.
Still got the same issue.
Error executing process > 'SNPS:preprocessing (18-227_S170_L004_R1_001_val_1_bismark_bt2_pe.deduplicated)'
Caused by:
Failed to create Conda environment
command: conda env create --prefix /glfs/brick01/gv0/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/work/conda/snps-8882ee7ea1a0aa0094bec65b6ca3edc3 --fil
e /home/kevin_wong1/.nextflow/assets/epidiverse/snp/env/environment.yml
status : 120
message:
==> WARNING: A newer version of conda exists. <==
current version: 22.11.1
latest version: 23.7.2
Please update conda by running
$ conda update -n base -c conda-forge conda
Or to minimize the number of packages updated during conda update use
conda install conda=23.7.2
I am going to try this with profile singularity instead
nano episnp.sh
#!/bin/bash
#SBATCH -t 200:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=18
#SBATCH --export=NONE
#SBATCH --account=putnamlab
#SBATCH -D /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --error="%x_error.%j" #if your job fails, the error report will be put in this file
#SBATCH --output="%x_output.%j" #once your job is completed, any final job report comments will be put in this file
# load modules needed (specific need for my computer)
source /usr/share/Modules/init/sh # load the module function
# load modules needed
echo "START" $(date)
module load Nextflow/20.07.1 #this pipeline requires this version
module load SAMtools/1.9-foss-2018b
module load Pysam/0.15.1-foss-2018b-Python-3.6.6
# define location for fasta_generate_regions.py
#fasta_generate_regions.py = ./fasta_generate_regions.py
# only need to direct to input folder not *bam files
NXF_VER=20.07.1 nextflow run epidiverse/snp -resume \
-profile singularity \
--input /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/bismark_deduplicated/ \
--reference /data/putnamlab/kevin_wong1/Past_Genome/past_filtered_assembly.fasta \
--output /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/ \
--clusters \
--variants \
--coverage 5 \
--take 47 # Number of samples
echo "STOP" $(date) # this will output the time it takes to run within the output message
Error in output file:
Error executing process > 'SNPS:preprocessing (18-346_S193_L004_R1_001_val_1_bismark_bt2_pe.deduplicated)'
Caused by:
Process `SNPS:preprocessing (18-346_S193_L004_R1_001_val_1_bismark_bt2_pe.deduplicated)` terminated with an error exit status (1)
Command executed:
samtools sort -T deleteme -m 966367642 -@ 4 \
-o sorted.bam 18-346_S193_L004_R1_001_val_1_bismark_bt2_pe.deduplicated.bam || exit $?
samtools calmd -b sorted.bam past_filtered_assembly.fasta 1> calmd.bam 2> /dev/null && rm sorted.bam
samtools index calmd.bam
Command exit status:
1
Command output:
(empty)
Command error:
INFO: Environment variable SINGULARITYENV_TMP is set, but APPTAINERENV_TMP is preferred
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
[E::hts_open_format] Failed to open file "18-346_S193_L004_R1_001_val_1_bismark_bt2_pe.deduplicated.bam" : No such file or directory
samtools sort: can't open "18-346_S193_L004_R1_001_val_1_bismark_bt2_pe.deduplicated.bam": No such file or directory
Work dir:
/glfs/brick01/gv0/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/work/bd/1bc1be4895ef173a94c9bbad215226
Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`
20230814 Attempt
As per the epidivere instructions, I need to create a folder for each sample.bam
First I will move the sorted bam files:
mkdir test_epi
cd ../bismark_deduplicated/
cp *.deduplicated_sorted.bam ../test_epi
Second, I will loop to make folders and move the correct sample into the folder:
for x in ./*.bam; do
mkdir "${x%.*}" && mv "$x" "${x%.*}"
done
nano episnp.sh
#!/bin/bash
#SBATCH -t 200:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=18
#SBATCH --export=NONE
#SBATCH --account=putnamlab
#SBATCH -D /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --error="%x_error.%j" #if your job fails, the error report will be put in this file
#SBATCH --output="%x_output.%j" #once your job is completed, any final job report comments will be put in this file
# load modules needed (specific need for my computer)
source /usr/share/Modules/init/sh # load the module function
# load modules needed
echo "START" $(date)
module load Nextflow/20.07.1 #this pipeline requires this version
module load SAMtools/1.9-foss-2018b
module load Pysam/0.15.1-foss-2018b-Python-3.6.6
# define location for fasta_generate_regions.py
#fasta_generate_regions.py = ./fasta_generate_regions.py
# only need to direct to input folder not *bam files
NXF_VER=20.07.1 nextflow run epidiverse/snp -resume \
-profile singularity \
--input /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/test_epidiverse/ \
--reference /data/putnamlab/kevin_wong1/Past_Genome/past_filtered_assembly.fasta \
--output /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/ \
--clusters \
--variants \
--coverage 5 \
--take 47 # Number of samples
echo "STOP" $(date) # this will output the time it takes to run within the output message
This produced an immediate error. I will try this on the non-sorted bam files.
20230815 Attempt
cd /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq
mkdir test_epidiverse2
cd bismark_deduplicated
nano epidiverse_prep
#!/bin/bash
#SBATCH -t 200:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=18
#SBATCH --export=NONE
#SBATCH --account=putnamlab
#SBATCH -D /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/bismark_deduplicated
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --error="%x_error.%j" #if your job fails, the error report will be put in this file
#SBATCH --output="%x_output.%j" #once your job is completed, any final job report comments will be put in this file
# copy files
cp *R1_001_val_1_bismark_bt2_pe.deduplicated.bam ../test_epidiverse2
Now make folders for each file:
cd ../test_epidiverse2
for x in ./*.bam; do
mkdir "${x%.*}" && mv "$x" "${x%.*}"
done
nano episnp.sh
#!/bin/bash
#SBATCH -t 200:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=18
#SBATCH --export=NONE
#SBATCH --account=putnamlab
#SBATCH -D /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --error="%x_error.%j" #if your job fails, the error report will be put in this file
#SBATCH --output="%x_output.%j" #once your job is completed, any final job report comments will be put in this file
# load modules needed (specific need for my computer)
source /usr/share/Modules/init/sh # load the module function
# load modules needed
echo "START" $(date)
module load Nextflow/20.07.1 #this pipeline requires this version
module load SAMtools/1.9-foss-2018b
module load Pysam/0.15.1-foss-2018b-Python-3.6.6
# define location for fasta_generate_regions.py
#fasta_generate_regions.py = ./fasta_generate_regions.py
# only need to direct to input folder not *bam files
NXF_VER=20.07.1 nextflow run epidiverse/snp -resume \
-profile singularity \
--input /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/test_epidiverse2/ \
--reference /data/putnamlab/kevin_wong1/Past_Genome/past_filtered_assembly.fasta \
--output /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/ \
--clusters \
--variants \
--coverage 5 \
--take 47 # Number of samples
echo "STOP" $(date) # this will output the time it takes to run within the output message
Immediately stops with this error:
ERROR: cannot find valid *.bam files in dir: /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/test_epidiverse2/
Maybe they don’t have to be in the different folders? Will try again with all the sample files in one folder
20230815 Attempt 2
cd /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq
mkdir test_epidiverse
cd bismark_deduplicated
nano epidiverse_prep.sh
#!/bin/bash
#SBATCH -t 200:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=18
#SBATCH --export=NONE
#SBATCH --account=putnamlab
#SBATCH -D /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/bismark_deduplicated
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --error="%x_error.%j" #if your job fails, the error report will be put in this file
#SBATCH --output="%x_output.%j" #once your job is completed, any final job report comments will be put in this file
# copy files
cp *.deduplicated_sorted.bam ../test_epidiverse
cd
nano episnp.sh
#!/bin/bash
#SBATCH -t 200:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=18
#SBATCH --export=NONE
#SBATCH --account=putnamlab
#SBATCH -D /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --error="%x_error.%j" #if your job fails, the error report will be put in this file
#SBATCH --output="%x_output.%j" #once your job is completed, any final job report comments will be put in this file
# load modules needed (specific need for my computer)
source /usr/share/Modules/init/sh # load the module function
# load modules needed
echo "START" $(date)
module load Nextflow/20.07.1 #this pipeline requires this version
module load SAMtools/1.9-foss-2018b
module load Pysam/0.15.1-foss-2018b-Python-3.6.6
# define location for fasta_generate_regions.py
#fasta_generate_regions.py = ./fasta_generate_regions.py
# only need to direct to input folder not *bam files
NXF_VER=20.07.1 nextflow run epidiverse/snp -resume \
-profile singularity \
--input /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/test_epidiverse/ \
--reference /data/putnamlab/kevin_wong1/Past_Genome/past_filtered_assembly.fasta \
--output /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/ \
--clusters \
--variants \
--coverage 5 \
--take 47 # Number of samples
echo "STOP" $(date) # this will output the time it takes to run within the output message
Got this error again:
Error executing process > 'SNPS:preprocessing (18-67_S176.deduplicated_sorted)'
Caused by:
Process `SNPS:preprocessing (18-67_S176.deduplicated_sorted)` terminated with an error exit status (1)
Command executed:
samtools sort -T deleteme -m 966367642 -@ 4 \
-o sorted.bam 18-67_S176.deduplicated_sorted.bam || exit $?
samtools calmd -b sorted.bam past_filtered_assembly.fasta 1> calmd.bam 2> /dev/null && rm sorted.bam
samtools index calmd.bam
Command exit status:
1
Command output:
(empty)
Command error:
INFO: Environment variable SINGULARITYENV_TMP is set, but APPTAINERENV_TMP is preferred
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
[E::hts_open_format] Failed to open file "18-67_S176.deduplicated_sorted.bam" : No such file or directory
samtools sort: can't open "18-67_S176.deduplicated_sorted.bam": No such file or directory
Work dir:
/glfs/brick01/gv0/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/work/5d/e6bdcee831ab4508170d1317d6025f
20230816 Attempt 1
Kevin Bryan suggested this:
Can you try putting this in your batch file before calling nextflow?
`export APPTAINER_BINDPATH=/data,/glfs`
If that doesn’t work, try referring to all of your paths as starting with /glfs/brick01/gv0 instead of /data. You can see that the “files” that it said it couldn’t find are symlinks to the /glfs path, but because the singularity container only binds what it thinks the working directory is, it binds it with /data instead of /glfs, but nextflow has created the symlinks using the canonical path (/glfs before launching the container), so the links aren’t valid.
nano episnp.sh
#!/bin/bash
#SBATCH -t 200:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=18
#SBATCH --export=NONE
#SBATCH --account=putnamlab
#SBATCH -D /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --error="%x_error.%j" #if your job fails, the error report will be put in this file
#SBATCH --output="%x_output.%j" #once your job is completed, any final job report comments will be put in this file
# load modules needed (specific need for my computer)
source /usr/share/Modules/init/sh # load the module function
#Forcing the glfs path
export APPTAINER_BINDPATH=/data,/glfs
# load modules needed
echo "START" $(date)
module load Nextflow/20.07.1 #this pipeline requires this version
module load SAMtools/1.9-foss-2018b
module load Pysam/0.15.1-foss-2018b-Python-3.6.6
# define location for fasta_generate_regions.py
#fasta_generate_regions.py = ./fasta_generate_regions.py
# only need to direct to input folder not *bam files
NXF_VER=20.07.1 nextflow run epidiverse/snp -resume \
-profile singularity \
--input /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/test_epidiverse/ \
--reference /data/putnamlab/kevin_wong1/Past_Genome/past_filtered_assembly.fasta \
--output /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/ \
--clusters \
--variants \
--coverage 5 \
--take 47 # Number of samples
echo "STOP" $(date) # this will output the time it takes to run within the output message
This produced the same error. Let me try putting the full glfs path
20230816 Attempt 2
nano episnp.sh
#!/bin/bash
#SBATCH -t 200:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=18
#SBATCH --export=NONE
#SBATCH --account=putnamlab
#SBATCH -D /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --error="%x_error.%j" #if your job fails, the error report will be put in this file
#SBATCH --output="%x_output.%j" #once your job is completed, any final job report comments will be put in this file
# load modules needed (specific need for my computer)
source /usr/share/Modules/init/sh # load the module function
#Forcing the glfs path
export APPTAINER_BINDPATH=/data,/glfs/brick01/gv0
# load modules needed
echo "START" $(date)
module load Nextflow/20.07.1 #this pipeline requires this version
module load SAMtools/1.9-foss-2018b
module load Pysam/0.15.1-foss-2018b-Python-3.6.6
# define location for fasta_generate_regions.py
#fasta_generate_regions.py = ./fasta_generate_regions.py
# only need to direct to input folder not *bam files
NXF_VER=20.07.1 nextflow run epidiverse/snp -resume \
-profile singularity \
--input /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/test_epidiverse/ \
--reference /data/putnamlab/kevin_wong1/Past_Genome/past_filtered_assembly.fasta \
--output /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/ \
--clusters \
--variants \
--coverage 5 \
--take 47 # Number of samples
echo "STOP" $(date) # this will output the time it takes to run within the output message
This also did not work. Let me try replacing all of the paths as Kevin suggested:
20230816 Attempt 3
nano episnp.sh
#!/bin/bash
#SBATCH -t 200:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=18
#SBATCH --export=NONE
#SBATCH --account=putnamlab
#SBATCH -D /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --error="%x_error.%j" #if your job fails, the error report will be put in this file
#SBATCH --output="%x_output.%j" #once your job is completed, any final job report comments will be put in this file
# load modules needed (specific need for my computer)
source /usr/share/Modules/init/sh # load the module function
# load modules needed
echo "START" $(date)
module load Nextflow/20.07.1 #this pipeline requires this version
module load SAMtools/1.9-foss-2018b
module load Pysam/0.15.1-foss-2018b-Python-3.6.6
# define location for fasta_generate_regions.py
#fasta_generate_regions.py = ./fasta_generate_regions.py
# only need to direct to input folder not *bam files
NXF_VER=20.07.1 nextflow run epidiverse/snp -resume \
-profile singularity \
--input /glfs/brick01/gv0/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/test_epidiverse/ \
--reference /glfs/brick01/gv0/putnamlab/kevin_wong1/Past_Genome/past_filtered_assembly.fasta \
--output /glfs/brick01/gv0/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/ \
--clusters \
--variants \
--coverage 5 \
--take 47 # Number of samples
echo "STOP" $(date) # this will output the time it takes to run within the output message
Still getting the same error….
Error executing process > 'SNPS:preprocessing (18-67_S176.deduplicated_sorted)'
Caused by:
Process `SNPS:preprocessing (18-67_S176.deduplicated_sorted)` terminated with an error exit status (1)
Command executed:
samtools sort -T deleteme -m 966367642 -@ 4 \
-o sorted.bam 18-67_S176.deduplicated_sorted.bam || exit $?
samtools calmd -b sorted.bam past_filtered_assembly.fasta 1> calmd.bam 2> /dev/null && rm sorted.bam
samtools index calmd.bam
Command exit status:
1
Command output:
(empty)
Command error:
INFO: Environment variable SINGULARITYENV_TMP is set, but APPTAINERENV_TMP is preferred
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
[E::hts_open_format] Failed to open file "18-67_S176.deduplicated_sorted.bam" : No such file or directory
samtools sort: can't open "18-67_S176.deduplicated_sorted.bam": No such file or directory
Work dir:
/glfs/brick01/gv0/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/work/cf/d934a34af1e0ccf6bf52ac672daffa
20230816 Attempt 4
Next I will try Kevin’s other suggestion of modifying the config file:
nano data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/snp/assets/custom.config
process {
executor = 'pbspro'
// with conda
module = ['Miniconda3']
conda = "${baseDir}/env/environment.yml"
// with docker/singularity
container = "epidiverse/dmr"
containerOptions '--volume /glfs:/glfs' #adding this line here
20230816
nano episnp.sh
#!/bin/bash
#SBATCH -t 200:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=18
#SBATCH --export=NONE
#SBATCH --account=putnamlab
#SBATCH -D /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --error="%x_error.%j" #if your job fails, the error report will be put in this file
#SBATCH --output="%x_output.%j" #once your job is completed, any final job report comments will be put in this file
# load modules needed (specific need for my computer)
source /usr/share/Modules/init/sh # load the module function
# load modules needed
echo "START" $(date)
module load Nextflow/20.07.1 #this pipeline requires this version
module load SAMtools/1.9-foss-2018b
module load Pysam/0.15.1-foss-2018b-Python-3.6.6
# define location for fasta_generate_regions.py
#fasta_generate_regions.py = ./fasta_generate_regions.py
# only need to direct to input folder not *bam files
NXF_VER=20.07.1 nextflow run epidiverse/snp -resume \
-profile singularity \
--input /glfs/brick01/gv0/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/test_epidiverse/ \
--reference /glfs/brick01/gv0/putnamlab/kevin_wong1/Past_Genome/past_filtered_assembly.fasta \
--output /glfs/brick01/gv0/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/ \
--clusters \
--variants \
--coverage 5 \
--take 47 # Number of samples
echo "STOP" $(date) # this will output the time it takes to run within the output message
Got the same error :(
Error executing process > 'SNPS:preprocessing (18-202_S188.deduplicated_sorted)'
Caused by:
Process `SNPS:preprocessing (18-202_S188.deduplicated_sorted)` terminated with an error exit status (1)
Command executed:
samtools sort -T deleteme -m 966367642 -@ 4 \
-o sorted.bam 18-202_S188.deduplicated_sorted.bam || exit $?
samtools calmd -b sorted.bam past_filtered_assembly.fasta 1> calmd.bam 2> /dev/null && rm sorted.bam
samtools index calmd.bam
Command exit status:
1
Command output:
(empty)
Command error:
INFO: Environment variable SINGULARITYENV_TMP is set, but APPTAINERENV_TMP is preferred
INFO: Environment variable SINGULARITYENV_TMPDIR is set, but APPTAINERENV_TMPDIR is preferred
INFO: Environment variable SINGULARITYENV_NXF_DEBUG is set, but APPTAINERENV_NXF_DEBUG is preferred
[E::hts_open_format] Failed to open file "18-202_S188.deduplicated_sorted.bam" : No such file or directory
samtools sort: can't open "18-202_S188.deduplicated_sorted.bam": No such file or directory
Work dir:
/glfs/brick01/gv0/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/work/c8/e2a9691fd0b2b35e2b03bbf311a5dd
Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line
20230821 Attempt
nano episnp.sh
#!/bin/bash
#SBATCH -t 200:00:00
#SBATCH --nodes=1 --ntasks=1 --cpus-per-task=18
#SBATCH --export=NONE
#SBATCH --account=putnamlab
#SBATCH -D /data/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse
#SBATCH --mail-type=BEGIN,END,FAIL
#SBATCH --mail-user=kevin_wong1@uri.edu
#SBATCH --error="%x_error.%j" #if your job fails, the error report will be put in this file
#SBATCH --output="%x_output.%j" #once your job is completed, any final job report comments will be put in this file
# load modules needed (specific need for my computer)
#source /usr/share/Modules/init/sh # load the module function
# load modules needed
echo "START" $(date)
module load Anaconda3/2022.05
module load Nextflow/20.07.1 #this pipeline requires this version
#module load SAMtools/1.9-foss-2018b
#module load Pysam/0.15.1-foss-2018b-Python-3.6.6
# define location for fasta_generate_regions.py
#fasta_generate_regions.py = ./fasta_generate_regions.py
#make conda env
conda env create --prefix /glfs/brick01/gv0/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/work/conda/snps-8882ee7ea1a0aa0094bec65b6ca3edc2 --file /home/kevin_wong1/.nextflow/assets/epidiverse/snp/env/environment.yml --force
conda activate snps
# only need to direct to input folder not *bam files
NXF_VER=20.07.1 nextflow run epidiverse/snp -resume \
-profile conda \
--input /glfs/brick01/gv0/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/methylseq_trim3/WGBS_methylseq/test_epidiverse/ \
--reference /glfs/brick01/gv0/putnamlab/kevin_wong1/Past_Genome/past_filtered_assembly.fasta \
--output /glfs/brick01/gv0/putnamlab/kevin_wong1/Thermal_Transplant_WGBS/Past_WGBS/EpiDiverse/ \
--clusters \
--variants \
--coverage 5 \
--take 47 # Number of samples
echo "STOP" $(date) # this will output the time it takes to run within the output message