Geneextend on scRNAseq data

GeneExt

GeneExt is a program from the Sebé-Pedrós Lab which has a great manual that explains this probem with 3’ biased sequencing approaches in non-model organism in great detail and with helpful nuance. The program allows the user to create a modified GTF/GFF file based on a BAM file of mapping data. It appends 3’ UTRs to the genes in a way that is informed by the mapping of reads, so the length of the 3’ UTRs added can change dynamically for each gene based on where reads are acutally mapping.

See manual here

The code from this post was inspired by Zoe Dellaert.

Installing GeneExt on Pegasus

cd nethome/kxw755/
source anaconda3/bin/activate 

# Copy repo
git clone https://github.com/sebepedroslab/GeneExt.git

# create environment
conda env create --prefix geneext -f GeneExt/environment.yaml

#activate conda
conda activate /projectnb/pegasus/nethome/kxw755/geneext

# Test run 
cd GeneExt
python geneext.py -g test_data/annotation.gtf -b test_data/alignments.bam -o result.gtf --peak_perc 0

output:



      ____                 _____      _
     / ___| ___ _ __   ___| ____|_  _| |_
    | |  _ / _ \ '_ \ / _ \  _| \ \/ / __|
    | |_| |  __/ | | |  __/ |___ >  <| |_
     \____|\___|_| |_|\___|_____/_/\_\__|

          ______    ___    ______
    -----[______]==[___]==[______]===>----

    Gene model adjustment for improved single-cell RNA-seq data counting


╭──────────────────╮
│ Preflight checks │
╰──────────────────╯
Genome annotation warning: Could not find "gene" features in test_data/annotation.gtf! Trying to fix ...
Running macs2 ... 
########## macs2 FAILED ##############
return code:  1 
Output:  Traceback (most recent call last):
  File "/nethome/kxw755/conda/envs/GeneExt/geneext/bin/macs2", line 653, in <module>
    main()
  File "/nethome/kxw755/conda/envs/GeneExt/geneext/bin/macs2", line 49, in main
    from MACS2.callpeak_cmd import run
  File "/nethome/kxw755/conda/envs/GeneExt/geneext/lib/python3.9/site-packages/MACS2/callpeak_cmd.py", line 23, in <module>
    from MACS2.OptValidator import opt_validate
  File "/nethome/kxw755/conda/envs/GeneExt/geneext/lib/python3.9/site-packages/MACS2/OptValidator.py", line 20, in <module>
    from MACS2.IO.Parser import BEDParser, ELANDResultParser, ELANDMultiParser, \
  File "__init__.pxd", line 206, in init MACS2.IO.Parser
ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

Traceback (most recent call last):
  File "/projectnb/pegasus/nethome/kxw755/conda/envs/GeneExt/geneext.py", line 703, in <module>
    helper.run_macs2(tempdir+'/' + 'plus.bam','plus',tempdir,verbose = verbose)
  File "/projectnb/pegasus/nethome/kxw755/conda/envs/GeneExt/geneext/helper.py", line 79, in run_macs2
    ps.check_returncode()
  File "/nethome/kxw755/conda/envs/GeneExt/geneext/lib/python3.9/subprocess.py", line 460, in check_returncode
    raise CalledProcessError(self.returncode, self.args, self.stdout,
subprocess.CalledProcessError: Command '('macs2', 'callpeak', '-t', 'tmp/plus.bam', '-f', 'BAM', '--keep-dup', '20', '-q', '0.01', '--shift', '1', '--extsize', '100', '--broad', '--nomodel', '--min-length', '30', '-n', 'plus', '--outdir', 'tmp')' returned non-zero exit status 1.

Okay so Jill was getting this error too… lets try to fix this error.

# Reinstall numpy with a Compatible Version:
conda update numpy

# Verify version of numpy
python -c "import numpy; print(numpy.__version__)" #1.26.4

#Reinstall macs2 to Ensure Compatibility:
conda install -c bioconda macs2=2.2.7.1

# Update Pandas
conda update pandas

# check to see if they are compatable
python -c "import pandas as pd; import numpy as np; print(pd.__version__, np.__version__)"

Lets try running it again

rm -r tmp
python geneext.py -g test_data/annotation.gtf -b test_data/alignments.bam -o result.gtf --peak_perc 0

Output


      ____                 _____      _   
     / ___| ___ _ __   ___| ____|_  _| |_ 
    | |  _ / _ \ '_ \ / _ \  _| \ \/ / __|
    | |_| |  __/ | | |  __/ |___ >  <| |_ 
     \____|\___|_| |_|\___|_____/_/\_\__|
     
          ______    ___    ______    
    -----[______]==[___]==[______]===>----

    Gene model adjustment for improved single-cell RNA-seq data counting


╭──────────────────╮
│ Preflight checks │
╰──────────────────╯
Genome annotation warning: Could not find "gene" features in test_data/annotation.gtf! Trying to fix ...
╭───────────╮
│ Execution │
╰───────────╯
Running macs2 ... done
Filtering macs2 peaks ... done
Extending genes ... done
╭───────────╮
│ All done! │
╰───────────╯
Extended 23/35 genes
Median extension length: 931.0 bp

Yay! A successful installation! Now I can run this on the merged BAM output files from CellRanger.

Written on September 10, 2024