GOanna on the ARS Ceres HPC

About Ceres/Scinet

Accessing the databases

GOanna requires access to some public databases that are already available on Ceres. These need to be in your working directory when you run the program. The best way to set this up is to create symbolic links to the databases from your working directory.

  1. agbase_database: species subset to run BLAST against
ln -s /reference/data/iplant/2019-09-16/agbase_database
  1. go_info: Uniprot GO annotations
ln -s /reference/data/iplant/2019-09-16/go_info

Running GOanna on Ceres

Running programs on Ceres/Scinet

  • You’ll need to run GOanna either in interactive mode or batch mode.
  • For interactive mode, use the salloc command.
  • For batch mode, you’ll need to write a batch job submission bash script.

Running GOanna in interactive mode

Loading the module

The Scinet VRSC has installed the GOanna module. To load the module in interactive mode, run the command

module load agbase

Getting the Help and Usage Statement

goanna -h

See Help and Usage Statement

GOanna has three required parameters:

-a BLAST database basename (acceptable options are listed in the help/usage)
-c peptide FASTA file to BLAST
-o output file basename

Example Command

goanna -c AROS_10.faa -a invertebrates -o goanna_output -p -g 70 -s 900 -d RefSeq -u Monica -x 37344

Running GOanna in batch mode

Running programs on Ceres/Scinet in batch mode

Example batch job submission bash script (e.g. goanna-job.sh):

#! /bin/bash
module load agbase
goanna -c AROS_10.faa -a invertebrates -o goanna_output -p -g 70 -s 900 -d RefSeq -u Monica -x 37344

Submitting the batch job:

sbatch goanna-job.sh

GOanna Commands Explained

-a invertebrates: GOanna BLAST database to use–first of three required options.

-c AROS_10.faa: input file (peptide FASTA)–second of three required options

-o goanna_output: output file basename–last of three required options

-p: our input file has NCBI deflines. This specifies how to parse them.

-g 70: tells GOanna to keep only those matches with at least 70% identity

-s 900: tells GOanna to keep only those matches with a bitscore above 900

-d RefSeq: database of query ID. This will appear in column 1 of the GAF output file.

-u “Monica”: name to appear in column 15 of the GAF output file

-x 37344: NCBI taxon ID of input file species will appear in column 13 of the GAF output file

Understanding Your Results

If all goes well, you should get 4 output files:

<basename>.asn: This is standard BLAST output format that allows for conversion to other formats. You probably won’t need to look at this output.

<basename>.html: This output displays in your web browser so that you can view pairwise alignments to determine BLAST parameters.

<basename>.tsv: This is the tab-delimited BLAST output that can be opened and sorted in Excel to determine BLAST parameter values. The file contains the following columns:

  • Query ID
  • query length
  • query start<
  • query end
  • subject ID
  • subject length
  • subject start
  • subject end
  • e-value
  • percent ID
  • query coverage
  • percent positive ID
  • gap openings
  • total gaps
  • bitscore
  • raw score

For more information on the BLAST output parameters see the NCBI BLAST documentation.

<basename>_goanna_gaf.tsv: This is the standard tab-separated GO annotation file format that is used by the GO Consortium and by software tools that accept GO annotation files to do GO enrichment.

If you see more files in your output folder there may have been an error in the analysis or there may have been no GO to transfer. Contact us.