InterProScan on the ARS Ceres HPC

About Ceres/Scinet

Running InterProScan on Ceres/Scinet

Important

Running InterProScan with Data

Getting the Help and Usage Statement

agbase_interproscan -h

See iprsusage

Example batch job submission bash script (e.g. agbase_interproscan-job.sh):

#! /bin/bash
module load agbase
agbase_interproscan -i AROS_10.faa -d outdir -f tsv,json,xml,html,gff3,svg -g -p -n Monica -x 109069 -D i5k

Submitting the batch job:

sbatch agbase_interproscan-job.sh
Command Explained

-i AROS_10.faa: local path to input FASTA file.

-d outdir: output directory name

-f tsv,json,xml,html,gff3,svg: desired output file formats

-g: tells the tool to perform GO annotation

-p: tells tool to perform pathway annoation

-n Monica: name of biocurator to include in column 15 of GAF output file

-x 109069: taxon ID of query species to be used in column 13 of GAF output file

-D i5k: database of query accession to be used in column 1 of GAF output file

Understanding Your Results

InterProScan outputs: https://github.com/ebi-pf-team/interproscan/wiki/OutputFormats

Default - <basename>.gff3 - <basename>.tsv - <basename>.xml

Optional - <basename>.json - <basename>.html.tar.gz - <basename>.svg.tar.gz

Parser Outputs

<basename>_gaf.txt: -This table follows the formatting of a gene association file (gaf) and can be used in GO enrichment analyses.

<basename>_acc_go_counts.txt: -This table includes input accessions, the number of GO IDs assigned to each accession and GO ID names. GO IDs are split into BP (Biological Process), MF (Molecular Function) and CC (Cellular Component).

<basename>_go_counts.txt: -This table counts the numbers of sequences assigned to each GO ID so that the user can quickly identify all genes assigned to a particular function.

<basename>_acc_interpro_counts.txt: -This table includes input accessions, number of InterPro IDs for each accession, InterPro IDs assigned to each sequence and the InterPro ID name.

<basename>_interpro_counts.txt: -This table counts the numbers of sequences assigned to each InterPro ID so that the user can quickly identify all genes with a particular motif.

<basename>_acc_pathway_counts.txt: -This table includes input accessions, number of pathway IDs for the accession and the pathway names. Multiple values are separated by a semi-colon.

<basename>_pathway_counts.txt: -This table counts the numbers of sequences assigned to each Pathway ID so that the user can quickly identify all genes assigned to a pathway.

<basename>.err: -This file will list any sequences that were not able to be analyzed by InterProScan. Examples of sequences that will cause an error are sequences with a large run of Xs.

If you see more files in your output folder there may have been an error in the analysis or there may have been no GO to transfer. Contact us.