GOanna on CyVerse¶
Accessing GOanna in the Discovery Environment¶
- Create an account on CyVerse (free)
- Open the CyVerse Discovery Environment (DE) and login with your CyVerse credentials.
- If you are new to the Discovery Environment (DE) the user guide can be found here.
- Click on the ‘Data’ button at the left side of the screen to access your files/folders. Upload your data to the DE.
- To access the GOanna 2.0 app click on the ‘Apps’ button at the left side of the DE.
- Search for ‘GOanna” in the search bar at the top of the ‘apps’ window (see below). The contents of the folder will appear in the main pane of the window. The GOanna app is called ‘GOanna 2.0’; click on the name to open the app.
Find Apps Easily with ‘Communities’
The GOanna 2.0 app belongs to the ‘i5k’ and ‘AgBase’ communties. You can join either of these communities and they will appear in the left-hand pane of your ‘Apps’ window (see above).
To join a community click on the person icon in the top-right corner of the Discovery Environment window and select ‘Communities’. In the ‘Communities’ window choose ‘all communities’ from the drop-down list. A list of communities will appear in the main pane of this window. Select the one you wish to join by clicking on it and then clicking on the ‘join’ button.
Using the GOanna App¶
Launching the App¶
Analysis Name:GOanna_2.0_analysis1: This menu is used to name the job you will run so that you can find it later. Analysis Name: The default name is “GOanna_2.0_analysis1”. We recommend changing the ‘analysis1’ portion of this to reflect the data you are running.
Comments: (Optional) You can add additional information in the comments section to distinguish your analyses further.
Select output folder: This is where your results will be placed. The default (recommended) is your ‘analyses’ folder.
Retain Inputs: Enabling this flag will copy all the input files into the analysis result folder.
Selecting this option will rapidly consume your allocated space. It is not recommended. Your inputs will always remain available in the folder in which you stored them.
This menu is used to select the BLAST database and your input file.
BLAST database basename: BLAST databases are created by AgBase based upon proteins that have GO available and subsetted by phyla. We recommend selecting the database most closely related to the sequence used as input.
Peptide FASTA file: Use the Browse button on the right hand side to navigate to your Data folder and select your protein sequence file.
Use this menu to select your BLAST parameters.
Transfer GO with experimental evidence only: We strongly recommend selecting the “yes” option from the dropdown menu so that only GO annotations based experimental evidence codes will be transferred . This will ensure the best quality annotations for your data.
The remaining parameters are standard BLAST parameters, and their defaults can be seen by hovering your cursor over the blue i.
Determining BLAST Parameters to Use¶
BLAST parameters are contingent on the BLAST database used and the composition of the input file, and so will change for each analysis.
Make a subset of 100 randomly selected sequences from your larger dataset and use this as the input for GOanna to test for parameters that give good alignments. The Split FASTA file app in the CyVerse Discovery Environment can be used to make a subsetted file.
- To test for good parameters use GOanna (either in CyVerse or online at AgBase) by selecting the same database you will use and setting relaxed parameters (e.g., in the CyVerse instance of GOanna, use defaults).
- Once you have run your subsetted file, use the html file to view alignments, select good alignments and note the parameters for these.
Parse query and subject bar delimited sequence identifiers: This option should be selected if you are using a protein fasta file from NCBI. These fasta files have headers with pipes that will not format correctly otherwise.
This tool uses stand alone BLAST and interprets FASTA deflines accordingly. NCBI ‘bar’-delimited deflines can be interpreted correctly using the ‘parse-deflines’ option. If the ‘parse-deflines’ option is not checked then BLAST will interpret the ID to be everything before the first space.
This menu is used to format your GO annotation results into a standard gene association file format.
Output File basename: This will be the prefix for your output files. A good name choice is to use fasta file name.
Database of query ID: Use the database that sequences were obtained from (Genbank), or a recognizable project name if these sequences are not in a database (e.g., i5k project or Smith Lab).The default is ‘user_input_db’.
‘Assigned by’ field of your GAF output file: Enter your name. This field is used to track who made the annotations. The default is ‘user’.
Taxon ID of the query species: Enter the NCBI taxon number for your species. This can be found by searching for your species name (common or scientific) in the NCBI taxon database. The default is “0000”.
Understanding Your Results¶
If all goes well, you should get 4 output files and a ‘logs’ folder.
<basename>.asn: This is standard BLAST output format that allows for conversion to other formats. You probably won’t need to look at this output.
<basename>.html: This output displays in your web browser so that you can view pairwise alignments to determine BLAST parameters.
<basename>.tsv: This is the tab-delimited BLAST output that can be opened and sorted in Excel to determine BLAST parameter values. The file contains the following columns:
- query ID
- query length
- query start
- query end
- subject ID
- subject length
- subject start
- subject end
- percent ID
- query coverage
- percent positive ID
- gap openings
- total gaps
- raw score
For more information on the BLAST output parameters see the NCBI BLAST documentation.
<basename>_goanna_gaf.tsv: This is the standard tab-separated GO annotation file format that is used by the GO Consortium and by software tools that accept GO annotation files to do GO enrichment.
If you see more files in your output folder there may have been an error in the analysis or there may have been no GO to transfer. Check the ‘condor_stderr’ file in the analysis output ‘logs’ folder.