SOBA - Sequence Ontology Bioinformatics Analysis

From SO Wiki

Jump to: navigation, search

Contents

Introduction

The GFF3 standard is widely used for genomic data and was developed and is maintained by the GMOD community. It is based upon GFF which was originally developed during the human genome sequencing project to compare genome annotations. Unlike GFF however, GFF3 uses an ontology (SO) to type the features.

Many genomic tools either use or produce files of GFF3. For example, and automated genome annotation tool such as MAKER will output GFF3, which can then be browsed using another GMOD tool such as GBrowse. The aim of SOBA is to provide an easy to use tool that will summarize the contents of a GFF3 file both with regards to the sequence and the ontology.

SOBA is available http://www.sequenceontology.org/cgi-bin/soba.cgi

Who is the audience for SOBA?

SOBA is intended to be a tool for people who are dealing with genomic sequence annotation, and want a summary of the contents of their annotation files. For example: SOBA would be a useful tool at an annotation jamboree for a newly sequenced organism, where the annotations are being created; SOBA would help those developing annotation tools quickly evaluate updates to their tool; SOBA provides those utilizing comparative genomics analysis get a high-level overview of the genome of two organisms. SOBA complements genome browsers in that it provides a summary of all the features annotated in the genome.

Getting started

A demo.

A demo is provided on the SOBA site using GFF3 from 2 established model organism databases: FlyBase and SGD. The demo is accessed from the main SOBA page.

Uploading a file

One or more files may be uploaded into SOBA for one analysis. There is a limit of 1.5G per analysis. Select files from your local machine using the 'select files button' . There is an option, remove file to delete un-needed files from the analysis. When the list of files is complete press the 'upload' button.

Viewing the analysis – step by step

SOBA provides a list of all of the genomic features in the GFF3 file and all of the sources of these features. You may choose all features and all sources, or select some of those which you are interested in. SOBA will display results for those that you select.

Feature Counts

This is the first summary and provides simple counts for each of the features in your analysis.


http://malachite.genetics.utah.edu/img/feature_counts.png



Feature Lengths

SOBA provides simple statistics on the features length: minimum length, maximum length, mean length, median length and footprint. The footprint is the percentage of the genome sequence annotated as that feature.



http://malachite.genetics.utah.edu/img/feature_lengths.png


Feature Distributions

SOBA provides a histogram for each kind of feature, showing the distribution of the length. When you select a feature, a slideshow will pop up displaying the graph. This slide show is navigable using 'Previous' and 'next' .

list of feature distributions


http://malachite.genetics.utah.edu/img/feature_distributions.png

slideshow of a distribution


http://malachite.genetics.utah.edu/img/feature_distribution_slideshow.png


Ontology Usage Graphs

SOBA provides a graph of the terms from the Sequence Ontology used by each source in the GFF3 file. The feature used is red, and the parent terms are black. Relationships are shown.

SO terms used in an annotation


http://malachite.genetics.utah.edu/img/SGD_graph.gif


Density Graphs

SOBA provides information about intron density, which is a measure of the number of introns per protein, described by Yandell et al. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1386723/pdf/pcbi.0020015.pdf Large-scale trends in the evolution of gene structures within 11 animal genomes. PLoS Comput Biol. 2006 Mar;2(3):e15. Epub 2006 Mar 3. It is calculated from the number of coding introns divided by the length of the protein it encodes.


http://malachite.genetics.utah.edu/img/SGD_density.gif


Saving the analysis

The results of SOBA may be saved in the following formats:

  • pdf - allows you to save a tabular report complete with graphical figures in a formated document.
  • text - allows you to save a tab delimited data file of the analysis.
  • html - allows you to produce a webpage of results.
  • gif - allows you to produce a downloadable image for each analysis, accessed via a tabular webpage.

Select the radio button and 'generate report'.

Trouble shooting

If you are certain you are using correctly formatted GFF3, and you have problems with this tool, please submit a BUG REPORT to the tracker explaining the bug and the conditions where it occurred.To check for inconsistencies in your GFF3 file, please use a GFF3 validator such as that provided by WormBase.

If you would like to see SOBA do more cool stuff, please submit a FEATURE REQUEST to the tracker.

Using the Sequence Ontology tracker system.

The SO tracker is hosted at sourceforge. Artifacts can be added to the tracker without logging in. Logging in to sourceforge adds the benefit of communication with the developers via the tracker, and notification of the progress of your request. The SO trackers include a feature request and a bug tracker, where you can leave comments about SOBA. To add an artifact to the tracker, choose the link ‘add new’, which will take you to a form page.

Screen shot of adding a new artefact


http://malachite.genetics.utah.edu/img/add_new_bug.png



Choose the category ‘soba’, provide a concise description of the bug or feature request, and a longer description. If it is a bug include details such as the circumstances that caused the bug, the system and the browser you used.

Screen shot of choosing the soba category in a tracker request


http://malachite.genetics.utah.edu/img/choose_category.png


How to cite SOBA

To cite SOBA, please refer to this paper: SOBA: sequence ontology bioinformatics analysis Barry Moore, Guozhen Fan and Karen Eilbeck (2010) Nucleic Acids Research Advance Access published online on May 21, 2010

Personal tools