Sequence Ontology Style Guide
This guide explains the conventions used in the development and maintenance of SO and SOFA.
What is a SO term?
The purpose of SO is to standardize genome annotation. A SO term is a concept that represents a feature on a nucleotide or protein sequence. The features include both raw data such as 'read' and 'contig' and also interpreted data such as gene models; 'gene', 'transcript', 'exon' and 'splice_acceptor'. SO also includes terms describing chromosome variation and the consequences of mutation. SOFA is a subset of SO. It stands for Sequence Ontology Feature Annotation. It is the features than can be directly located to biological sequence and therefore most likely to be used with a fully or partially automated annotation effort.
Terms
The following rules are applied to naming of terms in SO and SOFA. The terms must be computer friendly as they may be used to generate class names or variables etc in programming languages. Currently SOFA abides by all of the naming rules, but SO has yet to catch up.
Spelling conventions:
When there are differences in the accepted spelling between English and US usage, the US form is used.
Phrase spacing:
Terms do not include white space. The words in a phrase are separated by underscores, e.g. binding_site.
Case:
Terms are always in lowercase except where demanded by context e.g. mRNA
Numbers:
Numbers are spelled out in full where appropriate e.g. five_prime_UTR. Terms do not start with a number. Sometimes the Arabic number is part of the accepted name, so in these cases the number is allowed. Where a term would normally start with a number, ie 28S RNA, the stem is used first RNA_28S.
Abbreviations:
If there is a common abbreviation, it is used for the name of the term, eg UTR.
Symbols:
Symbols are generally spelled out in full. ' = prime, + = plus, -= minus. Greek letters should be spelled out e.g. gamma. Periods/points, slashes, and hyphens are not allowed, underscores used instead. Brackets (){}[] are not allowed.
Synonyms
Synonyms allow us to record the variant term names that have the same meaning. They facilitate searching the ontology.
Types of synonym:
- The long version of the words in the abbreviated phrase spelled out.
- Different words that mean the same thing.
Synonym rules:
- There is no limit on synonym number.
- One synonym can be used more than once.
- Synonyms do not have to be computer friendly. They can begin with numbers and include punctuation such as hyphens.
- Synonyms are always in lowercase except where demanded by context e.g. messenger RNA
Cross-referencing other databases
General database cross references (general dbxrefs) are used whenever a SO term has an identical meaning to an object in another database. SOFA is a subset of SO. The identical terms in SO and SOFA are marked with the dbxref SOFA:SOFA.
Definitions
The following rules and styles apply to defenitions within SO.
- Each term should have a definition.
- A definition must have a dbxref.
- Definitions are agreed by the song-devel group.
- If someone volunteers a definition it is cross referenced with the lower case initials of the volunteer (e.g. a definition by Michael Ashburner has the dbxref SO:ma).
- If the definition comes from a book, the ISBN number is used as a cross reference (e.g. a dbxref to the Oxford Dictionary of Molecular Biology would be ISBN:0198506732).
- If the definition comes from a paper, the PubMed ID is used (e.g. PMID:119108).
- Occasionally the best definition is found on a website. If this is the case, the dbxref is the URL (e.g. http:aptamer.icmb.utexas.edu)
Understanding relationships in SO
Currently there are 3 types of relationship in SO; isa, part_of and derived_from