Request for feedback
As a prelude to a more official release, we hereby invite the OBO community to examine and provide feedback for the refactored Sequence Ontology (SO) and its sibling Molecular Sequence Ontology (MSO), to which we have devoted a great deal of time and effort over some years now. As we wish to gain the support of community members who might be impacted by an official transition, we of course are open to comments and suggestions with regard to the representational content, and there are certainly remaining errors as well as more work to do, but we believe our work is at a stage at which inspection by the broader OBO community is a prudent next step.
One of the main motivations for this work was an attempt at deconflation of sequence entities as independent continuants (ICs) and as generically dependent continuants (GDCs): Although it’s been asserted in several papers that SO classes are GDCs, the definitions of most of the classes refer to ICs, and attributes/qualities are defined for many of the classes (which implies that they’re ICs). On the other hand, we assert that some of the classes don’t really make sense (or at least are unintuitive) as ICs (e.g., types of reads, contigs, assemblies). Our answer to this has been the creation of a Molecular Sequence Ontology of sequence entities as ICs and a largely parallel Sequence Ontology whose sequence entities are GDCs that are formally defined as being generically dependent on their corresponding MSO classes; the SO thus largely gets its structure via automated reasoning over these logical definitions. (The refactored SO additionally contains some of the classes that are unintuitive as ICs.) We envision the MSO to be used especially by ontologists and curators in need of IC sequence entities in their representations, and the SO to continue to be the term source for sequence annotators, as we hope to minimize disruption to that important community.
A second main motivation for this work has been the alignment with and integration of the SO into the OBO ecosystem. The current public SO is semantically rich in terms of its many class logical definitions, but it’s siloed apart from other OBOs, with its own top-level structure. With this refactoring work, we have integrated the classes of both the MSO and SO into the IC, DC, and occurrent categories of the Basic Formal Ontology. Additionally, the sequence entity classes of the MSO have mostly been integrated into high-level classes of the Chemical Entities of Biological Interest (ChEBI) ontology. Thus, although the overwhelming number of current sequence entity classes have been retained, the upper-level structuring of the MSO and SO is significantly different from the current public SO (which is why we’ve been referring to this work as refactoring).
There’s been a lot of other work in addition to the work described above, including editing of names and textual definitions to better conform to OBO recommendations, additional logical definitions, and newly created classes, as well as substantial work in the methodology of automatically generating the SO from the MSO so that the two ontologies don’t have to be independently maintained (which we thought would be a laborious and error-prone approach that we thus sought to avoid). However, we’ve attempted to minimize substantive semantic changes to classes so as to avoid disruption.
There’s a GitHub site for this project, at https://github.com/The-Sequence-Ontology/MSO. We recommend starting inspection with the MSO, using the MSO.owl file. (The SO along with the MSO can be examined in MSO-SO_merged.owl.) These ontologies are pre-reasoned, so running a reasoner isn’t necessary at this stage; however, for anyone wishing to try reasoning, we’ve found the HermiT reasoner to perform well with these ontologies, so we can recommend that one.
A good place to start is at the top-level sequence entity, ‘biological sequence entity’ (which was so named to correspond to the top-level ChEBI class ‘chemical entity’). One axis of classification is among three primary types of sequence entities: sequence boundaries (for zero-length entities such as breakpoints, junctions, termini, and deletions), sequence units (for one-length sequence residues), and sequence unit collections (each of which is a collection of sequence units either from the same molecular entity or from different ones). Two primary categories of sequence unit collections are sequence molecular entities (an extension of ChEBI molecular entities) and sequence molecular entity extents, which are continuous stretches of two or more sequence units, either as complete sequence molecular entities or as proper parts of them; sequence molecular entity extents in turn can be sequence molecular entity regions (which are proper parts of other extents) and sequence molecular entity chains (which are not). Another primary axis of classification is by type of sequence unit, divided into amino acid sequence entities and nucleotide sequence entities, the latter of which are further divided. Finally, in addition to all of the types of sequence entities (which are the focus of the ontologies), there are sequence-entity-related qualities, dispositions, and occurrents, categorized within the corresponding BFO classes.
We hope this brief overview is helpful, but let us know if there are any questions, and we look forward to your feedback.
Cheers,
Mike Bada
Mike Sinclair
Karen Eilbeck