sequence feature location
genomic feature location
In practice, GENO advocates describing biology at the level of genomic features - i.e. define specific terms for genes as genomic features, and not duplicate representation of the loci where each gene resides. So we might define a class representing the human Shh gene as a 'genomic feature', but not parallel this with a 'human Shh gene locus' class. The utility of the 'genomic locus' class in the ontology is primarily to be clear about the distinction, but we would only use it in modeling data if absolutely needed.
For example, we would define an 'HLA gene block' as a subclass of 'genomic feature', and assert that HLA-A, HLA-B, and HLA-C genes are part/subsequences of this HLA gene block (as opposed to modeling this as an 'HLA locus' and asserting that the HLA-A, HLA-B, and HLA-C genes occupy this locus).
genomic location
1. A genomic location (aka locus) is defined by its begin and end coordinates on a reference genome, independent of a particular sequence that may reside there. In GENO, we say that a genomic location is occupied_by a 'sequence feature' - where the identity of this feature depends on both it sequence, and its location in the genome (i.e. the locus it occupies). For example, the 'ATG' sequence beginning the ORF of the human SHH gene shares the *same sequence* as the 'ATG' beginning the ORF of the human AKT gene. But these are *distinct sequence features* because they occupy different genomic locations.
2. A given genomic location (e.g. the human SHH gene locus) may be occupied by different alleles (e.g. different alleles of the SHH gene). Within the genome of a single diploid organism, there is potential for two alleles to exist at such a locus (i.e. two different versions of the SHH gene). And across genomes of all members of a species, many more alleles of the SHH gene may exist and occupy this same locus.
3. The notion of a genomic location in the realm of biological sequences is analogous to a BFO:spatiotemporal region in the realm of physical entities. A spatiotemporal region can be occupied_by physical objects, while a genomic location is occupied_by sequence features. Just as a spatiotemporal region is distinct from an object that occupies it, so too a genomic locus is distinct from a sequence feature that occupies it. As a more concrete example, consider the distinction between a street address and the building that occupies it as analogous to the relationship between a genomic location and the feature that resides there.
genomic locus
The location of a sequence feature in a genome, defined by its start and end position on some reference genomic coordinate system
In GENO, the notion of a Genomic Location (aka Genomic Locus) plays the same role as that of a FALDO:Region in the design pattern for describing the location of a feature of interest. We define this specific GENO class because the ontological nature of FALDO:Region class is not clear in the context of the BFO and SO-based GENO model. We will work to resolve these questions and ideally converge these concepts in the future.
We don't link a Genomic Location to a specific reference sequence because in the FALDO model (which GENO adopts with the exception of swapping GENO:Genomic Locus for FALDO:Region), allows the start and end positions of a region to be defined on separate reference sequences. So while a given Location is conceptually associated with a single reference, in practice it can be pragmatic to define start and stop on different references sequences.
VMC:Location