GENO is an OWL model of genotypes, their more fundamental sequence components, and links to related biological and experimental entities. At present many parts of the model are exploratory and set to undergo refactoring. In addition, many classes and properties have GENO URIs but are place holders for classes that will be imported from an external ontology (e.g. SO, ChEBI, OBI, etc). Furthermore, ongoing work will implement a model of genotype-to-phenotype associations. This will support description of asserted and inferred relationships between a genotypes, phenotypes, and environments, and the evidence/provenance behind these associations. Documentation is under development as well, and for now a slidedeck is available at http://www.slideshare.net/mhb120/brush-icbo-2013 GENO ontology 2022-03-05 Used to annotation axioms that define identity criteria for instances of a class. is_identity_criteria proabalistic_quantifier Used to flag terms that are created for organizational purposes, e.g. to support groupings useful for defining GENO-based data models. mixin gene symbol editor preferred term The concise, meaningful, and human-friendly name for a class or property preferred by the ontology developers. (US-English) PERSON:Daniel Schober GROUP:OBI:<http://purl.obolibrary.org/obo/obi> editor preferred term example of usage A phrase describing how a term should be used and/or a citation to a work which uses it. May also include other kinds of examples that facilitate immediate understanding, such as widely know prototypes or instances of a class, or cases where a relation is said to hold. PERSON:Daniel Schober GROUP:OBI:<http://purl.obolibrary.org/obo/obi> example of usage in branch An annotation property indicating which module the terms belong to. This is currently experimental and not implemented yet. GROUP:OBI OBI_0000277 in branch has curation status PERSON:Alan Ruttenberg PERSON:Bill Bug PERSON:Melanie Courtot has curation status definition The official definition, explaining the meaning of a class or property. Shall be Aristotelian, formalized and normalized. Can be augmented with colloquial definitions. 2012-04-05: Barry Smith The official OBI definition, explaining the meaning of a class or property: 'Shall be Aristotelian, formalized and normalized. Can be augmented with colloquial definitions' is terrible. Can you fix to something like: A statement of necessary and sufficient conditions explaining the meaning of an expression referring to a class or property. Alan Ruttenberg Your proposed definition is a reasonable candidate, except that it is very common that necessary and sufficient conditions are not given. Mostly they are necessary, occasionally they are necessary and sufficient or just sufficient. Often they use terms that are not themselves defined and so they effectively can't be evaluated by those criteria. On the specifics of the proposed definition: We don't have definitions of 'meaning' or 'expression' or 'property'. For 'reference' in the intended sense I think we use the term 'denotation'. For 'expression', I think we you mean symbol, or identifier. For 'meaning' it differs for class and property. For class we want documentation that let's the intended reader determine whether an entity is instance of the class, or not. For property we want documentation that let's the intended reader determine, given a pair of potential relata, whether the assertion that the relation holds is true. The 'intended reader' part suggests that we also specify who, we expect, would be able to understand the definition, and also generalizes over human and computer reader to include textual and logical definition. Personally, I am more comfortable weakening definition to documentation, with instructions as to what is desirable. We also have the outstanding issue of how to aim different definitions to different audiences. A clinical audience reading chebi wants a different sort of definition documentation/definition from a chemistry trained audience, and similarly there is a need for a definition that is adequate for an ontologist to work with. PERSON:Daniel Schober GROUP:OBI:<http://purl.obolibrary.org/obo/obi> definition editor note An administrative note intended for its editor. It may not be included in the publication version of the ontology, so it should contain nothing necessary for end users to understand the ontology. PERSON:Daniel Schober GROUP:OBI:<http://purl.obofoundry.org/obo/obi> editor note term editor Name of editor entering the term in the file. The term editor is a point of contact for information regarding the term. The term editor may be, but is not always, the author of the definition, which may have been worked upon by several people 20110707, MC: label update to term editor and definition modified accordingly. See https://github.com/information-artifact-ontology/IAO/issues/115. PERSON:Daniel Schober GROUP:OBI:<http://purl.obolibrary.org/obo/obi> term editor alternative term An alternative name for a class or property which means the same thing as the preferred name (semantically equivalent) PERSON:Daniel Schober GROUP:OBI:<http://purl.obolibrary.org/obo/obi> alternative term definition source Formal citation, e.g. identifier in external database to indicate / attribute source(s) for the definition. Free text indicate / attribute source(s) for the definition. EXAMPLE: Author Name, URI, MeSH Term C04, PUBMED ID, Wiki uri on 31.01.2007 PERSON:Daniel Schober Discussion on obo-discuss mailing-list, see http://bit.ly/hgm99w GROUP:OBI:<http://purl.obolibrary.org/obo/obi> definition source has obsolescence reason Relates an annotation property to an obsolescence reason. The values of obsolescence reasons come from a list of predefined terms, instances of the class obsolescence reason specification. PERSON:Alan Ruttenberg PERSON:Melanie Courtot has obsolescence reason curator note An administrative note of use for a curator but of no use for a user PERSON:Alan Ruttenberg curator note term tracker item the URI for an OBI Terms ticket at sourceforge, such as https://sourceforge.net/p/obi/obi-terms/772/ An IRI or similar locator for a request or discussion of an ontology term. Person: Jie Zheng, Chris Stoeckert, Alan Ruttenberg Person: Jie Zheng, Chris Stoeckert, Alan Ruttenberg The 'tracker item' can associate a tracker with a specific ontology term. term tracker item ontology term requester The name of the person, project, or organization that motivated inclusion of an ontology term by requesting its addition. Person: Jie Zheng, Chris Stoeckert, Alan Ruttenberg Person: Jie Zheng, Chris Stoeckert, Alan Ruttenberg The 'term requester' can credit the person, organization or project who request the ontology term. ontology term requester is denotator type Relates an class defined in an ontology, to the type of it's denotator In OWL 2 add AnnotationPropertyRange('is denotator type' 'denotator type') Alan Ruttenberg is denotator type imported from For external terms/classes, the ontology from which the term was imported PERSON:Alan Ruttenberg PERSON:Melanie Courtot GROUP:OBI:<http://purl.obolibrary.org/obo/obi> imported from expand expression to ObjectProperty: RO_0002104 Label: has plasma membrane part Annotations: IAO_0000424 "http://purl.obolibrary.org/obo/BFO_0000051 some (http://purl.org/obo/owl/GO#GO_0005886 and http://purl.obolibrary.org/obo/BFO_0000051 some ?Y)" A macro expansion tag applied to an object property (or possibly a data property) which can be used by a macro-expansion engine to generate more complex expressions from simpler ones Chris Mungall expand expression to expand assertion to ObjectProperty: RO??? Label: spatially disjoint from Annotations: expand_assertion_to "DisjointClasses: (http://purl.obolibrary.org/obo/BFO_0000051 some ?X) (http://purl.obolibrary.org/obo/BFO_0000051 some ?Y)" A macro expansion tag applied to an annotation property which can be expanded into a more detailed axiom. Chris Mungall expand assertion to first order logic expression PERSON:Alan Ruttenberg first order logic expression antisymmetric property part_of antisymmetric property xsd:true Use boolean value xsd:true to indicate that the property is an antisymmetric property Alan Ruttenberg antisymmetric property OBO foundry unique label An alternative name for a class or property which is unique across the OBO Foundry. The intended usage of that property is as follow: OBO foundry unique labels are automatically generated based on regular expressions provided by each ontology, so that SO could specify unique label = 'sequence ' + [label], etc. , MA could specify 'mouse + [label]' etc. Upon importing terms, ontology developers can choose to use the 'OBO foundry unique label' for an imported term or not. The same applies to tools . PERSON:Alan Ruttenberg PERSON:Bjoern Peters PERSON:Chris Mungall PERSON:Melanie Courtot GROUP:OBO Foundry <http://obofoundry.org/> OBO foundry unique label has ID digit count Ontology: <http://purl.obolibrary.org/obo/ro/idrange/> Annotations: 'has ID prefix': "http://purl.obolibrary.org/obo/RO_" 'has ID digit count' : 7, rdfs:label "RO id policy" 'has ID policy for': "RO" Relates an ontology used to record id policy to the number of digits in the URI. The URI is: the 'has ID prefix" annotation property value concatenated with an integer in the id range (left padded with "0"s to make this many digits) Person:Alan Ruttenberg has ID digit count has ID range allocated Datatype: idrange:1 Annotations: 'has ID range allocated to': "Chris Mungall" EquivalentTo: xsd:integer[> 2151 , <= 2300] Relates a datatype that encodes a range of integers to the name of the person or organization who can use those ids constructed in that range to define new terms Person:Alan Ruttenberg has ID range allocated to has ID policy for Ontology: <http://purl.obolibrary.org/obo/ro/idrange/> Annotations: 'has ID prefix': "http://purl.obolibrary.org/obo/RO_" 'has ID digit count' : 7, rdfs:label "RO id policy" 'has ID policy for': "RO" Relating an ontology used to record id policy to the ontology namespace whose policy it manages Person:Alan Ruttenberg has ID policy for has ID prefix Ontology: <http://purl.obolibrary.org/obo/ro/idrange/> Annotations: 'has ID prefix': "http://purl.obolibrary.org/obo/RO_" 'has ID digit count' : 7, rdfs:label "RO id policy" 'has ID policy for': "RO" Relates an ontology used to record id policy to a prefix concatenated with an integer in the id range (left padded with "0"s to make this many digits) to construct an ID for a term being created. Person:Alan Ruttenberg has ID prefix elucidation person:Alan Ruttenberg Person:Barry Smith Primitive terms in a highest-level ontology such as BFO are terms which are so basic to our understanding of reality that there is no way of defining them in a non-circular fashion. For these, therefore, we can provide only elucidations, supplemented by examples and by axioms elucidation has associated axiom(nl) Person:Alan Ruttenberg Person:Alan Ruttenberg An axiom associated with a term expressed using natural language has associated axiom(nl) has associated axiom(fol) Person:Alan Ruttenberg Person:Alan Ruttenberg An axiom expressed in first order logic using CLIF syntax has associated axiom(fol) is allocated id range Relates an ontology IRI to an (inclusive) range of IRIs in an OBO name space. The range is give as, e.g. "IAO_0020000-IAO_0020999" PERSON:Alan Ruttenberg Add as annotation triples in the granting ontology is allocated id range has ontology root term Ontology annotation property. Relates an ontology to a term that is a designated root term of the ontology. Display tools like OLS can use terms annotated with this property as the starting point for rendering the ontology class hierarchy. There can be more than one root. Nicolas Matentzoglu has ontology root term may be identical to A annotation relationship between two terms in an ontology that may refer to the same (natural) type but where more evidence is required before terms are merged. David Osumi-Sutherland #40 VFB Edges asserting this should be annotated with to record evidence supporting the assertion and its provenance. may be identical to scheduled for obsoletion on or after Used when the class or object is scheduled for obsoletion/deprecation on or after a particular date. Chris Mungall, Jie Zheng https://github.com/geneontology/go-ontology/issues/15532 https://github.com/information-artifact-ontology/ontology-metadata/issues/32 GO ontology scheduled for obsoletion on or after has axiom id Person:Alan Ruttenberg Person:Alan Ruttenberg A URI that is intended to be unique label for an axiom used for tracking change to the ontology. For an axiom expressed in different languages, each expression is given the same URI has axiom label term replaced by Use on obsolete terms, relating the term to another term that can be used as a substitute Person:Alan Ruttenberg Person:Alan Ruttenberg Add as annotation triples in the granting ontology term replaced by begin end location The reference is the resource that the position value is anchored to. For example, a contig or chromosome in a genome assembly. reference is part of has part A relation used to link sequence entities (sequences, features, qualified features, and collections thereof) to their 'attributes'. Used in lieu of RO/BFO has_quality as this relation is definend to apply to independent contiinuant bearers, wheras sequence entities are generically dependent continuants. http://purl.obolibrary.org/obo/so_has_quality has_sequence_attribute A relation between a material information bearer or material genetic sequence bearer and generically dependent continuant that carries information or sequence content that the bearer encodes materializes Shortcut relation expanding to bearer_of some (concretizes some . . . ), linking a material information bearer or sequence macromolecule to some ICE or GDC sequence. bears_concretization_of is_genotype_of A relationship that holds between a biological entity and some level of genetic variation present in its genome. This relation aims to be equally as broad/inclusive as RO:0002200 ! has_phenotype. The biological entity can be an organism, a group of organism that share common genotype, or organism-derived entities such as cell lines or biospecimens. The genotype can be any of the various flavors of genotypes/allelotypes defined in GENO (intrinsic genotype, extrinsic genotype, effective genotype), or any genetic variation component of a genotype including variant alleles or sequence alterations. has_genotype An antisymmetric, irreflexive (normally transitive) relation between a whole and a distinct part (source: SIO) No proper part relation anymore in RO/BFO? http://semanticscience.org/resource/SIO_000053 has_proper_part A relationship between an entity that carries a sequence (e.g. a sequence feature or collection), and the sequence it bears. has_sequence_component has_state VMC:state 'Sequence' in the context of GENO is an abstract entity representing an ordered collection of monomeric units as carried in a biological macromolecule. has_sequence A geno:intrinnsic genotype 'specifies' a SO:genome. A geno:karyotype 'specifies' a geno:karyotype feature collection. A relationship between an information content entity representing a specification, and the entity it specifies. obsolete_specifies Created subproperties 'approximates_sequence' and 'resolves to sequence'. Genotypes and other sequence variant artifacts are not always expected to completely specify a sequence, but rather provide some approximation based on available knowledge. The 'resolves_to_sequence' property can be used when the sequence variant artifact is able to completely resolve a sequence, and the 'approximates_sequence' property can be used when it does not. obsolete_approximates_sequence Created subproperties 'approximates_sequence' and 'resolves to sequence'. Genotypes and other sequence variant artifacts are not always expected to completely specify a sequence, but rather provide some approximation based on available knowledge. The 'resolves_to_sequence' property can be used when the sequence variant artifact is able to completely resolve a sequence, and the 'approximates_sequence' property can be used when it does not. obsolete_resolves_to_sequence An asymmetric, irreflexive (normally transitive) relation between a part and its distinct whole. http://semanticscience.org/resource/SIO_000093 is_proper_part_of is_sequence_of is_subject_of obsolete_is_specified_by shortcut relation used to link a phenotype directly to a genotype of an organism is_phenotype_of_organism_with_genotype is_phenotype_with_genotype phenotype_has_genotype Might expand to something like: phenotype and (is_phenotype_of some (organism and (has_part some ('material genome' and (is_subject_of some (genome and (is_specified_by some genotype))))))) obsolete_is_phenotype_of_genotype A relation to link variant loci, phenotypes, or disease to the type of inheritance process they are involved in, based on how the genetic interactions between alleles at the causative locus determine the pattern of inheritance of a specific phenotype/disease from one generation to the next. Exploratory/temporary property, as we formalize our phenotypic inheritance model. obsolete_participates_in_inheritance_process A relation between a sequence entity (i.e. a sequence, feature, or qualified feature) and a part of this entity that is variant in terms of its sequence, position, or expression. has_variant_part is_variant_part_of A relation between a sequence entity (i.e. a sequence, feature, or qualified feature) and a part of this entity that is not variant. has_reference_sequence_part has_reference_part is_reference_part_of <fgf8a^ti282a> is_allele_of the 'danio rerio fgf8a' gene locus. A relation linking an instance of a variable feature (aka an allele) to a genomic location/locus it occupies. This is typically a gene locus, but a feature may be an allele of other types of named loci such as QTLs, or alleles of some unnamed locus of arbitrary size. Domain = allele Range = genomic locus (but in practice it is common to use a punned gene class IRI as the subject of this relation). Note that the allele <fgf8a^ti282a> is not necessarily an instance of the danio rerio fgf8a gene class, given that we adopt the SO definition of genes as 'producing a functional product'. If the <fgf8a^ti282a> allele is nonfunctional or null, it is an allele_of the danio rerio fgf8a gene class, but not an instance (rdf:type) of this class. It would, however, bean instance of a 'danio rerio fgf8a gene allele' class - because being a 'gene allele' as defined in GENO requires only occupying the genomic position where for a gene, but not necessarily producing a functional product. is_sequence_variant_of To allow users to make important distinctions in discourse and modeling, GENO clearly separates the notions/levels of 'biological sequence', 'sequence feature', and 'sequence location' ('genomic locus' when found in a genome). This sets up an important terminological nuance when it comes to alleles, where we believe it correct to say that a particular genomic feature is an alleles_of some genomic locus (as opposed to an allele_of some sequence or some feature). This is typically a gene locus, but even insertions falling outside of genes are considered alleles_of the locus they alter (e.g. alleles of other types of named loci such as QTLs, or alleles of some unnamed locus of arbitrary size). While conceptually it is most correct to say features are alleles_of some genomic locus, it is common practice to say that they are alleles of the class of feature defined to reside at that locus (typically a gene). Accordingly, we may write things like "fgf8a<ti282a> is an allele of the Danio rerio fgf8a gene", and we may create data where fgf8a<ti282a> is asserted as an allele_of the fgf8a gene class IRI. But here we mean more precisely that it is an allele of the locus at which the fgf8a gene resides. Allowing for this means that we dont have to create 'feature-based location/locus' terms mirroing all feature class terms already in exiistence (e.g. for every gene). It is important to be clear that the location/locus that a feature is an allele_of is defined exclusively by its genomic position, and not on the sequence it may contains. This is particularly relevant when considering transgenic insertions. For example, this means that the insertion of the S. cerevisiae GAL4 gene sequence within the D. melanogaster Bx gene locus would create an allele of the D. melanogaster Bx gene, but not an allele of the S. cerevisiae GAL4 gene. The transgene that results from such an insertion, while expressing S. cerevisiae GAL4 gene sequence, is not an allele of this gene because it does not reside at the S. cerevisiae GAL4 locus. This departs from how some databases use the term 'allele' - where transgenes expressing an exogenous gene are considered to be alleles of the exogenous genes they carry. For example, in the example above, Flybase describes the S. cerevisiae GAL4 transgene as an allele_of the S. cerevisiae GAL4 gene (and gives it the allele identifier FBal0040476). A GENO representation on the other hand would say that the S. cerevisiae GAL4 transgene derives_sequence_from the S. cerevisiae GAL4 gene, but is not an allele_of this gene. In a GENO model, FBal0040476 would be typed as a transgene insertion, but not considered an allele_of the Scer\GAL4 gene. At the end of the day, it's just semantics, but worth clarifying given the ubiquity and variable use of the term 'allele'. The GENO model attempts to define and adhere to the principled notion of positionally-defined 'alleles', and functionally-defined 'transgenes'. is_allele_of A relation used to link a variant locus instance to the gene class it is a variant of (in terms of its sequence or expression level). is_variant_instance_of formerly grouped is_allele_of and is_expression_variant_of proerpties under feature to class proeprty (now renmaed has_affected_locus) Domain = genomic feature instance Range = punned gene class IRI obsolete_is_genetic_variant_of A relation linking a gene class to a sequence-varaint or expression-variant of the gene. has_variant_instance formerly grouped has_allele and has_expression_variant proerpties under cllass to feature property (now renamed locus_affected_by) Domain = punned gene class Range = genomic feature obsolete_has_genetic_variant A relation linking a gene class to one of its sequence-variant alleles. Domain = punned gene class Range = allele has_sequence_variant has_allele A relation between a gene targeting reagent (e.g. a morpholino or RNAi) and the class of gene it targets. This is intended to be used as an instance-class relation, used for linking an instance of a gene targeting reagent to the class of gene whose instances it targets. targets_gene A relation that holds between an instance of a geneetic variation and a genomic feature (typically a gene class) that is affected in its sequence or expression. This class to organizes all relations used to link genetic variation instances of any type to genomic feature classes they effect. For example, is_allele_of links a gene allele instance to its gene class (genes are represented as classes in our OWL model). Such links support phenotype propagation from alleles to genes for Monarch Initiative use cases. Use of these properties effectively puns gene class IRIs into owl:individuals in a given rdf datset. has_affected_feature A relation between an expression-variant gene (ie integrated transgenes or knockdown reagent targeted genes), and the class of gene it represents. Domain = expression variant feature. Range = punned gene class This relation links an expression-variant gene instance (targeted or transgenic) to the class of gene that it preresents. For transient transgenes, this is the gene, the coding sequence need only to contain as part an expressed region from a given gene to stand in an is_expression_variant_of relation to the gene class. is_expression_variant_of A relation between a genomic feature class (typically a gene class) and an instance of a sequence feature or qualified sequence feature that represents or affects some change in the sequence or expression of the genomic feature. class_to_feature_relation This is an organizational grouping class to collect all relations used to link genomic feature classes (typically genes) to instance of a genomic feature sequence feature or qualified sequence feature. For example, linking a gene class IRI to an instance of an allele of that gene class. Such links support phenotype propagation from features/variants to genes (e.g. for Monarch Initiative use cases) is_feature_affected_by A relation between a gene class and a gene targeting reagent that targets it. is_target_of Domain = punned gene class Range = gene knockdown reagent is_gene_target_of A relation linking a gene class to one of an expression-variant of that gene.. Domain = punned gene class Range = expression variant feature has_expression_variant_instance has_expression_variant A relation between two sequence features at a given genomic locus that vary in their sequence or level of expression. Decided there was no need for a contrasting is_expression_variant_with property, so removed it and this parent grouping property. This proeprty is most commonly used to relate two different alleles of a given gene. It is not a relation between an allele and the gene it is a variant of. obsolete_is_variant_with A relation between two instances of a given gene that vary in their level of expression as a result of external factors influencing expression (e.g. gnee-knockdown reagents, epigenetic modification, alteration of endogenous gene-regulation pathways). obsolete_is_expression_variant_with A relation used to describe a context or conditions that define and/or identify an entity. Used in Monarch Data to link associations to qualifying contexts (e.g. environments or developmental stages) where the association applies. For example, a qualifying environment represents a context where genotype-phenotype associations apply - where the environment is an identity criteria for the association. Used in GENO to describe physical context of materialized sequence features that represent identifying criteria for instances of qualified sequence features. has_qualifying_context has_qualifier a relation to link a single locus complement to its zygosity. has_zygosity A relationship between a reference locus/allele and the gene class it is an allele of. is_reference_allele_of Consider obsoleting - it is likely sufficeint to use the parent has_sequence_attribute property - a separate proeprty to link to the staining intensity attribute is not really needed. has_color_value Used to link a gross chromosomal sequence feature (chromosome part) to a color value quality that inheres in the sequence feature in virtue of the staining pattern of the chromosomal DNA in which the sequence is materialized. has_staining_intensity Used to link a gene targeting reagent such as a morpholino, to an instance of a reagent targeted gene variant. relation between an molecular agent and its molecular target is_targeted_by 1. Used to specify derivation of transgene components from a gene class, or a engineered construct instance. 2. Used to specify the genetic background/strain of origin of an allele (i.e. that an allele was originally isolated from a specific background strain, and propagated into new genetic backgrounds. 3. Used to indicate derivation of a variant mouse genotype from an ES cell line used in generating the modified mice (IMPC) Relationship between a sequence feature and a distinct, non-overlapping feature from which it derives part or all of its sequence. sequence_derives_from A relationship between a variant allele and the gene class it is an allele of. is_variant_allele_of Relationship between a sex-qualified genotype and intrinsic genotype, created specifically to support propagation of phenotypes asserted on the former to the later for Monarch Initiative use cases. has_sex_agnostic_part A relation between a mutant allele (ie rare variant present in less than 1% of a population, or an experimentally-altered variant such as a knocked-out gene in a model organism), and the gene it is a variant of. is_mutant_allele_of A relationship between a polymorphic allele and the gene class it is an allele of. is_polymorphic_allele_of A relationship between a wild-type allele and the gene class it is an allele of. is_wild_type_allele_of An organizational class to hold relations of parthood between sequences/features. has_sequence_part is_sequence_part_of Relationship between an intrinsic genotype and a sex-qualified genotype, created specifically to support propagation of phenotypes asserted on the latter to the former for Monarch Initiative use cases. is_sex_agnostic_part_of A relation that holds between two sequence features at a particular genomic location that vary in their sequence. These features will have the same position when mapped onto a reference sequence, but vary in their sequence (in whole or in part). This property is most commonly used to relate two different alleles of a given gene (e.g. a wt and mutant instance of the BRCA2 gene). It is not a relation between an allele and the class-level gene it is a variant of (for this use is_allele_of) varies_with organizational property to hold imports from faldo. faldo properties A relation linking a qualified sequence feature to its component sequence feature. has_sequence_feature_component In GENO we define three levels of sequence artifacts: (1) biological sequences, (2) sequence features, and (3) qualified sequence features. The identify criteria for a 'biological sequence' include only its inherent sequence (the ordered string of units that comprise it). The identity criteria for a 'sequence feature' include its sequence and position (where it resides - i.e. its location based on how it maps to a reference or standard) The identity criteria for a 'qualified sequence feature' include its component sequence feature (defined by its sequence and position), and the material context of its bearer in a cell or organism. This context can include direct epigenetic modification, or being targeted by gene knockdown reagents such as morpholinos or RNAi, or being transiently overexpressed from a transgenic construct in a cell or organism. has_sequence_feature has_inferred_phenotype Property chain to propagate inferred phenotype associations 'up' a genotype partonomy in the direction of sequence alteration -> VL -> VSLC -> GVC -> genotype. Property chain to propagate inferred phenotype associations 'down' a genotype partonomy from a sex-qualified intrinsic genotype to the components of a sex-agnostic intrinsic genotype. Property chain to propagate inferred phenotype associations 'down' a genotype partonomy in the direction of genotype -> GVC -> VSLC -> VL -> sequence alteration. Property chain to propagate inferred phenotype associations from an intrinsic genotype component (e.g. a (sequence-)variant locus instance) to a gene class. Property chain to propagate inferred phenotype associations from a (sequence-)variant locus instance to a gene class (to support cases where the phenotype association is made at the level of the variant gene locus). Property chain to propagate inferred phenotype associations from an extrinnsic genotype component (e.g. a expression-variant gene instance) to a gene class. Property chain to propagate inferred phenotype associations from an expression-variant gene instance to a gene class (to support cases where the phenotype association is made at the level of the expression-variant gene). Property chain to propagate inferred phenotype associations 'down' a genotype partonomy just from a sex-qualified intrinsic genotype to the immediate sex-agnostic intrinsic genotype. (An additional property chain is needed to then propagate to the intrinsic genotype components) Proposal for a property linking variants to smaller components that are regulatory, and therefore should not inherit phenotypes. obsolete_has_regulatory_part A relation linking a sequence_alteration to the gene it alters. is_within_allele_of obsolete_is_alteration_within has_asserted_phenotype Proposal for a property linking regulatory elements to larger features of which they are a part. is_regulatory_part_of A relation linking a sequence feature to its component Position that represents an identifying criteria for sequence feature instances. For representing positional data, we advocate use of the FALDO model, which links to positional information through an instance of a Region class that represents the mapping of the feature onto some reference sequence. The positional_component property in GENO is meant primarily to formalize the identity criteria or sequence features and qualified sequence features, to illustrate the distinction between them. obsolete_has_position_component A relation between a nucleic acid or amino acid sequence or sequence feature, and one of its monomeric units (nucleotide or amino acid residues) has_sequence_unit A relation between two seqeunces or features that are considered variant with each other along their entire extents. completely_varies_with related_condition Note that we currently do not have a property chain to propagate phenotypes to genes across sequence_derives_from relation (e.g. in cases where a Tg insertion derives expressed sequence from some gene) The property chains below are defined as explicitly as possible, but many could be shortened if we used the inferred_to_cause_condition property to construct the property chains. Where this is the case, it is noted in the annotations on the property chains. Below are the different kinds/paths of propagation we desire: 1. Propagation 'down' a genotype (from larger components to smaller ones) 2. Propagation 'up' a genotype (from smaller components to larger ones) 3. From sex-qualified genotypes down to the sex-agnostic genotype and its components (but not 'up' to a sex-qualified genotype). 4. From an effective genotype to its intrinsic and extrinsic components. 5. From genotype components to genes (note here that a separate chain is needed to propagate conditions asserted on a sequence alteration to the gene, because of the fact that the link to the gene is from the variant locus/allele). 6. (Exploratory). There are cases where we may also want inter-genotype propagation (i.e. propagation that extends beyond moving up or down a single genotype). For example, if a phenotype is asserted on a sex-qualified intrinsic genotype, we want it to infer down through its component sex-agnostic intrinsic genotype and then up to any effective genotypes of which this sex-agnostic intrinsic genotype is a part. Given the data in hand, however, the conditions for this will likely never occur, so probably ok not to implement a chain to support this. Note that we do not want to propagate phenotypes up from sex-agnostic genotyeps to sex-qualified ones (e.g.from shha<tbx392>/shha<tbx392> [AB] to shha<tbx392>/shha<tbx392> [AB](male)) - because it may not be the case that a phenotype assessed without consideratioon to sex will apply on a sex-specific background. So we would not create a property chain to propagate inferred condition associations from sex-agnaostic intrinsic genotypes and their parts to sex-qualified intrinsic genotypes and effective genotypes that contain them (such as: has_variant_part o has_sex_agnostic_part o has_variant_part o 'causes condition') inferred_to_cause_condition This is a case of inter-gneotype phenotype propagation, requiring propagation down oen genotype and then up another. Given the data in hand, however, the conditions for this will likely never occur, so probably ok not to have this chain. This property chain propagates a phenotype asserted on a sex-qualified intrinsic genotype, down to its sex-agnostic genotype part, and then up to a parent effective genotype that has it as a variant part. I think this is OK in all cases, so we can implement this as the one case where we can have inter-genotype pheno propagation. But as noted, there will likely be no data that actually meets criteria to use this chain, so we can probably leave it out. Property chain to propagate inferred condition associations 'up' a genotype partonomy in the direction of sequence alteration -> VL -> VSLC -> GVC -> genotype. Property chain to propagate inferred condition associations from an effective genotype through a sex-qualified intrinsic genotype, through a sex-agnostic intrinsic genotype, to the coompnent variant parts of this sex-agnostic genotype. Property chain to propagate inferred condition associations 'down' a genotype partonomy from a sex-qualified intrinsic genotype to the components of a sex-agnostic intrinsic genotype. This chain in particuular is needed to get the conditions to move past the sex-agnostic genotype and down to its parts. The following shorter chain would also suffice here: is_variant_part_of o inferred_to_cause_condition Property chain to propagate inferred condition associations 'down' a genotype partonomy in the direction of genotype -> GVC -> VSLC -> VL -> sequence alteration. Property chain to propagate inferred condition associations from an effective genotype through a sex-qualified intrinsic genotype, through a sex-agnostic intrinsic genotype, through the coompnent variant parts of this sex-agnostic genotype, and to the affected gene. Property chain to propagate inferred condition associations 'down' a genotype partonomy from a sex-qualified intrinsic genotype to the components of a sex-agnostic intrinsic genotype. This chain in particuular is needed to get the conditions to propagate to genes. The shorter chain below would also suffice for this propagation: has_allele o inferred_to_cause_condition Property chain to propagate inferred condition associations from an sequence alteration through the variant locus to a gene class. (separate chains are needed to propagate from the variant locus to the gene class, and another to propagate from a genotype, GVC, or VSLC to the gene class). NOTE that i dont need this property chain if I have a property chain to infer a has_affected_locus link from a sequence alteration to a gene when the link is asserted from the variant locus to the gene: is_variant_part_of o has_affected_locus --> has_affected_locus Obsolete comment: Property chain to propagate inferred condition associations from an intrinsic genotype, GC, or VLSC to a gene class. (a separate chain is needed to propagate from the variant locus to the gene class, and another to propagate from a sequence alteration to the gene class). The following, shorter chain, would also suffice here: has_allele o inferred_to_cause_condition -> inferred_to_cause_condition Property chain to propagate inferred condition associations from an intrinsic genotype, GVC, or VLSC to an affected gene class, or from an extrinsic gneotype or component to an affected gene class. The following, shorter chain, would also suffice here: has_affected_locus o inferred_to_cause_condition -> inferred_to_cause_condition Note that a separate chain is needed to propagate from the variant locus to the gene class, and another to propagate from a sequence alteration to the gene class in cases where the link to gene is through the variant locus rather than the seq alteration). Property chain to propagate inferred condition associations from a variant locus instance to a gene class (to support cases where the phenotype association is made directly at the level of the variant locus/allele). Property chain to propagate inferred condition associations from an effective genotype through a sex-qualified intrinsic genotype to a sex-agnostic intrinsic genotype. Property chain to propagate inferred condition associations 'down' a genotype partonomy just from a sex-qualified intrinsic genotype to the immediate sex-agnostic intrinsic genotype. (An additional property chain is needed to then propagate to the intrinsic genotype components) inferred_to_contribute_to_condition inferred_to_correlate_with_condition LOINC:LA6668-3 pathogenic_for_condition LOINC:LA26332-9 likely_pathogenic_for_condition Relation between an entity and a condition (disease, phenotype) which it does not cause or contribute to. non-causal_for_condition LOINC:LA6675-8 benign_for_condition LOINC:LA26334-5 likely_benign_for_condition LOINC:LA26333-7 has_uncertain_significance_for_condition A relation used to describe a process contextualizing the identity of an entity. has_qualifying_process A relation used to describe an environment contextualizing the identity of an entity. has_qualifying_environment is_candidate_variant_for A relation linking a sequence feature to the location it occupies on some reference sequence. occupies has_location Can be used to a genomic feature to the chromosomal strand it resides on in the genome (+ or - strand, or both strands). Commonly used to link a gene to the strand it is transcribed from. on strand Holds between a copy number complement or functional copy number complement, and a genomic location that serves as a proxy for the sequence or functional element that defines the complement. Copy number complements represent sets of all copies of a particular biological sequence present in a particular genome. Their "identity" is based on their defining sequence, and the count of this sequence in the genome.The has_defining_location property is used to specify the sequence defining a copy number complement - by using a 'sequence location' as a proxy for a specific sequence that is found at this location. For copy number complements, it is the sequence at this location on some reference that defines sequences in a genome of interest that qualify for membership in the complement. For functional copy number complements (aka genetic dosage), it is the canonical function(s) performed by the sequence at this location (typically that of a gene) that helps to define sequences in a genome of interest that qualify for membership in the complement. has_defining_location Holds between a copy number complement or functional copy number complement, and the biological sequence that defines the complement. Copy number complements represent sets of all copies of a particular biological sequence present in a particular genome. Their "identity" is based on their defining sequence, and the count of this sequence in the genome.The has_defining_sequence property is used to specify the sequence defining a copy number complement. has_defining_sequence Holds between a copy number complement or functional copy number complement and a genomic feature that serves as a proxy for the sequence that defines the complement. Copy number complements represent sets of all copies of a particular biological sequence present in a particular genome. Their "identity" is based on their defining sequence, and the count of this sequence in the genome.The has_defining_feature property is used to specify the sequence defining a copy number complement - by using a 'sequence feature' as a proxy for the specific sequence of this feature on some reference. For copy number complements, it is the sequence of this proxy feature on some reference that defines sequences in a genome of interest that qualify for membership in the complement. For functional copy number complements (aka genetic dosage), it is the canonical function(s) performed by the sequence of this proxy feature (typically a gene) that helps to define sequences in a genome of interest that qualify for membership in the complement. has_defining_feature Relates a sequence feature location to an interval that defines its start and end position. Can be used when Interval objects are employed in representing sequence location. But start and end positions can also be directly attached to a location, avoiding the use of Interval objects. has_interval Relates a 'sequence feature location' to a sequence that it is anchored to. has_reference_sequence A role assigned to a sequence feature, collection, or genotype, e.g. serving as a 'reference' against with other sequences are compared. The RO:0000087 (has role) property cannot be used here because its domain is explicitly constrained to independent continuants, and sequence features in GENO are generically dependent contnuants. sequence role is_about is a (currently) primitive relation that relates an information artifact to an entity. is about Denotes is a primitive, instance-level, relation obtaining between an information content entity and some portion of reality. Denotation is what happens when someone creates an information content entity E in order to specifically refer to something. The only relation between E and the thing is that E can be used to 'pick out' the thing. This relation connects those two together. Freedictionary.com sense 3: To signify directly; refer to specifically Consdier if this is the best relation for linking genotypes to the genomic entities they specify. We could use the more generic 'is about', or define a new 'specifies' relation that holds between ICEs and something it specifies the nature or creation of. denotes A relation between a planned process and a continuant participating in that process that is not created during the process. The presence of the continuant during the process is explicitly specified in the plan specification which the process realizes the concretization of. has_specified_input A relation between a planned process and a continuant participating in that process. The presence of the continuant at the end of the process is explicitly specified in the objective specification which the process realizes the concretization of. has_specified_output a relation between a specifically dependent continuant (the dependent) and an independent continuant (the bearer), in which the dependent specifically depends on the bearer for its existence inheres_in a relation between an independent continuant (the bearer) and a specifically dependent continuant (the dependent), in which the dependent specifically depends on the bearer for its existence bearer of a relation between a continuant and a process, in which the continuant is somehow involved in the process participates in a relation between a process and a continuant, in which the continuant is somehow involved in the process has participant A journal article is an information artifact that inheres in some number of printed journals. For each copy of the printed journal there is some quality that carries the journal article, such as a pattern of ink. The quality (a specifically dependent continuant) concretizes the journal article (a generically dependent continuant), and both depend on that copy of the printed journal (an independent continuant). A relationship between a specifically dependent continuant and a generically dependent continuant, in which the generically dependent continuant depends on some independent continuant in virtue of the fact that the specifically dependent continuant also depends on that same independent continuant. Multiple specifically dependent continuants can concretize the same generically dependent continuant. concretizes a relation between an independent continuant (the bearer) and a quality, in which the quality specifically depends on the bearer for its existence has quality has role a relation between an independent continuant (the bearer) and a disposition, in which the disposition specifically depends on the bearer for its existence has disposition derives from starts during ends during x overlaps y if and only if there exists some z such that x has part z and z part of y overlaps x is in taxon y if an only if y is an organism, and the relationship between x and y is one of: part of (reflexive), developmentally preceded by, derives from, secreted by, expressed. in taxon A relationship that holds between a biological entity and a phenotype. Here a phenotype is construed broadly as any kind of quality of an organism part, a collection of these qualities, or a change in quality or qualities (e.g. abnormally increased temperature). The subject of this relationship can be an organism (where the organism has the phenotype, i.e. the qualities inhere in parts of this organism), a genomic entity such as a gene or genotype (if modifications of the gene or the genotype causes the phenotype), or a condition such as a disease (such that if the condition inheres in an organism, then the organism has the phenotype). has phenotype phenotype of temporally related to p has direct input c iff c is a participant in p, c is present at the start of p, and the state of c is modified during p. has input p has output c iff c is a participant in p, c is present at the end of p, and c is not present at the beginning of p. has output is member of Example 1: a collection of sequences such as a genome being comprised of separate sequences of chromosomes Example 2: a collection of information entities such as a genotype being comprised of a background component and a variant component has member is a mereological relation between a collection and an item. has member input of output of obsolete_formed as result of Holds between molecular entities a and b when the execution of a activates or inhibits the activity of b molecularly controls x bounds the sequence of y iff the upstream-most part of x is upstream of or coincident with the upstream-most part of y, and the downstream-most part of x is downstream of or coincident with the downstream-most part of y Chris Mungall bounds sequence of x has subsequence y iff all of the sequence parts of x are sequence parts of y has subsequence is subsequence of x overlaps the sequence of x if and only if x has a subsequence z and z is a subsequence of y. http://biorxiv.org/content/early/2014/06/27/006650.abstract overlaps sequence of inverse of downstream of sequence of is upstream of sequence of x is downstream of the sequence of y iff either (1) x and y have sequence units, and all units of x are downstream of all units of y, or (2) x and y are sequence units, and x is either immediately downstream of y, or transitively downstream of y. is downstream of sequence of Relation between a research artifact and an entity it is used to study, in virtue of its replicating or approximating features of the studied entity. To Do: decide on scope of this relation - inclusive of computational models in domain, or only physical models? Restricted to linking biological systems and phenomena? Inclusive of only diseases in range, or broader? Matthew Brush The driving use case for this relation was to link a biological model system such as a cell line or model organism to a disease it is used to investigate, in virtue of the model system exhibiting features similar to that of the disease of interest. is model of The genetic variant 'NM_007294.3(BRCA1):c.110C>A (p.Thr37Lys)' casues or contributes to the disease 'familial breast-ovarian cancer'. An environment of exposure to arsenic causes or contributes to the phenotype of patchy skin hyperpigmentation, and the disease 'skin cancer'. A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity has some causal or contributing role that influences the condition. Note that relationships of phenotypes to organisms/strains that bear them, or diseases they are manifest in, should continue to use RO:0002200 ! 'has phenotype' and RO:0002201 ! 'phenotype of'. Genetic variations can span any level of granularity from a full genome or genotype to an individual gene or sequence alteration. These variations can be represented at the physical level (DNA/RNA macromolecules or their parts, as in the ChEBI ontology and Molecular Sequence Ontology) or at the abstract level (generically dependent continuant sequence features that are carried by these macromolecules, as in the Sequence Ontology and Genotype Ontology). The causal relations in this hierarchy can be used in linking either physical or abstract genetic variations to phenotypes or diseases they cause or contribute to. Environments include natural environments or exposures, experimentally applied conditions, or clinical interventions. causes or contributes to condition A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity has a causal role for the condition. causes condition A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity has some contributing role in the manifestation of the condition. contributes to condition A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity influences the severity with which a condition manifests in an individual. contributes to expressivity of condition contributes to severity of condition A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity influences the frequency of the condition in a population. contributes to penetrance of condition contributes to frequency of condition A relationship between an entity (a genotype, genetic variation or environment) and a condition (a phenotype or disease) where the entity prevents or reduces the severity of a condition. Genetic variations can span any level of granularity from a full genome or genotype to an individual gene or sequence alteration. These variations can be represented at the physical level (DNA/RNA macromolecules or their parts, as in the ChEBI ontology and Molecular Sequence Ontology) or at the abstract level (generically dependent continuant sequence features that are carried by these macromolecules, as in the Sequence Ontology and Genotype Ontology). The causal relations in this hierarchy can be used in linking either physical or abstract genetic variations to phenotypes or diseases they cause or contribute to. Environments include natural environments or exposures, experimentally applied conditions, or clinical interventions. is preventative for condition A relationship between an entity and a condition (phenotype or disease) with which it exhibits a statistical dependence relationship. correlated with condition association has object association has predicate association has subject The position value is the offset along the reference where this position is found. Thus the only the position value in combination with the reference determines where a position is. position Property linking a sequence or sequence feature to an integer representing its length in terms of the number of units in the sequence. has_extent Shortcut relation linking a sequence feature directly to a string representing the 'state' of its sequence - i.e. the ordering of units that comprise it (e.g. 'atgcagctagctaccgtcgatcg'). has_sequence_string ObsoleteDataProperty The 'rank' quantifier in Bgee gene-anatomy associations, that indicates the imporatnace/specificity of a gene expression in a given anatommy relative to expressionin other anatomies for the same gene. Property to link an assertion or association with some value quantifying its relevance or ranking. has_quantifier The starting position of a sequence feature or interval. start_position The ending position of a sequence feature or interval. end_position Property linking a biological sequence to a string representing the ordered units that comprise the sequence (e.g. 'atgcagctagctaccgtcgatcg'). has_string Describes the number of members in some set. has_count In GENO, this is used to describe things like the number of sequence features comprising a 'sequence feature set', the number of sequences in a 'biological sequence set', or the number of functional sequences defining a particular 'functional copy number complement'. has_member_count Both strands A position that is exactly known. Exact position Positive strand Superclass for the general concept of a position on a sequence. The sequence is designated with the reference predicate. We place the FALDO:Position class under GENO:genomic location, as it represents a type of genomic location with an extent of 1 (i.e.has the same start and end coordinates - representing a single position as opposed to a location spanning a longer region). FALDO Position 1 1 A region describes a length of sequence with a start position and end position that represents a feature on a sequence, e.g. a gene. From what I can tell, feature instances in data whose position is to be defined using FALDO are always mapped to a Region, and then the position of this Region is defined according to its location within some larger reference sequence. The exception may be feature instances that are explicitly part of the reference sequence on which its location is being defined (such that no 'mapping' to a reference is required). This suggests that, conceptually, we can think of a FALDO:Region as a subregion of a reference sequence that is mapped to from a feature of interest, in order to define its position with respect to that reference sequence. Region Negative strand Part of the coordinate system denoting on which strand the feature can be found. If you do not yet know which stand the feature is on, you should tag the position with just this class. If you know more you should use one of the subclasses. This means a region described with a '.' in GFF3. A GFF3 unstranded position does not have this type in FALDO -- those are just a 'position'. Stranded position Julius Caesar Verdi’s Requiem the Second World War your body mass index BFO 2 Reference: In all areas of empirical inquiry we encounter general terms of two sorts. First are general terms which refer to universals or types:animaltuberculosissurgical procedurediseaseSecond, are general terms used to refer to groups of entities which instantiate a given universal but do not correspond to the extension of any subuniversal of that universal because there is nothing intrinsic to the entities in question by virtue of which they – and only they – are counted as belonging to the given group. Examples are: animal purchased by the Emperortuberculosis diagnosed on a Wednesdaysurgical procedure performed on a patient from Stockholmperson identified as candidate for clinical trial #2056-555person who is signatory of Form 656-PPVpainting by Leonardo da VinciSuch terms, which represent what are called ‘specializations’ in [81 Entity doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. For example Werner Ceusters 'portions of reality' include 4 sorts, entities (as BFO construes them), universals, configurations, and relations. It is an open question as to whether entities as construed in BFO will at some point also include these other portions of reality. See, for example, 'How to track absolutely everything' at http://www.referent-tracking.com/_RTU/papers/CeustersICbookRevised.pdf An entity is anything that exists or has existed or will exist. (axiom label in BFO2 Reference: [001-001]) entity BFO 2 Reference: Continuant entities are entities which can be sliced to yield parts only along the spatial dimension, yielding for example the parts of your table which we call its legs, its top, its nails. ‘My desk stretches from the window to the door. It has spatial parts, and can be sliced (in space) in two. With respect to time, however, a thing is a continuant.’ [60, p. 240 Continuant doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. For example, in an expansion involving bringing in some of Ceuster's other portions of reality, questions are raised as to whether universals are continuants A continuant is an entity that persists, endures, or continues to exist through time while maintaining its identity. (axiom label in BFO2 Reference: [008-002]) continuant continuant BFO 2 Reference: every occurrent that is not a temporal or spatiotemporal region is s-dependent on some independent continuant that is not a spatial region BFO 2 Reference: s-dependence obtains between every process and its participants in the sense that, as a matter of necessity, this process could not have existed unless these or those participants existed also. A process may have a succession of participants at different phases of its unfolding. Thus there may be different players on the field at different times during the course of a football game; but the process which is the entire game s-depends_on all of these players nonetheless. Some temporal parts of this process will s-depend_on on only some of the players. Occurrent doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. An example would be the sum of a process and the process boundary of another process. Simons uses different terminology for relations of occurrents to regions: Denote the spatio-temporal location of a given occurrent e by 'spn[e]' and call this region its span. We may say an occurrent is at its span, in any larger region, and covers any smaller region. Now suppose we have fixed a frame of reference so that we can speak not merely of spatio-temporal but also of spatial regions (places) and temporal regions (times). The spread of an occurrent, (relative to a frame of reference) is the space it exactly occupies, and its spell is likewise the time it exactly occupies. We write 'spr[e]' and `spl[e]' respectively for the spread and spell of e, omitting mention of the frame. An occurrent is an entity that unfolds itself in time or it is the instantaneous boundary of such an entity (for example a beginning or an ending) or it is a temporal or spatiotemporal region which such an entity occupies_temporal_region or occupies_spatiotemporal_region. (axiom label in BFO2 Reference: [077-002]) occurrent occurrent a chair a heart a leg a molecule a spatial region an atom an orchestra. an organism the bottom right portion of a human torso the interior of your mouth b is an independent continuant = Def. b is a continuant which is such that there is no c and no t such that b s-depends_on c at t. (axiom label in BFO2 Reference: [017-002]) independent continuant independent continuant a process of cell-division, \ a beating of the heart a process of meiosis a process of sleeping the course of a disease the flight of a bird the life of an organism your process of aging. p is a process = Def. p is an occurrent that has temporal proper parts and for some time t, p s-depends_on some material entity at t. (axiom label in BFO2 Reference: [083-003]) BFO 2 Reference: The realm of occurrents is less pervasively marked by the presence of natural units than is the case in the realm of independent continuants. Thus there is here no counterpart of ‘object’. In BFO 1.0 ‘process’ served as such a counterpart. In BFO 2.0 ‘process’ is, rather, the occurrent counterpart of ‘material entity’. Those natural – as contrasted with engineered, which here means: deliberately executed – units which do exist in the realm of occurrents are typically either parasitic on the existence of natural units on the continuant side, or they are fiat in nature. Thus we can count lives; we can count football games; we can count chemical reactions performed in experiments or in chemical manufacturing. We cannot count the processes taking place, for instance, in an episode of insect mating behavior.Even where natural units are identifiable, for example cycles in a cyclical process such as the beating of a heart or an organism’s sleep/wake cycle, the processes in question form a sequence with no discontinuities (temporal gaps) of the sort that we find for instance where billiard balls or zebrafish or planets are separated by clear spatial gaps. Lives of organisms are process units, but they too unfold in a continuous series from other, prior processes such as fertilization, and they unfold in turn in continuous series of post-life processes such as post-mortem decay. Clear examples of boundaries of processes are almost always of the fiat sort (midnight, a time of death as declared in an operating theater or on a death certificate, the initiation of a state of war) process process an atom of element X has the disposition to decay to an atom of element Y certain people have a predisposition to colon cancer children are innately disposed to categorize objects in certain ways. the cell wall is disposed to filter chemicals in endocitosis and exocitosis BFO 2 Reference: Dispositions exist along a strength continuum. Weaker forms of disposition are realized in only a fraction of triggering cases. These forms occur in a significant number of cases of a similar type [89 b is a disposition means: b is a realizable entity & b’s bearer is some material entity & b is such that if it ceases to exist, then its bearer is physically changed, & b’s realization occurs when and because this bearer is in some special physical circumstances, & this realization occurs in virtue of the bearer’s physical make-up. (axiom label in BFO2 Reference: [062-002]) disposition disposition the disposition of this piece of metal to conduct electricity. the disposition of your blood to coagulate the function of your reproductive organs the role of being a doctor the role of this boundary to delineate where Utah and Colorado meet To say that b is a realizable entity is to say that b is a specifically dependent continuant that inheres in some independent continuant which is not a spatial region and is of a type instances of which are realized in processes of a correlated type. (axiom label in BFO2 Reference: [058-002]) realizable entity realizable entity the ambient temperature of this portion of air the color of a tomato the length of the circumference of your waist the mass of this piece of gold. the shape of your nose the shape of your nostril a quality is a specifically dependent continuant that, in contrast to roles and dispositions, does not require any further process in order to be realized. (axiom label in BFO2 Reference: [055-001]) quality quality Reciprocal specifically dependent continuants: the function of this key to open this lock and the mutually dependent disposition of this lock: to be opened by this key of one-sided specifically dependent continuants: the mass of this tomato of relational dependent continuants (multiple bearers): John’s love for Mary, the ownership relation between John and this statue, the relation of authority between John and his subordinates. the disposition of this fish to decay the function of this heart: to pump blood the mutual dependence of proton donors and acceptors in chemical reactions [79 the mutual dependence of the role predator and the role prey as played by two organisms in a given interaction the pink color of a medium rare piece of grilled filet mignon at its center the role of being a doctor the shape of this hole. the smell of this portion of mozzarella b is a relational specifically dependent continuant = Def. b is a specifically dependent continuant and there are n &gt; 1 independent continuants c1, … cn which are not spatial regions are such that for all 1 i &lt; j n, ci and cj share no common parts, are such that for each 1 i n, b s-depends_on ci at every time t during the course of b’s existence (axiom label in BFO2 Reference: [131-004]) b is a specifically dependent continuant = Def. b is a continuant & there is some independent continuant c which is not a spatial region and which is such that b s-depends_on c at every time t during the course of b’s existence. (axiom label in BFO2 Reference: [050-003]) Specifically dependent continuant doesn't have a closure axiom because the subclasses don't necessarily exhaust all possibilites. We're not sure what else will develop here, but for example there are questions such as what are promises, obligation, etc. specifically dependent continuant specifically dependent continuant John’s role of husband to Mary is dependent on Mary’s role of wife to John, and both are dependent on the object aggregate comprising John and Mary as member parts joined together through the relational quality of being married. the priest role the role of a boundary to demarcate two neighboring administrative territories the role of a building in serving as a military target the role of a stone in marking a property boundary the role of subject in a clinical trial the student role BFO 2 Reference: One major family of examples of non-rigid universals involves roles, and ontologies developed for corresponding administrative purposes may consist entirely of representatives of entities of this sort. Thus ‘professor’, defined as follows,b instance_of professor at t =Def. there is some c, c instance_of professor role & c inheres_in b at t.denotes a non-rigid universal and so also do ‘nurse’, ‘student’, ‘colonel’, ‘taxpayer’, and so forth. (These terms are all, in the jargon of philosophy, phase sortals.) By using role terms in definitions, we can create a BFO conformant treatment of such entities drawing on the fact that, while an instance of professor may be simultaneously an instance of trade union member, no instance of the type professor role is also (at any time) an instance of the type trade union member role (any more than any instance of the type color is at any time an instance of the type length).If an ontology of employment positions should be defined in terms of roles following the above pattern, this enables the ontology to do justice to the fact that individuals instantiate the corresponding universals – professor, sergeant, nurse – only during certain phases in their lives. b is a role means: b is a realizable entity and b exists because there is some single bearer that is in some special physical, social, or institutional set of circumstances in which this bearer does not have to be and b is not such that, if it ceases to exist, then the physical make-up of the bearer is thereby changed. (axiom label in BFO2 Reference: [061-001]) role role The entries in your database are patterns instantiated as quality instances in your hard drive. The database itself is an aggregate of such patterns. When you create the database you create a particular instance of the generically dependent continuant type database. Each entry in the database is an instance of the generically dependent continuant type IAO: information content entity. the pdf file on your laptop, the pdf file that is a copy thereof on my laptop the sequence of this protein molecule; the sequence that is a copy thereof in that protein molecule. b is a generically dependent continuant = Def. b is a continuant that g-depends_on one or more other entities. (axiom label in BFO2 Reference: [074-001]) generically dependent continuant generically dependent continuant a flame a forest fire a human being a hurricane a photon a puff of smoke a sea wave a tornado an aggregate of human beings. an energy wave an epidemic the undetached arm of a human being BFO 2 Reference: Material entities (continuants) can preserve their identity even while gaining and losing material parts. Continuants are contrasted with occurrents, which unfold themselves in successive temporal parts or phases [60 BFO 2 Reference: Object, Fiat Object Part and Object Aggregate are not intended to be exhaustive of Material Entity. Users are invited to propose new subcategories of Material Entity. BFO 2 Reference: ‘Matter’ is intended to encompass both mass and energy (we will address the ontological treatment of portions of energy in a later version of BFO). A portion of matter is anything that includes elementary particles among its proper or improper parts: quarks and leptons, including electrons, as the smallest particles thus far discovered; baryons (including protons and neutrons) at a higher level of granularity; atoms and molecules at still higher levels, forming the cells, organs, organisms and other material entities studied by biologists, the portions of rock studied by geologists, the fossils studied by paleontologists, and so on.Material entities are three-dimensional entities (entities extended in three spatial dimensions), as contrasted with the processes in which they participate, which are four-dimensional entities (entities extended also along the dimension of time).According to the FMA, material entities may have immaterial entities as parts – including the entities identified below as sites; for example the interior (or ‘lumen’) of your small intestine is a part of your body. BFO 2.0 embodies a decision to follow the FMA here. A material entity is an independent continuant that has some portion of matter as proper or improper continuant part. (axiom label in BFO2 Reference: [019-002]) material entity material entity Stub class to serve as root of hierarchy for imports of molecular entities from ChEBI ontology. molecular entity nucleic acid A cultured cell population that represents a genetically stable and homogenous population of cultured cells that shares a common propagation history (i.e. has been successively passaged together in culture). cell line Stub class to serve as root of hierarchy for imports of cell types from CL or other cell terminologies. cell 1. Stub class to serve as root of hierarchy for imports from an ontology of environment and experimental conditions. 2. Need to consdier how to model environments in a way that covers ENVO and XCO content in a consistent and coherent way. A couple classes under Exploratory Class are relvant here. Consider how we might approach environments/condisitons using an EQ aproach analogous to how phenotypes are defined (i.e. consider environments/coonditions as qualities inhereing in some entity). In ENVO's alignment with the Basic Formal Ontology, this class is being considered as a subclass of a proposed BFO class "system". The relation "environed_by" is also under development. Roughly, a system which includes a material entity (at least partially) within its site and causally influences that entity may be considered to environ it. Following the completion of this alignment, this class' definition and the definitions of its subclasses will be revised. environmental system Example zebrafish intrinsic genotype: Genotype = fgf8a<ti282a/+>; shha<tb392/tb392> (AB) reference component (genomic background) = AB variant component ('genomic variation complement') = fgf8a<ti282a/+>; shha<tb392/tb392> . . . and within this variant component, there are two 'variant single locus complements' represented: allele complement 1 = fgf8a<ti282a/+> allele complement 2 = shha<tb392/tb392> and within each of these 'variant single locus complements' there is one or more variant gene locus member: in complement 1: fgf8a<ti282a> in complement 2: shha<ttb392> A genomic genotype that does not specify the sex determining chromosomal features of its bearer (i.e. does not indicate the background sex chromosome complement) This modeling approach allows use to create separate genotype instances for data sources that report sex-specific phenotypes to ensure that sex-specific G2P differences are accurately described. These sex-qualified genotypes can be linked to the more general sex-agnostic intrinsic genotype that is shared by make and female mice of the same strain, to aggregate associated phenotypes at this level, and allow aggregation with G2P association data about the same strains from sources that distinguish sex-specific phenotypes (e.g. IMPC) and those that do not (e.g. MGI). Conceptually, a sex-qualified phenotype represents a superset of sequence features relative to a sex-agnostic intirnsic genotype, in that if specifies the background sex-chromosome complement of the genome. Thus, in the genotype partonomy, a sex-qualified genotype has as part a sex-agnostic genotype. This allows for the propagation of phenotypes associated with a sex-qualified genotype to the intrinsic genotype. genotype organismal genotype sex-agnostic intrinsic genotype In practice, most genotype instances classified as sex-agnostic genotypes because they are not sex-specific. When a genotype is indicated to be that of a male or female, it implies a known sex chromosome complement in the genomic background. This requires us to distinguish separate 'sex-qualified' genotype instances for males and females that share a common 'sex-agnostic' genotype. For example, male and female mice that of the same strain/background and containing the same set of genetic variations will have the same sex-agnostic intrinsic genotype, but different sex-qualified intrinsic genotypes (which take into account background sex chromosome sequence as identifying criteria for genotype instances). genomic genotype (sex-agnostic) An allele that varies in it sequence from what is considered the reference or canonical sequence at that location. The use of the descriptor 'variant' here is consistent with naming recommendations from the ACMG Guidelines paper here: PMID:25741868. Generally, the descriptive labels chosen for subtypes of variant allele conform these recommendations as well, where 'variant' is used to cover mutant and polymorphic alleles. alternate allele sequence-variant feature variant feature Note that what is considered the 'reference' vs. 'variant' sequence at a given locus may be context-dependent - so being 'variant' is more a role played in a particular situation. A 'variant allele' contains a 'sequence alteration', or is itself a 'sequence alteration', that makes it vary_with some other allele to which it is being compared. But in any comparison of alternative sequences at a particular genomic location, the choice of a 'reference' vs the 'variant' is context-dependent - as comparisons in other contexts might consider a different feature to be the reference. So being 'variant' is more a role played in a particular situation - as an allele that is variant in one context/analysis may be considered reference in another. A variant allele can be variant along its entire extent, in which case it is considered a 'sequence alteration', or it can span a broader extent of sequence contains sequence alteration(s) as part. And example of the former is a SNP, and an example of the latter is a variant gene allele that contains one or more point mutations in its sequence. variant allele A genomic feature set representing all 'variant single locus complements' in a single genome, which together constitute the 'variant' component of a genomic genotype. Note that even a reference feature (e.g. a wild-type gene) that is a member of a single locus complement that contains a variant allele is included in this 'genomic variation complement'. Thus, the members of this 'genomic variation complement' (which is a sequence collection) are 'single locus variant complements'. Our axiom below uses has_part rather than has_member, however, to account for the fact that many 'genomic variation complements' have only one 'single locus variant complement' as members. So because has_member is not reflexive, it is not appropriate for these cases. A 'complement' refers to an exhaustive collection of *all* objects that make up some well-defined set. Such a complement may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features. Here, a 'genomic variation complement' is the set of all 'single locus complements' in a particular genome that harbor some known variation. In model organisms, the majority of genotypes describe variation at a single location in the genome (ie only one 'single-locus variant complement') that are variant realtive to some reference background. For example, the genotype instance 'fgf8a<t1282a/+>(AB)') exhibits a mutation at only one locus. But some genotypes describe variation at more than one location (e.g. a double mutant that has alterations in the fgf8a gene and the shh gene)). genomic variation complement The ZFIN background 'AB' that serves as a reference as part of the genotype fgf8a^ti282a/+ (AB) A reference genome that represents the sequence of a genome from which a variant genome is derived (through the introduction of sequence alterations). Here, a 'genomic background' would differ form a 'reference genome' in that 'background' implies a derivation of the variant from the background (which is the case for most MOD strains), whereas a reference is simply meant as a target for comparison. But in a sense all background genomes are by default reference, in that the derived variant genome is compared against it. genomic background OBI:genetic population background information background genome The reference/wild-type cd99l2 danio rerio gene allele spans bases 27,004,426-27,021,059 on Chromosome 7. The "mn004Gt" represents an experimentally-created allele of this gene, in which sequence from a gene trap construct containing an RFP marker has been inserted at the cd99l2 gene locus. The resulting gene allele includes sequence from this construct that make it longer than the reference gene sequence, and also alter its seqauence in a way that prevents it from producing a functional product. The sequence extent of this cd99l2 gene allele is determined based on how its sequence aligns with that of the canonical gene and surrounding sequence in a reference genome. http://useast.ensembl.org/Danio_rerio/Gene/Summary?g=ENSDARG00000056722 http://zfin.org/action/feature/feature-detail?zdbID=ZDB-ALT-111117-8 A genomic feature that represents one of a set of versions of a gene (i.e. a haplotype whose extent is that of a gene) Regarding the distinction between a 'gene' and a 'gene allele': Every zebrafish genome contains a 'gene allele' for every zebrafish gene. Many will be 'wild-type' or at least functional gene alleles. But some may be alleles that are mutated or truncated so as to lack functionality. According to current SO criteria defining genes, a 'gene' no longer exists in the case of a non-functional or deleted variant. But the 'gene allele' does exist - and its extent is that of the remaining/altered sequence based on alignment with a reference gene. Even for completely deleted genes, an allele of the gene exists (and here is equivalent to the junction corresponding to the where gene would live based on a reference alignment). This design allows us to classify genes and any variants of those genes (be they functional or not) as the same type of thing (ie a 'gene allele'), since classification is based on genomic position rather than functional capacity. This is practical for representation of variant genotypes which often carry non-functional versions of a gene at a particular locus. What is important here is specifying what is present at a locus associated with a particular gene, whether or not it is a functional gene or not. http://purl.obolibrary.org/obo/SO_0001023 ! allele In SO, the concept of a 'gene' is functionally defined, in that a gene necessarily produces a functional product. By contrast, the concept of a 'gene allele' here is positionally defined - representing the sequence present at the location a gene resides in a reference genome (based on sequence alignment). An Shh gene allele, for example, may be a fully functional wild-type version of the gene, a non-functional version carrying a deleterious point mutation, a truncated version of the gene, or even a complete deletion. In all these cases, an 'Shh gene allele' exists at the position where the canonical gene resides in the reference genome - even if the extent of this allele different than the wild-type, or even zero in the case of the complete deletion. A genomic feature being an allele_of a gene is based on its location in a host genome - not on its sequence. This means, for example, that the insertion of the human SMN2 gene into the genome of a mouse (see http://www.informatics.jax.org/allele/MGI:3056903) DOES NOT represent an allele_of the human SMN2 gene according to the GENO model - because it is located in a mouse genome, not a human one. Rather, this is a transgenic insertion that derives_sequence_from the human SMN2 gene. If this human SMN2 gene is inserted within the mouse SMN2 gene locus (e.g. used to replace mouse SMN2 gene), the feature it creates is an allele_of the mouse SMN2 gene (one that happens to match the sequence of the human ortholog of the gene). But again, it is not an allele_of the human SMN2 gene. gene allele A sequence that serves as a standard against which other sequences at the same location are compared. The notion of a 'reference' in GENO is implemented at the level of 'biological sequence' rather than at the level of a sequence feature - i.e. we define a class for 'reference sequence' rather than reference sequence feature'. This is because it is at the *sequence* level that features of interest are determined to be variant or not. It is taken for granted that the *location* of the feature of interest is the same as that of the reference sequence to which it is compared, becasue an alignment process establishing common location always precedes the sequence comparison that determines if the feature is variant. reference sequence A reference sequence is one that serves as a standard against which 'variant' versions of the feature are compared, or against which located sequence features within the reference region are aligned in order to assign position information. Being 'reference' does not imply anything about the frequency or function of features bearing the sequence. Only that some agent has used it to serve a reference role in defining a variant or locating a sequence. reference sequence a collection more than one sequence features (ie a collection of discontinuous sequence features) perhaps not same as SO:sequence collection, as here we explicitly include features that can have an extent of zero (and SO:sequence collection is a collection of regions that have an extent of at least one) 1. Note that members of this class can be features with extents of zero (e.g. junctions). This is likely different than the SO:sequence feature class which has members that are regions. obsolete sequence feature collection true A sequence feature collection comprised of discontiguous sequences from a single genome Previously called 'genetic locus collection'. Difference between 'genetic' and 'genomic', as used here, is that 'genomic' implies a feature is a heritable part of some genome, while 'genetic' implies that it is part of some feature that is capable of contributing to gene expression in a cell or other biological system. genomic feature collection Conceptually, members of this collection are meant to be about the sum total genetic material in a single cell or organism. But these members need not be associated with an actual material in a real cell or organism individual. For example, things like a 'reference genome' may not actually represent the material genome of any individual cell or organism in reality. Here, there may be no genomic material referents of the sequences in such a collection because the genome is tied to an idealized, hypothetical cell or organism instance. The key is that conceptually, they are still tied to the idea of being contained in a single genome. In the case of a genotype, the individual seqeunce members are not all about the genetic material of a singel cell or organism. Rather, it is the resolved sequence contained in the genotype that is meant to be about the total genomic sequence content of a genome - which we deem acceptable for classifying as a genetic locus collection. obsolete genomic feature collection true A single locus complement that serves as a standard against which 'variant' sequences are compared reference allelic complement reference single locus feature complement Not required at present for any specific use case, so marking as exploratory and obsoleting for simplicity. Eq Class axiom: 'single locus complement' and (has_sequence_attribute some reference) SC axioms: 'has member' exactly 0 'variant allele' 'has member' only 'reference genomic feature' 'has member' some 'reference genomic feature' obsolete reference single locus complement true A single locus complement in which at least one member allele is considered variant, and/or the total number of features in the complement deviates from the normal poloidy of the reference genome (e.g. trisomy 13). variant allelic complement Instances of this class are sets comprised of all allels at a specified genomic location where at least one allele is variant (non-reference). In diploid genomes this complement typically has two members. Note that this class also covers cases where deviant numbers of genes or chromosomes are present in a genome (e.g. trisomy of chromosome 21), even if their sequence is not variant. variant single locus complement A genome that varies at one or more loci from the sequence of some reference genome. http://purl.obolibrary.org/obo/SO_0001506 ! variant_genome (definition of SO term here is too vague to know if has same meaning as GENO class here) variant genome An allele whose sequence matches what is consdiered to be the reference sequence at that location in the genome. Being a 'reference allele' is a role or status assigned in the context of a specific dataset or analysis. In human variation datasets, 'reference' status is typically assigned based on factors such as being the most common in a population, being an ancestral allele, or being indentified first as a prototypical example of some feature or gene. For example, 'reference alleles' in characterizing SNPs often represent the allele first characterized in a reference genome, or the most common allele in a population. In model organism datasets, 'reference' alleles are typically (but not always) the 'wild-type' variant at a given locus, representing a functional and unaltered version of the feature that is part of a defined genomic background, and against which natural or experimentally-induced alterations are compared. reference allele A genomic feature known to exist, but remaining uncharacterized with respect to its identity (e.g. which allele exists at a given gene locus). Uses as a term of convenience for describing data reporting unspecified alleles in a genotype (i.e. in cases where zygosoty for a given locus is not known). Typlically recorded in genotype syntaxes as a ' /? '. Not required at present for any specific use case, so marking as exploratory and obsoleting for simplicity. Eq Class def: 'genomic feature' and (has_sequence_attribute some unspecified) An unspecified feature is known to exist as the partner of a characterized allele when the zygosity at that locus is not known. Its specific sequence/identity, however, is unknown (ie whether it is a reference or variant allele). obsolete unspecified feature true A junction found at a chromosomal position where an insertion has occurred on the homologous chromosome, such that the junction represents the reference feature paired with the hemizygously inserted feature. hemizygous reference junction Eliminating unecessary defined/organizational classes. Former logical def: junction and (has_sequence_attribute some reference) Subclass axiom: is_variant_with some insertion In the case of a transgenic insertion that creates a hemizygous locus, the refernce locus that this insertion is variant_with is the junction on the homologous chromosome at the same position where the insertion occurred. This is the 'hemizygous reference' junction. The junction-insertion pair represents the allelic complement at that locus, which is considered to be hemizygous. Most genotype syntaxes represent this hemizygous state with a ' /0' notation. obsolete reference junction true A gene that originates from the genome of a danio rerio. danio rerio gene A gene that originates from the genome of a homo sapiens. homo sapiens gene A gene that originates from the genome of a mus musculus. mus musculus gene A reference human sonic hedgehog (shh) gene spans bases 155,592,680-155,604,967 on Chromosome 7, according to genome build GRCh37, and produces a primary funcitonal transcript that is 4454 bp in length and produces a 462 amino acid protein involved in cell signaling events behind various aspects of cell differentiation and development. http://useast.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000164690 Note that this may be slightly different than the extend described in other gene databases, such as Entrez Gene:http://www.ncbi.nlm.nih.gov/gene/6469 A version/allele of a gene that serves as a standard against which variant genes are compared. reference gene Not required at present for any specific use case, so marking as exploratory and obsoleting for simplicity. Eq Class axiom: 'gene allele' and (has_sequence_attribute some reference) SC axioms: is_variant_with some 'gene allele' is_reference_allele_of some gene Being a 'reference gene' is a role or status assigned in the context of a specific dataset or analysis. In human variation datasets, 'reference' status is typically assigned based on factors such as being the most common version/allele in a population, being an ancestral allele, or being indentified first as a prototypical example of a gene. In model organism datasets, 'reference' genes are typically the 'wild-type' allele for a given gene, representing a functional and unaltered version of the gene that is part of a defined genomic background, and against which natural or experimentally-induced versions are compared. obsolete reference gene allele true obsolete experimental insertion true gene trap insertion A transgene that has been integrated into a chrromosome in the host genome. An integrated transgene differs from a transgenic insertion in that a transgenic insertion may contain single transgene, a partial transgene that needs endognous sequences from the host genome to become functional (e.g. an enhancer trap), or multiple transgenes (i.e. be polycistronic). Fiurthermore, the transgenic insertion may contain sequences in addition to its transgene(s - e.g. sequences flanking the transgene reqired for integration or replicaiton/maintenance in the host genome. The term 'integrated transgene' covers individual transgenes that were delivered in whole or in part by a transgenic insertion. An 'integrated transgene' differs from its parent 'transgene' in that transgenes can include genes introduced into a cell/organism on an extra-chromosomal plasmid that is never integrated into the host genome. integrated transgene A nucleic acid macromolecule that is part of a cell or virion and has been inherited from an ancestor cell or virion, and/or is capable of being replicated and inherited through successive generations of progeny. 1. Note that at present, a material genome and genetic material are necessarily part of some cell or virion. So a genomic library is not considered a material genome/genomic material - rather, we could say that this genomic library is a 'genomic material sample' that bears the concretization of some genome. 2. A challenging edge case is experimentally delivered DNA into a terminally differentiated cell that will never divide. Such material does technically meet our definition - since we are careful to say that the material must be *capable of* being stably inherited through subsequent generations. Thus, we would say that *if* the cell were resume replication, the material would be heritable in this way. 1. Genomic material here is considered as a DNA or RNA molecule that is found in a cell or virus, and capable of being replicated and inherited by progeny cells or virus. As such, this nucleic acid is either chromosomal DNA, or some replicative epi-chromosomal plasmid or transposon. Genetic material is necessarily part of some 'material genome', and both are necessarily part of some cell or virion. So a genomic library is not considered a material genome/genetic material - rather, we could say that this genomic library is a 'genomic material sample' that bears the concretization of some genome. 2. Genomic material need not be inherited from an immediate ancestor cell or organism (e.g. a replicative plasmid or transposon acquired through some experimental modification), but such cases must be capable of being inherited by progeny cells or organisms. genomic material A material entity that represents all genetic material in a cell or virion. The material genome is typically molecular aggregate of all the chromosomal DNA and epi-chromosomal DNA that represents all sequences that are heritable by progeny of a cell or virion. physical genome A genome is the collection of all nucleic acids in a cell or virus, representing all of an organism's hereditary information. It is typically DNA, but many viruses have RNA genomes. The genome includes both nuclear chromosomes (ie nuclear and micronucleus chromosomes) and cytoplasmic chromosomes stored in various organelles (e.g. mitochondrial or chloroplast chromosomes), and can in addition contain non-chromosomal elements such as replicative viruses, plasmids, and transposable elements. Note that at present, a material genome and genetic material are necessarily part of some cell or virion. So a genomic library is not considered a material genome/genetic material - rather, we could say that this genomic library is a 'genomic material sample' that bears the concretization of some SO:genome. material genome a population of homo sapiens grouped together in virtue of their sharing some commonality (either an inherent attribute or an externally assigned role) Consider http://semanticscience.org/resource/SIO_001062 ! human population ("A human population refers to a collection of human beings"). homo sapiens population human population A maximal collection of organisms of a single species that have been bred or experimentally manipulated with the goal of being genetically identical. organism strain or breed Two mice colonies with the same genotype information, but maintained in different labs, are different strains (many examples of this in MGI/IMSR) strain or breed A group comprised of organisms from a single taxonomic group (e.g. family, order, genus, species, or a strain or breed within a given taxon) taxonomic group mus musculus strain danio rerio strain sequence attribute that can inhere only in a collection of more than one sequence features obsolete sequence feature collection attribute true A quality inhering in a collection of discontinuous sequence features in a single genome that reside on the same macromolecule (eg the same chromosomes). in cis A quality inhering in a collection of discontinuous sequence features in a single genome that reside on different macromolecules (e.g. different chromosomes). in trans An allelic state that describes the degree of similarity of features at a particular location in the genome (i.e. whether the alleles or haplotypes are the same or different). allelic state derived from https://en.wikipedia.org/wiki/Zygosity http://semanticscience.org/resource/SIO_001263 zygosity hemizygous heterozygous homozygous indeterminite zygosity no-call zygosity unknown zygosity unspecified zygosity indeterminite zygosity MGI uses this term when zygosity is not known. no-call zygosity (this is how the GVF10 format/standard refers to loci without enough data to make an accurate call . . . see http://www.sequenceontology.org/resources/gvf.html#quick_gvf_examples) The disposition of an entity to be transmitted to subsequent generations following a genetic replication or organismal reproduction event. We can use these terms to describe the heritability of genetic matieral or sequence features - e.g. chromosomal DNA or genes are heritable in that they are passed on to child cells/organisms). Such genetic material has a heritable disposition in a cell or virion, in virtue of its being replicated in its cellular host and inherited by progeny cells (such that the sequence content it encodes is stably propagated in the genetic material of subsequence generations of cells). We can also use these terms to describe the heritability of phenotypes/conditions - e.g. the passage of a particular trait or disease across generations of reproducing cells/organisms. heritabililty heritable non-heritable The pattern in which a genetic trait or condition is passed from one generation to the next, as determined by genetic interactions between alleles of the causal gene, and interactions between these alleles and the environment. The subtypes of inheritance pattern in this hierarchy are largely distinguished based on the underlying genetic mechanism, which will manifest in a characteristic pattern of traits in affected and unaffected family members. For example, 'autosomal dominant inheritance' defines an inheritance pattern that is caused by the interaction of alleles on non-sex chromosomes wherein the trait manifests even in heterozygotes - resulting in a characteristic pattern of 'dominant' inheritance across generations of individuals in a family. mode of inheritance phenotypic inheritance pattern http://purl.obolibrary.org/obo/HP_0000005 http://purl.obolibrary.org/obo/NCIT_C45827 An inheritance pattern results from the disposition of a genetic variant to cause a particular trait or phenotype when it is present in a particular genetic and environmental context. Here, "genetic context" refers to the allelic state of the variant, which depends on what other alleles exist at the same location/locus in the genome. Zygosities such as heterozygous and homozygous are simple, common examples of 'states' of an allele. These genetic and environmental "interactions" of alleles play out at the level of the gene products produced by the causal alleles, and are observable in the pattern with which the trait caused by an allele is inherited across generations of individuals. Thus, an inheritance pattern such as dominance is not inherent to a single allele or its phenotype, but rather a result of the relationship between two alleles of a gene and the phenotype that results in a given environment. This also means that the 'dominance' of an allele is context dependent - Allele 1 can be dominant over Allele 2 in the context of Phenotype X, but recessive to Allele 3 in the context of Phenotype Y. inheritance pattern disposition inhering in a genetic locus variant that is realized in its inheritance by some offspring such that at least a partial variant-associated phenotype is apparent in heterozygotes Triage until decide if want to define this as grouping class that would result in multiple-inheritance. obsolete dominant inheritance true An autosomal dominant inheritance pattern wherein a heterozygous individual simultaneously expresses the distinct traits associated with each allele in the heterozygous locus. co-dominant autosomal inheritance An autosomal dominant inheritance pattern wherein the trait associated with one allele completely masks the trait associated with a different allele found at that locus. pure dominant inheritance complete autosomal dominant inheritance An autosomal dominant inheritance pattern wherein the trait expressed in a heterozygous individual is intermediate between the trait expressed in individuals homozygous for either allele in the heterozygous locus. intermediate dominant autosomal inheritance semi-dominant autosomal inheritance incomplete autosomal dominant inheritance An X-linked inheritance pattern wherein the trait manifests in heterozygotes. http://purl.obolibrary.org/obo/HP_0001423 X-linked dominant inheritance An inheritance pattern wherein a trait caused by alleles of an autosomal gene manifests in heterozygotes. vertical inheritance http://purl.obolibrary.org/obo/HP_0000006 autosomal dominant inheritance An inheritance pattern wherein a trait caused by alleles of an autosomal gene manifests in homozygous but not heterozygote individuals. autosomal recessive inheritance An X-linked inheritance pattern wherein a trait caused by alleles of a gene on the X-chromosome manifests in homozygous but not heterozygote individuals. http://purl.obolibrary.org/obo/HP_0001419 X-linked recessive inheritance duplicate term, use GENO:0000148 obsolete autosomal recessive inheritance true An attribute inhering in a feature that is designated to serve as a standard against which 'variant' versions of the same location are compared. Being 'reference' is a role or status assigned in the context of a data set or analysis framework. A given allele can be reference on one context and variant in another. reference unspecified life cycle stage obsolete genetic insertion technique true obsolete mutagen treatment technique true obsolete targeted gene mutation technique true obsolete random genetic insertion technique true obsolete targeted genetic insertion technique true obsolete enhancer trapping technique true obsolete gene trapping technique true obsolete promoter trapping technique true obsolete targeted knock-in technique true obsolete random transgene insertion technique true A single locus complement that represents the collection of all chromosome sequences for a given chromosome in a single genome obsolete chromosome complement true A complete chromosome that has been abnormally duplicated in a genome, typically as the result of a meiotic non-disjunction event or unbalanced translocation duplicate chromosome This 'gained' chromosome is conceptually an 'insertion' in a genome that received two copies of a chromosome in a cell division following a non-disjunction event. As such, it qualifies as a type of sequence_alteration, and as a 'extra' chromosome. gained aneusomic chromosome 0 A 'deletion' resulting from the loss of a complete chromosome, typically as the result of a meiotic non-disjunction event or unbalanced translocation. This 'lost' chromosome is conceptually a 'deletion' in a genome that received zero copies of a chromosome in a cell division following a non-disjunction event. As such, it qualifies as a type of sequence_alteration. But it doesn't classify under SO:deletion because this class is defined as "the point at which one or more contiguous nucleotides were excised". absent aneusomic chromosome lost aneusomic chromosome A large deletion or terminal addition of part of some non-homologous chromsosome, as the result of an unbalanced translocation. Novel sequence features gained in a genome are considered to be sequence alterations, including aneusomic chromosome segments gained through unbalanced translocation events, entire aneusomic chromosomes gained through a non-disjunction event during replication, or extrachromosomal replicons that becoome part of the heritable genome of a cell or organism. aneuploid chromosomal segment aneusomic chromosomal subregion/segment partial aneusomic chromosomal element Aneusomic chromosomal parts are examples of "partial aneuploidy" as described in http://en.wikipedia.org/wiki/Aneuploidy: "The terms "partial monosomy" and "partial trisomy" are used to describe an imbalance of genetic material caused by loss or gain of part of a chromosome. In particular, these terms would be used in the situation of an unbalanced translocation, where an individual carries a derivative chromosome formed through the breakage and fusion of two different chromosomes. In this situation, the individual would have three copies of part of one chromosome (two normal copies and the portion that exists on the derivative chromosome) and only one copy of part of the other chromosome involved in the derivative chromosome." aneusomic chromosomal part A part of some non-homologous chromosome that has been gained as the result of an unbalanced translocation event. duplicate partial aneuploid chromosomal element translocated duplicate chromosomal element translocated duplicate chromosomal segment Such additions of translocated chromosomal parts confer a trisomic condition to the duplicated region of the chromsome, and are thus considered to be 'variant single locus complements' in virtue of an abnormal number of features at a particular genomic location, rather than abnormal sequence within the location. gained aneusomic chromosomal segment 0 A deletion of a terminal portion of a chromosome resulting from an unbalanced translocation to another chromosome. In our model, we consider this chromosomal region to be monosomic, and thus a variant single locus complement dropped partial anneuploid chromosomal element translocated absent chromosomal segment truncated chromosome terminus This is not a deletion in the sense defined by the Sequence Ontology in that it is not the result of an 'excision' of nucleotides, but an unbalanced translocation event. The allelic complement that results is comprised of the terminus or junction represented by this lost chromosomal segment, and the remaining normal segment in the homologous chromosome. The lost aneusommic chromosomal segment is typically accommpanied by a gained aneusomic chromosomal segment from another chromosome. Loss of translocated chromosomal parts can confer a monosomic condition to a region of the chromsome. This results in a 'variant single locus complement' - in virtue of an abnormal number of features at a particular locus, rather than abnormal sequence within the locus. lost aneusomic chromosomal segment A complete chromosome that has been abnormally duplicated, or the absense of a chromosome that has been lost, typically as the result of a non-disjunction event or unbalanced translocation complete aneusomic chromosome Large sequence features gained in a genome are considered to be sequence alterations (akin to insertions), including aneusomic chromosome segments gained through unbalanced translocation events, entrie aneusomic chromosomes gained through a non-disjunction event during replication, or extrachromosomal replicons that become part of the heritable gneme of a cell or organism. Similarly, large sequence features lost from genome are akin to deletions and therefore also considered sequence alterations. This includes the loss of chromosomal segments through unbalanced translocation events, and the loss of entire chromosomes through a non-disjunction event during replication. aneusomic chromosome Stub class to serve as root of hierarchy for imports of biological processes from GO-BP. biological process disomic zygosity aneusomic zygosity trisomic homozygous trisomic heterozygous A heterozygous quality inhering in a single locus complement comprised of two different varaint alleles and no wild type locus. (e.g.fgf8a<ti282a>/fgf8a<x15>) trans-heterozygous compound heterozygous A sequence feature that references some biological macromolecule applied as a reagent in an experiment or technique (e.g. a morpholino expression plasmid, or oligonucleotide probe) replaced with SO:engineered_region extra-genomic sequence obsolete reagent sequence feature true a heterozygous quality inhering in a single locus complement comprised of one variant allele and one wild-type/reference allele (e.g.fgf8a<ti282a/+>) simple heterozygous A structurally or functionally defined component of a transgene (e.g. a promoter, a region coding for a fluorescent protein tag, etc) transgene part An attribute inhering in a sequence feature that varies from some designated reference in virtue of alterations in its sequence or expression level variant An attribute inhereing in a sequence feature for which there is more than one version fixed in a population at some significant percentage (typically 1% or greater), where the locus is not considered to be either reference or a variant. polymorphic An attribute inhering in a feature bearing a sequence alteration that is present at very low levels in a given population (typically less than 1%), or that has been experimentally generated to alter the feature with respect to some reference sequence. mutant A sequence feature (continuous extent of biological sequence) that is of genomic origin (i.e. carries sequence from the genome of a cell or organism) This class was created largely as a modeling convenience to support organizing data for schema definitions. We may consider obsoleting it if it ends up causing confusion or complicating classification of terms in the ontology. 1. A feature being 'of genomic origin' here means only that its sequence has been located to the genome of some organism by alignment with some reference genome. This is because the sequence was originally identified in, or artificially created to replicate, sequence from an organism's genome. 2. The location of a genomic feature is defined by start and end coordinates based on alignment with a reference genome. Genomic features can span any size from a complete chromosome, to a chromosomal band or region, to a gene, to a single base pair or even junction between base pairs (this would be a sequence feature with an extent of zero). 3. As sequence features, instances of genomic features are identified by both their inherent *sequence* and their *position* in a genome - as determined by an alignment with some reference sequence. Accordingly, the 'ATG' start codon in the coding DNA sequence of the human AKT gene and the 'ATG' start codon in the human SHH gene represent two distinct genomic features despite having he same sequence, in virtue of their different positions in the genome. genomic feature A nucleic acid molecule that contains one or more sequences serving as a template for gene expression in a biological system (ie a cell or virion). This class is different from genomic material in that genomic material is necessarily heritable, while genetic material includes genomic material, as well as any additional nucleic acids that participate in gene expression resulting in a cellular or organismal phenotype. So things like transiently transfected expression constructs would qualify as 'genetic material but not 'genomic material'. Things like siRNAs and morpholinos affect gene expression indirectly, (ie are not templates for gene expression), and therefore do not qualify as genetic material. genetic material An allele that is variant with respect to some wild-type allele, in virtue of its being very rare in a population (typically <1%), or being an experimentally-induced alteration that derives from a wild-type feature in a given strain. Based on use of 'mutant' as described in PMID: 25741868 ACMG Guidelines Not required for any specific use case at this point so removed for simplicity. Formely asserted as allele and inferred as varaint allele. Eq class definition: allele and (mutation or ('has subsequence' some mutation)) 'Mutant' is typically contrasted with 'wild-type', where 'mutant' indicates a natural but very rare allele in a population (typically <1%), or an experimentally-induced variation that derives from a wild-type background locus for a given strain, which can be selected for in establishing a mutant line. obsolete mutant allele true A sequence alteration that is very rare allele in a population (typically <1%), or an experimentally-induced variation that derives from a wild-type feature in a given strain. mutation A genetic feature that is not part of the chromosomal genome of a cell or virion, but rather a stable and heritable element that is replilcated and passed on to progeny (e.g. a replicative plasmid or transposon) Consider replacing with SO_0001038 ! extrachromosomal_mobile_genetic_element episomal replicon Extrachromosomal replicons are replicated and passed on to descendents, and thus part of the heritable genome of a cell or organism. In cases where the presence of such a replicon is novel or aberrant (i.e. not included in the reference for that genome), the replicon is considered a 'sequence alteration'. extrachromosomal replicon expression construct feature expression construct An allele that is fixed in a population at some stable level, typically > 1%. Polymorphic alleles reside at loci where more than one version exists at some signifcant frequency in a population. PMID: 25741868 ACMG Guidelines Polymorphic alleles are contrasted with mutant alleles (extremely rare variants that exist in <1% of a population), and 'wild-type alleles' (extremenly common variants present in >99% of a population). Polymorphic alleles exist in equilibrium in a given population somewhere between these two extremes (i.e. >1% and <99%). polymorphic allele A polymorphic allele that is present at the highest frequency relative to other polymorphic variants at the same genomic location. major allele major polymorphic allele A polymorphic allele that is not present at the highest frequency among all fixed variants at the locus (i.e. not the major polymorphic allele at a given location). minor allele minor polymorphic allele A polymorphic allele that is determined from the sequence of a recent ancestor in a phylogentic tree. ancestral allele ancestral polymorphic allele An allele representing a highly common varaint (typically >99% in a population), that typically exhibits canonical function, and against which rare and/or non-functional mutant alleles are often compared. wild-type allele 'Wild-type' is typically contrasted with 'mutant', where 'wild-type' indicates a highly prevalent allele in a population (typically >99%), and/or some prototypical allele in a background genome that serves as a basis for some experimental alteration to generate a mutant allele, which can be selected for in establishing a mutant strain. The notion of wild-type alleles is more common in model organism databases, where specific mutations are generated against a wild-type reference feature. Wild-type alleles are typically but not always used as reference alleles in sequence comparison/analysis applications. More than one wild-type sequence can exist for a given feature, but typically only one allele is deemed wild-type iin the context of a single dataset or analysis. wild-type allele wild-type gene allele A gene allele representing the most common varaint in a population (typically >99% frequency), that exhibits canonical function, and against which rare and/or non-functional mutant gene alleles are compared in characterizing the phenotypic consequences of genetic variation. wild-type gene A gene altered in its expression level in the context of some experiment as a result of being targeted by gene-knockdown reagent(s) such as a morpholino or RNAi. The identity of a given instance of a reagent-targeted gene is dependent on the experimental context of its knock-down - specifically what reagent was used and at what level. For example, the wild-type shha zebrafish gene targeted in epxeriment 1 by morpholino1 annd in experiment 2 by morpholino 2 represent two distinct instances of a 'reagent-targeted gene', despite sharing the same sequence and position. reagent targeted gene A transgene that is delivered as part of a DNA expression construct into a cell or organism in order to transiently express a specified product (i.e. it has not integrated into the host genome). experimentally-expressed transgene extrinsic transgene transiently-expressed transgene An allele attribute describing a highly common variant (typically >99% in a population), that typically exhibits canonical function, and against which rare and/or non-functional mutant alleles are compared. wild-type One of a set of sequence features known to exist at a particular genomic location. A landsacpe review found mostly gene-centric definitions of 'allele' that represented a particular version of a gene, or variation within a gene sequence [1][2][3][4][5][6][6a]. But we also found 'allele' used to refer to other types and extents of variation - including single nucleotide polymorphisms, repeat regions, and copy number variations [7][8][9][10][11], where such variations don't neccessarily impact a gene. To be maximally accommodating of how this term is used across research communities, GENO defines 'allele' broadly and allow alleles can span any locus or extent of sequence. While 'alleles' encountered in public datases typically overlap a gene, many do not. But GENO does define the 'gene allele' class as a subtype of 'allele' to refers more specifically to a specifc version of an entire gene. [1] https://isogg.org/wiki/Allele (retrieved 2018-03-17) [2] http://semanticscience.org/resource/allele (retrieved 2018-03-17) [3] https://en.wikipedia.org/wiki/Allele (retrieved 2018-03-17) [4] https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/allele (retrieved 2018-03-17) [5] http://purl.obolibrary.org/obo/SO_0001023 (retrieved 2018-03-17) [6] http://purl.obolibrary.org/obo/NCIT_C16277 (retrieved 2018-03-17) [6a] https://www.ncbi.nlm.nih.gov/mesh/68000483 [7] https://www.snpedia.com/index.php/Allele (retrieved 2018-03-17) [8] https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism (retrieved 2018-03-17) [9] http://purl.obolibrary.org/obo/OGI_0000008 (retrieved 2018-03-17) [10] http://purl.obolibrary.org/obo/OBI_0001352 (retrieved 2018-03-17) [11] http://purl.phyloviz.net/ontology/typon#Allele (retrieved 2018-03-17) variable feature An allele is a seqeunce feature at a genomic location where variation occurs (i.e. where >1 different sequence is known to exist). An allele can span only the extent of sequence known to vary (e.g. a single base SNP, or short insertion), or it can span a larger extent that includes one or more variable features as proper parts (e.g. a 'gene allele' that spans the extent of an entire gene which contains several sequence alterations). Alleles can carry 'reference' or 'variant' sequence - depending on whether the its 'state' matches that considered to be the reference at that location. Alleles whose state differs from the reference are called 'variant alleles', and those that match the reference are called 'reference alleles'. What is considered the 'reference' state at a particular location may vary, depending on the context/goal of a particular analysis. A 'sequence alteration' is a 'variant allele' that varies along its entire extent (i.e every position varies from that of some defined reference sequence). allele a sequence attribute of a chromosome or chromosomal region that has been abnormally duplicated or lost, as the result of a non-disjunction event or unbalanced translocation. aneusomic An allele of a gene that contains some sequence alteration. A gene allele is 'variant' in virtue of its containing a sequence alteration that varies from some reference gene standard. But note that a gene allele that is variant in one context/dataset can be considered a reference in another context/dataset. variant gene allele The set of both shha gene alleles in a diiploid zebrafish genome, e.g. fgf8a<ti282a/+>. The collection of the individual base-pairs present at the position 24126737 in both copies of chromosome 5 in a diploid human genome. A set representing the complement of all sequence features occupying a particular genomic location across all homologous chromosomes in the genome of a single organism. TO DO: show a VCF representation of this example. Consider making 'allelic complement' the primary label. allelic complement homologous allele complement single locus feature complement A 'complement' refers to an exhaustive collection of *all* objects that make up some well-defined set. Such a complement may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features. Here, a 'single locus complement' is the set of all alleles at a specified location in a particular genome. This complement is typically a pair of two features in a diploid genome (with two copies of each chromosome). E.g. a gene pair, a QTL pair, a nucleotide pair for a SNP, or a pair of entire chromosomes. The fact that we are counting how many copies of the same *sequence* exist in a genome, as opposed to how many of the same *feature*, is what sets feature-level concepts like 'single locus complement'. apart from sequence-level concepts like 'copy number complement'. To illustrate the difference, consider a duplication event that creates a new copy of the human APOE gene on a different chromosome. This creates an entirely new sequence feature at a distinct locus from that of the original APOE gene. The 'copy number complement' for sequence defined by the APOE gene locus would have a count of three, as this sequence is present three times in the genome. But the 'single locus complement' at the APOE gene locus would still have a count of two - because the duplicated copy is at a different location in the genome, and therefore does not represent a copy of the APOE locus. single locus complement In an experiment where shha is targeted by MO1 and shhb is overexpressed from a transgenic expression construct, the extrinsic genotype captures the altered expression status of these two genes. A notation for representing such a genotype might describe this scenario as: shha<MO1-1ng/ul>; shhb<pFLAG-mmusShhb> This notation parallels those used for more traditional 'intrinsic' genotypes, where the affected gene is presented with its alteration in angled brackets < >. In the extrinsic genotype shown here, the variation in shha is affected by a specific concentration of an shha-targeting morpholino (instead of a mutation in the shha gene). And the variation in shhb is affected by its overexpression from a pFLAG Shhb expression construct. A specification of the known state of gene expression across a genome, and how it varies from some baseline/reference state. We acknowledge that this is not a 'genotype' in the traditional sense, but this terminological choice highlights similarities that play out in parallel modeling of intrinsic and extrinsic genotype partonomies, and parallel syntactic formats for labeling instances of these genotypes. Our rationale here is that what we care about from perspective of G2P associations is identifying genomic features that impact phenotype - where experimental approaches include permanent introduction of intrinsic modifications to genomic sequence, and transient introduction of extrinsic factors that modify expression of specific genes. As the former is described by the traditional notion of a genotype, it seems a rational leap to consider the latter akin to an 'extrinsic genotype' wherein the alterations are externally applied rather than inherent to the genome. Finally, there is some precedent to thinking about such extrinsic modifications in terms of a genotype, in the EFO:0000513 ! genotype: "The total sum of the genetic information of an organism that is known and relevant to the experiment being performed, including chromosomal, plasmid, viral or other genetic material which has been introduced into the organism either prior to or during the experiment." experimental genotype expression genotype An extrinsic genotype describes variation in the 'expression level' of genes in a cell or organism, as mediated by transient, gene-specific experimental interventions such as RNAi, morpholinos, TALENS CRISPR, or construct overexpression. This concept is relevant primarily for model organisms and systems that are subjected to such interventions to determine how altered expression of specific genes may impact organismal or cellular phenotypes in the context of a particular experiment. The 'extrinsic genotype' concept is contrasted with the more familiar notion of an 'intrinsic genotype', describing variation in the inherent genomic sequence (i.e. 'allelic state'). In G2P research, interventions affecting both genomic sequence and gene expression are commonly applied in order to assess the impact specific genomic features can have on phenotype and disease. It is in this context that we chose to model 'extrinsic' alterations in expression as genotypes - to support parallel conceptualization and representation of these different types of genetic variation that inform the discovery of G2P associations. extrinsic genotype A genotype that describes the total intrinsic and extrinsic variation across a genome at the time of a phenotypic assessment (where 'intrinsic' refers to variation in genomic sequence, as mediated by sequence alterations, and 'extrinsic' refers to variation in gene expression, as mediated through transient gene-specific interventions such as gene knockdown reagents or overexpression constructs). Closest concept/definition we could find for this concept was for EFO:0000513 ! genotype: "The total sum of the genetic information of an organism that is known and relevant to the experiment being performed, including chromosomal, plasmid, viral or other genetic material which has been introduced into the organism either prior to or during the experiment." An effective genotype is meant to summarize all factors related to genes and their expression that influence an observed phenotype - including 'intrinsic' alterations in genomic sequence, and gene-specific 'extrinsic' alterations in expression transiently introduced at the time of the phenotypic assessment. effective genotype A set comprised of *all* reagent-targeted genes in a single genome in the context of a given experiment (e.g. the zebrafish shha and shhb genes in a zebrafish exposed to morpholinos targeting both of these genes). A 'complement' refers to an exhaustive collection of *all* objects that make up some well-defined set. Such a complement may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features. For example, a 'reagent-targeted gene complement' is the set of all genes in a particular genome that are targeted by reagents in the context of a particular experiment. reagent-targeted gene complement The set of all transgenes trransiently expressed in a biological system in the context of a given experiment. experimental transgene complement transiently-expressed transgene complement Consider wild-type zebrafish shha gene in the context of being targeted by morpholino1 vs morpholino 2 in separate experiments. These shha genes share identical sequence and position, but represent distinct instances of a 'expression-variant genes' because of their different external context. This is important because these qualified features could have distinct phenotypes associated with them (just as two different sequence variants of the same gene can have potentially different associated phenotypes). A gene altered in its expression level relative to some baseline of normal expression in the system under investigation (e.g. a cell line or model organism). See SO classes under 'silenced gene' (e.g. 'gene silenced by RNA interference'). These seem to represent the concept of a qualified feature as I define it here, in that they are defined by alterations extrinsic to the sequence and position of the gene itself. expression allele Expression-variant genes are altered in their expression level through some modification or intervention external to its sequence and position. These may include endogenous mechanisms (e.g. direct epigentic modification that impact expression level, or altered regulatory networks controlling gene expression), or experimental interventions (e.g. targeting by a gene-knockdown reagent, or being transiently expressed as part of a transgenic construct in a host cell or organism). The identity of a given instance of a experssion-variant gene is dependent on how its level of expression is manipulated in a biological system (i.e. via targeting by gene-knockdown reagents, or being transiently overexpressed). So expression-variant genes have the additional identity criteria of a genetic context of its material bearer (external to its sequence and position) that impacts its level of expression in a biological system. expression-variant gene gene targeting reagent sequence targeting reagent gene knockdown reagent A region within a gene that is specifically targeted by a gene knockdown reagent, typically in virtue of bearing sequence complementary to the reagent. targeted gene segment reagent-targeted gene subregion A specification of the genetic state of an organism, whether complete (defined over the whole genome) or incomplete (defined over a subset of the genome). Genotypes typically describe this genetic state as a diff between some variant component and a canonical reference. As information artifacts, genotypes specify the state of a genome be defining a diff between some canonical reference and a variant or alternate sequence that replaces the corresponding portion of the reference. We can consider a genotype then as a collection of these reference and variant features, along with some rule for operating on them and resolve a final single sequence. This is valid ontologically because we commit only to sequence features being GDCs - which allows for their concretization in either biological or informational patterns. Accordingly, a particular gene allele, such as shh<tbx292>, can be part of a genome in a biological sense and part of a genotype in an informational sense. This idea underpins the 'genotype partonomy' at the core of the GENO model that decomposes a complete genotype into its more fundamental parts, including alleles and allele complements, as described in the comment above. Core definition above adapted from the GA4GH VMC data model definition here: https://docs.google.com/document/d/12E8WbQlvfZWk5NrxwLytmympPby6vsv60RxCeD5wc1E/edit#heading=h.4e32jj4jtmsl (retrieved 2018-04-09). Note however that the VMC genotype concept likely is not intended to cover 'effective' and 'extrinsic' genotype concepts defined in GENO. 1. Scope of 'Genetic State': 'Genetic state' is considered quite broadly in GENO to describe two general kinds of 'states'. First, is traditional notion of 'allelic state' - defined as the complement of alleles present at a particular location or locations in a genome (i.e. across all homologous chromosomes containing this location). Here, a genotype can describe allelic state at a specific locus in a genome (an 'allelic genotype'), or describe the allelic state across the entire genome ('genomic genotype'). Second, this concept can also describe states of genomic features 'extrinsic' to their intrinsic sequence, such as the expression status of a gene as a result of being specifically targeted by experimental interventions such as RNAi, morpholinos, or CRISPRs. 2. Genotype Subtypes: In GENO, we use the term 'intrinsic' for genotypes describing variation in genomic sequence, and 'extrinsic' for genotypes describing variation in gene expression (e.g. resulting from the targeted experimental knock-down or over-expression of endogenous genes). We use the term 'effective genotype' to describe the total intrinsic and extrinsic variation in a cell or organism at the time a phenotypic assessment is performed. Two more precise conccepts are subsumed by the notion of an 'intrinsic genotype': (1) 'allelic genotypes', which specify allelic state at a single genomic location; and (2) 'genomic genotypes', which specify allelic state across an entire genome. In both cases, allelic state is typically specified in terms of a differential between a reference and a set of 1 or more known variant features. 3. The Genotype Partonomy: 'Genomic genotypes' describing sequence variation across an entire genome are 'decomposed' in GENO into a partonomy of more granular levels of variation. These levels are defined to be meaningful to biologists in their attempts to relate genetic variation to phenotypic features. They include 'genomic variation complement' (GVC), 'variant single locus complement' (VSLC), 'allele', 'haplotype', 'sequence alteration', and 'genomic background' classes. For example, the components of the zebrafish genotype "fgf8a<ti282a/ti282a>; fgf3<t24149/+>[AB]", described at zfin.org/ZDB-FISH-150901-9362, include the following elements: - GVC: fgf8a<ti282a/ti282a>; fgf3<t24149/+> (total intrinsic variation in the genome) - Genomic Background: AB (the reference against which the GVC is variant) - VSLC1: fgf8a<ti282a/ti282a> (homozygous complement of gene alleles at one known variant locus) - VSLC2: fgf3<t24149/+> (heterozygous complement of gene alleles at another known variant locus) - Allele 1: fgf8a<ti282a> (variant version of the fgf8a gene, present in two copies) - Allele 2: fgf3<t24149> (variant version of the fgf3 gene, present in one copy) - Allele 3: fgf3<+> (wild-type version of the fgf3 gene, present in one copy) - Sequence Alteration1: <ti282a> (the specific mutation within the fgf8a gene that makes it variant) - Sequence Alteration2: <t24149> (the specific mutation within the fgf3 gene that makes it variant) A graphical representation of this decomposition that maps each element to a visual depiction of the portion of a genome it denotes can be found here: https://github.com/monarch-initiative/GENO-ontology/blob/develop/README.md One reason that explicit representation of these levels is important is because it is at these levels that phenotypic features are annotated to genetic variations in different clinical and model organism databases For example, ZFIN typically annotates phenotypes to effective genotypes, MGI to intrinsic genotypes, Wormbase to variant alleles, and ClinVar to haplotypes and sequence alterations. The ability to decompose a genotype into representations at these levels allows us to "propagate phenotypes" up or down the partonomy (e.g. infer associations of phenotypes annotated to a genotype to its more granular levels of variation and the gene(s) affected). This helps to supporting integrated analysis of G2P data. genotype ZFIN do not annotate with a pre-composed phenotype ontology - all annotations compose phenotypes on-the-fly using a combination of PATO, ZFA, GO and other ontologies. So while there is no manually curated zebrafish phenotype ontology, the Upheno pipeline generates one automatically here: http://purl.obolibrary.org/obo/upheno/zp.owl This ontology does not have a root 'phenotype' class, however, and so we generate our own in GENO as a stub placeholder for import of needed zebrafish phenotype classes. zebrafish phenotype an allelic state where a single allele exists at a particular location in the organellar genome (mitochondrial or plastid) of a cell/organism. homoplasmic an allelic state where more than one type of allele exists at a particular location in the organellar genome (mitochondrial or plastid) of a cell/organism. heteroplasmic hemizygous X-linked hemizygous Y-linked hemizygous insertion-linked A genomic genotype that specifies the baseline sequence of a genome from which a variant genome is derived (through the introduction of sequence alterations). Being a 'genomic background' implies that a variant genotype was derived from this background (which is the case for most model organism database genotypes/strains). This is a subtly different notion than being a 'reference genotype' , which can be any genotype that serves as a basis for comparison. But in a sense all background genotypes are by default reference genotypes, in that the derived variant genotype is compared against it. background genotype genomic background The descriptor 1p22.3 = chromosome 1, short arm, region 2, band 2, sub-band 3. This is read as "one q two-two point three", not "one q twenty-two point three". An extended part of a chromosome representing a term of convenience in order to hierarchically organize morphologically defined chromosome features: chromosome > arm > region > band > sub-band. New term request for SO. http://ghr.nlm.nih.gov/handbook/howgeneswork/genelocation and http://people.rit.edu/rhrsbi/GeneticsPages/Handouts/ChromosomeNomenclature.pdf, both of which define the nomenclature for the banding hierarchy we use here: chromosome > arm > region > band > sub-band Note that an alternate nomenclature for this hierarchy is here (http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/chrombanding.html): chromosome > arm > band > sub-band > sub-sub-band chromosomal region The descriptor 1p22.3 = chromosome 1, short arm, region 2, band 2, sub-band 3. This is read as "one q two-two point three", not "one q twenty-two point three". http://ghr.nlm.nih.gov/handbook/howgeneswork/genelocation and http://people.rit.edu/rhrsbi/GeneticsPages/Handouts/ChromosomeNomenclature.pdf, both of which define the nomenclature for the banding hierarchy we use here: chromosome > arm > region > band > sub-band Note that an alternate nomenclature for this hierarchy is here (http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/chrombanding.html): chromosome > arm > band > sub-band > sub-sub-band chromosome sub-band chromosomal band brightness chromosomal band intensity gpos gneg gvar gpos100 gpos75 gpos50 gpos25 A chromosome arm that is the shorter of the two arms of a given chromosome. p-arm stalk short chromosome arm A chromosome arm that is the longer of the two arms of a given chromosome. q-arm long chromosome arm gpos66 gpos33 A transgene part whose sequence regulates the synthesis of a functional product, but which is not itself transcribed. regulatory transgene region A transgene part whose sequence is expressed in a gene product through transcription and/or translation. coding transgene feature expressed transgene region reporter region A transgene whose product is used as a selectable marker. selectable marker transgene A genotype that describes what is known about variation in a genome at a gross structural level, in terms of the number and appearance of chromosomes in the nucleus of a eukaryotic cell. Derived from http://en.wikipedia.org/wiki/Karyotype (accessed 2017-03-28) Karyotypes describe structural variation across a genome at the level of chromosomal morphology and banding patterns detectable in stained chromosomal spreads. This coarser level does not capture more granular levels of variation commonly represented in other forms of genotypes (e.g. specific alleles and sequence alterations). A base karyotype representing a genome with no known structural variation can be as simple as '46XY', but karyotypes typically contains some gross variant component (such as a chromosome duplication or translocation). karyotype A genomic genotype where the genomic background specifies a male or female sex chromosome complement. This modeling approach enables creation separate genotype instances for data sources that report sex-specific phenotypes to ensure that sex-specific G2P differences are accurately described. These sex specific genotypes can be linked to the broader intrinsic genotype that is shared by male and female mice of the same strain, to aggregate associated phenotypes at this level, and allow aggregation with G2P association data about the same strains from sources that distinguish sex-specific phenotypes (e.g. IMPC) and those that do not (e.g. MGI). In the genotype partonomy, a sex qualified genotype has as part a sex-agnostic genotype. This allows for the propagation of phenotypes associated with a sex-qualified genotype to the intrinsic genotype. Ontologically, this parthood is based on the fact that the background component of a sex-qualified genotype specifies the sex chromosomes while that of the sex-agnostic genotype does not. Thus, the sequence content of the sex-qualified genotype is a superset of that of the intrinsic genotype, with the latter being a proper part of the former. intrinsic genotype (sex-specific) sex-qualified genotype sex-qualified intrinsic genotype We distinguish the notion of a sex-agnostic intrinsic genotype, which does not specify whether the portion of the genome defining organismal sex is male or female, from the notion of a sex-qualified intrinsic genotype, which does. Male and female mice that contain the same background and genetic variation complement will have the same 'sex-agnostic intrinsic genotype', despite their genomes varying in their sex-chromosome complement. By contrast, these two mice would have different 'sex-qualified intrinsic genotypes', as this class takes background sex chromosome sequences into account in the identity criteria for its instances. Conceptually, a sex-qualified phenotype represents a superset of sequence features relative to a sex-agnostic intirnsic genotype, in that if specifies the background sex-chromosome complement of the genome. genomic genotype (sex-qualified) A genomic genotype here the genomic background specifies a male sex chromosome complement. male intrinsic genotype A genomic genotype here the genomic background specifies a female sex chromosome complement. female intrinsic genotype A background genotype whose sequence or identity is not known or specified. unspecified background genotype unspecified genomic background 1. The set of all alleles at a particular location in a genome (a 'single locus complement') - e.g. {APOE-epsilon2 / APOE-epsilon4} at the APOE locus 2. The set of all alleles that comprise a haplotype - e.g. the SNPs {rs7412-T, rs429358-T} in the APOEɛ2 allele. 3. The set of all chromosomes in a genome - e.g. {human Chr1, 2, 3, . . . 22, X, Y} A set of sequence features. 'Sets' are used to represent entities that are typically collections of more than one member. But we allow for sets that contain 0 members (an 'empty' set) or 1 member (a 'singleton' or 'unit' set), consistent with the concept of 'mathematical sets'. Sets may also include duplicates (i.e. contain more than one member representing the same feature). The notion of a 'complement' is a special case of a set, where the members necessarily comprise an exhaustive collection of all objects that make up some well-defined set. It is useful for defining many biologically-relevant sets of sequence features. For example, a 'haplotype' is the set of all genetically-linked alleles on a single chromosomal strand at a defined location - e.g. the SNP alleles {rs7412-C, rs429358-C} comprise the haplotype defining the APOEɛ4 gene allele [1]. And a 'single locus complement' is the set of all alleles at a specified location in a particular genome - e.g. the APOEɛ4 and APOEɛ4 gene alleles ([1], [2]) that make up the 'Gs270' APOE genotype [3]. [1] https://www.snpedia.com/index.php/APOE-%CE%B54 [2] https://www.snpedia.com/index.php/APOE-%CE%B52 [3] https://www.snpedia.com/index.php/Gs270 sequence feature set A set of genomic features (i.e. sequence features that are of genomic origin). In some cases there may be zero or only one member of such a complement, which is why this class is not defened to necessarily have some 'genomic feature' as a member. genomic locus complement A genomic feature is any located sequence feature in the genome, from a single nucleotide to a gene into an entire chromosome. 'Sets' are used to represent entities that are typically collections of more than one member - e.g. the set of chromosomes that make up the human genome. But we allow for sets that contain 0 members (an 'empty' set) or 1 member (a 'singleton' or 'unit' set), consistent with the concept of 'mathematical sets'. For example, a 'single locus complement' at an X-linked locus in a XY male will consist of only one allele, as there is only one X-chromosome in the genome. Note also that sets may contain duplicates (i.e. more than one member representing the same feature). For example, a homozygous 'single locus complement' is a set comprised of two of the same feature. The notion of a 'genomic feature set' differs from that of a 'genomic sequence set' in that we are counting how many copies of the same *sequence feature* exist in a genome, as opposed to how many of the same *sequence*. 'Genomic feature sets are useful for representing things like 'single locus complements', where members are sequence features whose identity is dependent on their location. By contrast, 'genomic sequence sets' are useful for describing things like 'copy number complements', which are concerned only with how many copies of a sequence exist in a genome, regardless of the location where these reside. genomic feature set A genomic feature that is part of a gene, and delineated by some functional or structural function or role it serves (e.g.a promoter element, coding region, etc). defined gene part SO:0000831 (gene member region) gene part A transgene that codes for a product used as a reporter of gene expression or activity. reporter transgene A junction between bases, a deletion variant, a terminus at the end of a chromosome. A genomic feature that has an extent of zero. Former logical def: 'genomic feature' and (has_extent value 0) obsolete null feature true An extrachromosomal replicon that is variant in a genome in virtue of its being a novel addition to the genome - i.e. it is not present in the reference for the genome in which it is found. aberrant extrachromosomal replicon exogenous extrachromosomal replicon transgenic extrachromosomal replicon Extrachromosomal replicons are replicated and passed on to descendents, and thus part of the heritable genome of a cell or organism. In cases where the presence of such a replicon is exogenous or aberrant (i.e. not included in the reference for that genome), the replicon is considered a 'sequence alteration'. novel extrachromosomal replicon A genomic feature that represents an entirely new replicon in the genome, e.g. an extrachromosomal replicon or an extra copy of a chromosome. This class is defined so as to support classification of things like novel extrachromosomal replicons and aneusomic chromosomes as being variant alleles in a genome. These represent entirely new features in the genome - not variants of an existing feature. Novel replicons are considered as an 'insertion' in a genome, and as such, qualify as types of sequence_alterations and variant alleles. There is no pre-existing locus that it modifies, however, and thus it is not really an 'allele of' a named locus. But conceptually, we still consider these to represent genetic variants and classify them as variant alleles. novel replicon An attribute of a genomic feature that represents a feature not previously found in a given genome, e.g. an extrachromosomal replicon or aneusomic third copy of a chromosome. novel A sequence feature representing the end of a sequence that is bounded only on one side (e.g. at the end of an chromosome or oligonucleotide). terminus A sequence feature or a set of such features. sequence feature or collection GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria. 1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence. 2. 'Sequence feature' identity is dependent on its sequence and the genomic position if the sequence (aligns with definition of 'sequence feature' in the Sequence Ontology). 3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical context of the genetic material bearing the feature, extrinsic to its sequence and its genomic position. For example, its being targeted by gene knockdown reagents, its being transgenically expressed in a foreign cell from a recombinant expression construct, its having been epigenetically modified in a way that alters its expression level or pattern, or its being located in a specific cellular or anatomical location. sequence feature or set A linear ordering of units representing monomers of a biological macromolecule (e.g. nucleotides in DNA and RNA, amino acids in polypeptides). GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria. 1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence. 2. 'Sequence feature' identity is dependent on its sequence and the genomic location of the sequence (this is consistent with the definition of 'sequence feature' in the Sequence Ontology). 3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical context of the genetic material in which the feature is concretized. This third criteria is extrinsic to its sequence and its genomic location. For example, the feature's physical concretization being targeted by a gene knockdown reagent in a cell (e.g. the zebrafish Shha gene as targeted by the morpholino 'Shha-MO1'), or its being transiently expressed from a recombinant expression construct (e.g. the human SHH gene as expressed in a mouse Shh knock-out cell line), or its having been epigenetically modified in a way that alters its expression level or pattern (e.g. the human SHH gene with a specific methylation pattern). biomacromolecular sequence state VMC:State 'Sequences' differ from 'sequence features' in that instances are distinguished only by their inherent ordering of units, and not by any positional aspect related to alignment with some reference sequence. Accordingly, the 'ATG' translational start codon of the human AKT gene is the same *sequence* as the 'ATG' start codon of the human SHH gene, but these represent two distinct *sequence features* in virtue of their different positions in the genome. biological sequence true state In the VMC model, the notion of a GENO:biological sequence is called the 'state' of an allele. A sequence feature (or collection of features) whose identity is dependent on the context or state of its material bearer (in addition to its sequence an position). This context/state describes factors external to its inherent sequence and position that can influences its expression, such as being targeted by gene-knockdown reagents, or an epigenetic modification. qualified sequence feature or collection Consider wild-type zebrafish shha gene in the context of being targeted by morpholino MO-1 vs morpholino MO-2 in separate experiments. These shha genes share identical sequence and position, but represent distinct instances of a 'qualified sequence feature' because of their different external contexts. This is important because these qualified features could have distinct phenotypes associated with them (just as two different sequence variants (alleles) of the same gene can have potentially different associated phenotypes). A qualified sequence feature that carries sequence derived from the genome of a cell or organism. qualified genomic feature true This axiom is an initial attempt to formalize the identity criteria of an extrinnsic context that separates qualified sequence features from sequence features (i.e. the context of its material bearer). As we further develop our efforts here this will get refined and more precise. true Formalizes one identity criteria of the sequence feature component of a qualified sequence feature (which itself is identified by its sequence and its genomic position). A set of qualified sequence features that carry genomic sequence. Because there are cases there may be zero or only one member of such a set, this class is not asserted to necessarily have some 'qualified genomic feature' as a member. A 'complement' refers to an exhaustive collection of all objects that make up some well-defined set. This notion is useful for defining biologically-relevant sets of sequence features. For example, a haplotype is defined as the set of all genetically-linked alleles on a single chromosomal strand at a defined location - e.g. the SNP alleles {rs7412-C, rs429358-C} comprise the haplotype defining the APOEɛ4 gene allele. A complements may contain 0, 1, or more than one members. For example, the complement of alleles at a defined locus across homologous chromosomes in an individual's genome will consist of two members for autosomal locations, and one member for non-homologous locations on the X and Y chromosome. qualified genomic feature set A genotype that describes the total variation in heritable genomic sequence of a cell or organism, typically in terms of alterations from some reference or background genotype. Genotype vs Genome in GENO: An (intrinsic) genotype is an information artifact representing an indirect syntax for specifying a genome sequence. This syntax has reference and variant components - a 'background genotype' and 'genomic variation complement' - that must be operated on to resolve a specifie genome sequence. Specifically, the genome sequence is resolved by substituting all sequences specified by the 'genomic variation complement' for the corresponding sequences in the 'reference genome'. So, while the total sequence content represented in a genotype may be greater than that in a genome, the intended resolution of these sequences is to arrive at a single genome sequence. It is this end-point that we consider when holding that a genotype 'specifies' a genome. 1. A genomic genotype is a short-hand specification of a genome that uses a representational syntax comprised of information about a reference genome ('genomic background'), and all specific variants from this reference (the 'genomic variation complement'). Conceptually, this variant genome sequence can be resolved by substituting all sequences specified by the 'genomic variation complement' for the corresponding sequences in the reference 'genomic background' sequence. 2. 'Heritable' genomic sequence is that which is passed on to subsequent generations of cells/organisms, and includes all chromosomal sequences, the mitochondrial genome, and any transmissable extrachromosomal replicons. intrinsic genotype DNA sequence RNA sequence amino acid sequence obsolete biological sequence or collection true obsolete biological sequence collection true A sequence feature whose identity is additionally dependent on the cellular or anatomical location of the genetic material bearing the feature. As a qualified sequence feature, the BRCA1c.5096G>A variant as materialized in a somatic breast epithelial cell could be distinguished as a separate entity from a BRCA1c.5096G>A variant in a different cell type or location (e.g. germline BRCA1 varaint in a sperm cell). location-qualified sequence feature A sequence feature whose identity is additionally dependent on factors specifically influencing its level of expression in the context of a biological system (e.g. being targeted by gene-knockdown reagents, or driven from exogneous expression system like recombinant construct) expression-qualified sequence feature A sequence feature position based on a genomic coordinate system, where the position specifies start and end coordinates based on its alignment with some reference genomic sequence. This 'genomic position' concept differs from the faldo:Position concecpt in that the former describes the start AND end points/coordinates of a feature, while the latter describes a single point/coordinate at the beginning OR end of a feature. genomic coordinates remodeling notion of sequence feature position around the idea of a 'genomic locus' obsolete genomic position true phenotypic inheritance process A sequence attribute inhering in a feature whose identity is not specified. obsolete unspecified true An attribute describing a type of variation inhering in a sequence feature or collection. allele attribute variation attribute An intrinsic genotype that specifies variation from a defined reference genome. variant genomic genotype An information entity that is intented to represent some biological sequence, sequence feature, qualified sequence feature, or a collection of one or more of these entities. eliminating classes that are not necessary or add uneeded complexity. obsolete sequence information entity true 1 biological sequence residue monomeric residue biological sequence unit deoxyribonucleic acid residue DNA residue ribonucleic acid residue RNA residue amino acid residue An attribute, quality, or state of a sequence feature or collection. http://purl.obolibrary.org/obo/SO_0000400 Sequence feature attributes can be 'intrinsic' - reflecting feature-level characteristics that depend only on the sequence, location, or genomic context of a feature or collection, or 'extrinsic' - reflecting characteristics of the physical molecule in which the feature is concretized (e.g. its cellular context, source of origin, physical appearance, etc.). Intrinsic attributes include things like allelic state, allelic phase. Extrinsic attributes include things like its cellular distribution and chromosomal band intensity. sequence feature attribute The location of a sequence feature as defined by its start and end position on some reference coordinate system. 1. A sequence feature location is defined by its begin and end coordinates on a reference sequence, but it is not identified by a particular sequence that may reside there. The same location, as defined on a particular reference, may be occupied by different sequences in the genome of organism 1 vs that of organism 2 (e.g. if a SNV exists within this location in only one of the organisms). 2. The notion of a sequence feature location in the realm of biological sequences is analogous to a BFO:spatiotemporal region in the realm of physical entities. A spatiotemporal region can be 'occupied by' physical objects, while a genomic location is 'occupied by' sequence features. Just as a spatiotemporal region is distinct from an object that occupies it, so too a genomic location is distinct from a sequence feature that occupies it. As a more concrete example, consider the distinction between a street address and the building that occupies it as analogous to the relationship between a genomic locus and the sequence feature that resides there. sequence feature location A sequence feature whose identity is additionally dependent on a chemical modification made to the genetic material bearing the feature (e.g. binding of transcriptional regulators, or epigenetic modifications including direct DNA methylation, or modification of histones associated with a feature) modification-qualified sequence feature 1. The zebrafish "fgf8a<ti282a>/fgf8a<+>" allelic genotype describes the combination of gene alleles present at a specific gene locus (the fgf8a locus - which here has a heterozygous state). 2. The human allelic genotypes in the VCF records describes below describe the set of SNPs present at specific positions on Chromosome 20 in the human genome. The first record describes a heterozygouse C/T allelic genotype at Chr20:2300608, and the second describes a homozygous G/G allelic genotype at Chr20:2301308. ##fileformat=VCFv4.2 ##FORMAT=<ID=GT, Description="Genotype, 0=REF, 1=ALT"> #CHROM POS REF ALT FILTER FORMAT SAMP001 20 2300608 C T PASS GT 0/1 20 2301308 T G PASS GT 1/1 (derived from https://faculty.washington.edu/browning/beagle/intro-to-vcf.html) 3. Some allelic genotype formats encode the genotype as a single string - e.g. "GRCh38 Chr12:258635(A;T)" describes a heterozygous A/T allelic genotype of SNPs present at a specific position 258635 on human chromosome 12. A genotype that specifies the 'allelic state' at a particular location in the genome - i.e. the set of alleles present at this locus across all homologous chromosomes. single locus genotype An 'allelic genotype' describes the set of alleles present at a particular location in the genome. This use of the term 'genotype' reflects its use in clinical genetics where variation has historically been assessed at a specific locus, and a genotype describes the allelic state at that particular location. This contrasts to the use of the term 'genotype in model orgnaism communities where it commonly describes the allelic state at all loci in a genome known to vary from an established reference or background. allelic genotype Exploratory class looking at creating more specific subtypes of associatiosn, and defining identity criteria for each. genotype-phenotype association true true true true knockdown reagent targeted gene complement A sequence alteration within the coding sequence of a gene. Not required at this poitn, so marked exploratory and obsoleted. Asserted under sequence_alteration. obsolete coding sequence alteration true A construct that contains a mobile P-element, holding sequences to be delivered to a target cell or genome. P-element construct An engineered region that is used to transfer foreign genetic material into a host cell. engineered_genetic_vector Constructs can be engineered to carry inserts of DNA from external sources, for purposes of cloning and propagation or gene expression in host cells. Constructs are typically packaged as part of delivery systems such as plasmids or viral vectors. engineered genetic construct A transgene that is not chromosomally integrated in the host genome, but instead exists as part of an extra-chromosomal construct. non-integrated transgene extra-chromosomal transgene A collection of more than one sequence feature. http://purl.obolibrary.org/obo/SO_0001260 ! sequence_collection obsolete sequence feature collection true A set of discrete, genetically-linked sequence alterations that reside on the same chromosomal strand and are typically co-inherited within a haplotype block. Consider if we dont want to define this as a 'complement', as it implies a complet set of memebrs of a defined type. But many haplotypes will be incomplete, due to lack of knowledge of other variation bound by the haplotype block. Instead, we can create an 'allele set' class as the haplotype parent? Informed by https://isogg.org/wiki/Haplotype and https://en.wikipedia.org/wiki/Haplotype. A haplotype is a set of non-overlapping alleles that reside in close proximity on the same DNA strand. We model them as 'complements' because they include all known/relevant alleles within a defined region in the genome (e.g. a 'gene', or a 'haplotype block') - where this set may consist of 0, 1, or more alterations from some reference. Because they are genetically linked, the alleles comprising a haplotype are likely to be co-inherited and survive descent across many generations of reproduction. As highlighted in https://en.wikipedia.org/wiki/Haplotype, the term 'haplotype' is most commonly used to describe the following scenarios of genetic linkage between 'alleles': 1. The 'alleles' comprising the haplotype are 'single nucleotide polymorphisms' (SNPs) or other small alterations, which collectively tend to occur together on a chromosomal strand). This use of 'haplotype' is commonly seen in phasing of patient WGS or WES data, to describe a state where two or more alterations that are believed to occur 'in cis' on the same chromosomal strand. 2. The 'alleles' comprising the haplotype are SNPs or other short alterations, which collectively define a specific version of a gene. In this case, the locaiton bounding the haplotype corresponds to a gene locus, and the haplotype defines a specific allele of that gene (i.e 'gene allele'). "Star alleles" of PGx genes are examples of this category of haplotype (e.g. https://www.ebi.ac.uk/cgi-bin/ipd/imgt/hla/get_allele_hgvs.cgi?A*33:01:01, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4724253/). 3. Each of the 'alleles' comprising the haplotype is itself a 'gene allele' (i.e. a specific version of an entire gene), such that the haolotype contains multiple complete 'gene alleles' that are co-inherited because they reside in tightly linked clusters on a single chromosome. Each of these more specific definition serves a purpose for a particular type of genetic analysis or use case. The GENO definition of 'haplotype' is broadly inclusive of these and any other scenarios where distinct 'alleles' of any kind on the same chromosomal strand are genetically linked, and thus tend to be co-inherited across successive generations. haplotype A set of genomic sequences (a biological sequence that is of genomic origin). copy number complement A 'genomic *sequence* set' differs from a 'genomic *feature* set' in that we are counting how many copies of the same *sequence* exist in a genome, as opposed to how many of the same *sequence feature*. 'Genomic sequence sets' are useful for describing things like 'copy number complements', which are concerned only with how many copies of a sequence exist in a genome, regardless of the location where these reside. By contrast, 'genomic feature sets are useful for representing things like 'single locus complements', where members are sequence features whose identity is dependent on their location. genomic sequence set A relation used to describe an environment contextualizing the identity of an entity. microsatellite alteration A relation used to describe a process contextualizing the identity of an entity. repeat region alteration A quality inhering in an 'allelic complement' (aka a 'single locus complement') that describes the allelic variability found at a particular locus in the genome of a single cell/organism allelic state allelic dosage an attribute inhering in a feature based on the total number or relative stoichiometry of functional copies present in a particular genome. gene dosage Remodeled this concept as a 'genetic dosage complement' - a sequence-level class, as opposed to a sequence feature attribute. Genetic dosage reflects how many 'functional' copies of a sequence are present in a genome. In diploid organisms, the normal dosage is 2 for autosomal genes/regions. Dosage increases if there is a duplication of the gene/region. Dosage decreases if there is either a deletion of a gene/region, or an inactivating mutation that eliminates gene function. This sets it apart from the notion of 'copy number', which reflects how many actual copies of a sequence exist in a genome. Addition of a non-functional allele of a gene will increase its copy number, but not increase its dosage. Duplications of a sequence can occur at new locations in the genome, such that the resulting sequence represents a distinct sequence feature from the copy at its native locus. For example, duplication of a region containing the human APOE gene on a different chromosome creates a sequence feature that shares sequence from the original gene, but not location, and therefore represents a different sequence feature. The notions of dosage and copy number are therefore concerned with sequence-level entities (how many copies of a 'sequence' exist), as opposed to sequence feature-level entities. The notion of a single-locus complement would be used to describe how many of a particular features are present in a genome - and describe which alleles of this feature are found. obsolete genetic dosage true A quality inhering in an allele that describes its genetic origin (how it came to be part of a cell's genome), i.e. whether it occurred de novo through some spontaneous mutation event, or was inherited from a parent. genetic origin variant origin allele origin Describes an allele that is inherited from a female parent in virtue of the allele being present in the mother's egg. maternally inherited maternal allele origin Describes an allele that is inherited from a male parent in virtue of the allele being present in the father's sperm. paternally inherited paternal allele origin Describes an allele that originated through a mutation event in a germ cell of one of the parents, or in the fertilized egg itself during early embryogenesis. De novo alleles are* heritable* but *not inherited*. Derived from https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/de-novo-mutation and https://ghr.nlm.nih.gov/primer/mutationsanddisorders/genemutation We distinguish germline, somatic, and de novo allele origin based on a combination two key criteria - whether the allele *inherited* from a parent, and whether it is *heritble' by offspring. De novo variants are *heritable* but not *inherited* - as they are not observed constitutively in either parent, but can be passed to offspring in virtue of their being present in the individual's germ cells. By contrast, germline variants are both inherited (passed down from a parent) and heritable (passable down to offspring), and somatic variants are neither inherited or heritable - having originated via a spontaneous mutation in a non-germ cell. De novo variants appear for the first time in one family member. They often explain genetic disorders in which an affected child has a mutation in every cell in the body but the parents do not, and there is no family history of the disorder. de novo allele origin Describes an allele whose origin is not known. unknown allele origin Describes an allele that result from some spontaneous mutation event in a somatic cell after fertilization, and thus are not present in every cell in the body. acquired Derived from https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/somatic-variant and https://ghr.nlm.nih.gov/primer/mutationsanddisorders/genemutation We distinguish germline, somatic, and de novo allele origin based on a combination two key criteria - whether the allele *inherited* from a parent, and whether it is *heritble' by offspring. Somatic variants are neither inherited or heritable - having originated via a spontaneous mutation in a non-germ cell. By contrast, germline variants are both inherited (passed down from a parent) and heritable (passable down to offspring). De novo mutations are not inherited but are typically heritable, as they originated through a spontaneous mutation that made them present in germ cells. These acquired mutations are called 'somatic' because they typically affect somatic (non-germ) cells. But when spontaneous do mutations occur in the germ cells of an organism, these can be passed on to offspring in whom they will be considered de novo mutations. somatic allele origin a quality inhering in a feature in virtue of its presence only in the genome of gametes (germ cells). germ-line replaced by GENO:0000900 ! 'germline' obsolete gametic true 2 2 An allelic genotype specifying the set of two alleles present at a particular location in a diploid genome (i.e., a diploid 'single locus complement') Alt: A sequence feature complement comprised of two haplotypes at a particular location on paired homologous chromosomes in a diploid genome. "Humans are diploid organisms; they have paired homologous chromosomes in their somatic cells, which contain two copies of each gene. An allele is one member of a pair of genes occupying a specific spot on a chromosome (called locus). Two alleles at the same locus on homologous chromosomes make up the individual’s genotype. A haplotype (a contraction of the term ‘haploid genotype’) is a combination of alleles at multiple loci that are transmitted together on the same chromosome. Haplotype may refer to as few as two loci or to an entire chromosome depending on the number of recombination events that have occurred between a given set of loci. Genewise haplotypes are established with markers within a gene; familywise haplotypes are established with markers within members of a gene family; and regionwise haplotypes are established within different genes in a region at the same chromosome. Finally, a diplotype is a matched pair of haplotypes on homologous chromosomes." From https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4118015/ https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4118015/figure/sap-26-03-165-g002/ diplotype A quality inhering in a collection of discontinuous sequence features in a single genome in virtue of their relative position on the same or separate chromosomes. allelic phase oryzias latipes strain Describes an allele that is inherited from a parent in virtue of the allele being present in the germline of one of the parents. hereditary parental origin parentally inherited Derived from https://www.cancer.gov/publications/dictionaries/genetics-dictionary/def/germline-variant and https://ghr.nlm.nih.gov/primer/mutationsanddisorders/genemutation We distinguish germline, somatic, and de novo allele origin based on a combination two key criteria - whether the allele *inherited* from a parent, and whether it is *heritble' by offspring. Germline variants are both *inherited* (present constitutively in a parent and passed down to offspring) and *heritable* (passable down to future offspring). By contrast, somatic variants are neither inherited or heritable - having originated via a spontaneous mutation in a non-germ cell. Traits caused by de novo mutations in germ cells are not inherited but are typically heritable, as they originated through a spontaneous mutation that made them present a germ cells. germline allele origin An inheritance pattern that is not determined or not known. unknown inheritance undetermined inheritance The canonical allele that represents a single nucleotide variation in the BRCA2 gene, which can be described by various contextual alleles such as “NC_000013.11:g.32319070T>A” and “NG_012772.3:g.8591T>A”. One of a set of sequence features or haplotypes that exist at a particular genetic locus. <see ClinGen Allele Model> The notion of a 'canonical allele' is taken from the ClinGen Allele model (http://dataexchange.clinicalgenome.org/allele/). It is implemented in GENO to provide an ontological representation of this concept that will support data integration efforts, but may be replaced by should an IRI become available from the ClinGen model. http://dataexchange.clinicalgenome.org/allele/resource/canonical_allele/ No longer needed by ClinGen for their interpretation models, and will likely be replaced in ClinGen and elsewhere by VMC/GA4GH modeling constructs. ClinGen Allele Model (http://dataexchange.clinicalgenome.org/allele/) As a 'sequence feature or collection' (sensu SO), a 'canonical allele' is considered here as an extent of biological sequence encoded in nucleic acid molecules of a cell or organism (as opposed to an information artifact that is about such a sequence). Canonical alleles can include haplotypes that contain more than one discontinuous sequence alteration that exist in cis on the same chromosomal strand. In the ClinGen allele model, 'canonical alleles are contrasted with 'contextual alleles'. Contextual alleles are informational representation that describe a canonical allele using a particular reference sequence. A single canonical allele can be described by many contextual alleles that each use a different reference sequence in their representation (e.g. different chromosomal or transcript references) obsolete canonical allele true An informational artifact that describes a canonical allele by defining its sequence and position relative to a particular reference sequence. The notion of a 'contextual allele' is taken from the ClinGen Allele model (http://dataexchange.clinicalgenome.org/allele/). It is implemented in GENO to provide an ontological representation of this concept that will support data integration efforts, but may be replaced by should an IRI become available from the ClinGen model. http://dataexchange.clinicalgenome.org/allele/resource/contextual_allele/ No longer needed by ClinGen for their interpretation models, and will likely be replaced in ClinGen and elsewhere by VMC/GA4GH modeling constructs. Former axiom: denotes some 'obsolete_canonical allele' ClinGen Allele Model (http://dataexchange.clinicalgenome.org/allele/) The notion of a 'contextual allele' derives from the ClinGen Allele model. Here, each genetic allele in a patient corresponds to a single 'canonical allele', which in turn may aggregate any number of 'contextual allele' representations that are may be defined against different reference sequences. Accordingly, many contextual alleles can describe a single canonical allele. For example, the contextual alleles “NC_000013.11:g.32319070T>A” and “NG_012772.3:g.8591T>A” both describe the same underlying canonical allele, a single nucleotide variation, in the BRCA2 gene. obsolete contextual allele true A mitochondrial inheritance pattern whereby manifestation of a trait is observed when some inherited mitochondria contian the causative allele and some do not. heteroplasmic mitochondrial inheritance A mitochondrial inheritance pattern whereby manifestation of a trait occurs when only mitochondria containing the causative allele are inherited. homoplasmic mitochondrial inheritance true An generically dependent continuant that carries biological sequence that is part of or derived from a genome. An abstract/organizational class to support data modeling, that includes genomic features, genomic feature complements, qualified genomic features and their complements, as well as genotypes that denote such entities. genomic entity A sequence feature representing a region of the genome over which there is little evidence for historical recombination, such that sequence alterations it contains are typically co-inherited across generations. Consider whether we might better model a 'haplotype block' at the level of a sequence location, rather than a sequence region - e.g. as "A genomic location over which there is little evidence for historical recombination, such that sequence alterations it contains are typically co-inherited across generations." Look at how teh concept is used in research, and if people think of each version of sequence in a haplotype block to be an instance. I think we would just call these versions 'alleles', and then could define haplotype block as a location. Current definition is based on http://purl.obolibrary.org/obo/SO_0000355 ! haplotype_block (def = A region of the genome which is co-inherited as the result of the lack of historic recombination within it). If we stick with a region-level treatment, consdier if as a defined region of genomic sequence where variation is known to occur, a haplotype block should be classified as a subtype of allele. Informed by http://purl.obolibrary.org/obo/SO_0000355 ! haplotype_block, and DOI: 10.1126/science.1069424. A particular haplotype block is defined by the set of sequence alterations it is known to contain, which collectively represent a 'haplotype'. The boundaries of haplotype blocks are defined in efforts to identify haplotypes that exist in organisms or populations. A haplotype block may span any number of sequence alterations, and may cover small or large chromosomal regions - depending on the number of recombination events that have occurred between the alterations defining the haplotype. haplotype block A genotype that describes the total variation in heritable genomic sequence of a cell or organism, typically in terms of alterations from some reference or background genotype. 'Genomic Genotype' vs 'Genome' in GENO: A genomic genotype is an information artifact with a representational syntax that can specify what is known about the complete sequence of a genome. This syntax describes 'reference' and 'variant' components - namely a 'background genotype' and 'genomic variation complement' - that must be operated on to resolve the genome sequence. Specifically, the genome sequence is determined by substituting all sequences specified by the 'genomic variation complement' for the corresponding sequences in the reference 'background genotype'. So, while the total sequence content described in a genotype may exceed that of a single a genome (in that it includes a reference genome and variatoin complement), the intended resolution of these sequences is to arrive at a single genome sequence. It is this end-point that we consider when asserting that a genotype 'specifies' a genome. complete genotype 1. A genomic genotype is a short-hand specification of a genome that uses a representational syntax comprised of information about a reference genome ('genomic background'), and all specific variants from this reference (the 'genomic variation complement'). Conceptually, this variant genome sequence can be resolved by substituting all sequences specified by the 'genomic variation complement' for the corresponding sequences in the reference 'genomic background' sequence. 2. 'Heritable' genomic sequence is that which is passed on to subsequent generations of cells/organisms, and includes all chromosomal sequences, the mitochondrial genome, and any transmissable extrachromosomal replicons. genomic genotype A quality inhering in a particular allele in virtue of its presence only in a particular type of cell in an organism (e.g. somatic vs germ cells) decided this attribute is not needed, and moved its child 'germline' and 'somatic' concepts under allele origin Cellular context of an allele is typically defined in the context of evaluating an individual organism, as alleles that are somatic in one organism can be germline in others. obsolete allele cellular context true The location of a sequence feature in a genome, defined by its start and end position on some reference genomic coordinate system In GENO, the notion of a Genomic Location (aka Genomic Locus) plays the same role as that of a FALDO:Region in the design pattern for describing the location of a feature of interest. We define this specific GENO class because the ontological nature of FALDO:Region class is not clear in the context of the BFO and SO-based GENO model. We will work to resolve these questions and ideally converge these concepts in the future. We don't link a Genomic Location to a specific reference sequence because in the FALDO model (which GENO adopts with the exception of swapping GENO:Genomic Locus for FALDO:Region), allows the start and end positions of a region to be defined on separate reference sequences. So while a given Location is conceptually associated with a single reference, in practice it can be pragmatic to define start and stop on different references sequences. In practice, GENO advocates describing biology at the level of genomic features - i.e. define specific terms for genes as genomic features, and not duplicate representation of the loci where each gene resides. So we might define a class representing the human Shh gene as a 'genomic feature', but not parallel this with a 'human Shh gene locus' class. The utility of the 'genomic locus' class in the ontology is primarily to be clear about the distinction, but we would only use it in modeling data if absolutely needed. For example, we would define an 'HLA gene block' as a subclass of 'genomic feature', and assert that HLA-A, HLA-B, and HLA-C genes are part/subsequences of this HLA gene block (as opposed to modeling this as an 'HLA locus' and asserting that the HLA-A, HLA-B, and HLA-C genes occupy this locus). genomic location genomic locus VMC:Location 1. A genomic location (aka locus) is defined by its begin and end coordinates on a reference genome, independent of a particular sequence that may reside there. In GENO, we say that a genomic location is occupied_by a 'sequence feature' - where the identity of this feature depends on both it sequence, and its location in the genome (i.e. the locus it occupies). For example, the 'ATG' sequence beginning the ORF of the human SHH gene shares the *same sequence* as the 'ATG' beginning the ORF of the human AKT gene. But these are *distinct sequence features* because they occupy different genomic locations. 2. A given genomic location (e.g. the human SHH gene locus) may be occupied by different alleles (e.g. different alleles of the SHH gene). Within the genome of a single diploid organism, there is potential for two alleles to exist at such a locus (i.e. two different versions of the SHH gene). And across genomes of all members of a species, many more alleles of the SHH gene may exist and occupy this same locus. 3. The notion of a genomic location in the realm of biological sequences is analogous to a BFO:spatiotemporal region in the realm of physical entities. A spatiotemporal region can be occupied_by physical objects, while a genomic location is occupied_by sequence features. Just as a spatiotemporal region is distinct from an object that occupies it, so too a genomic locus is distinct from a sequence feature that occupies it. As a more concrete example, consider the distinction between a street address and the building that occupies it as analogous to the relationship between a genomic location and the feature that resides there. genomic feature location true A material entity that is an organism, derived from an organism, or composed of organisms (e.g. a cell line, biosample, tissue culture, population, etc). useful organizational term to collect entities that have genomes/genotypes. organismal entity The molecular product resulting from transcription of a single gene (either a protein or RNA molecule) gene product obsolete reporter role reporter obsolete selectable marker role selectable marker selectable marker region A genome whose sequence is identical to that of a genome sequence considered to be the reference. reference genome A haplotype is an allele that represents one of many possible versions of a 'haplotype block', which defines a region of genomic sequence that is typically 'co-inherited' across generations due to a lack of historically observed recombination within it. Former comment: "Each of these more specific definition serves a purpose for a particular type of genetic analysis or use case - e.g. 'SNP allele' haplotypes are identified and analysed in studies to uncover the genetic basis of common disease by efforts like the International HapMap Project." Informed by https://isogg.org/wiki/Haplotype and https://en.wikipedia.org/wiki/Haplotype and http://purl.obolibrary.org/obo/SO_0001024 ! haplotype. Decided to represent haplotypes as collections of discrete alleles, rather than continuous features defined by such sets. Former SC axioms: - is_allele_of some 'haplotype block' - 'has part' some sequence_alteration 1. The relationship between 'haplotype' and 'haplotype block' is analogous to the relationship between 'gene allele' and 'gene': a 'gene allele' is one of many possible instances of a 'gene', while a 'haplotype' is one of many possible instances of a 'haplotype block'. In this sense, a gene allele can be considered to be a haplotype whose extent is that of a gene (as it is generally true that there is a low probability of recombination within any given gene). 2. Haplotypes typically contain more than one 'genetically-linked' loci where sequence alterations are known to exist, such that a set of alterations will be co-inherited together across many generations of reproduction. A common use of 'haplotype' is in phasing of patient WGS or WES data, where this term refers to sequence containing two or more alterations that are beleived to occur 'in cis' on the same chromosomal strand. GENO's definition is consistent with but more inclusive than this view, allowing for haplotypes with one or zero established alterations as long as there is a low probability of recombination within the region it spans (such that alterations found in cis are likely to remain in cis across successive generations). As a result, GENO considers any allele that spans an extent greater than that of a single sequence alteration to be a haplotype - as long as there is an expectation of low recombination frequency within the haplotype block occupied by the allele. For example, a 'gene allele' is a haplotype representing a particular version of a gene that contains one or more sequence alterations - as a 'gene' is a region of sequence with a low probability of recombination that is generally expeted to be inherited as a unit. 3. As highlighted in https://en.wikipedia.org/wiki/Haplotype, the term 'haplotype' is most commonly used to describe the following scenarios of genetic linkage between 'alleles': a. The first is regions containing multiple linked 'gene alleles' - i.e. specific versions of entire genes that are co-inherited because they reside in tightly linked clusters on a single chromosome. b. The second is a region containing multiple linked single nucleotide polymorphisms (SNPs) that tend to occur together on a chromosomal strand (i.e. be statistically associated). This use of 'haplotype' is commonly seen in phasing of patient WGS or WES data, to describe a state where two or more alterations that are believed to occur 'in cis' on the same chromosomal strand. c. A third, which is related to the previous case, occurs when the extent of region containing linked SNPs is that of a single gene. In this case, the haplotype represents a 'gene allele' - a version of an entire gene defined by the set of sequence alterations it contains. We may consider this a haplotype as most genes are small enough that there is little chance of recombination events moving cis alterations onto separate chromosomes. The GENO definition of 'haplotype' is broadly inclusive of these and any other scenarios where distinct 'alleles' of any kind on the same chromosomal strand are genetically linked, and thus tend to be co-inherited across successive generations. obsolete haplotype true A sequence feature representing a region of the genome over which there is little evidence for historical recombination, such that sequences it contain are typically co-inherited/transmitted across generations. Derived from DOI: 10.1126/science.1069424 and http://purl.obolibrary.org/obo/SO_0000355 ! haplotype_block. Decided to represent haplotypes as collections of discrete alleles, rather than continuous features defined by such sets. A haplotype block is a class of genomic sequence defined by a lack of evidence for historical recombination, such that sequence alterations within it tend to be co-inherited across successive generations. A haplotype is considered to be one of many possible versions of a 'haplotype block' - defined by the set of co-inherited alterations it contains. In this sense, the relationship between 'haplotype' and 'haplotype block' is analogous to the relationship between 'gene allele' and 'gene'* - a 'gene allele' is one of many possible instances of a 'gene', while a 'haplotype' is one of many possible instances of a 'haplotype block'. The boundaries of haplotype blocks are defined in efforts to identify haplotypes that exist in organisms or populations. A haplotype block may span any number of sequence alterations, and may cover small or large chromosomal regions - depending on the number of recombination events that have occurred between the alterations defining the haplotype. ----------------------- * One difference however is that gene instnaces are necessarily 'functional' - so non-functional alleles of a gene locus wont qualify as gene instances. no such requirement exists for haplotype block instnaces. obsolete haplotype block true An allelic state that describes the number of different alleles of a gene from an organellar genome (i.e. mitochondrial, plastid) that may exist in a cell. Cells with a population of organelles from a single origin that all share the same organellar genome will contain only one allele of each organellar gene, while cells with populations of organelles of different origins may contain more than one allele of a given organellar gene. organellar plasmy Consider wild-type zebrafish shha gene in the context of being targeted by morpholino MO-1 vs morpholino MO-2 in separate experiments. These shha genes share identical sequence and position, but represent distinct instances of a 'qualified sequence feature' because of their different external contexts. This is important because these qualified features could have distinct phenotypes associated with them (just as two different sequence variants (alleles) of the same gene can have potentially different associated phenotypes). A sequence feature whose identity is additionally dependent on the context or state of the material sequence molecule in which the feature is concretized. This context/state describes factors external to the feature's intrinsic sequence and position that can influences its expression, such as being targeted by gene-knockdown reagents, or an epigenetic modification. GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria. 1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence. 2. 'Sequence feature' identity is dependent on its sequence and the genomic location of the sequence (this is consistent with the definition of 'sequence feature' in the Sequence Ontology). 3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical state or context of the genetic material in which the feature is concretized. This third criteria is extrinsic to its sequence and its genomic location. For example, the feature's physical concretization being targeted by a gene knockdown reagent in a cell (e.g. the zebrafish Shha gene as targeted by the morpholino 'Shha-MO1'), or its being transiently expressed from a recombinant expression construct (e.g. the human SHH gene as expressed in a mouse Shh knock-out cell line), or its having been epigenetically modified in a way that alters its expression level or pattern (e.g. the human SHH gene with a specific methylation pattern). Modeling sequence entities at this 'qualified' level is useful for distinguishing cases where features with identical sequence and position as separate instances - based on their material bearers being found in different contexts. For example, consider a situation where the zebrafish shha gene (a sequence feature) is targeted in two experimental groups of fish by two different morpholinos, and phenotypes are assessed for each. We want to be able to represent two 'variants' of the shha gene in this scenario as separate 'qualified sequence feature' instances so we can capture data about the phenotypes resulting from each - just as we would separately represent to different sequence variants (alleles) of the shha gene at the sequence feature level so that we can track their associated phenotypes. qualified sequence feature A set of qualified seqeunce features. 'Sets' are used to represent entities that are typically collections of more than one member. But we allow for sets that contain 0 members (an 'empty' set) or 1 member (a 'singleton' or 'unit' set), consistent with the concept of 'matehmatical sets'. qualified sequence feature set A biolocical sequence, or set of such sequences. biological sequence or collection biological sequence or set A set of biological sequences. 'Sets' are used to represent entities that are typically collections of more than one member. But we allow for sets that contain 0 members (an 'empty' set) or 1 member (a 'singleton' or 'unit' set), consistent with the concept of 'mathematical sets'. A set may also include multiple copies of the same sequence. For example, in a 'copy number complement', members are all copies of this same biological sequence. biological sequence set A set of all features representing *functional* versions of a specified sequence (typically that of a gene) in a particular genome. Formerly considered modeling this as an informational entity, defined as "An information entity that describes the total number of functional copies of a gene or region of sequence in a particular genome." functional feature complement genetic dosage Decided to implement copy number related classes at the sequence level, rather than the sequence feature level. Replaced by GENO:0000963. As for copy number complements, the defining 'sequence' here is specified in terms of a location on a reference sequence - typically the location where a gene or set of genes resides. But the criteria for membership in a functional copy number complement require only that the feature can perform the functions associated with the gene or genes at the defining location. A gene allele that varies by only one nucleotide from the wild-type gene may not qualify if that alteration eliminates the function of the allele. This represents an important distinction between 'copy number' and 'functional copy number'. The former is not concerned with the functionality of sequence copies - only that there is a duplication of sequence in the genome. Thus, the addition of a non-functional allele of a gene will increase its copy number, but not increase its 'functional copy number (aka its dosage). The notion of 'functional copy number' (aka 'genetic dosage') describes how many 'functional' copies of a sequence are present in a genome - i.e. sequences that retain their normal activity and/or produce gene products that retain their normal activity. In diploid organisms, the normal dosage is 2 for autosomal genes/regions. Dosage increases if there is a duplication of the gene/region. Dosage decreases if there is either a deletion of a gene/region, or an inactivating mutation that eliminates gene function. This latter condition sets it apart from the notion of a 'copy number complement', which reflects how many actual copies of a sequence exist in a genome. Addition of a non-functional allele of a gene will increase its genomic sequence complement count (i.e. its copy number), but not increase its dosage. obsolete functional copy number complement true A sequence feature attribute that reflects feature-level characteristics that depend only on the sequence, location, or genomic context of a feature or collection, but are independent of how it may be concretized in physical form. obsolete intrinsic sequence feature attribute true A sequence feature attribute that reflects characteristics of the physical molecule in which the feature is concretized (e.g. its cellular context, source of origin, etc.) obsolete extrinsic sequence feature attribute true A quality inhering in an allele reflecting whether it is found in all cells of an organism's body, or just some clonal subset (e.g. in mosaicism). allelic cellular distribution A cellular distribution in which an allele is found in all cells of an organism's body, typically in virtue of its germline origin. constitutional A cellular distribuution in which an allele is found only in some clonal subset of cells in an organism, typically in virtue of its somatic origin. clonal An inheritance pattern that depends on a mixture of major and minor genetic determinants (i.e. alleles of more than one contributing genes), possibly together with environmental factors. complex inherritance multi-factorial inheritance multi-genic inheritance multi-locus inheritance multigenic inheritance http://purl.obolibrary.org/obo/HP_0001426 Diseases inherited in this manner are termed 'complex diseases'. multifactorial inheritance A multifactorial inheritance pattern that is determined by the simultaneous action of alleles in two genes. http://purl.obolibrary.org/obo/HP_0010984 digenic inheritance A multifactorial inheritance pattern that is determined by the simultaneous action of alleles in few genes. http://purl.obolibrary.org/obo/HP_0010983 It is recommended this term be used for traits governed by three gene loci, although it is noted that usage of this term in the literature is not uniform. oligogenic inheritance A multifactorial inheritance pattern that is determined by the simultaneous action of alleles a large number of genes. http://purl.obolibrary.org/obo/HP_0010982 Typically used for traits/conditions governed by more than three gene loci. polygenic inheritance An inheritance pattern wherein the trait is determined by alleles of a single causal gene, possibly together with environmental factors. single-gene inheritance monogenic inheritance An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a non-sex chromosome. autosomal inheritance An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a sex chromosome. gonosomal inheritance http://purl.obolibrary.org/obo/HP_0010985 allosomal inheritance An inheritance pattern wherein the trait is determined by alleles of a single causal gene on an X-chromosome. http://purl.obolibrary.org/obo/HP_0001417 X-linked inheritance An X-linked dominant inheritance pattern wherein the trait associated with one allele completely masks the trait associated with a different allele found at that locus. complete X-linked dominant inheritance An X-linked dominant inheritance pattern wherein the trait expressed in a heterozygous individual is intermediate between the trait expressed in individuals homozygous for either allele in the heterozygous locus. semi-dominant X-linked inheritance incomplete X-linked dominant inheritance An X-linked dominant inheritance pattern wherein a heterozygous individual simultaneously expresses the distinct traits associated with each allele in the heterozygous locus. co-dominant X-linked inheritance An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a Y-chromosome. holandric inheritance http://purl.obolibrary.org/obo/HP_0001450 Y-linked inheritance An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a Z-chromosome. Z-linked inheritance A Z-linked inheritance pattern wherein the trait manifests in heterozygotes. Z-linked dominant inheritance A Z-linked dominant inheritance pattern wherein the trait associated with one allele completely masks the trait associated with a different allele found at that locus. complete Z-linked dominant inheritance A Z-linked dominant inheritance pattern wherein the trait expressed in a heterozygous individual is intermediate between the trait expressed in individuals homozygous for either allele in the heterozygous locus. semi-dominant Z-linked inheritance incomplete Z-linked dominant inheritance An Z-linked dominant inheritance pattern wherein a heterozygous individual simultaneously expresses the distinct traits associated with each allele in the heterozygous locus. co-dominant Z-linked inheritance A Z-linked inheritance pattern wherein a trait caused by alleles of a gene on the Z-chromosome manifests in homozygous but not heterozygote individuals. Z-linked reccessive inheritance An inheritance pattern wherein the trait is determined by alleles of a single causal gene on a W-chromosome. W-linked inheritance An inheritance pattern observed for traits related to a gene encoded on the mitochondrial genome. http://purl.obolibrary.org/obo/HP_0001427 Because the mitochondrial genome is essentially always maternally inherited, a mitochondrial condition can only be transmitted by females, although the condition can affect both sexes. The proportion of mutant mitochondria can vary (heteroplasmy). mitochondrial inheritance An autosomal dominant inheritance pattern wherein the trait manifests in heterozygotes in a sex-specific manner (i.e. only in males or only in females). http://purl.obolibrary.org/obo/HP_0001470 sex-limited autosomal dominant inheritance An autosomal recessive inheritance pattern wherein the trait manifests only in homozygotes, and in a sex-specific manner (i.e. only in males or only in females). http://purl.obolibrary.org/obo/HP_0031362 sex-limited autosomal recessive inheritance A set of discrete alleles within a particular genome. 'Sets' are used to model entities that can be comprised of multiple discrete elements - but which can also contain zero or a single member. An "Allele Set' represents any collection of 0 or more discrete alleles found within a particular genome. The alleles in such a set can be located at distant or close locations in the genome, and if on the same chromosome can be in trans, in cis, or even overlapping When the members of such a set are found 'in cis' on the same chromosome, they may constitute a 'haplotype'. When found 'in trans' at the same location on homologous chromosomes, they may constitute a 'single locus complement'. allele set A copy number complement' that has an abnormal number of members (e.g. more or less than two for an autosomal sequence in a diploid genome, as a result of deletion or duplication event(s). copy number variation Decided to implement copy number related classes at the sequence level, rather than the sequence feature level. In a 'normal' diploid genome, the copy number complement for any feature (on a non-Y chromosome) contains two members. A copy number variation occurs when a complement contains more or less than two members - as the result of deletion or duplication event(s). Note that the 'copy number variation' class in GENO is related to but ontologically distinct form the SO 'copy_number_variation' class. The GENO class refers to a *set* of all copies of a sequence in a genome, where the number of members in the set is in conflict with the genome's normal ploidy (e.g. not two for a diploid genome). The SO class, which is defined as a sequence feature level concept and therefore represents a single continuous extent of sequence, refers to a single copy of duplicated (or deleted) sequence that comprises the set defined by the GENO CNV class. obsolete variant copy number complement true A set of all features in a particular genome whose sequence aligns with a particular location on a reference genome. Such features are typically on the scale of complete genes or larger. Decided to implement copy number related classes at the sequence level, rather than the sequence feature level. Replaced by GENO:0000961. 1. Features described by 'copy number' are larger regions of sequence spanning one or more complete genes, or large chromosomal segment. Copies of these regions often become distributed across a genome at unknown locations. By contrast, short repeats, such as tri-nucelotide 'CAG' repeats in the Huntingtin gene, occur at defined locations (adjacent to the originating 'CAG' sequence), and can therefore be modeled as proper alleles. 2. A copy number complement, like any sequence feature complement, is a set of features in a particular genome that meet some criterion. The criterion in this case is that their sequence maps to that of a particular location in a reference sequence. So a copy number complement is the set of all features that share or align with a specified sequence defined on some reference. The sequence of member sequences need not exactly match that of the reference, as copies may accrue some alterations. What is important is that conceptually they represent exact or inexact copies of the reference sequence at a defining location. 3. In a 'normal' diploid genome, the copy number complement for any feature (on a non-Y chromosome) contains two members. A copy number variation occurs when a complement contains more or less than two members - as the result of deletion or duplication event(s). In GENO, a 'copy number variation' refers to a copy number complement' that has an abnormal number of members. obsolete copy number complement true A biological sequence that is of genomic origin (i.e. carries sequence from the genome of a cell or organism). A sequence being 'of genomic origin' here means only that it has been located to the genome of some organism by alignment with some reference genomic sequence. This is because the sequence was originally identified in, or artificially created to replicate, sequence from an organism's genome. genomic sequence A set representing the complement of all copies of a particular biological sequence (typically at the scale of complete genes or larger) present in a particular genome. The identity of a 'copy number complement' instance is determined by the sequence defining its members, and their count (the number of times this sequence appears in a particular genome). In reality the sequence of each copy may not be identical, given the tendency of large regions to accumulate subtle variations. What matters is that they share a common origin/alignment with a defining location in a reference genome. We represent the notion of copy number at the "sequence level" (as opposed to the "sequence feature level") because we are concerned only with the number of copies of a sequence in a genome, and not the location of the features bearing this sequence. Consider a copy number complement comprised of three copies of the sequence defined by the location Chr8 100000-200000 on a GRCh38.2 reference genome. In one person's genome, this sequence may appear at its normal location on Chromosome 8, as well as in duplications on chromosomes 5, and 12. In another genome the sequence might appear three times as well, but on chromosomes 8, 9, and 15. When representing causal associations linking copy number to disease, it is important that these are considered to be *the same* copy number complement - because what a curator associates with a disease is the presence of three copies of some sequence in a genome, independent of their location. The "sequence level" representation here supports this use case. By contrast, a "feature level" representation, where identity of a copy number complement would be based on the identity of member *features*), does not - because we have two sets comprised of entirely different features (based on location being tied to their identity). The count of how many of a particular sequences are found in a genome is the sequences 'copy number'. In diploid organisms, the normal copy number for sequences at most locations is 2 (a notable exception being those on the X-chromosome where normal copy number is 1). Variations in copy number occur if this count increases due to a duplication of the gene/region, or decreases due to a deletion of a gene/region. A driving use case for representing copy number is to support associations between variation in copy number of a particular sequence, and phenotypes or diseases that can result. A 'complement' refers to an exhaustive collection of *all* objects that make up some well-defined set. Such a set may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features, such as 'copy number complements' representing the set of all copies of a particular sequence in a genome. The fact that we are counting how many copies of the same *sequence* exist in a genome here, as opposed to how many of the same *feature*, is what sets sequence-level concepts like 'copy number complement' apart from feature-level concepts like 'single locus complement'. To illustrate the difference, consider a duplication event that creates a new copy of the human APOE gene on a different chromosome. This creates an entirely new sequence feature at a distinct locus from that of the original APOE gene. The 'copy number complement' for sequence defined by the APOE gene locus would have a count of three, as this sequence is present three times in the genome. But the 'single locus complement' at the APOE gene locus would still have a count of two - because the duplicated copy is at a different location in the genome, and therefore does not represent a copy of the APOE locus. The notion of a 'complement' is useful as a special case of a set, where the members necessarily comprise an exhaustive collection of *all* objects that make up some well-defined set. Here, a 'copy number complement' represents 'represents the set of *all* copies of a specified sequence in a particular genome. Note that sequences can be duplicated in a set (i.e. contain more than one member representing the same sequence). In the 'copy number complement' example, each set member is a copy of this same biological sequence. copy number complement A 'copy number complement' that has an abnormal number of members, as the result of deletion or duplication event(s). Note that this 'variant copy number complement' class in GENO is related to but ontologically distinct from the SO 'copy number variation' class. The GENO class refers to a *set* of all copies of a sequence in a genome, where the number of members in the set departs from the genome's normal ploidy of sequences at that location. The SO class, which is defined as a "sequence feature level" concept (and therefore represents a single continuous extent of sequence), refers to a sequence alteration such as a deletion or duplication that changes the copy number of the affected sequence, and would result in the presence of a 'variant copy number complement'. The presence of an SO 'copy number variation' suggests, but does not guarantee, the existence of a GENO 'variant copy number complement' (e.g. if a second balancing event has occurred). For example, the deletion variant reported in the ClinVar record here (https://www.ncbi.nlm.nih.gov/clinvar/variation/21009/) is a copy number variation in the SO sense - a deletion that likely results in a GENO 'variant copy number complement'. Databases like ClinVar and dbVar type such alterations as 'copy number variants'. But ClinVar also describes 'variant copy number complements' that may result from the presence of one or more SO 'copy number variations' in a given genome, e.g. here ( https://www.ncbi.nlm.nih.gov/clinvar/variation/221691/). In this case, the submitter is asserting that a state in which only one copy of the defined sequence (Chr2: 73601366 - 73673202) exists in a genome is pathogenic for 'Premature ovarian failure'. This requires more knowledge of the complete genomic state than an assertion that a specific SO 'copy number variation' (here, a deletion variant) is pathogenic for the condition - as here we know that not only is one copy deleted, but also that only one copy remains. 'Abnormal' is typically more or less than two members for an autosomal sequence in a diploid genome, and more or less than one member for a sequence in a non-homologous region of a sex-chromosome. variant copy number complement A set representing the complement of all functional versions of a specified sequence (typically that of a gene) in a particular genome. functional genetic dosage A 'complement' refers to an exhaustive collection of *all* objects that make up some well-defined set. Such a set may contain 0, 1, or more than one members. The notion of a complement is useful for defining many biologically-relevant sets of sequence features, such as the set of all functional copies of a particular sequence in a genome. This is known as the 'functional copy number' or 'genetic dosage' of the sequence. 'Functional copies' of a sequence are those that exhibit normal activity and/or produce gene products that exhibit normal activity associated with the sequence. The count of functional copies of a gene is often referred to as its 'dosage'. In diploid organisms, the normal 'dosage' is 2 for autosomal genes/regions. Dosage increases if there is a duplication of a functional gene/region. Dosage decreases if there is either a deletion of a gene/region, or an inactivating mutation that eliminates gene function. This sets it apart from the notion of a 'copy number complement', which reflects how many copies of a sequence exist in a genome, regardless of their functionality. Addition of a non-functional allele of a gene will increase its copy number, but not increase its dosage. As we saw for 'copy number complement', the defining sequence here is specified in terms of a location on a reference sequence - typically the location where a gene or set of genes resides. But the criteria for membership in a 'functional' copy number complement require only that the feature can perform the functions associated with the gene or genes at the defining location. A gene allele that varies by only one nucleotide from the wild-type gene may not qualify as functional if that alteration eliminates the activity of the allele. functional copy complement A clonal distribution in which an allele arose during embryogenesis and is present in a subset of tissues derived from some common developmental cell or tissue type. mosaic A pair of integers representing start and end position of a location on a sequence coordinate system. sequence interval An inheritance pattern wherein the trait is determined by inheritance of extra, missing, or re-arranged chromosomes possibly together with environmental factors. The Alliance of Genomic Resources chromosomal inheritance An inheritance pattern wherein the trait is determined by inheritance of missing sections of one or more chromosomes, encompassing either 0 or multiple genes, possibly together with environmental factors. Alliance of Genomic Resources chromosomal deletion inheritance An inheritance pattern wherein the trait is determined by inheritance of duplicated sections of one or more chromosomes, encompassing either 0 or multiple genes, possibly together with environmental factors. Alliance of Genomic Resources chromosomal duplication inheritance An inheritance pattern wherein the trait is determined by inheritance of translocation or inversion of sections of one or more chromosomes, possibly together with environmental factors. Alliance of Genomic Resources chromosomal rearrangement inheritance exploratory Describes an allele that is inherited from a parent. Need to consider if/how this is different than 'germline allele origin'. One scenario that potentially distinguishes them is the case where a de novo mutation occurs in the germ cells of a parent, and is passed to offspring. This does not qualify as 'germline allele origin', as currently defined. But it would qualify as 'inherited' inherited allele origin exploratory Describes an allele that is part of an allelic complement where both alleles are inherited from the same parent. From Wikidedia: Uniparental inheritance is a non-mendelian form of inheritance that consists of the transmission of genotypes from one parental type to all progeny. That is, all the genes in offspring will originate from only the mother or only the father. This phenomenon is most commonly observed in eukaryotic organelles such as mitochondria and chloroplasts. https://en.wikipedia.org/wiki/Uniparental_inheritance uniparental allele origin exploratory Describes an allele that is part of an allelic complement where one allele is maternally inherited and other paternally inherited. Biparental inheritance of alleles is typical of normal mendelian inheritance, where offspring inherit a maternal and a paternal copies of a given gene. biparental allele origin A biological process whose specific outcome is the progression of an integrated living unit: an anatomical structure (which may be a subcellular structure, cell, tissue, or organ), or organism over time from an initial condition to a later condition. [database_cross_reference: GOC:isa_complete] developmental process pulling in HP 'phenotypic abnormality' root here human phenotypic abnormality Stub class to serve as root of hierarchy for imports of human developmental stages from the Human Developmental Stages Ontology. A spatiotemporal region encompassing some part of the life cycle of an organism. human life cycle stage data item data item information content entity Examples of information content entites include journal articles, data, graphical layouts, and graphs. an information content entity is an entity that is generically dependent on some artifact and stands in relation of aboutness to some entity information_content_entity 'is_encoded_in' some digital_entity in obi before split (040907). information_content_entity 'is_encoded_in' some physical_document in obi before split (040907). Previous. An information content entity is a non-realizable information entity that 'is encoded in' some digital or physical entity. PERSON: Chris Stoeckert OBI_0000142 information content entity information content entity curation status specification The curation status of the term. The allowed values come from an enumerated list of predefined terms. See the specification of these instances for more detailed definitions of each enumerated value. Better to represent curation as a process with parts and then relate labels to that process (in IAO meeting) PERSON:Bill Bug GROUP:OBI:<http://purl.obolibrary.org/obo/obi> OBI_0000266 curation status specification data about an ontology part Data about an ontology part is a data item about a part of an ontology, for example a term Person:Alan Ruttenberg ontology metadata data about an ontology part data about an ontology part obsolescence reason specification The reason for which a term has been deprecated. The allowed values come from an enumerated list of predefined terms. See the specification of these instances for more detailed definitions of each enumerated value. The creation of this class has been inspired in part by Werner Ceusters' paper, Applying evolutionary terminology auditing to the Gene Ontology. PERSON: Alan Ruttenberg PERSON: Melanie Courtot obsolescence reason specification denotator type The Basic Formal Ontology ontology makes a distinction between Universals and defined classes, where the formal are "natural kinds" and the latter arbitrary collections of entities. A denotator type indicates how a term should be interpreted from an ontological perspective. Alan Ruttenberg Barry Smith, Werner Ceusters denotator type ontology module I have placed this under 'data about an ontology part', but this can be discussed. I think this is OK if 'part' is interpreted reflexively, as an ontology module is the whole ontology rather than part of it. ontology file This class and it's subclasses are applied to OWL ontologies. Using an rdf:type triple will result in problems with OWL-DL. I propose that dcterms:type is instead used to connect an ontology URI with a class from this hierarchy. The class hierarchy is not disjoint, so multiple assertions can be made about a single ontology. ontology module base ontology module An ontology module that comprises only of asserted axioms local to the ontology, excludes import directives, and excludes axioms or declarations from external ontologies. base ontology module editors ontology module An ontology module that is intended to be directly edited, typically managed in source control, and typically not intended for direct consumption by end-users. source ontology module editors ontology module main release ontology module An ontology module that is intended to be the primary release product and the one consumed by the majority of tools. TODO: Add logical axioms that state that a main release ontology module is derived from (directly or indirectly) an editors module main release ontology module bridge ontology module An ontology module that consists entirely of axioms that connect or bridge two distinct ontology modules. For example, the Uberon-to-ZFA bridge module. bridge ontology module import ontology module A subset ontology module that is intended to be imported from another ontology. TODO: add axioms that indicate this is the output of a module extraction process. import file import ontology module subset ontology module An ontology module that is extracted from a main ontology module and includes only a subset of entities or axioms. ontology slim subset ontology subset ontology module curation subset ontology module A subset ontology that is intended as a whitelist for curators using the ontology. Such a subset will exclude classes that curators should not use for curation. curation subset ontology module analysis ontology module An ontology module that is intended for usage in analysis or discovery applications. analysis subset ontology module single layer ontology module A subset ontology that is largely comprised of a single layer or strata in an ontology class hierarchy. The purpose is typically for rolling up for visualization. The classes in the layer need not be disjoint. ribbon subset single layer subset ontology module exclusion subset ontology module A subset of an ontology that is intended to be excluded for some purpose. For example, a blacklist of classes. antislim exclusion subset ontology module external import ontology module An imported ontology module that is derived from an external ontology. Derivation methods include the OWLAPI SLME approach. external import external import ontology module species subset ontology module A subset ontology that is crafted to either include or exclude a taxonomic grouping of species. taxon subset species subset ontology module reasoned ontology module An ontology module that contains axioms generated by a reasoner. The generated axioms are typically direct SubClassOf axioms, but other possibilities are available. reasoned ontology module generated ontology module An ontology module that is automatically generated, for example via a SPARQL query or via template and a CSV. TODO: Add axioms (using PROV-O?) that indicate this is the output-of some reasoning process generated ontology module template generated ontology module An ontology module that is automatically generated from a template specification and fillers for slots in that template. template generated ontology module taxonomic bridge ontology module taxonomic bridge ontology module ontology module subsetted by expressivity ontology module subsetted by expressivity obo basic subset ontology module A subset ontology that is designed for basic applications to continue to make certain simplifying assumptions; many of these simplifying assumptions were based on the initial version of the Gene Ontology, and have become enshrined in many popular and useful tools such as term enrichment tools. Examples of such assumptions include: traversing the ontology graph ignoring relationship types using a naive algorithm will not lead to cycles (i.e. the ontology is a DAG); every referenced term is declared in the ontology (i.e. there are no dangling clauses). An ontology is OBO Basic if and only if it has the following characteristics: DAG Unidirectional No Dangling Clauses Fully Asserted Fully Labeled No equivalence axioms Singly labeled edges No qualifier lists No disjointness axioms No owl-axioms header No imports obo basic subset ontology module ontology module subsetted by OWL profile ontology module subsetted by OWL profile EL++ ontology module EL++ ontology module where to place this depends on if we take the organismal view or the quality centric view. mammalian phenotype Mus musculus Stub class to serve as root of hierarchy for imports of virus types from relevant ontologies or terminologies. Viruses Danio rerio Oryzias latipes Homo sapiens A processual entity that realizes a plan which is the concretization of a plan specification. Stub class to serve as root of hierarchy for experimental techniques and processes, defined in GENO or imported from ontologies such as OBI and ERO. planned process reagent role a population is a collection of individuals from the same taxonomic class living, counted or sampled at a particular site or in a particular area population Stub class to serve as root of hierarchy for imports from NCBI Taxonomy. organism 'Value' label chosen here according to http://www.uwgb.edu/heuerc/2D/ColorTerms.html Was parent of chromosomal band intensity before moving this class to live as a sequence feature attribute. color value obsolete color brightness true female male phenotypic sex A material entity that consists of two or more organisms, viruses, or viroids. A group of organisms of the same taxonomic group grouped together in virtue of their sharing some commonality (either an inherent attribute or an externally assigned role). collection of organisms A domestic group, or a number of domestic groups linked through descent (demonstrated or stipulated) from a common ancestor, marriage, or adoption. family Morpholino oligos are synthesized from four different Morpholino subunits, each of which contains one of the four genetic bases (A, C, G, T) linked to a 6-membered morpholine ring. Eighteen to 25 subunits of these four subunit types are joined in a specific order by non-ionic phosphorodiamidate intersubunit linkages to give a Morpholino. morpholino_oligo The descriptor 1p22.3 = chromosome 1, short arm, region 2, band 2, sub-band 3. This is read as "one q two-two point three", not "one q twenty-two point three". A region of the chromosome between the centromere and the telomere. Human chromosomes have two arms, the p arm (short) and the q arm (long) which are separated from each other by the centromere. Formerly http://purl.obolibrary.org/obo/GENO_0000613, replaced by SO term. http://ghr.nlm.nih.gov/handbook/howgeneswork/genelocation and http://people.rit.edu/rhrsbi/GeneticsPages/Handouts/ChromosomeNomenclature.pdf, both of which define the nomenclature for the banding hierarchy we use here: chromosome > arm > region > band > sub-band Note that an alternate nomenclature for this hierarchy is here (http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/chrombanding.html): chromosome > arm > band > sub-band > sub-sub-band chromosome arm Any extent of continuous biological sequence. GENO defines three levels of sequence-related artifacts, which are distinguished by their identity criteria. 1. 'Biological sequence' identity is dependent only on the ordering of units that comprise the sequence. 2. 'Sequence feature' identity is dependent on its sequence and the genomic location of the sequence (this is consistent with the definition of 'sequence feature' in the Sequence Ontology). 3. 'Qualified sequence feature' identity is additionally dependent on some aspect of the physical context of the genetic material in which the feature is concretized. This third criteria is extrinsic to its sequence and its genomic location. For example, the feature's physical concretization being targeted by a gene knockdown reagent in a cell (e.g. the zebrafish Shha gene as targeted by the morpholino 'Shha-MO1'), or its being transiently expressed from a recombinant expression construct (e.g. the human SHH gene as expressed in a mouse Shh knock-out cell line), or its having been epigenetically modified in a way that alters its expression level or pattern (e.g. the human SHH gene with a specific methylation pattern). A sequence feature is an extent of 'located' biological sequence, whose identity is determined by both its inherent sequence (ordering of monomeric units) and its position (start and end coordinates based on alignment with some reference). By contrast, 'biological sequences' are identified and distinguished only by their inehrent sequence, and not their position. Accordingly, the 'ATG' start codon in the coding DNA sequence of the human AKT gene is the same 'sequence' as the 'ATG' start codon in the human SHH gene, but these represent two distinct 'sequence features' in virtue of their different positions in the genome. sequence_feature true Formalizes the first identity criteria for a sequence feature of its sequence. true Formalizes the second identify criteiria for a sequence feature of its genomic position. We use the FALDO model to represent positional information, which links features to positional information through an instance of a Region class that represents the mapping of the feature onto some reference sequence. (But features can also be linked to Positions directly through the location property). A region of known length which may be used to manufacture a longer region. obsolete assembly_component true A contiguous sequence derived from sequence assembly. Has no gaps, but may contain N's from unavailable bases. obsolete contig true 0 The point at which one or more contiguous nucleotides were excised. deleted_sequence nucleotide deletion nucleotide_deletion SO:1000033 SO:0000159 SOFA http://en.wikipedia.org/wiki/Nucleotide_deletion deletion enhancer A regulatory_region composed of the TSS(s) and binding sites for TF_complexes of the basal transcription machinery. promoter A region of nucleotide sequence that has translocated to a new position. transchr translocated sequence SO:0000199 DBVAR translocation SSLP simple sequence length polymorphism simple sequence length variation SO:0000207 simple_sequence_length_variation sequence length variation SO:0000248 sequence_length_variation See here for a list of engineered regions in ZFIN: http://zfin.org/cgi-bin/webdriver?MIval=aa-markerselect.apg&marker_type=REGION&query_results=t&compare=contains&WINSIZE=25. Includes things like loxP sites, inducible promoters, ires elements, etc. engineered_foreign_gene A repeat_region containing repeat_units of 2 to 10 bp repeated in tandem. http://en.wikipedia.org/wiki/Microsatellite_%28genetics%29 A defined feature that includes any type of VNTR or SSLP locus. microsatellite RNAi_reagent Structural unit composed of a nucleic acid molecule which controls its own replication through the interaction of specific proteins at one or more origins of replication. A complete chromosome sequence. chromosome The descriptor 1p22.3 = chromosome 1, short arm, region 2, band 2, sub-band 3. This is read as "one q two-two point three", not "one q twenty-two point three". A cytologically distinguishable feature of a chromosome, often made visible by staining, and usually alternating light and dark. http://ghr.nlm.nih.gov/handbook/howgeneswork/genelocation and http://people.rit.edu/rhrsbi/GeneticsPages/Handouts/ChromosomeNomenclature.pdf, both of which define the nomenclature for the banding hierarchy we use here: chromosome > arm > region > band > sub-band Note that an alternate nomenclature for this hierarchy is here (http://www.ncbi.nlm.nih.gov/Class/MLACourse/Original8Hour/Genetics/chrombanding.html): chromosome > arm > band > sub-band > sub-sub-band "Band' is a term of convenience in order to hierarchically organize morphologically defined chromosome features: chromosome > arm > region > band > sub-band. chromosome band centromere Obsoleted as we didnt want to commit to constructs being plasmids - but rather wanted a classification of more general types of engineered regions used to replicate and deliver sequence to target cells/genomes. Replaced by GENO:0000856 ! engineered genetic construct. obsolete engineered_plasmid true The sequence of one or more nucleotides added between two adjacent nucleotides in the sequence. insertion nucleotide insertion nucleotide_insertion SO:1000034 SO:0000667 DBVAR SOFA insertion SNPs are single base pair positions in genomic DNA at which different sequence alternatives exist in normal individuals in some population(s), wherein the least frequent variant has an abundance of 1% or greater. single nucleotide polymorphism SO:0000694 SOFA SNP A junction is a boundary between regions. A boundary has an extent of zero. junction A region (or regions) that includes all of the sequence elements necessary to encode a functional transcript. A gene may include regulatory regions, transcribed regions and/or other functional sequence regions. Regarding the distinction between a 'gene' and a 'gene allele': Every zebrafish genome contains a 'gene allele' for every zebrafish gene. Many will be 'wild-type' or at least functional gene alleles. But some may be alleles that are mutated or truncated so as to lack functionality. According to current SO criteria defining genes, a 'gene' no longer exists in the case of a non-functional or deleted variant. But the 'gene allele' does exist - and its extent is that of the remaining/altered sequence based on alignment with a reference gene. Even for completely deleted genes, an allele of the gene exists (and here is equivalent to the junction corresponding to the where gene would live based on a reference alignment). A gene is any 'gene allele' that produces a functional transcript (ie one capable of translation into a protein, or independent functioning as an RNA), when encoded in the genome of some cell or virion. gene A quantitative trait locus (QTL) is a polymorphic locus which contains alleles that differentially affect the expression of a continuously distributed phenotypic trait. Usually it is a marker described by statistical association to quantitative variation in the particular phenotypic trait that is thought to be controlled by the cumulative action of alleles at multiple loci. quantitative trait locus QTL An attribute to describe a region that was modified in vitro. engineered construct engineered_region An extended region of sequence corresponding to a defined feature that is a proper part of a chromosome, e.g. a chromosomal 'arm', 'region', or 'band'. chromosomal feature gross chromosomal part chromosome part A gene that has been transferred naturally or by any of a number of genetic engineering techniques into a cell or organism where it is foreign (i.e. does not belong to the host genome). On the relationship between 'transgenic insertions', 'transgenes', and 'alleles' Transgenic insertions are sequence alterations comprised of foreign/exogenous sequence. This sequence can be from the same or different species as the host cell or genome - it is exogenous in virtue of it being additional sequence inserted into the original host genome. A given transgenic insertion may create one or more transgenes when introduced into a host genome. The extent of a transgene is spans all features needed to drive its expression in the host genome. In most cases a transgenic insertion completely contains one or more transgenes that are fully competent to drive expression in the host genome. But in some cases, a transgenic insertion may carry only part of the final transgene it creates - which requires additional endogenous sequences in the vicinity of its insertion site to complete a functional gene (e.g. this is the case for enhancer traps or gene traps) to complete. In addition to the transgenes they create upon genomic integration, transgenic insertions can create variant alleles by disrupting a known endogenous gene/locus. Variant alleles are versions of a particular genomic features (typically genes), that are altered in their sequence relative to some reference. An insertion that disrupts an endogenous gene would be considered a 'sequence alteration' (sensu SO) which creates a 'variant gene allele'. From the perspective of this disrupted gene, the origin or transgenic nature of this insertion is irrelevant - what matters here is that the gene's sequence has been altered to create an allele. For the purposes of modeling, any transgene(s) created when an endogenous gene is interrupted by an insertion is considered/modeled separately from the allele of the endogenous gene that is created by the insertion. The transgenic insertion, which is simply a sequence alteration in the host genome, is then linked to any transgenes that it contributes to or overlaps with or contains. The model of the Flybase example HERE illustrates this approach. Transgenes can exist as integrated into the host genome, or extra-chromosomally on replicons or transiently carried/expressed vectors. What matters is that they are active in the context of a foreign biological system (typically a cell or organism). Note that transgenes as defined here are not necessarily from a different taxon than that of the host genome. For example, a Mus musculus gene over-expressed from a chromosomally-integrated expression construct in a Mus musculus genome qualifies as a transgene because it is exogenous to the host genome. transgene A multiple nucleotide polymorphism with alleles of common length > 1, for example AAA/TTT. multiple nucleotide polymorphism SO:0001013 MNP A variation that increases or decreases the copy number of a given region. CNP CNV copy number polymorphism copy number variation SO:0001019 SOFA http://en.wikipedia.org/wiki/Copy_number_variation copy_number_variation A collection of sequence features (typically a collection of chromosomes) that covers the sum genetic material within a cell or virion (where 'genetic material' refers to any nucleic acid that is part of a cell or virion and has been inherited from an ancestor cell or virion, and/or can be replicated and inherited by its progeny) Genotype vs Genome in GENO: An (genomic) genotype is an information artifact representing a shorthand syntax for specifying what is known about variation in a genome sequence. This syntax has reference and variant components - a 'reference genome' and 'genomic variation complement' - that must be operated on to resolve a final genome sequence (i.e. substituting all sequences specified by the 'genomic variation complement' for the corresponding sequences in the 'reference genome'). So, while the total sequence content represented in a genotype may be greater than that in a genome, the intended resolution of these sequences is to arrive at a single genome sequence. 'genome sequence' A genome is considered the complement of all heritable sequence features in a given cell or organism (chromosomal or extrachromosomal). This is typically a collection of >1 sequence molecules (e.g. chromosomes), but in some organisms (e.g. bacteria) it may be a single sequence macromolecule (e.g. a circular plasmid). For this reason 'genome' classifies under 'sequence feature complement'. genome A few examples highlighting the distinction of 'sequence alterations' from their parent 'variant allele': 1. Consider NM_000059.3(BRCA2):c.631G>A variation in the BRCA2 gene. This mutation of a single nucleotide creates a gene allele whose extent is that of the entire BRCA2 gene. This version of the full BRCA2 gene is a 'variant allele', while the extent of sequence spanning just the single altered base is a 'sequence alteration'. See https://www.ncbi.nlm.nih.gov/snp/80358871. 2. Consider the NM_000059.3(BRCA2):c.132_133ins8 variation in the BRCA2 gene. This 8 bp insertion creates a gene allele whose extent is that of the entire BRCA2 gene. This version of the full BRCA2 gene is a 'variant allele', while the extent of sequence spanning just the 8 bp insertion is a 'sequence alteration'. See https://www.ncbi.nlm.nih.gov/snp/483353112. 3. Consider the NM_000059.3(BRCA2):c.22_23delAG variation in the BRCA2 gene. This 2 bp deletion creates a gene allele whose extent is that of the entire BRCA2 gene. This version of the full BRCA2 gene is a 'variant allele', while the junction where the deletion occured is a 'sequence alteration' with an extent of zero. See https://www.ncbi.nlm.nih.gov/snp/483353112. A sequence_alteration is a sequence_feature whose extent is the deviation from another sequence. sequence variation SO:1000004 SO:1000007 SO:0001059 SOFA 1. A 'sequence alteration' is an allele whose sequence deviates in its entirety from that of other features found at the same genomic location (i.e. it deviates along its entire extent). In this sense, 'sequence alterations' represent the minimal extent an allele can take - i.e. that which is variable with some other feature along its entire sequence). An example is a SNP or insertion. Alleles whose extent goes beyond the specific sequence that is known to be variable are not sequence alterations. These are alleles that represent alternate versions of some larger, named feature. The classic example here is a 'gene allele', which spans the extent of an entire gene, and contains one or more sequence alterations (regions known to vary) as part. 2. Sequence alterations are not necessarily 'variant' in the sense defined in GENO (i.e. being 'variant with' some reference sequence). In any comparison of alleles at a particular location, the choice of a 'reference' is context-dependent - as comparisons in other contexts might consider a different allele to be the reference. So while sequence alterations are usually considered 'variant' in the context in which they are considered, this variant status may not hold at all times. For this reason, the 'sequence alteration' class is not made an rdfs:subClassOf 'variant allele'. For a particular instance of a sequence alteration, howver, we may in some cases be able to rdf:type it as a 'varaint allele' and a 'sequence alteration', in situations where we can be confident that the feature will *never* be considered a reference. For example, experimentally generated mutations in model organism genes that are created expressly to vary from an established reference. 3. Note that we consider novel features gained in a genome to be sequence alterations, including aneusomic chromosomes gained through a non-disjunction event during replication, or extrachromosomal replicons that become part of the heritable genome of a cell or organism. sequence_alteration An insertion that derives from another organism, via the use of recombinant DNA technology. transgenic insertion SO:0001218 transgenic_insertion A region which is the result of some arbitrary experimental procedure. The procedure may be carried out with biological material or inside a computer. not currently needed to support modeling use cases. can re-introduce if becomes necessary. obsolete experimental_feature true A construct which is designed to integrate into a genome and produce a fusion transcript between exons of the gene into which it inserts and a reporter element in the construct. Gene traps contain a splice acceptor, do not contain promoter elements for the reporter, and are mutagenic. Gene traps may be bicistronic with the second cassette containing a promoter driving an a selectable marker. gene_trap_construct A construct which is designed to integrate into a genome and express a reporter when inserted in close proximity to a promoter element. Promoter traps typically do not contain promoter elements and are mutagenic. promoter_trap_construct A construct which is designed to integrate into a genome and express a reporter when the expression from a basic minimal promoter is enhanced by genomic enhancer elements. Enhancer traps contain promoter elements and are not usually mutagenic. enhancer_trap_construct SNVs are single base pair positions in genomic DNA at which different sequence alternatives exist. single nucleotide variant kareneilbeck Thu Oct 08 11:37:49 PDT 2009 SO:0001483 SOFA SNV A biological_region characterized as a single heritable trait in a phenotype screen. The heritable phenotype may be mapped to a chromosome but generally has not been characterized to a specific gene locus. heritable_phenotypic_marker 'GRCh37.p10' (a human reference genome build) A genome sequence that is used as a standard against which other genome sequences are compared, or into which alterations are intentionally introduced. reference genome sequence A sequence alteration whereby the copy number of a given regions is greater than the reference sequence. copy number gain gain kareneilbeck Mon Feb 28 01:54:09 PST 2011 SO:0001742 DBVAR copy_number_gain A sequence alteration whereby the copy number of a given region is less than the reference sequence. copy number loss loss kareneilbeck Mon Feb 28 01:55:02 PST 2011 SO:0001743 DBVAR copy_number_loss Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from one parent and no copies of the same chromosome or region from the other parent. UPD uniparental disomy kareneilbeck Mon Feb 28 02:01:05 PST 2011 SO:0001744 DBVAR http:http\://en.wikipedia.org/wiki/Uniparental_disomy UPD Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from the mother and no copies of the same chromosome or region from the father. maternal uniparental disomy kareneilbeck Mon Feb 28 02:03:01 PST 2011 SO:0001745 maternal_uniparental_disomy Uniparental disomy is a sequence_alteration where a diploid individual receives two copies for all or part of a chromosome from the father and no copies of the same chromosome or region from the mother. paternal uniparental disomy kareneilbeck Mon Feb 28 02:03:30 PST 2011 SO:0001746 paternal_uniparental_disomy A structural sequence alteration where there are multiple equally plausible explanations for the change. complex kareneilbeck Wed Mar 23 03:21:19 PDT 2011 SO:0001784 DBVAR complex_structural_alteration kareneilbeck Fri Mar 25 02:27:41 PDT 2011 SO:0001785 DBVAR structural_alteration Formerly http://purl.obolibrary.org/obo/GENO_0000067, replaced with SO term. regulatory element regulatory gene region regulatory_region Any change in genomic DNA caused by a single event. SO:1000002 SOFA substitution When no simple or well defined DNA mutation event describes the observed DNA change, the keyword \"complex\" should be used. Usually there are multiple equally plausible explanations for the change. complex substitution SO:1000005 SOFA complex_substitution A single nucleotide change which has occurred at the same position of a corresponding nucleotide in a reference sequence. point mutation SO:1000008 SOFA http://en.wikipedia.org/wiki/Point_mutation point_mutation Change of a pyrimidine nucleotide, C or T, into an other pyrimidine nucleotide, or change of a purine nucleotide, A or G, into an other purine nucleotide. SO:1000009 transition A substitution of a pyrimidine, C or T, for another pyrimidine. pyrimidine transition SO:1000010 pyrimidine_transition A transition of a cytidine to a thymine. C to T transition SO:1000011 C_to_T_transition The transition of cytidine to thymine occurring at a pCpG site as a consequence of the spontaneous deamination of 5'-methylcytidine. C to T transition at pCpG site SO:1000012 C_to_T_transition_at_pCpG_site T to C transition SO:1000013 T_to_C_transition A substitution of a purine, A or G, for another purine. purine transition SO:1000014 purine_transition A transition of an adenine to a guanine. A to G transition SO:1000015 A_to_G_transition A transition of a guanine to an adenine. G to A transition SO:1000016 G_to_A_transition Change of a pyrimidine nucleotide, C or T, into a purine nucleotide, A or G, or vice versa. SO:1000017 http://en.wikipedia.org/wiki/Transversion transversion Change of a pyrimidine nucleotide, C or T, into a purine nucleotide, A or G. pyrimidine to purine transversion SO:1000018 pyrimidine_to_purine_transversion A transversion from cytidine to adenine. C to A transversion SO:1000019 C_to_A_transversion C to G transversion SO:1000020 C_to_G_transversion A transversion from T to A. T to A transversion SO:1000021 T_to_A_transversion A transversion from T to G. T to G transversion SO:1000022 T_to_G_transversion Change of a purine nucleotide, A or G , into a pyrimidine nucleotide C or T. purine to pyrimidine transversion SO:1000023 purine_to_pyrimidine_transversion A transversion from adenine to cytidine. A to C transversion SO:1000024 A_to_C_transversion A transversion from adenine to thymine. A to T transversion SO:1000025 A_to_T_transversion A transversion from guanine to cytidine. G to C transversion SO:1000026 G_to_C_transversion A transversion from guanine to thymine. G to T transversion SO:1000027 G_to_T_transversion A sequence alteration which included an insertion and a deletion, affecting 2 or more bases. SO:1000032 http://en.wikipedia.org/wiki/Indel Indels can have a different number of bases than the corresponding reference sequence. indel One or more nucleotides are added between two adjacent nucleotides in the sequence; the inserted sequence derives from, or is identical in sequence to, nucleotides adjacent to insertion point. nucleotide duplication nucleotide_duplication SO:1000035 duplication A continuous nucleotide sequence is inverted in the same position. inversion SO:1000036 DBVAR SOFA inversion A tandem duplication where the individual regions are in the same orientation. direct tandem duplication SO:1000039 direct_tandem_duplication A tandem duplication where the individual regions are not in the same orientation. inverted tandem duplication mirror duplication SO:1000040 inverted_tandem_duplication A duplication consisting of 2 identical adjacent regions. erverted tandem duplication SO:1000173 DBVAR tandem_duplication Stub class to serve as root of hierarchy for imports of developmental stages from Uberon or taxon specific vocabularies such as ZFIN stages terms) life cycle stage Stub class to serve as root of hierarchy for imports of anatomical entities from UBERON, CARO, or taxon-specific anatomy ontologies. http://purl.obolibrary.org/obo/CARO_0000000 anatomical entity Stub node that gathers root classes from various taxon-specific phenotype ontologies, as connectors to bringing classes from these ontolgies into the GENO framework. 1. From OGMS: A (combination of) quality(ies) of an organism determined by the interaction of its genetic make-up and environment that differentiates specific instances of a species from other instances of the same species (from OGMS, and used in OBI, but treatment as a quality is at odds with previous OBI discussions and their treatemnt of 'comparative phenotype assessment, where a phenotype is described as a quality or disposition) 2. From OBI calls: quality or disposition inheres in organism or part of an organism towards some growth environment Phenotype Animals exhibit variations compared to a given control. 'Variant' is the given label of the root class in the Worm Phenotype ontology. Renamng it here to be consisent with our hierarchy of phenotype classes. Variant c. elegans phenotype worm phenotype abnormal(ly) malformed endocardium cell abnormal(ly) absent dorso-rostral cluster abnormal(ly) disrupted diencephalon development abnormal(ly) disrupted neutrophil aggregation abnormal(ly) absent adaxial cell association Equivalent to: http://www.informatics.jax.org/marker/MGI:98297 mus musculus shh gene http://zfin.org/ZDB-GENE-980526-166 danio rerio shha gene http://zfin.org/ZDB-GENE-040123-1 danio rerio cdkn1ca gene Equivalent to: http://www.ensembl.org/Gene/Summary?g=ENSG00000164690 Codes for: http://www.uniprot.org/uniprot/Q15465 homo sapiens SHH gene exploratory term exemplar term example to be eventually removed example to be eventually removed failed exploratory term The term was used in an attempt to structure part of the ontology but in retrospect failed to do a good job Person:Alan Ruttenberg failed exploratory term metadata complete Class has all its metadata, but is either not guaranteed to be in its final location in the asserted IS_A hierarchy or refers to another class that is not complete. metadata complete organizational term Term created to ease viewing/sort terms for development purpose, and will not be included in a release organizational term ready for release Class has undergone final review, is ready for use, and will be included in the next release. Any class lacking "ready_for_release" should be considered likely to change place in hierarchy, have its definition refined, or be obsoleted in the next release. Those classes deemed "ready_for_release" will also derived from a chain of ancestor classes that are also "ready_for_release." ready for release metadata incomplete Class is being worked on; however, the metadata (including definition) are not complete or sufficiently clear to the branch editors. metadata incomplete uncurated Nothing done yet beyond assigning a unique class ID and proposing a preferred term. uncurated pending final vetting All definitions, placement in the asserted IS_A hierarchy and required minimal metadata are complete. The class is awaiting a final review by someone other than the term editor. pending final vetting Core is an instance of a grouping of terms from an ontology or ontologies. It is used by the ontology to identify main classes. PERSON: Alan Ruttenberg PERSON: Melanie Courtot obsolete_core true placeholder removed placeholder removed terms merged An editor note should explain what were the merged terms and the reason for the merge. terms merged term imported This is to be used when the original term has been replaced by a term imported from an other ontology. An editor note should indicate what is the URI of the new term to use. term imported term split This is to be used when a term has been split in two or more new terms. An editor note should indicate the reason for the split and indicate the URIs of the new terms created. term split universal Hard to give a definition for. Intuitively a "natural kind" rather than a collection of any old things, which a class is able to be, formally. At the meta level, universals are defined as positives, are disjoint with their siblings, have single asserted parents. Alan Ruttenberg A Formal Theory of Substances, Qualities, and Universals, http://ontology.buffalo.edu/bfo/SQU.pdf universal defined class A defined class is a class that is defined by a set of logically necessary and sufficient conditions but is not a universal "definitions", in some readings, always are given by necessary and sufficient conditions. So one must be careful (and this is difficult sometimes) to distinguish between defined classes and universal. Alan Ruttenberg defined class named class expression A named class expression is a logical expression that is given a name. The name can be used in place of the expression. named class expressions are used in order to have more concise logical definition but their extensions may not be interesting classes on their own. In languages such as OWL, with no provisions for macros, these show up as actuall classes. Tools may with to not show them as such, and to replace uses of the macros with their expansions Alan Ruttenberg named class expression to be replaced with external ontology term Terms with this status should eventually replaced with a term from another ontology. Alan Ruttenberg group:OBI to be replaced with external ontology term requires discussion A term that is metadata complete, has been reviewed, and problems have been identified that require discussion before release. Such a term requires editor note(s) to identify the outstanding issues. Alan Ruttenberg group:OBI requires discussion Initially created such that integrated transgene infers as child of sequence_alteration.