definition term replaced by amino acid modification Alliance of Genome Resources Alliance of Genome Resources Gene Biotype Slim biosapiens database of genomic structural variation RNA modification SO feature annotation variant annotation term amino acid 1 letter code amino acid 3 letter code biosapiens protein feature ontology dbsnp variant terms DBVAR ensembl variant terms subset_property synonym_type_property consider has_alternative_id has_broad_synonym database_cross_reference has_exact_synonym has_narrow_synonym has_obo_format_version has_obo_namespace has_related_synonym has_scope has_synonym_type in_subset A geometric operator, specified in Egenhofer 1989. Two features meet if they share a junction on the sequence. X adjacent_to Y iff X and Y share a boundary but do not overlap. sequence adjacent_to adjacent_to A geometric operator, specified in Egenhofer 1989. Two features meet if they share a junction on the sequence. X adjacent_to Y iff X and Y share a boundary but do not overlap. PMID:20226267 SO:ke sequence associated_with This relationship is vague and up for discussion. associated_with B is complete_evidence_for_feature A if the extent (5' and 3' boundaries) and internal boundaries of B fully support the extent and internal boundaries of A. sequence complete_evidence_for_feature If A is a feature with multiple regions such as a multi exon transcript, the supporting EST evidence is complete if each of the regions is supported by an equivalent region in B. Also there must be no extra regions in B that are not represented in A. This relationship was requested by jeltje on the SO term tracker. The thread for the discussion is available can be accessed via tracker ID:1917222. complete_evidence_for_feature B is complete_evidence_for_feature A if the extent (5' and 3' boundaries) and internal boundaries of B fully support the extent and internal boundaries of A. SO:ke X connects_on Y, Z, R iff whenever Z is on a R, X is adjacent to a Y and adjacent to a Z. kareneilbeck 2010-10-14T01:38:51Z sequence connects_on Example: A splice_junction connects_on exon, exon, mature_transcript. connects_on X connects_on Y, Z, R iff whenever Z is on a R, X is adjacent to a Y and adjacent to a Z. PMID:20226267 X contained_by Y iff X starts after start of Y and X ends before end of Y. kareneilbeck 2010-10-14T01:26:16Z sequence contained_by The inverse is contains. Example: intein contained_by immature_peptide_region. contained_by X contained_by Y iff X starts after start of Y and X ends before end of Y. PMID:20226267 The inverse of contained_by. kareneilbeck 2010-10-14T01:32:15Z sequence contains Example: pre_miRNA contains miRNA_loop. contains The inverse of contained_by. PMID:20226267 sequence derives_from derives_from X is disconnected_from Y iff it is not the case that X overlaps Y. kareneilbeck 2010-10-14T01:42:10Z sequence disconnected_from disconnected_from X is disconnected_from Y iff it is not the case that X overlaps Y. PMID:20226267 kareneilbeck 2009-08-19T02:19:45Z sequence edited_from edited_from kareneilbeck 2009-08-19T02:19:11Z sequence edited_to edited_to B is evidence_for_feature A, if an instance of B supports the existence of A. sequence evidence_for_feature This relationship was requested by nlw on the SO term tracker. The thread for the discussion is available can be accessed via tracker ID:1917222. evidence_for_feature B is evidence_for_feature A, if an instance of B supports the existence of A. SO:ke X is exemplar of Y if X is the best evidence for Y. sequence exemplar_of Tracker id: 2594157. exemplar_of X is exemplar of Y if X is the best evidence for Y. SO:ke Xy is finished_by Y if Y part of X, and X and Y share a 3' boundary. kareneilbeck 2010-10-14T01:45:45Z sequence finished_by Example CDS finished_by stop_codon. finished_by Xy is finished_by Y if Y part of X, and X and Y share a 3' boundary. PMID:20226267 X finishes Y if X is part_of Y and X and Y share a 3' or C terminal boundary. kareneilbeck 2010-10-14T02:17:53Z sequence finishes Example: stop_codon finishes CDS. finishes X finishes Y if X is part_of Y and X and Y share a 3' or C terminal boundary. PMID:20226267 X gained Y if X is a variant_of X' and Y part of X but not X'. kareneilbeck 2011-06-28T12:51:10Z sequence gained A relation with which to annotate the changes in a variant sequence with respect to a reference. For example a variant transcript may gain a stop codon not present in the reference sequence. gained X gained Y if X is a variant_of X' and Y part of X but not X'. SO:ke sequence genome_of genome_of kareneilbeck 2009-08-19T02:27:04Z sequence guided_by guided_by kareneilbeck 2009-08-19T02:27:24Z sequence guides guides X has_integral_part Y if and only if: X has_part Y and Y part_of X. kareneilbeck 2009-08-19T12:01:46Z sequence has_integral_part Example: mRNA has_integral_part CDS. has_integral_part X has_integral_part Y if and only if: X has_part Y and Y part_of X. http://precedings.nature.com/documents/3495/version/1 sequence has_origin has_origin Inverse of part_of. sequence has_part Example: operon has_part gene. has_part Inverse of part_of. http://precedings.nature.com/documents/3495/version/1 sequence has_quality The relationship between a feature and an attribute. has_quality sequence homologous_to homologous_to X integral_part_of Y if and only if: X part_of Y and Y has_part X. kareneilbeck 2009-08-19T12:03:28Z sequence integral_part_of Example: exon integral_part_of transcript. integral_part_of X integral_part_of Y if and only if: X part_of Y and Y has_part X. http://precedings.nature.com/documents/3495/version/1 R is_consecutive_sequence_of R iff every instance of R is equivalent to a collection of instances of U:u1, u2, un, such that no pair of ux uy is overlapping and for all ux, it is adjacent to ux-1 and ux+1, with the exception of the initial and terminal u1,and un (which may be identical). kareneilbeck 2010-10-14T02:19:48Z sequence is_consecutive_sequence_of Example: region is consecutive_sequence of base. is_consecutive_sequence_of R is_consecutive_sequence_of R iff every instance of R is equivalent to a collection of instances of U:u1, u2, un, such that no pair of ux uy is overlapping and for all ux, it is adjacent to ux-1 and ux+1, with the exception of the initial and terminal u1,and un (which may be identical). PMID:20226267 X lost Y if X is a variant_of X' and Y part of X' but not X. kareneilbeck 2011-06-28T12:53:16Z sequence lost A relation with which to annotate the changes in a variant sequence with respect to a reference. For example a variant transcript may have lost a stop codon present in the reference sequence. lost X lost Y if X is a variant_of X' and Y part of X' but not X. SO:ke A maximally_overlaps X iff all parts of A (including A itself) overlap both A and Y. kareneilbeck 2010-10-14T01:34:48Z sequence maximally_overlaps Example: non_coding_region_of_exon maximally_overlaps the intersections of exon and UTR. maximally_overlaps A maximally_overlaps X iff all parts of A (including A itself) overlap both A and Y. PMID:20226267 sequence member_of A subtype of part_of. Inverse is collection_of. Winston, M, Chaffin, R, Herrmann: A taxonomy of part-whole relations. Cognitive Science 1987, 11:417-444. member_of A relationship between a pseudogenic feature and its functional ancestor. sequence non_functional_homolog_of non_functional_homolog_of A relationship between a pseudogenic feature and its functional ancestor. SO:ke sequence orthologous_to orthologous_to X overlaps Y iff there exists some Z such that Z contained_by X and Z contained_by Y. kareneilbeck 2010-10-14T01:33:15Z sequence overlaps Example: coding_exon overlaps CDS. overlaps X overlaps Y iff there exists some Z such that Z contained_by X and Z contained_by Y. PMID:20226267 sequence paralogous_to paralogous_to X part_of Y if X is a subregion of Y. sequence part_of Example: amino_acid part_of polypeptide. part_of X part_of Y if X is a subregion of Y. http://precedings.nature.com/documents/3495/version/1 B is partial_evidence_for_feature A if the extent of B supports part_of but not all of A. sequence partial_evidence_for_feature partial_evidence_for_feature B is partial_evidence_for_feature A if the extent of B supports part_of but not all of A. SO:ke sequence position_of position_of Inverse of processed_into. kareneilbeck 2009-08-19T12:14:00Z sequence processed_from Example: miRNA processed_from miRNA_primary_transcript. processed_from Inverse of processed_into. http://precedings.nature.com/documents/3495/version/1 X is processed_into Y if a region X is modified to create Y. kareneilbeck 2009-08-19T12:15:02Z sequence processed_into Example: miRNA_primary_transcript processed into miRNA. processed_into X is processed_into Y if a region X is modified to create Y. http://precedings.nature.com/documents/3495/version/1 kareneilbeck 2009-08-19T02:21:03Z sequence recombined_from recombined_from kareneilbeck 2009-08-19T02:20:07Z sequence recombined_to recombined_to sequence sequence_of sequence_of sequence similar_to similar_to X is strted_by Y if Y is part_of X and X and Y share a 5' boundary. kareneilbeck 2010-10-14T01:43:55Z sequence started_by Example: CDS started_by start_codon. started_by X is strted_by Y if Y is part_of X and X and Y share a 5' boundary. PMID:20226267 X starts Y if X is part of Y, and A and Y share a 5' or N-terminal boundary. kareneilbeck 2010-10-14T01:47:53Z sequence starts Example: start_codon starts CDS. starts X starts Y if X is part of Y, and A and Y share a 5' or N-terminal boundary. PMID:20226267 kareneilbeck 2009-08-19T02:22:14Z sequence trans_spliced_from trans_spliced_from kareneilbeck 2009-08-19T02:22:00Z sequence trans_spliced_to trans_spliced_to X is transcribed_from Y if X is synthesized from template Y. kareneilbeck 2009-08-19T12:05:39Z sequence transcribed_from Example: primary_transcript transcribed_from gene. transcribed_from X is transcribed_from Y if X is synthesized from template Y. http://precedings.nature.com/documents/3495/version/1 Inverse of transcribed_from. kareneilbeck 2009-08-19T12:08:24Z sequence transcribed_to Example: gene transcribed_to primary_transcript. transcribed_to Inverse of transcribed_from. http://precedings.nature.com/documents/3495/version/1 Inverse of translation _of. kareneilbeck 2009-08-19T12:11:53Z sequence translates_to Example: codon translates_to amino_acid. translates_to Inverse of translation _of. http://precedings.nature.com/documents/3495/version/1 X is translation of Y if Y is translated by ribosome to create X. kareneilbeck 2009-08-19T12:09:59Z sequence translation_of Example: Polypeptide translation_of CDS. translation_of X is translation of Y if Y is translated by ribosome to create X. http://precedings.nature.com/documents/3495/version/1 A' is a variant (mutation) of A = definition every instance of A' is either an immediate mutation of some instance of A, or there is a chain of immediate mutation processes linking A' to some instance of A. sequence variant_of Added to SO during the immunology workshop, June 2007. This relationship was approved by Barry Smith. variant_of A' is a variant (mutation) of A = definition every instance of A' is either an immediate mutation of some instance of A, or there is a chain of immediate mutation processes linking A' to some instance of A. SO:immuno_workshop sequence SO:0000000 Sequence_Ontology true A sequence_feature with an extent greater than zero. A nucleotide region is composed of bases and a polypeptide region is composed of amino acids. sequence sequence SO:0000001 region A sequence_feature with an extent greater than zero. A nucleotide region is composed of bases and a polypeptide region is composed of amino acids. SO:ke A coding exon that is not the most 3-prime or the most 5-prime in a given transcript. interior coding exon sequence SO:0000004 interior_coding_exon The many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA. INSDC_feature:repeat_region http://en.wikipedia.org/wiki/Satellite_DNA INSDC_qualifier:satellite satellite DNA sequence SO:0000005 satellite_DNA The many tandem repeats (identical or related) of a short basic repeating unit; many have a base composition or other property different from the genome average that allows them to be separated from the bulk (main band) genomic DNA. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Satellite_DNA wiki A region amplified by a PCR reaction. http://en.wikipedia.org/wiki/RAPD PCR product sequence amplicon SO:0000006 This term is mapped to MGED. This term is now located in OBI, with the following ID OBI_0000406. PCR_product A region amplified by a PCR reaction. SO:ke http://en.wikipedia.org/wiki/RAPD wiki One of a pair of sequencing reads in which the two members of the pair are related by originating at either end of a clone insert. mate pair read-pair sequence SO:0000007 read_pair One of a pair of sequencing reads in which the two members of the pair are related by originating at either end of a clone insert. SO:ls A small non coding RNA sequence, present in the cytoplasm. INSDC_feature:ncRNA INSDC_qualifier:scRNA small cytoplasmic RNA sequence SO:0000013 scRNA A small non coding RNA sequence, present in the cytoplasm. SO:ke A collection of match parts. sequence SO:0000038 match_set true A collection of match parts. SO:ke A part of a match, for example an hsp from blast is a match_part. match part sequence SO:0000039 match_part A part of a match, for example an hsp from blast is a match_part. SO:ke A part of a gene, that has no other route in the ontology back to region. This concept is necessary for logical inference as these parts must have the properties of region. It also allows us to associate all the parts of genes with a gene. sequence SO:0000050 gene_part true A part of a gene, that has no other route in the ontology back to region. This concept is necessary for logical inference as these parts must have the properties of region. It also allows us to associate all the parts of genes with a gene. SO:ke A regulatory element of an operon to which activators or repressors bind thereby effecting translation of genes in that operon. http://en.wikipedia.org/wiki/Operator_(biology)#Operator operator segment sequence SO:0000057 Moved to transcriptional_cis_regulatory_region (SO:0001055) from gene_group_regulatory_region (SO:0000752) on 11 Feb 2021 when SO:0000752 was merged into SO:0001055. See GitHub Issue #529. operator A regulatory element of an operon to which activators or repressors bind thereby effecting translation of genes in that operon. SO:ma http://en.wikipedia.org/wiki/Operator_(biology)#Operator wiki A binding site that, of a nucleotide molecule, that interacts selectively and non-covalently with polypeptide residues of a nuclease. nuclease binding site sequence SO:0000059 nuclease_binding_site A binding site that, of a nucleotide molecule, that interacts selectively and non-covalently with polypeptide residues of a nuclease. SO:cb A transposon or insertion sequence. An element that can insert in a variety of DNA sequences. http://en.wikipedia.org/wiki/Transposable_element transposable element transposon sequence SO:0000101 transposable_element A transposon or insertion sequence. An element that can insert in a variety of DNA sequences. http://www.sci.sdsu.edu/~smaloy/Glossary/T.html http://en.wikipedia.org/wiki/Transposable_element wiki A match to an EST or cDNA sequence. expressed sequence match sequence SO:0000102 expressed_sequence_match A match to an EST or cDNA sequence. SO:ke The end of the clone insert. clone insert end sequence SO:0000103 clone_insert_end The end of the clone insert. SO:ke A sequence of amino acids linked by peptide bonds which may lack appreciable tertiary structure and may not be liable to irreversible denaturation. SO:0000358 http://en.wikipedia.org/wiki/Polypeptide protein sequence SO:0000104 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. The term 'protein' was merged with 'polypeptide'. Although 'protein' was a sequence_attribute and therefore meant to describe the quality rather than an actual feature, it was being used erroneously. It is replaced by 'peptidyl' as the polymer attribute. polypeptide A sequence of amino acids linked by peptide bonds which may lack appreciable tertiary structure and may not be liable to irreversible denaturation. SO:ma http://en.wikipedia.org/wiki/Polypeptide wiki A sequence_variant is a non exact copy of a sequence_feature or genome exhibiting one or more sequence_alteration. sequence mutation SO:0000109 sequence_variant_obs true A sequence_variant is a non exact copy of a sequence_feature or genome exhibiting one or more sequence_alteration. SO:ke Any extent of continuous biological sequence. INSDC_feature:misc_feature INSDC_note:other INSDC_note:sequence_feature located_sequence_feature sequence feature sequence located sequence feature SO:0000110 sequence_feature Any extent of continuous biological sequence. LAMHDI:mb SO:ke An oligo to which new deoxyribonucleotides can be added by DNA polymerase. http://en.wikipedia.org/wiki/Primer_(molecular_biology) DNA primer primer oligonucleotide primer polynucleotide primer sequence sequence SO:0000112 primer An oligo to which new deoxyribonucleotides can be added by DNA polymerase. SO:ke http://en.wikipedia.org/wiki/Primer_(molecular_biology) wiki A viral sequence which has integrated into a host genome. proviral region sequence proviral sequence SO:0000113 proviral_region A viral sequence which has integrated into a host genome. SO:ke A methylated deoxy-cytosine. methylated C methylated cytosine methylated cytosine base methylated cytosine residue methylated_C sequence SO:0000114 methylated_cytosine A methylated deoxy-cytosine. SO:ke A primary transcript that, at least in part, encodes one or more proteins. protein coding primary transcript sequence pre mRNA SO:0000120 May contain introns. protein_coding_primary_transcript A primary transcript that, at least in part, encodes one or more proteins. SO:ke Region in mRNA where ribosome assembles. INSDC_feature:regulatory INSDC_qualifier:ribosome_binding_site ribosome entry site sequence SO:0000139 ribosome_entry_site Region in mRNA where ribosome assembles. SO:ke A sequence segment located within the five prime end of an mRNA that causes premature termination of translation. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Attenuator INSDC_qualifier:attenuator attenuator sequence sequence SO:0000140 attenuator A sequence segment located within the five prime end of an mRNA that causes premature termination of translation. SO:as http://en.wikipedia.org/wiki/Attenuator wiki The sequence of DNA located either at the end of the transcript that causes RNA polymerase to terminate transcription. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Terminator_(genetics) INSDC_qualifier:terminator terminator sequence sequence SO:0000141 Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527. terminator The sequence of DNA located either at the end of the transcript that causes RNA polymerase to terminate transcription. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Terminator_(genetics) wiki A region of known length which may be used to manufacture a longer region. assembly component sequence SO:0000143 assembly_component A region of known length which may be used to manufacture a longer region. SO:ke A region of the transcript sequence within a gene which is not removed from the primary RNA transcript by RNA splicing. http://en.wikipedia.org/wiki/Exon INSDC_feature:exon sequence SO:0000147 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. exon A region of the transcript sequence within a gene which is not removed from the primary RNA transcript by RNA splicing. SO:ke http://en.wikipedia.org/wiki/Exon wiki One or more contigs that have been ordered and oriented using end-read information. Contains gaps that are filled with N's. sequence scaffold SO:0000148 supercontig One or more contigs that have been ordered and oriented using end-read information. Contains gaps that are filled with N's. SO:ls A contiguous sequence derived from sequence assembly. Has no gaps, but may contain N's from unavailable bases. http://en.wikipedia.org/wiki/Contig sequence SO:0000149 contig A contiguous sequence derived from sequence assembly. Has no gaps, but may contain N's from unavailable bases. SO:ls http://en.wikipedia.org/wiki/Contig wiki A sequence obtained from a single sequencing experiment. Typically a read is produced when a base calling program interprets information from a chromatogram trace file produced from a sequencing machine. sequence SO:0000150 read A sequence obtained from a single sequencing experiment. Typically a read is produced when a base calling program interprets information from a chromatogram trace file produced from a sequencing machine. SO:rd A piece of DNA that has been inserted in a vector so that it can be propagated in a host bacterium or some other organism. http:http://en.wikipedia.org/wiki/Clone_(genetics) sequence SO:0000151 clone A piece of DNA that has been inserted in a vector so that it can be propagated in a host bacterium or some other organism. SO:ke http:http://en.wikipedia.org/wiki/Clone_(genetics) wiki The point at which one or more contiguous nucleotides were excised. SO:1000033 http://en.wikipedia.org/wiki/Nucleotide_deletion loinc:LA6692-3 deleted_sequence nucleotide deletion nucleotide_deletion sequence SO:0000159 deletion The point at which one or more contiguous nucleotides were excised. SO:ke http://en.wikipedia.org/wiki/Nucleotide_deletion wiki loinc:LA6692-3 Deletion A modified base in which adenine has been methylated. methylated A methylated adenine methylated adenine base methylated adenine residue methylated_A sequence SO:0000161 methylated_adenine A modified base in which adenine has been methylated. SO:ke Consensus region of primary transcript bordering junction of splicing. A region that overlaps exactly 2 base and adjacent_to splice_junction. http://en.wikipedia.org/wiki/Splice_site splice site sequence SO:0000162 With spliceosomal introns, the splice sites bind the spliceosomal machinery. splice_site Consensus region of primary transcript bordering junction of splicing. A region that overlaps exactly 2 base and adjacent_to splice_junction. SO:cjm SO:ke http://en.wikipedia.org/wiki/Splice_site wiki Intronic 2 bp region bordering the exon, at the 5' edge of the intron. A splice_site that is downstream_adjacent_to exon and starts intron. 5' splice site donor splice site five prime splice site splice donor site sequence donor SO:0000163 five_prime_cis_splice_site Intronic 2 bp region bordering the exon, at the 5' edge of the intron. A splice_site that is downstream_adjacent_to exon and starts intron. SO:cjm SO:ke http://www.ucl.ac.uk/~ucbhjow/b241/glossary.html Intronic 2 bp region bordering the exon, at the 3' edge of the intron. A splice_site that is upstream_adjacent_to exon and finishes intron. acceptor splice site splice acceptor site three prime splice site sequence 3' splice site acceptor SO:0000164 three_prime_cis_splice_site Intronic 2 bp region bordering the exon, at the 3' edge of the intron. A splice_site that is upstream_adjacent_to exon and finishes intron. SO:cjm SO:ke http://www.ucl.ac.uk/~ucbhjow/b241/glossary.html A cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Enhancer_(genetics) INSDC_qualifier:enhancer sequence SO:0000165 An enhancer may participate in an enhanceosome GO:0034206. A protein-DNA complex formed by the association of a distinct set of general and specific transcription factors with a region of enhancer DNA. The cooperative assembly of an enhanceosome confers specificity of transcriptional regulation. This comment is a place holder should we start to make cross products with GO. enhancer A cis-acting sequence that increases the utilization of (some) eukaryotic promoters, and can function in either orientation and in any location (upstream or downstream) relative to the promoter. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Enhancer_(genetics) wiki A regulatory_region composed of the TSS(s) and binding sites for TF_complexes of the core transcription machinery. A region (DNA) to which RNA polymerase binds, to begin transcription. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Promoter INSDC_qualifier:promoter promoter sequence sequence SO:0000167 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. The region on a DNA molecule involved in RNA polymerase binding to initiate transcription. Moved from is_a: SO:0001055 transcriptional_cis_regulatory_region as per request from GREEKC initiative in August 2020. Merged with RNA_polymerase_promoter (SO:0001203) Aug 2020. Moved up one level from is_a CRM (SO:0000727) to is_a transcriptional_cis_regulatory_region (SO:0001055) as part of the GREEKC work January 2021. Pascale Gaudet from Gene Ontology pointed out that CRM can be located upstream of the promoter and therefore cannot include the promoter. promoter A regulatory_region composed of the TSS(s) and binding sites for TF_complexes of the core transcription machinery. A region (DNA) to which RNA polymerase binds, to begin transcription. SO:regcreative http://en.wikipedia.org/wiki/Promoter wiki A nucleotide match against a sequence from another organism. cross genome match sequence SO:0000177 cross_genome_match A nucleotide match against a sequence from another organism. SO:ma The DNA region of a group of adjacent genes whose transcription is coordinated on one or several mutually overlapping transcription units transcribed in the same direction and sharing at least one gene. http://en.wikipedia.org/wiki/Operon INSDC_feature:operon sequence SO:0000178 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. Definition updated with per Mejia-Almonte et.al Redefining fundamental concepts of transcription initiation in prokaryotes Aug 5 2020. operon The DNA region of a group of adjacent genes whose transcription is coordinated on one or several mutually overlapping transcription units transcribed in the same direction and sharing at least one gene. SO:ma http://en.wikipedia.org/wiki/Operon wiki The start of the clone insert. clone insert start sequence SO:0000179 clone_insert_start The start of the clone insert. SO:ke A match against a translated sequence. translated nucleotide match sequence SO:0000181 translated_nucleotide_match A match against a translated sequence. SO:ke A region of the gene which is not transcribed. non transcribed region non-transcribed sequence nontranscribed region nontranscribed sequence sequence SO:0000183 non_transcribed_region A region of the gene which is not transcribed. SO:ke A transcript that in its initial state requires modification to be functional. http://en.wikipedia.org/wiki/Primary_transcript INSDC_feature:precursor_RNA INSDC_feature:prim_transcript precursor RNA primary transcript sequence SO:0000185 primary_transcript A transcript that in its initial state requires modification to be functional. SO:ma http://en.wikipedia.org/wiki/Primary_transcript wiki A group of characterized repeat sequences. sequence SO:0000187 repeat_family true A group of characterized repeat sequences. SO:ke A region of a primary transcript that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it. http://en.wikipedia.org/wiki/Intron INSDC_feature:intron sequence SO:0000188 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. intron A region of a primary transcript that is transcribed, but removed from within the transcript by splicing together the sequences (exons) on either side of it. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Intron wiki A DNA fragment used as a reagent to detect the polymorphic genomic loci by hybridizing against the genomic DNA digested with a given restriction enzyme. http://en.wikipedia.org/wiki/Restriction_fragment_length_polymorphism RFLP RFLP fragment restriction fragment length polymorphism sequence SO:0000193 RFLP_fragment A DNA fragment used as a reagent to detect the polymorphic genomic loci by hybridizing against the genomic DNA digested with a given restriction enzyme. GOC:pj http://en.wikipedia.org/wiki/Restriction_fragment_length_polymorphism wiki An exon whereby at least one base is part of a codon (here, 'codon' is inclusive of the stop_codon). coding exon sequence SO:0000195 coding_exon An exon whereby at least one base is part of a codon (here, 'codon' is inclusive of the stop_codon). SO:ke The sequence of the five_prime_coding_exon that codes for protein. five prime exon coding region sequence SO:0000196 five_prime_coding_exon_coding_region The sequence of the five_prime_coding_exon that codes for protein. SO:cjm The sequence of the three_prime_coding_exon that codes for protein. three prime exon coding region sequence SO:0000197 three_prime_coding_exon_coding_region The sequence of the three_prime_coding_exon that codes for protein. SO:cjm An exon that does not contain any codons. noncoding exon sequence SO:0000198 noncoding_exon An exon that does not contain any codons. SO:ke The 5' most coding exon. 5' coding exon five prime coding exon sequence SO:0000200 five_prime_coding_exon The 5' most coding exon. SO:ke Messenger RNA sequences that are untranslated and lie five prime or three prime to sequences which are translated. untranslated region sequence SO:0000203 UTR Messenger RNA sequences that are untranslated and lie five prime or three prime to sequences which are translated. SO:ke A region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein. http://en.wikipedia.org/wiki/5'_UTR 5' UTR INSDC_feature:5'UTR five prime UTR five_prime_untranslated_region sequence SO:0000204 five_prime_UTR A region at the 5' end of a mature transcript (preceding the initiation codon) that is not translated into a protein. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/5'_UTR wiki A region at the 3' end of a mature transcript (following the stop codon) that is not translated into a protein. http://en.wikipedia.org/wiki/Three_prime_untranslated_region INSDC_feature:3'UTR three prime UTR three prime untranslated region sequence SO:0000205 three_prime_UTR A region at the 3' end of a mature transcript (following the stop codon) that is not translated into a protein. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Three_prime_untranslated_region wiki A primary transcript encoding a ribosomal RNA. rRNA primary transcript ribosomal RNA primary transcript sequence SO:0000209 rRNA_primary_transcript A primary transcript encoding a ribosomal RNA. SO:ke A transcript which has undergone the necessary modifications, if any, for its function. In eukaryotes this includes, for example, processing of introns, cleavage, base modification, and modifications to the 5' and/or the 3' ends, other than addition of bases. In bacteria functional mRNAs are usually not modified. http://en.wikipedia.org/wiki/Mature_transcript mature transcript sequence SO:0000233 A processed transcript cannot contain introns. mature_transcript A transcript which has undergone the necessary modifications, if any, for its function. In eukaryotes this includes, for example, processing of introns, cleavage, base modification, and modifications to the 5' and/or the 3' ends, other than addition of bases. In bacteria functional mRNAs are usually not modified. SO:ke http://en.wikipedia.org/wiki/Mature_transcript wiki Messenger RNA is the intermediate molecule between DNA and protein. It includes UTR and coding sequences. It does not contain introns. http://en.wikipedia.org/wiki/MRNA http://www.gencodegenes.org/gencode_biotypes.html INSDC_feature:mRNA messenger RNA protein_coding_transcript sequence SO:0000234 An mRNA does not contain introns as it is a processed_transcript. The equivalent kind of primary_transcript is protein_coding_primary_transcript (SO:0000120) which may contain introns. This term is mapped to MGED. Do not obsolete without consulting MGED ontology. mRNA Messenger RNA is the intermediate molecule between DNA and protein. It includes UTR and coding sequences. It does not contain introns. SO:ma http://en.wikipedia.org/wiki/MRNA wiki http://www.gencodegenes.org/gencode_biotypes.html GENCODE A DNA site where a transcription factor binds. TF binding site transcription factor binding site sequence SO:0000235 Definition updated along with definitions in Mejia-Almonte et.al PMID:32665585. Added relationship part_of SO:0000727 CRM in place of previous CRM relationship has_part TF_binding_site August 2020 in response to requests from GREEKC initiative. Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527. TF_binding_site A DNA site where a transcription factor binds. SO:ke The in-frame interval between the stop codons of a reading frame which when read as sequential triplets, has the potential of encoding a sequential string of amino acids. TER(NNN)nTER. open reading frame sequence SO:0000236 The definition was modified by Rama. ORF is defined by the sequence, whereas the CDS is defined according to whether a polypeptide is made. This term is mapped to MGED. Do not obsolete without consulting MGED ontology. ORF The in-frame interval between the stop codons of a reading frame which when read as sequential triplets, has the potential of encoding a sequential string of amino acids. TER(NNN)nTER. SGD:rb SO:ma The sequences extending on either side of a specific region. flanking region sequence SO:0000239 flanking_region The sequences extending on either side of a specific region. SO:ke rRNA is an RNA component of a ribosome that can provide both structural scaffolding and catalytic activity. INSDC_qualifier:unknown http://en.wikipedia.org/wiki/RRNA INSDC_feature:rRNA ribosomal RNA ribosomal ribonucleic acid sequence SO:0000252 Definition updated 10 June 2021 as part of restructuring rRNA terms and reforming definitions to have similar structures. Request from EBI. See GitHub Issue #493 rRNA rRNA is an RNA component of a ribosome that can provide both structural scaffolding and catalytic activity. ISBN:0198506732 http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/RRNA wiki Transfer RNA (tRNA) molecules are approximately 80 nucleotides in length. Their secondary structure includes four short double-helical elements and three loops (D, anti-codon, and T loops). Further hydrogen bonds mediate the characteristic L-shaped molecular structure. Transfer RNAs have two regions of fundamental functional importance: the anti-codon, which is responsible for specific mRNA codon recognition, and the 3' end, to which the tRNA's corresponding amino acid is attached (by aminoacyl-tRNA synthetases). Transfer RNAs cope with the degeneracy of the genetic code in two manners: having more than one tRNA (with a specific anti-codon) for a particular amino acid; and 'wobble' base-pairing, i.e. permitting non-standard base-pairing at the 3rd anti-codon position. INSDC_qualifier:unknown http://en.wikipedia.org/wiki/TRNA INSDC_feature:tRNA sequence transfer RNA transfer ribonucleic acid SO:0000253 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. tRNA Transfer RNA (tRNA) molecules are approximately 80 nucleotides in length. Their secondary structure includes four short double-helical elements and three loops (D, anti-codon, and T loops). Further hydrogen bonds mediate the characteristic L-shaped molecular structure. Transfer RNAs have two regions of fundamental functional importance: the anti-codon, which is responsible for specific mRNA codon recognition, and the 3' end, to which the tRNA's corresponding amino acid is attached (by aminoacyl-tRNA synthetases). Transfer RNAs cope with the degeneracy of the genetic code in two manners: having more than one tRNA (with a specific anti-codon) for a particular amino acid; and 'wobble' base-pairing, i.e. permitting non-standard base-pairing at the 3rd anti-codon position. ISBN:0198506732 http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00005 http://en.wikipedia.org/wiki/TRNA wiki A small nuclear RNA molecule involved in pre-mRNA splicing and processing. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/SnRNA INSDC_qualifier:snRNA small nuclear RNA sequence SO:0000274 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. snRNA A small nuclear RNA molecule involved in pre-mRNA splicing and processing. PMID:11733745 WB:ems http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/SnRNA wiki Small nucleolar RNAs (snoRNAs) are short non-coding RNAs enriched in the nucleolus as components of small nucleolar ribonucleoproteins. They guide ribose methylation and pseudouridylation of rRNAs and snRNAs, and a subgroup regulate excision of rRNAs from rRNA precursor transcripts. snoRNAs may also guide rRNA acetylation and tRNA methylation, and regulate mRNA abundance and alternative splicing. INSDC_feature:ncRNA INSDC_qualifier:snoRNA small nucleolar RNA sequence SO:0000275 Updated the definition of snoRNA (SO:0000275) from "A snoRNA (small nucleolar RNA) is any one of a class of small RNAs that are associated with the eukaryotic nucleus as components of small nucleolar ribonucleoproteins. They participate in the processing or modifications of many RNAs, mostly ribosomal RNAs (rRNAs) though snoRNAs are also known to target other classes of RNA, including spliceosomal RNAs, tRNAs, and mRNAs via a stretch of sequence that is complementary to a sequence in the targeted RNA." to "Small nucleolar RNAs (snoRNAs) are short non-coding RNAs enriched in the nucleolus as components of small nucleolar ribonucleoproteins. They guide ribose methylation and pseudouridylation of rRNAs and snRNAs, and a subgroup regulate excision of rRNAs from rRNA precursor transcripts. snoRNAs may also guide rRNA acetylation and tRNA methylation, and regulate mRNA abundance and alternative splicing." to acknowledge that some snoRNAs functionally localize to other compartments (cytoplasm or even secreted). See GitHub Issue #578. snoRNA Small nucleolar RNAs (snoRNAs) are short non-coding RNAs enriched in the nucleolus as components of small nucleolar ribonucleoproteins. They guide ribose methylation and pseudouridylation of rRNAs and snRNAs, and a subgroup regulate excision of rRNAs from rRNA precursor transcripts. snoRNAs may also guide rRNA acetylation and tRNA methylation, and regulate mRNA abundance and alternative splicing. GOC:kgc PMID:31828325 Small, ~22-nt, RNA molecule that is the endogenous transcript of a miRNA gene (or the product of other non coding RNA genes. Micro RNAs are produced from precursor molecules (SO:0001244) that can form local hairpin structures, which ordinarily are processed (usually via the Dicer pathway) such that a single miRNA molecule accumulates from one arm of a hairpin precursor molecule. Micro RNAs may trigger the cleavage of their target molecules or act as translational repressors. SO:0000649 INSDC_feature:ncRNA http://en.wikipedia.org/wiki/MiRNA http://en.wikipedia.org/wiki/StRNA INSDC_qualifier:miRNA micro RNA microRNA small temporal RNA stRNA sequence SO:0000276 miRNA Small, ~22-nt, RNA molecule that is the endogenous transcript of a miRNA gene (or the product of other non coding RNA genes. Micro RNAs are produced from precursor molecules (SO:0001244) that can form local hairpin structures, which ordinarily are processed (usually via the Dicer pathway) such that a single miRNA molecule accumulates from one arm of a hairpin precursor molecule. Micro RNAs may trigger the cleavage of their target molecules or act as translational repressors. PMID:11081512 PMID:12592000 http://en.wikipedia.org/wiki/MiRNA wiki http://en.wikipedia.org/wiki/StRNA wiki A repeat_region containing repeat_units of 2 to 10 bp repeated in tandem. INSDC_feature:repeat_region http://en.wikipedia.org/wiki/Microsatellite INSDC_qualifier:microsatellite STR microsatellite locus microsatellite marker short tandem repeat sequence SO:0000289 microsatellite A repeat_region containing repeat_units of 2 to 10 bp repeated in tandem. NCBI:th http://www.informatics.jax.org/silver/glossary.shtml http://en.wikipedia.org/wiki/Microsatellite wiki STR http://www.ncbi.nlm.nih.gov/books/NBK21126/def-item/A9651/ The sequence is complementarily repeated on the opposite strand. It is a palindrome, and it may, or may not be hyphenated. Examples: GCTGATCAGC, or GCTGA-----TCAGC. INSDC_feature:repeat_region http://en.wikipedia.org/wiki/Inverted_repeat INSDC_qualifier:inverted inverted repeat inverted repeat sequence sequence SO:0000294 inverted_repeat The sequence is complementarily repeated on the opposite strand. It is a palindrome, and it may, or may not be hyphenated. Examples: GCTGATCAGC, or GCTGA-----TCAGC. SO:ke http://en.wikipedia.org/wiki/Inverted_repeat wiki A region of nucleic acid from which replication initiates; includes sequences that are recognized by replication proteins, the site from which the first separation of complementary strands occurs, and specific replication start sites. http://en.wikipedia.org/wiki/Origin_of_replication INSDC_feature:rep_origin ori origin of replication sequence SO:0000296 origin_of_replication A region of nucleic acid from which replication initiates; includes sequences that are recognized by replication proteins, the site from which the first separation of complementary strands occurs, and specific replication start sites. NCBI:cf http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Origin_of_replication wiki Part of the primary transcript that is clipped off during processing. sequence SO:0000303 clip Part of the primary transcript that is clipped off during processing. SO:ke A modified nucleotide, i.e. a nucleotide other than A, T, C. G. INSDC_feature:modified_base modified base site sequence SO:0000305 Modified base:<modified_base>. modified_DNA_base A modified nucleotide, i.e. a nucleotide other than A, T, C. G. http://www.insdc.org/files/feature_table.html A nucleotide modified by methylation. methylated base feature sequence SO:0000306 methylated_DNA_base_feature A nucleotide modified by methylation. SO:ke Regions of a few hundred to a few thousand bases in vertebrate genomes that are relatively GC and CpG rich; they are typically unmethylated and often found near the 5' ends of genes. http://en.wikipedia.org/wiki/CpG_island CG island CpG island sequence SO:0000307 CpG_island Regions of a few hundred to a few thousand bases in vertebrate genomes that are relatively GC and CpG rich; they are typically unmethylated and often found near the 5' ends of genes. SO:rd http://en.wikipedia.org/wiki/CpG_island wiki A repeat where the same sequence is repeated in the same direction. Example: GCTGA-followed by-GCTGA. INSDC_feature:repeat_region http://en.wikipedia.org/wiki/Direct_repeat INSDC_qualifier:direct direct repeat sequence SO:0000314 direct_repeat A repeat where the same sequence is repeated in the same direction. Example: GCTGA-followed by-GCTGA. SO:ke http://en.wikipedia.org/wiki/Direct_repeat wiki The first base where RNA polymerase begins to synthesize the RNA transcript. INSDC_feature:misc_feature INSDC_note:transcription_start_site transcription start site transcription_start_site sequence SO:0000315 Added relationship is_a SO:0002309 core_promoter_element with the creation of core_promoter_element as part of GREEKC initiative August 2020 - Dave Sant. TSS The first base where RNA polymerase begins to synthesize the RNA transcript. SO:ke A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon. INSDC_feature:CDS coding sequence coding_sequence sequence SO:0000316 CDS A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon. SO:ma First codon to be translated by a ribosome. http://en.wikipedia.org/wiki/Start_codon initiation codon start codon sequence SO:0000318 start_codon First codon to be translated by a ribosome. SO:ke http://en.wikipedia.org/wiki/Start_codon wiki In mRNA, a set of three nucleotides that indicates the end of information for protein synthesis. http://en.wikipedia.org/wiki/Stop_codon stop codon sequence SO:0000319 stop_codon In mRNA, a set of three nucleotides that indicates the end of information for protein synthesis. SO:ke http://en.wikipedia.org/wiki/Stop_codon wiki A nucleotide sequence that may be used to identify a larger sequence. sequence SO:0000324 tag A nucleotide sequence that may be used to identify a larger sequence. SO:ke A primary transcript encoding a large ribosomal subunit RNA. 35S rRNA primary transcript rRNA large subunit primary transcript sequence SO:0000325 rRNA_large_subunit_primary_transcript A primary transcript encoding a large ribosomal subunit RNA. SO:ke A short diagnostic sequence tag, serial analysis of gene expression (SAGE), that allows the quantitative and simultaneous analysis of a large number of transcripts. SAGE tag sequence SO:0000326 SAGE_tag A short diagnostic sequence tag, serial analysis of gene expression (SAGE), that allows the quantitative and simultaneous analysis of a large number of transcripts. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=7570003&dopt=Abstract Region of sequence similarity by descent from a common ancestor. INSDC_feature:misc_feature http://en.wikipedia.org/wiki/Conserved_region INSDC_note:conserved_region conserved region sequence SO:0000330 conserved_region Region of sequence similarity by descent from a common ancestor. SO:ke http://en.wikipedia.org/wiki/Conserved_region wiki Short (typically a few hundred base pairs) DNA sequence that has a single occurrence in a genome and whose location and base sequence are known. INSDC_feature:STS sequence tag site sequence SO:0000331 STS Short (typically a few hundred base pairs) DNA sequence that has a single occurrence in a genome and whose location and base sequence are known. http://www.biospace.com Coding region of sequence similarity by descent from a common ancestor. coding conserved region sequence SO:0000332 coding_conserved_region Coding region of sequence similarity by descent from a common ancestor. SO:ke The boundary between two exons in a processed transcript. exon junction sequence SO:0000333 exon_junction The boundary between two exons in a processed transcript. SO:ke Non-coding region of sequence similarity by descent from a common ancestor. conserved non-coding element conserved non-coding sequence nc conserved region noncoding conserved region sequence SO:0000334 nc_conserved_region Non-coding region of sequence similarity by descent from a common ancestor. SO:ke A sequence that closely resembles a known functional gene, at another locus within a genome, that is non-functional as a consequence of (usually several) mutations that prevent either its transcription or translation (or both). In general, pseudogenes result from either reverse transcription of a transcript of their "normal" paralog (SO:0000043) (in which case the pseudogene typically lacks introns and includes a poly(A) tail) or from recombination (SO:0000044) (in which case the pseudogene is typically a tandem duplication of its "normal" paralog). INSDC_feature:gene http://en.wikipedia.org/wiki/Pseudogene INSDC_qualifier:pseudo INSDC_qualifier:unknown sequence SO:0000336 pseudogene A sequence that closely resembles a known functional gene, at another locus within a genome, that is non-functional as a consequence of (usually several) mutations that prevent either its transcription or translation (or both). In general, pseudogenes result from either reverse transcription of a transcript of their "normal" paralog (SO:0000043) (in which case the pseudogene typically lacks introns and includes a poly(A) tail) or from recombination (SO:0000044) (in which case the pseudogene is typically a tandem duplication of its "normal" paralog). http://www.ucl.ac.uk/~ucbhjow/b241/glossary.html http://en.wikipedia.org/wiki/Pseudogene wiki A double stranded RNA duplex, at least 20bp long, used experimentally to inhibit gene function by RNA interference. RNAi reagent sequence SO:0000337 RNAi_reagent A double stranded RNA duplex, at least 20bp long, used experimentally to inhibit gene function by RNA interference. SO:rd Structural unit composed of a nucleic acid molecule which controls its own replication through the interaction of specific proteins at one or more origins of replication. http://en.wikipedia.org/wiki/Chromosome sequence SO:0000340 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. chromosome Structural unit composed of a nucleic acid molecule which controls its own replication through the interaction of specific proteins at one or more origins of replication. SO:ma http://en.wikipedia.org/wiki/Chromosome wiki A cytologically distinguishable feature of a chromosome, often made visible by staining, and usually alternating light and dark. http://en.wikipedia.org/wiki/Cytological_band chromosome band cytoband cytological band sequence SO:0000341 chromosome_band A cytologically distinguishable feature of a chromosome, often made visible by staining, and usually alternating light and dark. SO:ma http://en.wikipedia.org/wiki/Cytological_band wiki A region of sequence, aligned to another sequence with some statistical significance, using an algorithm such as BLAST or SIM4. sequence SO:0000343 match A region of sequence, aligned to another sequence with some statistical significance, using an algorithm such as BLAST or SIM4. SO:ke Region of a transcript that regulates splicing. splice enhancer sequence SO:0000344 splice_enhancer Region of a transcript that regulates splicing. SO:ke A tag produced from a single sequencing read from a cDNA clone or PCR product; typically a few hundred base pairs long. expressed sequence tag sequence SO:0000345 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. EST A tag produced from a single sequencing read from a cDNA clone or PCR product; typically a few hundred base pairs long. SO:ke A match against a nucleotide sequence. nucleotide match sequence SO:0000347 nucleotide_match A match against a nucleotide sequence. SO:ke A match against a protein sequence. protein match sequence SO:0000349 protein_match A match against a protein sequence. SO:ke A sequence of nucleotides that has been algorithmically derived from an alignment of two or more different sequences. http://en.wikipedia.org/wiki/Sequence_assembly sequence assembly sequence SO:0000353 sequence_assembly A sequence of nucleotides that has been algorithmically derived from an alignment of two or more different sequences. SO:ma http://en.wikipedia.org/wiki/Sequence_assembly wiki A set of (usually) three nucleotide bases in a DNA or RNA sequence, which together code for a unique amino acid or the termination of translation and are contained within the CDS. http://en.wikipedia.org/wiki/Codon sequence SO:0000360 codon A set of (usually) three nucleotide bases in a DNA or RNA sequence, which together code for a unique amino acid or the termination of translation and are contained within the CDS. SO:ke http://en.wikipedia.org/wiki/Codon wiki The junction where an insertion occurred. insertion site sequence SO:0000366 insertion_site The junction where an insertion occurred. SO:ke The junction in a genome where a transposable_element has inserted. transposable element insertion site sequence SO:0000368 transposable_element_insertion_site The junction in a genome where a transposable_element has inserted. SO:ke A non-coding RNA less than 200 nucleotides long, usually with a specific secondary structure, that acts to regulate gene expression. These include short ncRNAs such as piRNA, miRNA and siRNAs (among others). small regulatory ncRNA sequence SO:0000370 small_regulatory_ncRNA A non-coding RNA less than 200 nucleotides long, usually with a specific secondary structure, that acts to regulate gene expression. These include short ncRNAs such as piRNA, miRNA and siRNAs (among others). PMID:28541282 PomBase:al SO:ma An RNA sequence that has catalytic activity with or without an associated ribonucleoprotein. enzymatic RNA sequence SO:0000372 This was moved to be a child of transcript (SO:0000673) because some enzymatic RNA regions are part of primary transcripts and some are part of processed transcripts. Moved under ncRNA on 18 Nov 2021. See GitHub Issue #533. enzymatic_RNA An RNA sequence that has catalytic activity with or without an associated ribonucleoprotein. RSC:cb An RNA with catalytic activity. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/Ribozyme INSDC_qualifier:ribozyme sequence SO:0000374 ribozyme An RNA with catalytic activity. SO:ma http://en.wikipedia.org/wiki/Ribozyme wiki Cytosolic 5.8S rRNA is an RNA component of the large subunit of cytosolic ribosomes in eukaryotes. http://en.wikipedia.org/wiki/5.8S_ribosomal_RNA cytosolic 5.8S LSU rRNA cytosolic 5.8S rRNA cytosolic 5.8S ribosomal RNA cytosolic rRNA 5 8S sequence SO:0000375 Dave Sant removed '5_8S rRNA is also found in archaea.' from definition due to lack of references mentioning this on 1 Feb 2021. See GitHub Issue #505. Renamed from rRNA_5_8S to cytosolic_5_8S_rRNA on 10 June 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493. cytosolic_5_8S_rRNA Cytosolic 5.8S rRNA is an RNA component of the large subunit of cytosolic ribosomes in eukaryotes. https://rfam.xfam.org/family/RF00002 http://en.wikipedia.org/wiki/5.8S_ribosomal_RNA wiki A small catalytic RNA motif that catalyzes self-cleavage reaction. Its name comes from its secondary structure which resembles a carpenter's hammer. The hammerhead ribozyme is involved in the replication of some viroid and some satellite RNAs. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/Hammerhead_ribozyme INSDC_qualifier:hammerhead_ribozyme hammerhead ribozyme sequence SO:0000380 hammerhead_ribozyme A small catalytic RNA motif that catalyzes self-cleavage reaction. Its name comes from its secondary structure which resembles a carpenter's hammer. The hammerhead ribozyme is involved in the replication of some viroid and some satellite RNAs. PMID:2436805 http://en.wikipedia.org/wiki/Hammerhead_ribozyme wiki The RNA molecule essential for the catalytic activity of RNase MRP, an enzymatically active ribonucleoprotein with two distinct roles in eukaryotes. In mitochondria it plays a direct role in the initiation of mitochondrial DNA replication. In the nucleus it is involved in precursor rRNA processing, where it cleaves the internal transcribed spacer 1 between 18S and 5.8S rRNAs. INSDC_feature:ncRNA INSDC_qualifier:RNase_MRP_RNA RNase MRP RNA sequence SO:0000385 Moved under enzymatic_RNA on 18 Nov 2021. See GitHub Issue #533. RNase_MRP_RNA The RNA molecule essential for the catalytic activity of RNase MRP, an enzymatically active ribonucleoprotein with two distinct roles in eukaryotes. In mitochondria it plays a direct role in the initiation of mitochondrial DNA replication. In the nucleus it is involved in precursor rRNA processing, where it cleaves the internal transcribed spacer 1 between 18S and 5.8S rRNAs. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00030 The RNA component of Ribonuclease P (RNase P), a ubiquitous endoribonuclease, found in archaea, bacteria and eukarya as well as chloroplasts and mitochondria. Its best characterized activity is the generation of mature 5 prime ends of tRNAs by cleaving the 5 prime leader elements of precursor-tRNAs. Cellular RNase Ps are ribonucleoproteins. RNA from bacterial RNase Ps retains its catalytic activity in the absence of the protein subunit, i.e. it is a ribozyme. Isolated eukaryotic and archaeal RNase P RNA has not been shown to retain its catalytic function, but is still essential for the catalytic activity of the holoenzyme. Although the archaeal and eukaryotic holoenzymes have a much greater protein content than the bacterial ones, the RNA cores from all the three lineages are homologous. Helices corresponding to P1, P2, P3, P4, and P10/11 are common to all cellular RNase P RNAs. Yet, there is considerable sequence variation, particularly among the eukaryotic RNAs. INSDC_feature:ncRNA INSDC_qualifier:RNase_P_RNA RNase P RNA sequence SO:0000386 Moved under enzymatic_RNA on 18 Nov 2021. See GitHub Issue #533. RNase_P_RNA The RNA component of Ribonuclease P (RNase P), a ubiquitous endoribonuclease, found in archaea, bacteria and eukarya as well as chloroplasts and mitochondria. Its best characterized activity is the generation of mature 5 prime ends of tRNAs by cleaving the 5 prime leader elements of precursor-tRNAs. Cellular RNase Ps are ribonucleoproteins. RNA from bacterial RNase Ps retains its catalytic activity in the absence of the protein subunit, i.e. it is a ribozyme. Isolated eukaryotic and archaeal RNase P RNA has not been shown to retain its catalytic function, but is still essential for the catalytic activity of the holoenzyme. Although the archaeal and eukaryotic holoenzymes have a much greater protein content than the bacterial ones, the RNA cores from all the three lineages are homologous. Helices corresponding to P1, P2, P3, P4, and P10/11 are common to all cellular RNase P RNAs. Yet, there is considerable sequence variation, particularly among the eukaryotic RNAs. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00010 The RNA component of telomerase, a reverse transcriptase that synthesizes telomeric DNA. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/Telomerase_RNA INSDC_qualifier:telomerase_RNA telomerase RNA sequence SO:0000390 telomerase_RNA The RNA component of telomerase, a reverse transcriptase that synthesizes telomeric DNA. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00025 http://en.wikipedia.org/wiki/Telomerase_RNA wiki U1 is a small nuclear RNA (snRNA) component of the spliceosome (involved in pre-mRNA splicing). Its 5' end forms complementary base pairs with the 5' splice junction, thus defining the 5' donor site of an intron. There are significant differences in sequence and secondary structure between metazoan and yeast U1 snRNAs, the latter being much longer (568 nucleotides as compared to 164 nucleotides in human). Nevertheless, secondary structure predictions suggest that all U1 snRNAs share a 'common core' consisting of helices I, II, the proximal region of III, and IV. http://en.wikipedia.org/wiki/U1_snRNA U1 small nuclear RNA U1 snRNA small nuclear RNA U1 snRNA U1 sequence SO:0000391 U1_snRNA U1 is a small nuclear RNA (snRNA) component of the spliceosome (involved in pre-mRNA splicing). Its 5' end forms complementary base pairs with the 5' splice junction, thus defining the 5' donor site of an intron. There are significant differences in sequence and secondary structure between metazoan and yeast U1 snRNAs, the latter being much longer (568 nucleotides as compared to 164 nucleotides in human). Nevertheless, secondary structure predictions suggest that all U1 snRNAs share a 'common core' consisting of helices I, II, the proximal region of III, and IV. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00003 http://en.wikipedia.org/wiki/U1_snRNA wiki U1 small nuclear RNA RSC:cb small nuclear RNA U1 RSC:cb snRNA U1 RSC:cb U2 is a small nuclear RNA (snRNA) component of the spliceosome (involved in pre-mRNA splicing). Complementary binding between U2 snRNA (in an area lying towards the 5' end but 3' to hairpin I) and the branchpoint sequence (BPS) of the intron results in the bulging out of an unpaired adenine, on the BPS, which initiates a nucleophilic attack at the intronic 5' splice site, thus starting the first of two transesterification reactions that mediate splicing. http://en.wikipedia.org/wiki/U2_snRNA U2 small nuclear RNA U2 snRNA small nuclear RNA U2 snRNA U2 sequence SO:0000392 U2_snRNA U2 is a small nuclear RNA (snRNA) component of the spliceosome (involved in pre-mRNA splicing). Complementary binding between U2 snRNA (in an area lying towards the 5' end but 3' to hairpin I) and the branchpoint sequence (BPS) of the intron results in the bulging out of an unpaired adenine, on the BPS, which initiates a nucleophilic attack at the intronic 5' splice site, thus starting the first of two transesterification reactions that mediate splicing. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00004 http://en.wikipedia.org/wiki/U2_snRNA wiki U2 small nuclear RNA RSC:CB small nuclear RNA U2 RSC:CB snRNA U2 RSC:CB U4 small nuclear RNA (U4 snRNA) is a component of the major U2-dependent spliceosome. It forms a duplex with U6, and with each splicing round, it is displaced from U6 (and the spliceosome) in an ATP-dependent manner, allowing U6 to refold and create the active site for splicing catalysis. A recycling process involving protein Prp24 re-anneals U4 and U6. http://en.wikipedia.org/wiki/U4_snRNA U4 small nuclear RNA U4 snRNA small nuclear RNA U4 snRNA U4 sequence SO:0000393 U4_snRNA U4 small nuclear RNA (U4 snRNA) is a component of the major U2-dependent spliceosome. It forms a duplex with U6, and with each splicing round, it is displaced from U6 (and the spliceosome) in an ATP-dependent manner, allowing U6 to refold and create the active site for splicing catalysis. A recycling process involving protein Prp24 re-anneals U4 and U6. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00015 http://en.wikipedia.org/wiki/U4_snRNA wiki U4 small nuclear RNA RSC:cb small nuclear RNA U4 RSC:cb snRNA U4 RSC:cb An snRNA required for the splicing of the minor U12-dependent class of eukaryotic nuclear introns. It forms a base paired complex with U6atac_snRNA (SO:0000397). U4atac small nuclear RNA U4atac snRNA small nuclear RNA U4atac snRNA U4atac sequence SO:0000394 U4atac_snRNA An snRNA required for the splicing of the minor U12-dependent class of eukaryotic nuclear introns. It forms a base paired complex with U6atac_snRNA (SO:0000397). PMID:12409455 U4atac small nuclear RNA RSC:cb small nuclear RNA U4atac RSC:cb snRNA U4atac RSC:cb U5 RNA is a component of both types of known spliceosome. The precise function of this molecule is unknown, though it is known that the 5' loop is required for splice site selection and p220 binding, and that both the 3' stem-loop and the Sm site are important for Sm protein binding and cap methylation. http://en.wikipedia.org/wiki/U5_snRNA U5 small nuclear RNA U5 snRNA small nuclear RNA U5 snRNA U5 sequence SO:0000395 U5_snRNA U5 RNA is a component of both types of known spliceosome. The precise function of this molecule is unknown, though it is known that the 5' loop is required for splice site selection and p220 binding, and that both the 3' stem-loop and the Sm site are important for Sm protein binding and cap methylation. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00020 http://en.wikipedia.org/wiki/U5_snRNA wiki U5 small nuclear RNA RSC:cb small nuclear RNA U5 RSC:cb snRNA U5 RSC:cb U6 snRNA is a component of the spliceosome which is involved in splicing pre-mRNA. The putative secondary structure consensus base pairing is confined to a short 5' stem loop, but U6 snRNA is thought to form extensive base-pair interactions with U4 snRNA. http://en.wikipedia.org/wiki/U6_snRNA U6 small nuclear RNA U6 snRNA small nuclear RNA U6 snRNA U6 sequence SO:0000396 U6_snRNA U6 snRNA is a component of the spliceosome which is involved in splicing pre-mRNA. The putative secondary structure consensus base pairing is confined to a short 5' stem loop, but U6 snRNA is thought to form extensive base-pair interactions with U4 snRNA. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00015 http://en.wikipedia.org/wiki/U6_snRNA wiki U6 small nuclear RNA RSC:cb small nuclear RNA U6 RSC:cb snRNA U6 RSC:cb U6atac_snRNA is an snRNA required for the splicing of the minor U12-dependent class of eukaryotic nuclear introns. It forms a base paired complex with U4atac_snRNA (SO:0000394). U6atac small nuclear RNA U6atac snRNA snRNA U6atac sequence SO:0000397 U6atac_snRNA U6atac_snRNA is an snRNA required for the splicing of the minor U12-dependent class of eukaryotic nuclear introns. It forms a base paired complex with U4atac_snRNA (SO:0000394). http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=retrieve&db=pubmed&list_uids=12409455&dopt=Abstract U6atac small nuclear RNA RSC:cb U6atac snRNA RSC:cb snRNA U6atac RSC:cb U11 snRNA plays a role in splicing of the minor U12-dependent class of eukaryotic nuclear introns, similar to U1 snRNA in the major class spliceosome it base pairs to the conserved 5' splice site sequence. http://en.wikipedia.org/wiki/U11_snRNA U11 small nuclear RNA U11 snRNA small nuclear RNA U11 snRNA U11 sequence SO:0000398 U11_snRNA U11 snRNA plays a role in splicing of the minor U12-dependent class of eukaryotic nuclear introns, similar to U1 snRNA in the major class spliceosome it base pairs to the conserved 5' splice site sequence. PMID:9622129 http://en.wikipedia.org/wiki/U11_snRNA wiki U11 small nuclear RNA RSC:cb small nuclear RNA U11 RSC:cb snRNA U11 RSC:cb The U12 small nuclear (snRNA), together with U4atac/U6atac, U5, and U11 snRNAs and associated proteins, forms a spliceosome that cleaves a divergent class of low-abundance pre-mRNA introns. http://en.wikipedia.org/wiki/U12_snRNA U12 small nuclear RNA U12 snRNA small nuclear RNA U12 snRNA U12 sequence SO:0000399 U12_snRNA The U12 small nuclear (snRNA), together with U4atac/U6atac, U5, and U11 snRNAs and associated proteins, forms a spliceosome that cleaves a divergent class of low-abundance pre-mRNA introns. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00007 http://en.wikipedia.org/wiki/U12_snRNA wiki U12 small nuclear RNA RSC:cb small nuclear RNA U12 RSC:cb snRNA U12 RSC:cb U14 small nucleolar RNA (U14 snoRNA) is required for early cleavages of eukaryotic precursor rRNAs. In yeasts, this molecule possess a stem-loop region (known as the Y-domain) which is essential for function. A similar structure, but with a different consensus sequence, is found in plants, but is absent in vertebrates. SO:0005839 U14 small nucleolar RNA U14 snoRNA small nucleolar RNA U14 snoRNA U14 sequence SO:0000403 An evolutionarily conserved eukaryotic low molecular weight RNA capable of intermolecular hybridization with both homologous and heterologous 18S rRNA. U14_snoRNA U14 small nucleolar RNA (U14 snoRNA) is required for early cleavages of eukaryotic precursor rRNAs. In yeasts, this molecule possess a stem-loop region (known as the Y-domain) which is essential for function. A similar structure, but with a different consensus sequence, is found in plants, but is absent in vertebrates. PMID:2551119 http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00016 A family of RNAs are found as part of the enigmatic vault ribonucleoprotein complex. The complex consists of a major vault protein (MVP), two minor vault proteins (VPARP and TEP1), and several small untranslated RNA molecules. It has been suggested that the vault complex is involved in drug resistance. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/Vault_RNA INSDC_qualifier:vault_RNA vault RNA sequence SO:0000404 vault_RNA A family of RNAs are found as part of the enigmatic vault ribonucleoprotein complex. The complex consists of a major vault protein (MVP), two minor vault proteins (VPARP and TEP1), and several small untranslated RNA molecules. It has been suggested that the vault complex is involved in drug resistance. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00006 http://en.wikipedia.org/wiki/Vault_RNA wiki Y RNAs are components of the Ro ribonucleoprotein particle (Ro RNP), in association with Ro60 and La proteins. The Y RNAs and Ro60 and La proteins are well conserved, but the function of the Ro RNP is not known. In humans the RNA component can be one of four small RNAs: hY1, hY3, hY4 and hY5. These small RNAs are predicted to fold into a conserved secondary structure containing three stem structures. The largest of the four, hY1, contains an additional hairpin. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/Y_RNA INSDC_qualifier:Y_RNA Y RNA sequence SO:0000405 Y_RNA Y RNAs are components of the Ro ribonucleoprotein particle (Ro RNP), in association with Ro60 and La proteins. The Y RNAs and Ro60 and La proteins are well conserved, but the function of the Ro RNP is not known. In humans the RNA component can be one of four small RNAs: hY1, hY3, hY4 and hY5. These small RNAs are predicted to fold into a conserved secondary structure containing three stem structures. The largest of the four, hY1, contains an additional hairpin. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00019 http://en.wikipedia.org/wiki/Y_RNA wiki Cytosolic 18S rRNA is an RNA component of the small subunit of cytosolic ribosomes in eukaryotes. http://en.wikipedia.org/wiki/18S_ribosomal_RNA cytosolic 18S rRNA cytosolic 18S ribosomal RNA cytosolic rRNA 18S sequence SO:0000407 Renamed to cytosolic_18S_rRNA from rRNA_18S on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493. cytosolic_18S_rRNA Cytosolic 18S rRNA is an RNA component of the small subunit of cytosolic ribosomes in eukaryotes. SO:ke http://en.wikipedia.org/wiki/18S_ribosomal_RNA wiki A biological_region of sequence that, in the molecule, interacts selectively and non-covalently with other molecules. A region on the surface of a molecule that may interact with another molecule. When applied to polypeptides: Amino acids involved in binding or interactions. It can also apply to an amino acid bond which is represented by the positions of the two flanking amino acids. BS:00033 http://en.wikipedia.org/wiki/Binding_site INSDC_feature:misc_binding binding site binding_or_interaction_site sequence site SO:0000409 See GO:0005488 : binding. binding_site A biological_region of sequence that, in the molecule, interacts selectively and non-covalently with other molecules. A region on the surface of a molecule that may interact with another molecule. When applied to polypeptides: Amino acids involved in binding or interactions. It can also apply to an amino acid bond which is represented by the positions of the two flanking amino acids. EBIBS:GAR SO:ke http://en.wikipedia.org/wiki/Binding_site wiki A binding site that, in the molecule, interacts selectively and non-covalently with polypeptide molecules. INSDC_feature:protein_bind protein binding site sequence SO:0000410 See GO:0042277 : peptide binding. protein_binding_site A binding site that, in the molecule, interacts selectively and non-covalently with polypeptide molecules. SO:ke A region of polynucleotide sequence produced by digestion with a restriction endonuclease. http://en.wikipedia.org/wiki/Restriction_fragment restriction fragment sequence SO:0000412 restriction_fragment A region of polynucleotide sequence produced by digestion with a restriction endonuclease. SO:ke http://en.wikipedia.org/wiki/Restriction_fragment wiki A region where the sequence differs from that of a specified sequence. INSDC_feature:misc_difference sequence difference sequence SO:0000413 sequence_difference A region where the sequence differs from that of a specified sequence. SO:ke The signal_peptide is a short region of the peptide located at the N-terminus that directs the protein to be secreted or part of membrane components. BS:00159 http://en.wikipedia.org/wiki/Signal_peptide INSDC_feature:sig_peptide signal peptide signal peptide coding sequence sequence signal SO:0000418 Old def before biosapiens:The sequence for an N-terminal domain of a secreted protein; this domain is involved in attaching nascent polypeptide to the membrane leader sequence. signal_peptide The signal_peptide is a short region of the peptide located at the N-terminus that directs the protein to be secreted or part of membrane components. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Signal_peptide wiki signal uniprot:feature_type The polypeptide sequence that remains when the cleaved peptide regions have been cleaved from the immature peptide. BS:00149 INSDC_feature:mat_peptide mature protein region sequence chain mature peptide SO:0000419 This term mature peptide, merged with the biosapiens term mature protein region and took that to be the new name. Old def: The coding sequence for the mature or final peptide or protein product following post-translational modification. mature_protein_region The polypeptide sequence that remains when the cleaved peptide regions have been cleaved from the immature peptide. EBIBS:GAR SO:cb http://www.insdc.org/files/feature_table.html chain uniprot:feature_type A sequence that can autonomously replicate, as a plasmid, when transformed into a bacterial host. autonomously replicating sequence sequence SO:0000436 ARS A sequence that can autonomously replicate, as a plasmid, when transformed into a bacterial host. SO:ma A single stranded oligonucleotide. single strand oligo single strand oligonucleotide single stranded oligonucleotide ss oligo ss oligonucleotide sequence SO:0000441 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. ss_oligo A single stranded oligonucleotide. SO:ke A double stranded oligonucleotide. double stranded oligonucleotide ds oligo ds-oligonucleotide sequence SO:0000442 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. ds_oligo A double stranded oligonucleotide. SO:ke A 17-28-nt, small interfering RNA derived from transcripts of repetitive elements. INSDC_feature:ncRNA INSDC_qualifier:rasiRNA repeat associated small interfering RNA sequence SO:0000454 Changed parent term from ncRNA (SO:0000655) to piRNA (SO:0001035). See GitHub Issue #573. rasiRNA A 17-28-nt, small interfering RNA derived from transcripts of repetitive elements. PMID:18032451 http://www.developmentalcell.com/content/article/abstract?uid=PIIS1534580703002284 A non-functional descendant of a functional entity. pseudogenic region sequence SO:0000462 pseudogenic_region A non-functional descendant of a functional entity. SO:cjm A non-functional descendant of an exon. decayed exon sequence SO:0000464 Does not have to be part of a pseudogene. decayed_exon A non-functional descendant of an exon. SO:ke One of the pieces of sequence that make up a golden path. golden path fragment sequence SO:0000468 golden_path_fragment One of the pieces of sequence that make up a golden path. SO:rd A set of regions which overlap with minimal polymorphism to form a linear sequence. tiling path sequence SO:0000472 tiling_path A set of regions which overlap with minimal polymorphism to form a linear sequence. SO:cjm A piece of sequence that makes up a tiling_path (SO:0000472). tiling path fragment sequence SO:0000474 tiling_path_fragment A piece of sequence that makes up a tiling_path (SO:0000472). SO:ke A primary transcript that is never translated into a protein. nc primary transcript noncoding primary transcript sequence SO:0000483 nc_primary_transcript A primary transcript that is never translated into a protein. SO:ke The sequence of the 3' exon that is not coding. three prime coding exon noncoding region three_prime_exon_noncoding_region sequence SO:0000484 three_prime_coding_exon_noncoding_region The sequence of the 3' exon that is not coding. SO:ke The sequence of the 5' exon preceding the start codon. five prime coding exon noncoding region five_prime_exon_noncoding_region sequence SO:0000486 five_prime_coding_exon_noncoding_region The sequence of the 5' exon preceding the start codon. SO:ke A continuous piece of sequence similar to the 'virtual contig' concept of the Ensembl database. virtual sequence sequence SO:0000499 virtual_sequence A continuous piece of sequence similar to the 'virtual contig' concept of the Ensembl database. SO:ke A region of sequence that is transcribed. This region may cover the transcript of a gene, it may emcompas the sequence covered by all of the transcripts of a alternately spliced gene, or it may cover the region transcribed by a polycistronic transcript. A gene may have 1 or more transcribed regions and a transcribed_region may belong to one or more genes. sequence SO:0000502 This concept cam about as a direct result of the SO meeting August 2004.nThe exact nature of the relationship between transcribed_region and gene is still up for discussion. We are going with 'associated_with' for the time being. transcribed_region true A region of sequence that is transcribed. This region may cover the transcript of a gene, it may emcompas the sequence covered by all of the transcripts of a alternately spliced gene, or it may cover the region transcribed by a polycistronic transcript. A gene may have 1 or more transcribed regions and a transcribed_region may belong to one or more genes. SO:ke The recognition sequence necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA. INSDC_feature:regulatory INSDC_qualifier:polyA_signal_sequence poly(A) signal polyA signal sequence polyadenylation termination signal sequence SO:0000551 Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527. polyA_signal_sequence The recognition sequence necessary for endonuclease cleavage of an RNA transcript that is followed by polyadenylation; consensus=AATAAA. http://www.insdc.org/files/feature_table.html The site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation. The boundary between the UTR and the polyA sequence. SO:0001430 INSDC_feature:polyA_site polyA cleavage site polyA junction polyA site polyA_junction sequence polyadenylation site SO:0000553 polyA_site The site on an RNA transcript to which will be added adenine residues by post-transcriptional polyadenylation. The boundary between the UTR and the polyA sequence. http://www.insdc.org/files/feature_table.html A region of chromosome where the spindle fibers attach during mitosis and meiosis. http://en.wikipedia.org/wiki/Centromere INSDC_feature:centromere sequence SO:0000577 centromere A region of chromosome where the spindle fibers attach during mitosis and meiosis. SO:ke http://en.wikipedia.org/wiki/Centromere wiki A structure consisting of a 7-methylguanosine in 5'-5' triphosphate linkage with the first nucleotide of an mRNA. It is added post-transcriptionally, and is not encoded in the DNA. http://en.wikipedia.org/wiki/5%27_cap sequence SO:0000581 cap A structure consisting of a 7-methylguanosine in 5'-5' triphosphate linkage with the first nucleotide of an mRNA. It is added post-transcriptionally, and is not encoded in the DNA. http://seqcore.brcf.med.umich.edu/doc/educ/dnapr/mbglossary/mbgloss.html http://en.wikipedia.org/wiki/5%27_cap wiki Group I catalytic introns are large self-splicing ribozymes. They catalyze their own excision from mRNA, tRNA and rRNA precursors in a wide range of organisms. The core secondary structure consists of 9 paired regions (P1-P9). These fold to essentially two domains, the P4-P6 domain (formed from the stacking of P5, P4, P6 and P6a helices) and the P3-P9 domain (formed from the P8, P3, P7 and P9 helices). Group I catalytic introns often have long ORFs inserted in loop regions. http://en.wikipedia.org/wiki/Group_I_intron group I intron sequence SO:0000587 GO:0000372. group_I_intron Group I catalytic introns are large self-splicing ribozymes. They catalyze their own excision from mRNA, tRNA and rRNA precursors in a wide range of organisms. The core secondary structure consists of 9 paired regions (P1-P9). These fold to essentially two domains, the P4-P6 domain (formed from the stacking of P5, P4, P6 and P6a helices) and the P3-P9 domain (formed from the P8, P3, P7 and P9 helices). Group I catalytic introns often have long ORFs inserted in loop regions. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00028 http://en.wikipedia.org/wiki/Group_I_intron wiki A self spliced intron. INSDC_feature:ncRNA INSDC_qualifier:autocatalytically_spliced_intron autocatalytically spliced intron sequence SO:0000588 autocatalytically_spliced_intron A self spliced intron. SO:ke The signal recognition particle (SRP) is a universally conserved ribonucleoprotein. It is involved in the co-translational targeting of proteins to membranes. The eukaryotic SRP consists of a 300-nucleotide 7S RNA and six proteins: SRPs 72, 68, 54, 19, 14, and 9. Archaeal SRP consists of a 7S RNA and homologues of the eukaryotic SRP19 and SRP54 proteins. In most eubacteria, the SRP consists of a 4.5S RNA and the Ffh protein (a homologue of the eukaryotic SRP54 protein). Eukaryotic and archaeal 7S RNAs have very similar secondary structures, with eight helical elements. These fold into the Alu and S domains, separated by a long linker region. Eubacterial SRP is generally a simpler structure, with the M domain of Ffh bound to a region of the 4.5S RNA that corresponds to helix 8 of the eukaryotic and archaeal SRP S domain. Some Gram-positive bacteria (e.g. Bacillus subtilis), however, have a larger SRP RNA that also has an Alu domain. The Alu domain is thought to mediate the peptide chain elongation retardation function of the SRP. The universally conserved helix which interacts with the SRP54/Ffh M domain mediates signal sequence recognition. In eukaryotes and archaea, the SRP19-helix 6 complex is thought to be involved in SRP assembly and stabilizes helix 8 for SRP54 binding. INSDC_feature:ncRNA INSDC_qualifier:SRP_RNA SRP RNA sequence 7S RNA signal recognition particle RNA SO:0000590 SRP_RNA The signal recognition particle (SRP) is a universally conserved ribonucleoprotein. It is involved in the co-translational targeting of proteins to membranes. The eukaryotic SRP consists of a 300-nucleotide 7S RNA and six proteins: SRPs 72, 68, 54, 19, 14, and 9. Archaeal SRP consists of a 7S RNA and homologues of the eukaryotic SRP19 and SRP54 proteins. In most eubacteria, the SRP consists of a 4.5S RNA and the Ffh protein (a homologue of the eukaryotic SRP54 protein). Eukaryotic and archaeal 7S RNAs have very similar secondary structures, with eight helical elements. These fold into the Alu and S domains, separated by a long linker region. Eubacterial SRP is generally a simpler structure, with the M domain of Ffh bound to a region of the 4.5S RNA that corresponds to helix 8 of the eukaryotic and archaeal SRP S domain. Some Gram-positive bacteria (e.g. Bacillus subtilis), however, have a larger SRP RNA that also has an Alu domain. The Alu domain is thought to mediate the peptide chain elongation retardation function of the SRP. The universally conserved helix which interacts with the SRP54/Ffh M domain mediates signal sequence recognition. In eukaryotes and archaea, the SRP19-helix 6 complex is thought to be involved in SRP assembly and stabilizes helix 8 for SRP54 binding. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00017 Most box C/D snoRNAs also contain long (>10 nt) sequences complementary to rRNA. Boxes C and D, as well as boxes C' and D', are usually located in close proximity, and form a structure known as the box C/D motif. This motif is important for snoRNA stability, processing, nucleolar targeting and function. A small number of box C/D snoRNAs are involved in rRNA processing; most, however, are known or predicted to serve as guide RNAs in ribose methylation of rRNA. Targeting involves direct base pairing of the snoRNA at the rRNA site to be modified and selection of a rRNA nucleotide a fixed distance from box D or D'. C D box snoRNA C/D box snoRNA SNORD box C/D snoRNA sequence SO:0000593 Added 'SNORD' as a synonym of C_D_box_snoRNA (SO:0000593) and 'SNORA' as a synonym of H_ACA_box_snoRNA (SO:0000594). See GitHub Issue #577. C_D_box_snoRNA Most box C/D snoRNAs also contain long (>10 nt) sequences complementary to rRNA. Boxes C and D, as well as boxes C' and D', are usually located in close proximity, and form a structure known as the box C/D motif. This motif is important for snoRNA stability, processing, nucleolar targeting and function. A small number of box C/D snoRNAs are involved in rRNA processing; most, however, are known or predicted to serve as guide RNAs in ribose methylation of rRNA. Targeting involves direct base pairing of the snoRNA at the rRNA site to be modified and selection of a rRNA nucleotide a fixed distance from box D or D'. http://www.bio.umass.edu/biochem/rna-sequence/Yeast_snoRNA_Database/snoRNA_DataBase.html SNORD PMID:31828325 A short 3'-uridylated RNA that can form a duplex (except for its post-transcriptionally added oligo_U tail (SO:0000609)) with a stretch of mature edited mRNA. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/Guide_RNA INSDC_qualifier:guide_RNA gRNA guide RNA sequence SO:0000602 guide_RNA A short 3'-uridylated RNA that can form a duplex (except for its post-transcriptionally added oligo_U tail (SO:0000609)) with a stretch of mature edited mRNA. http://www.rna.ucla.edu/index.html http://en.wikipedia.org/wiki/Guide_RNA wiki Group II introns are found in rRNA, tRNA and mRNA of organelles in fungi, plants and protists, and also in mRNA in bacteria. They are large self-splicing ribozymes and have 6 structural domains (usually designated dI to dVI). A subset of group II introns also encode essential splicing proteins in intronic ORFs. The length of these introns can therefore be up to 3kb. Splicing occurs in almost identical fashion to nuclear pre-mRNA splicing with two transesterification steps. The 2' hydroxyl of a bulged adenosine in domain VI attacks the 5' splice site, followed by nucleophilic attack on the 3' splice site by the 3' OH of the upstream exon. Protein machinery is required for splicing in vivo, and long range intron to intron and intron-exon interactions are important for splice site positioning. Group II introns are further sub-classified into groups IIA and IIB which differ in splice site consensus, distance of bulged A from 3' splice site, some tertiary interactions, and intronic ORF phylogeny. http://en.wikipedia.org/wiki/Group_II_intron group II intron sequence SO:0000603 GO:0000373. group_II_intron Group II introns are found in rRNA, tRNA and mRNA of organelles in fungi, plants and protists, and also in mRNA in bacteria. They are large self-splicing ribozymes and have 6 structural domains (usually designated dI to dVI). A subset of group II introns also encode essential splicing proteins in intronic ORFs. The length of these introns can therefore be up to 3kb. Splicing occurs in almost identical fashion to nuclear pre-mRNA splicing with two transesterification steps. The 2' hydroxyl of a bulged adenosine in domain VI attacks the 5' splice site, followed by nucleophilic attack on the 3' splice site by the 3' OH of the upstream exon. Protein machinery is required for splicing in vivo, and long range intron to intron and intron-exon interactions are important for splice site positioning. Group II introns are further sub-classified into groups IIA and IIB which differ in splice site consensus, distance of bulged A from 3' splice site, some tertiary interactions, and intronic ORF phylogeny. http://www.sanger.ac.uk/Software/Rfam/browse/index.shtml http://en.wikipedia.org/wiki/Group_II_intron wiki A region containing or overlapping no genes that is bounded on either side by a gene, or bounded by a gene and the end of the chromosome. http://en.wikipedia.org/wiki/Intergenic_region intergenic region sequence SO:0000605 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. intergenic_region A region containing or overlapping no genes that is bounded on either side by a gene, or bounded by a gene and the end of the chromosome. SO:cjm http://en.wikipedia.org/wiki/Intergenic_region wiki Sequence of about 100 nucleotides of A added to the 3' end of most eukaryotic mRNAs. polyA sequence sequence SO:0000610 polyA_sequence Sequence of about 100 nucleotides of A added to the 3' end of most eukaryotic mRNAs. SO:ke A pyrimidine rich sequence near the 3' end of an intron to which the 5'end becomes covalently bound during nuclear splicing. The resulting structure resembles a lariat. branch point branch site branch_point sequence SO:0000611 branch_site A pyrimidine rich sequence near the 3' end of an intron to which the 5'end becomes covalently bound during nuclear splicing. The resulting structure resembles a lariat. SO:ke The polypyrimidine tract is one of the cis-acting sequence elements directing intron removal in pre-mRNA splicing. http://en.wikipedia.org/wiki/Polypyrimidine_tract polypyrimidine tract sequence SO:0000612 polypyrimidine_tract The polypyrimidine tract is one of the cis-acting sequence elements directing intron removal in pre-mRNA splicing. http://nar.oupjournals.org/cgi/content/full/25/4/888 http://en.wikipedia.org/wiki/Polypyrimidine_tract wiki The base where transcription ends. transcription end site sequence SO:0000616 transcription_end_site The base where transcription ends. SO:ke A specific structure at the end of a linear chromosome, required for the integrity and maintenance of the end. http://en.wikipedia.org/wiki/Telomere INSDC_feature:telomere telomeric DNA telomeric sequence sequence SO:0000624 telomere A specific structure at the end of a linear chromosome, required for the integrity and maintenance of the end. SO:ma http://en.wikipedia.org/wiki/Telomere wiki A regulatory region which upon binding of transcription factors, suppress the transcription of the gene or genes they control. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Silencer_(DNA) INSDC_qualifier:silencer sequence SO:0000625 silencer A regulatory region which upon binding of transcription factors, suppress the transcription of the gene or genes they control. SO:ke http://en.wikipedia.org/wiki/Silencer_(DNA) wiki A regulatory region that 1) when located between a CRM and a gene's promoter prevents the CRM from modulating that genes expression and 2) acts as a chromatin boundary element or barrier that can block the encroachment of condensed chromatin from an adjacent region. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Insulator_(genetics) INSDC_qualifier:insulator insulator element sequence SO:0000627 moved from is_a: SO:0001055 transcriptional_cis_regulatory_region as per request from GREEKC initiative in August 2020. insulator A regulatory region that 1) when located between a CRM and a gene's promoter prevents the CRM from modulating that genes expression and 2) acts as a chromatin boundary element or barrier that can block the encroachment of condensed chromatin from an adjacent region. NCBI:cf PMID:12154228 SO:regcreative http://en.wikipedia.org/wiki/Insulator_(genetics) wiki Regions of the chromosome that are important for structural elements. chromosomal structural element sequence SO:0000628 chromosomal_structural_element A repeat region containing tandemly repeated sequences having a unit length of 10 to 40 bp. INSDC_feature:repeat_region http://en.wikipedia.org/wiki/Minisatellite INSDC_qualifier:minisatellite VNTR sequence SO:0000643 minisatellite A repeat region containing tandemly repeated sequences having a unit length of 10 to 40 bp. http://www.informatics.jax.org/silver/glossary.shtml http://en.wikipedia.org/wiki/Minisatellite wiki VNTR http://www.ncbi.nlm.nih.gov/books/NBK21126/def-item/A9655/ Antisense RNA is RNA that is transcribed from the coding, rather than the template, strand of DNA. It is therefore complementary to mRNA. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/Antisense_RNA INSDC_qualifier:antisense_RNA antisense RNA sequence SO:0000644 antisense_RNA Antisense RNA is RNA that is transcribed from the coding, rather than the template, strand of DNA. It is therefore complementary to mRNA. SO:ke http://en.wikipedia.org/wiki/Antisense_RNA wiki The reverse complement of the primary transcript. antisense primary transcript sequence SO:0000645 antisense_primary_transcript The reverse complement of the primary transcript. SO:ke A small RNA molecule that is the product of a longer exogenous or endogenous dsRNA, which is either a bimolecular duplex or very long hairpin, processed (via the Dicer pathway) such that numerous siRNAs accumulate from both strands of the dsRNA. siRNAs trigger the cleavage of their target molecules. INSDC_feature:ncRNA http://en.wikipedia.org/wiki/SiRNA INSDC_qualifier:siRNA small interfering RNA sequence SO:0000646 siRNA A small RNA molecule that is the product of a longer exogenous or endogenous dsRNA, which is either a bimolecular duplex or very long hairpin, processed (via the Dicer pathway) such that numerous siRNAs accumulate from both strands of the dsRNA. siRNAs trigger the cleavage of their target molecules. PMID:12592000 http://en.wikipedia.org/wiki/SiRNA wiki Cytosolic SSU rRNA is an RNA component of the small subunit of cytosolic ribosomes. cytosolic SSU rRNA cytosolic SSU ribosomal RNA cytosolic small subunit rRNA sequence SO:0000650 Renamed to cytosolic_SSU_rRNA from small_subunit_rRNA on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493. cytosolic_SSU_rRNA Cytosolic SSU rRNA is an RNA component of the small subunit of cytosolic ribosomes. SO:ke Cytosolic LSU rRNA is an RNA component of the large subunit of cytosolic ribosomes. cytosolic LSU RNA cytosolic LSU rRNA cytosolic large subunit rRNA sequence SO:0000651 Renamed to cytosolic_LSU_rRNA from large_subunit_rRNA on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493. cytosolic_LSU_rRNA Cytosolic LSU rRNA is an RNA component of the large subunit of cytosolic ribosomes. SO:ke Cytosolic 5S rRNA is an RNA component of the large subunit of cytosolic ribosomes in both prokaryotes and eukaryotes. http://en.wikipedia.org/wiki/5S_ribosomal_RNA cytosolic 5S LSU rRNA cytosolic 5S rRNA cytosolic 5S ribosomal RNA cytosolic rRNA 5S sequence SO:0000652 Renamed from rRNA_5S to cytosolic_5S_rRNA on 27 May 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493. cytosolic_5S_rRNA Cytosolic 5S rRNA is an RNA component of the large subunit of cytosolic ribosomes in both prokaryotes and eukaryotes. http://www.sanger.ac.uk/cgi-bin/Rfam/getacc?RF00001 http://en.wikipedia.org/wiki/5S_ribosomal_RNA wiki Cytosolic 28S rRNA is an RNA component of the large subunit of cytosolic ribosomes in metazoan eukaryotes. http://en.wikipedia.org/wiki/28S_ribosomal_RNA cytosolic 28S LSU rRNA cytosolic 28S rRNA cytosolic 28S ribosomal RNA cytosolic rRNA 28S sequence SO:0000653 Renamed from rRNA_28S to cytosolic_28S_rRNA on 27 May 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493. cytosolic_28S_rRNA Cytosolic 28S rRNA is an RNA component of the large subunit of cytosolic ribosomes in metazoan eukaryotes. SO:ke http://en.wikipedia.org/wiki/28S_ribosomal_RNA wiki An RNA transcript that does not encode for a protein rather the RNA molecule is the gene product. INSDC_qualifier:other http://en.wikipedia.org/wiki/NcRNA http://www.gencodegenes.org/gencode_biotypes.html known_ncrna noncoding RNA sequence SO:0000655 A ncRNA is a processed_transcript, so it may not contain parts such as transcribed_spacer_regions that are removed in the act of processing. For the corresponding primary_transcripts, please see term SO:0000483 nc_primary_transcript. ncRNA An RNA transcript that does not encode for a protein rather the RNA molecule is the gene product. SO:ke http://en.wikipedia.org/wiki/NcRNA wiki http://www.gencodegenes.org/gencode_biotypes.html GENCODE A region of sequence containing one or more repeat units. INSDC_feature:repeat_region INSDC_qualifier:other repeat region sequence SO:0000657 repeat_region A region of sequence containing one or more repeat units. SO:ke A repeat that is located at dispersed sites in the genome. INSDC_feature:repeat_region http://en.wikipedia.org/wiki/Interspersed_repeat INSDC_qualifier:dispersed dispersed repeat interspersed repeat sequence SO:0000658 dispersed_repeat A repeat that is located at dispersed sites in the genome. SO:ke http://en.wikipedia.org/wiki/Interspersed_repeat wiki An intron which is spliced by the spliceosome. spliceosomal intron sequence SO:0000662 GO:0000398. spliceosomal_intron An intron which is spliced by the spliceosome. SO:ke The sequence of one or more nucleotides added between two adjacent nucleotides in the sequence. SO:1000034 loinc:LA6687-3 insertion nucleotide insertion nucleotide_insertion sequence SO:0000667 insertion The sequence of one or more nucleotides added between two adjacent nucleotides in the sequence. SO:ke loinc:LA6687-3 Insertion insertion http://www.ncbi.nlm.nih.gov/dbvar/ A match against an EST sequence. EST match sequence SO:0000668 EST_match A match against an EST sequence. SO:ke An RNA synthesized on a DNA or RNA template by an RNA polymerase. INSDC_feature:misc_RNA http://en.wikipedia.org/wiki/RNA sequence SO:0000673 Added relationship overlaps SO:0002300 unit_of_gene_expression with Mejia-Almonte et.al PMID:32665585 Aug 5, 2020. transcript An RNA synthesized on a DNA or RNA template by an RNA polymerase. SO:ma http://en.wikipedia.org/wiki/RNA wiki A region of nucleotide sequence targeted by a nuclease enzyme. nuclease sensitive site sequence SO:0000684 nuclease_sensitive_site A region of nucleotide sequence targeted by a nuclease enzyme. SO:ma The space between two bases in a sequence which marks the position where a deletion has occurred. deletion junction sequence SO:0000687 deletion_junction The space between two bases in a sequence which marks the position where a deletion has occurred. SO:ke A set of subregions selected from sequence contigs which when concatenated form a nonredundant linear sequence. golden path sequence SO:0000688 golden_path A set of subregions selected from sequence contigs which when concatenated form a nonredundant linear sequence. SO:ls A match against cDNA sequence. cDNA match sequence SO:0000689 cDNA_match A match against cDNA sequence. SO:ke SNPs are single base pair positions in genomic DNA at which different sequence alternatives exist in normal individuals in some population(s), wherein the least frequent variant has an abundance of 1% or greater. single nucleotide polymorphism sequence SO:0000694 SNP SNPs are single base pair positions in genomic DNA at which different sequence alternatives exist in normal individuals in some population(s), wherein the least frequent variant has an abundance of 1% or greater. SO:cb A sequence used in experiment. sequence SO:0000695 Requested by Lynn Crosby, jan 2006. reagent A sequence used in experiment. SO:ke A short oligonucleotide sequence, of length on the order of 10's of bases; either single or double stranded. http://en.wikipedia.org/wiki/Oligonucleotide oligonucleotide sequence SO:0000696 oligo A short oligonucleotide sequence, of length on the order of 10's of bases; either single or double stranded. SO:ma http://en.wikipedia.org/wiki/Oligonucleotide wiki A sequence_feature with an extent of zero. boundary breakpoint sequence SO:0000699 A junction is a boundary between regions. A boundary has an extent of zero. junction A sequence_feature with an extent of zero. SO:ke A comment about the sequence. sequence SO:0000700 remark A comment about the sequence. SO:ke A region of sequence where the validity of the base calling is questionable. possible base call error sequence SO:0000701 possible_base_call_error A region of sequence where the validity of the base calling is questionable. SO:ke A region of sequence where there may have been an error in the assembly. possible assembly error sequence SO:0000702 possible_assembly_error A region of sequence where there may have been an error in the assembly. SO:ke A region of sequence implicated in an experimental result. experimental result region sequence SO:0000703 experimental_result_region A region of sequence implicated in an experimental result. SO:ke A region (or regions) that includes all of the sequence elements necessary to encode a functional transcript. A gene may include regulatory regions, transcribed regions and/or other functional sequence regions. http://en.wikipedia.org/wiki/Gene INSDC_feature:gene sequence SO:0000704 This term is mapped to MGED. Do not obsolete without consulting MGED ontology. A gene may be considered as a unit of inheritance. gene A region (or regions) that includes all of the sequence elements necessary to encode a functional transcript. A gene may include regulatory regions, transcribed regions and/or other functional sequence regions. SO:immuno_workshop http://en.wikipedia.org/wiki/Gene wiki Two or more adjacent copies of a region (of length greater than 1). INSDC_feature:repeat_region http://en.wikipedia.org/wiki/Tandem_repeat http://www.sci.sdsu.edu/~smaloy/Glossary/T.html INSDC_qualifier:tandem tandem repeat sequence SO:0000705 tandem_repeat Two or more adjacent copies of a region (of length greater than 1). SO:ke http://en.wikipedia.org/wiki/Tandem_repeat wiki The 3' splice site of the acceptor primary transcript. trans splice acceptor site sequence 3' trans splice site SO:0000706 This region contains a polypyridine tract and AG dinucleotide in some organisms and is UUUCAG in C. elegans. trans_splice_acceptor_site The 3' splice site of the acceptor primary transcript. SO:ke A region of nucleotide sequence corresponding to a known motif. INSDC_feature:misc_feature INSDC_note:nucleotide_motif nucleotide motif sequence SO:0000714 nucleotide_motif A region of nucleotide sequence corresponding to a known motif. SO:ke A motif that is active in RNA sequence. RNA motif sequence SO:0000715 RNA_motif A motif that is active in RNA sequence. SO:ke A nucleic acid sequence that when read as sequential triplets, has the potential of encoding a sequential string of amino acids. It need not contain the start or stop codon. http://en.wikipedia.org/wiki/Reading_frame reading frame sequence SO:0000717 This term was added after a request by SGD. August 2004. Modified after SO meeting in Cambridge to not include start or stop. reading_frame A nucleic acid sequence that when read as sequential triplets, has the potential of encoding a sequential string of amino acids. It need not contain the start or stop codon. SGD:rb http://en.wikipedia.org/wiki/Reading_frame wiki An ordered and oriented set of scaffolds based on somewhat weaker sets of inferential evidence such as one set of mate pair reads together with supporting evidence from ESTs or location of markers from SNP or microsatellite maps, or cytogenetic localization of contained markers. pseudochromosome sequence superscaffold SO:0000719 ultracontig An ordered and oriented set of scaffolds based on somewhat weaker sets of inferential evidence such as one set of mate pair reads together with supporting evidence from ESTs or location of markers from SNP or microsatellite maps, or cytogenetic localization of contained markers. FB:WG A region of a DNA molecule where transfer is initiated during the process of conjugation or mobilization. http://en.wikipedia.org/wiki/Origin_of_transfer INSDC_feature:oriT origin of transfer sequence SO:0000724 oriT A region of a DNA molecule where transfer is initiated during the process of conjugation or mobilization. http://www.insdc.org/files/feature_table.html http://en.wikipedia.org/wiki/Origin_of_transfer wiki The transit_peptide is a short region at the N-terminus of the peptide that directs the protein to an organelle (chloroplast, mitochondrion, microbody or cyanelle). BS:00055 INSDC_feature:transit_peptide transit peptide sequence signal transit SO:0000725 Added to bring SO inline with the EMBL, DDBJ, GenBank feature table. Old definition before biosapiens: The coding sequence for an N-terminal domain of a nuclear-encoded organellar protein. This domain is involved in post translational import of the protein into the organelle. transit_peptide The transit_peptide is a short region at the N-terminus of the peptide that directs the protein to an organelle (chloroplast, mitochondrion, microbody or cyanelle). http://www.insdc.org/files/feature_table.html transit uniprot:feature_type A regulatory region where transcription factor binding sites are clustered to regulate various aspects of transcription activities. (CRMs can be located a few kb to hundreds of kb upstream of the core promoter, in the coding sequence, within introns, or in the untranslated regions (UTR) sequences, and even on a different chromosome). A single gene can be regulated by multiple CRMs to give precise control of its spatial and temporal expression. CRMs function as nodes in large, intertwined regulatory network. CRM DNA accessibility is subject to regulation by dbTFs and transcription co-TFs. CRM TF module cis regulatory module transcription factor module sequence SO:0000727 Requested by Stephen Grossmann Dec 2004. Changed relationship from has_part SO:0000235 TF_binding site to TF_binding_site is part_of SO:0000727 CRM in response to requests from GREEKC initiative in Aug 2020. Removed 3' from definition because 5' UTRs are included as well, notified by Colin Logie of GREEKC. Nov 9 2020. DS Updated name from 'CRM' to 'cis_regulatory_module' on 08 Feb 2021. See GitHub Issue #526. DS Added final sentence to definition as part of GREEKC Feb 16, 2021. See GitHub Issue #534. cis_regulatory_module A regulatory region where transcription factor binding sites are clustered to regulate various aspects of transcription activities. (CRMs can be located a few kb to hundreds of kb upstream of the core promoter, in the coding sequence, within introns, or in the untranslated regions (UTR) sequences, and even on a different chromosome). A single gene can be regulated by multiple CRMs to give precise control of its spatial and temporal expression. CRMs function as nodes in large, intertwined regulatory network. CRM DNA accessibility is subject to regulation by dbTFs and transcription co-TFs. PMID:19660565 SO:SG A gap in the sequence of known length. The unknown bases are filled in with N's. INSDC_feature:gap INSDC_feature:assembly_gap sequence SO:0000730 gap A gap in the sequence of known length. The unknown bases are filled in with N's. SO:ke A region that is involved in the regulation of transcription of a group of regulated genes. SO:0001055 gene group regulatory region sequence SO:0000752 Merged into transcriptional_cis_regulatory_region (SO:0001055) on 11 Feb 2021 as part of GREEKC reducing redundancy as we prepare to submit several terms to Ensembl. See GitHub Issue #529. gene_group_regulatory_region true The region of sequence that has been inserted and is being propagated by the clone. clone insert sequence SO:0000753 clone_insert The region of sequence that has been inserted and is being propagated by the clone. SO:ke A non functional descendant of an rRNA. INSDC_feature:rRNA INSDC_qualifier:pseudo pseudogenic rRNA sequence SO:0000777 Added Jan 2006 to allow the annotation of the pseudogenic rRNA by flybase. Non-functional is defined as its transcription is prevented due to one or more mutatations. pseudogenic_rRNA A non functional descendant of an rRNA. SO:ke A non functional descendent of a tRNA. INSDC_feature:tRNA INSDC_qualifier:pseudo pseudogenic tRNA sequence SO:0000778 Added Jan 2006 to allow the annotation of the pseudogenic tRNA by flybase. Non-functional is defined as its transcription is prevented due to one or more mutatations. pseudogenic_tRNA A non functional descendent of a tRNA. SO:ke A region of a chromosome. chromosomal region chromosomal_region chromosome part sequence SO:0000830 This is a manufactured term, that serves the purpose of allow the parts of a chromosome to have an is_a path to the root. chromosome_part A region of a chromosome. SO:ke A region of a gene. gene member region sequence SO:0000831 A manufactured term used to allow the parts of a gene to have an is_a path to the root. gene_member_region A region of a gene. SO:ke A region of a transcript. transcript region sequence SO:0000833 This term was added to provide a grouping term for the region parts of transcript, thus giving them an is_a path back to the root. transcript_region A region of a transcript. SO:ke A region of a mature transcript. mature transcript region sequence SO:0000834 A manufactured term to collect together the parts of a mature transcript and give them an is_a path to the root. mature_transcript_region A region of a mature transcript. SO:ke A part of a primary transcript. primary transcript region sequence SO:0000835 This term was added to provide a grouping term for the region parts of primary_transcript, thus giving them an is_a path back to the root. primary_transcript_region A part of a primary transcript. SO:ke A region of an mRNA. mRNA region sequence SO:0000836 This term was added to provide a grouping term for the region parts of mRNA, thus giving them an is_a path back to the root. mRNA_region A region of an mRNA. SO:cb A region of UTR. UTR region sequence SO:0000837 A region of UTR. This term is a grouping term to allow the parts of UTR to have an is_a path to the root. UTR_region A region of UTR. SO:ke Biological sequence region that can be assigned to a specific subsequence of a polypeptide. BS:00124 BS:00331 region site sequence positional positional polypeptide feature region or site annotation SO:0000839 Added to allow the polypeptide regions to have is_a paths back to the root. polypeptide_region Biological sequence region that can be assigned to a specific subsequence of a polypeptide. SO:GAR SO:ke region uniprot:feature_type site uniprot:feature_type A region within an intron. spliceosomal intron region sequence SO:0000841 A terms added to allow the parts of introns to have is_a paths to the root. spliceosomal_intron_region A region within an intron. SO:ke A region of a gene that has a specific function. gene component region sequence SO:0000842 gene_component_region A region of a CDS. CDS region sequence SO:0000851 CDS_region A region of a CDS. SO:cb A region of an exon. exon region sequence SO:0000852 exon_region A region of an exon. RSC:cb Cytosolic 16S rRNA is an RNA component of the small subunit of cytosolic ribosomes in prokaryotes. http://en.wikipedia.org/wiki/16S_ribosomal_RNA cytosolic 16S SSU RNA cytosolic 16S ribosomal RNA cytosolic rRNA 16S sequence cytosolic 16S rRNA SO:0001000 Renamed to cytosolic_16S_rRNA from rRNA_16S on 10 June 2021 as per restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Request from EBI. See GitHub Issue #493. cytosolic_16S_rRNA Cytosolic 16S rRNA is an RNA component of the small subunit of cytosolic ribosomes in prokaryotes. SO:ke http://en.wikipedia.org/wiki/16S_ribosomal_RNA wiki Cytosolic 23S rRNA is an RNA component of the large subunit of cytosolic ribosomes in prokaryotes. cytosolic 23S LSU rRNA cytosolic 23S rRNA cytosolic rRNA 23S sequence cytosolic 23S ribosomal RNA SO:0001001 Renamed from rRNA_23S to cytosolic_23S_rRNA on 27 May 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493. cytosolic_23S_rRNA Cytosolic 23S rRNA is an RNA component of the large subunit of cytosolic ribosomes in prokaryotes. SO:ke Cytosolic 25S rRNA is an RNA component of the large subunit of cytosolic ribosomes most eukaryotes. cytosolic 25S LSU rRNA cytosolic 25S rRNA cytosolic 25S ribosomal RNA cytosolic rRNA 25S sequence SO:0001002 Renamed from rRNA_5S to cytosolic_5S_rRNA on 27 May 2021 with the restructuring of rRNA child terms. Updated definition to be consistent with format of other rRNA definitions. Requested by EBI. See GitHub Issue #493. cytosolic_25S_rRNA Cytosolic 25S rRNA is an RNA component of the large subunit of cytosolic ribosomes most eukaryotes. PMID:15493135 PMID:2100998 RSC:cb A variation that increases or decreases the copy number of a given region. http://en.wikipedia.org/wiki/Copy_number_variation CNP CNV copy number polymorphism copy number variation sequence SO:0001019 copy_number_variation A variation that increases or decreases the copy number of a given region. SO:ke http://en.wikipedia.org/wiki/Copy_number_variation wiki A nucleotide region with either intra-genome or intracellular mobility, of varying length, which often carry the information necessary for transfer and recombination with the host genome. http://en.wikipedia.org/wiki/Mobile_genetic_element INSDC_feature:mobile_element MGE mobile genetic element sequence SO:0001037 mobile_genetic_element A nucleotide region with either intra-genome or intracellular mobility, of varying length, which often carry the information necessary for transfer and recombination with the host genome. PMID:14681355 http://en.wikipedia.org/wiki/Mobile_genetic_element wiki An MGE that is integrated into the host chromosome. integrated mobile genetic element sequence SO:0001039 integrated_mobile_genetic_element An MGE that is integrated into the host chromosome. SO:ke A regulatory_region that modulates the transcription of a gene or genes. INSDC_feature:regulatory INSDC_qualifier:transcriptional_cis_regulatory_region transcription-control region transcriptional cis regulatory region sequence SO:0001055 Previous parent term transcription_regulatory_region (SO:0001067) has been merged with this term on 11 Feb 2021 as part of the GREEKC consortium. See GitHub Issue #527. transcriptional_cis_regulatory_region A regulatory_region that modulates the transcription of a gene or genes. PMID:9679020 SO:regcreative A regulatory_region that modulates splicing. splicing regulatory region sequence SO:0001056 Moved from transcription_regulatory_region (SO:0001679) to transcriptional_cis_regulatory_region (SO:0001055) by Dave Sant on Feb 11, 2021 when transcription_regulatory_region was merged into transcriptional_cis_regulatory_region to be consistent with GO and reduce redundancy as part of the GREEKC consortium. See GitHub Issue #527. splicing_regulatory_region A regulatory_region that modulates splicing. SO:ke A sequence_alteration is a sequence_feature whose extent is the deviation from another sequence. SO:1000004 SO:1000007 INSDC_feature:misc_feature INSDC_feature:variation INSDC_note:sequence_alteration sequence alteration partially characterised change in DNA sequence partially_characterised_change_in_DNA_sequence uncharacterised_change_in_nucleotide_sequence sequence sequence variation SO:0001059 Merged with partially characterized change in nucleotide sequence. sequence_alteration A sequence_alteration is a sequence_feature whose extent is the deviation from another sequence. SO:ke An immature_peptide_region is the extent of the peptide after it has been translated and before any processing occurs. BS:00129 immature peptide region sequence SO:0001063 Range. immature_peptide_region An immature_peptide_region is the extent of the peptide after it has been translated and before any processing occurs. EBIBS:GAR The maximal intersection of exon and UTR. noncoding region of exon sequence SO:0001214 An exon either containing but not starting with a start codon or containing but not ending with a stop codon will be partially coding and partially non coding. noncoding_region_of_exon The maximal intersection of exon and UTR. SO:ke The region of an exon that encodes for protein sequence. coding region of exon sequence SO:0001215 An exon containing either a start or stop codon will be partially coding and partially non coding. coding_region_of_exon The region of an exon that encodes for protein sequence. SO:ke A region containing at least one unique origin of replication and a unique termination site. http://en.wikipedia.org/wiki/Replicon_(genetics) sequence SO:0001235 replicon A region containing at least one unique origin of replication and a unique termination site. ISBN:0716719207 http://en.wikipedia.org/wiki/Replicon_(genetics) wiki A base is a sequence feature that corresponds to a single unit of a nucleotide polymer. http://en.wikipedia.org/wiki/Nucleobase sequence SO:0001236 base A base is a sequence feature that corresponds to a single unit of a nucleotide polymer. SO:ke http://en.wikipedia.org/wiki/Nucleobase wiki A region of the genome of known length that is composed by ordering and aligning two or more different regions. http://en.wikipedia.org/wiki/Genome_assembly#Genome_assembly sequence SO:0001248 assembly A region of the genome of known length that is composed by ordering and aligning two or more different regions. SO:ke http://en.wikipedia.org/wiki/Genome_assembly#Genome_assembly wiki A region which is intended for use in an experiment. biomaterial region sequence SO:0001409 biomaterial_region A region which is intended for use in an experiment. SO:cb A region which is the result of some arbitrary experimental procedure. The procedure may be carried out with biological material or inside a computer. experimental output artefact experimental_output_artefact sequence analysis feature SO:0001410 experimental_feature A region which is the result of some arbitrary experimental procedure. The procedure may be carried out with biological material or inside a computer. SO:cb A region defined by its disposition to be involved in a biological process. INSDC_misc_feature INSDC_note:biological_region biological region sequence SO:0001411 biological_region A region defined by its disposition to be involved in a biological process. SO:cb A DNA region within which self-interaction occurs more often than expected by chance because of DNA-looping. topologically defined region sequence SO:0001412 topologically_defined_region A DNA region within which self-interaction occurs more often than expected by chance because of DNA-looping. PMID:32782014 SO:cb Intronic 2 bp region bordering exon. A splice_site that adjacent_to exon and overlaps intron. cis splice site sequence SO:0001419 cis_splice_site Intronic 2 bp region bordering exon. A splice_site that adjacent_to exon and overlaps intron. SO:cjm SO:ke Primary transcript region bordering trans-splice junction. trans splice site sequence SO:0001420 trans_splice_site Primary transcript region bordering trans-splice junction. SO:ke SNVs are single nucleotide positions in genomic DNA at which different sequence alternatives exist. kareneilbeck 2009-10-08T11:37:49Z single nucleotide variant sequence SO:0001483 SNV SNVs are single nucleotide positions in genomic DNA at which different sequence alternatives exist. SO:bm A region of peptide sequence used to target the polypeptide molecule to a specific organelle. kareneilbeck 2010-03-11T02:15:05Z peptide localization signal sequence localization signal SO:0001527 peptide_localization_signal A region of peptide sequence used to target the polypeptide molecule to a specific organelle. SO:ke A kind of ribosome entry site, specific to Eukaryotic organisms that overlaps part of both 5' UTR and CDS sequence. kareneilbeck 2010-06-07T03:12:20Z http://en.wikipedia.org/wiki/Kozak_consensus_sequence kozak consensus kozak consensus sequence kozak sequence sequence SO:0001647 kozak_sequence A kind of ribosome entry site, specific to Eukaryotic organisms that overlaps part of both 5' UTR and CDS sequence. SO:ke http://en.wikipedia.org/wiki/Kozak_consensus_sequence wikipedia A binding site that, in the nucleotide molecule, interacts selectively and non-covalently with polypeptide residues. kareneilbeck 2010-08-03T12:26:05Z sequence nucleotide to protein binding site SO:0001654 nucleotide_to_protein_binding_site A binding site that, in the nucleotide molecule, interacts selectively and non-covalently with polypeptide residues. SO:ke A regulatory region that is involved in the control of the process of transcription. kareneilbeck 2010-10-12T03:49:35Z transcription regulatory region sequence SO:0001679 Obsoleted by David Sant on 11 Feb 2021 when it was merged with transcriptional_cis_regulatory_region (SO:0001055) to reduce redundancy and be consistent with Gene Ontology. See GitHub Issue #527. transcription_regulatory_region true A regulatory region that is involved in the control of the process of transcription. SO:ke A sequence motif is a nucleotide or amino-acid sequence pattern that may have biological significance. kareneilbeck 2010-10-14T04:13:22Z http://en.wikipedia.org/wiki/Sequence_motif sequence sequence motif SO:0001683 sequence_motif A sequence motif is a nucleotide or amino-acid sequence pattern that may have biological significance. http://en.wikipedia.org/wiki/Sequence_motif http://en.wikipedia.org/wiki/Sequence_motif wikipedia A biological DNA region implicated in epigenomic changes caused by mechanisms other than changes in the underlying DNA sequence. This includes, nucleosomal histone post-translational modifications, nucleosome depletion to render DNA accessible and post-replicational base modifications such as cytosine modification. kareneilbeck 2010-03-27T12:02:29Z sequence epigenetically modified region SO:0001720 Moved from is_a biological_region (SO:0001411) to is_a regulatory_region (SO:0005836) on 11 Feb 2021. GREEKC members pointed out that this would be a more appropriate location. See GitHub Issue #530. 11 Feb 2021 updated definition along with addition of epigenomically_modified_region (SO:0002332). Epigenetically modified region is now not inherited while epigenomically modified region is not annotated as inherited. See GitHub Issue #532 and issue #534. epigenetically_modified_region A biological DNA region implicated in epigenomic changes caused by mechanisms other than changes in the underlying DNA sequence. This includes, nucleosomal histone post-translational modifications, nucleosome depletion to render DNA accessible and post-replicational base modifications such as cytosine modification. SO:ke http://en.wikipedia.org/wiki/Epigenetics An assembly region that has been sequenced from both ends resulting in a read_pair (mate_pair). kareneilbeck 2011-04-14T01:48:20Z paired end fragment sequence SO:0001790 paired_end_fragment An assembly region that has been sequenced from both ends resulting in a read_pair (mate_pair). SO:ke A region of sequence that is involved in the control of a biological process. INSDC_feature:regulatory http://en.wikipedia.org/wiki/Regulatory_region INSDC_qualifier:other regulatory region sequence SO:0005836 regulatory_region A region of sequence that is involved in the control of a biological process. SO:ke http://en.wikipedia.org/wiki/Regulatory_region wiki A collection of related genes. gene group sequence SO:0005855 gene_group A collection of related genes. SO:ma The cleaved_peptide_region is the region of a peptide sequence that is cleaved during maturation. cleaved peptide region sequence SO:0100011 Range. cleaved_peptide_region The cleaved_peptide_region is the region of a peptide sequence that is cleaved during maturation. EBIBS:GAR A sequence alteration where the length of the change in the variant is the same as that of the reference. loinc:LA6690-7 sequence SO:1000002 substitution A sequence alteration where the length of the change in the variant is the same as that of the reference. SO:ke loinc:LA6690-7 Substitution When no simple or well defined DNA mutation event describes the observed DNA change, the keyword "complex" should be used. Usually there are multiple equally plausible explanations for the change. complex substitution sequence SO:1000005 complex_substitution When no simple or well defined DNA mutation event describes the observed DNA change, the keyword "complex" should be used. Usually there are multiple equally plausible explanations for the change. EBI:www.ebi.ac.uk/mutations/recommendations/mutevent.html A single nucleotide change which has occurred at the same position of a corresponding nucleotide in a reference sequence. http://en.wikipedia.org/wiki/Point_mutation point mutation sequence SO:1000008 point_mutation A single nucleotide change which has occurred at the same position of a corresponding nucleotide in a reference sequence. SO:immuno_workshop http://en.wikipedia.org/wiki/Point_mutation wiki A continuous nucleotide sequence is inverted in the same position. loinc:LA6689-9 inversion sequence SO:1000036 inversion A continuous nucleotide sequence is inverted in the same position. EBI:www.ebi.ac.uk/mutations/recommendations/mutevent.html loinc:LA6689-9 Inversion inversion http://www.ncbi.nlm.nih.gov/dbvar/ A set of units of gene expression directly regulated by a common set of one or more common regulatory gene products. http://en.wikipedia.org/wiki/Regulon sequence SO:1001284 Definition updated with Mejia-Almonte et.al PMID:32665585 on Aug 5, 2020. Added relationship has_part SO:0002300 regulon A set of units of gene expression directly regulated by a common set of one or more common regulatory gene products. ISBN:0198506732 PMID:32665585 http://en.wikipedia.org/wiki/Regulon wiki The sequence referred to by an entry in a databank such as GenBank or SwissProt. databank entry sequence accession SO:2000061 databank_entry The sequence referred to by an entry in a databank such as GenBank or SwissProt. SO:ke