:py:mod:`pyranges.genomicfeatures` ================================== .. py:module:: pyranges.genomicfeatures Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: pyranges.genomicfeatures.GenomicFeaturesMethods Functions ~~~~~~~~~ .. autoapisummary:: pyranges.genomicfeatures.genome_bounds pyranges.genomicfeatures.tile_genome .. py:class:: GenomicFeaturesMethods(pr) Namespace for methods using feature information. Accessed through `gr.features`. .. py:attribute:: pr .. py:method:: tss() Return the transcription start sites. Returns the 5' for every interval with feature "transcript". .. seealso:: :obj:`pyranges.genomicfeatures.GenomicFeaturesMethods.tes` return the transcription end sites .. rubric:: Examples >>> gr = pr.data.ensembl_gtf()[["Source", "Feature"]] >>> gr +--------------+------------+--------------+-----------+-----------+--------------+ | Chromosome | Source | Feature | Start | End | Strand | | (category) | (object) | (category) | (int64) | (int64) | (category) | |--------------+------------+--------------+-----------+-----------+--------------| | 1 | havana | gene | 11868 | 14409 | + | | 1 | havana | transcript | 11868 | 14409 | + | | 1 | havana | exon | 11868 | 12227 | + | | 1 | havana | exon | 12612 | 12721 | + | | ... | ... | ... | ... | ... | ... | | 1 | havana | gene | 1173055 | 1179555 | - | | 1 | havana | transcript | 1173055 | 1179555 | - | | 1 | havana | exon | 1179364 | 1179555 | - | | 1 | havana | exon | 1173055 | 1176396 | - | +--------------+------------+--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.features.tss() +--------------+------------+------------+-----------+-----------+--------------+ | Chromosome | Source | Feature | Start | End | Strand | | (category) | (object) | (object) | (int64) | (int64) | (category) | |--------------+------------+------------+-----------+-----------+--------------| | 1 | havana | tss | 11868 | 11869 | + | | 1 | havana | tss | 12009 | 12010 | + | | 1 | havana | tss | 29553 | 29554 | + | | 1 | havana | tss | 30266 | 30267 | + | | ... | ... | ... | ... | ... | ... | | 1 | havana | tss | 1092812 | 1092813 | - | | 1 | havana | tss | 1116086 | 1116087 | - | | 1 | havana | tss | 1116088 | 1116089 | - | | 1 | havana | tss | 1179554 | 1179555 | - | +--------------+------------+------------+-----------+-----------+--------------+ Stranded PyRanges object has 280 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: tes(slack=0) Return the transcription end sites. Returns the 3' for every interval with feature "transcript". .. seealso:: :obj:`pyranges.genomicfeatures.GenomicFeaturesMethods.tss` return the transcription start sites .. rubric:: Examples >>> gr = pr.data.ensembl_gtf()[["Source", "Feature"]] >>> gr +--------------+------------+--------------+-----------+-----------+--------------+ | Chromosome | Source | Feature | Start | End | Strand | | (category) | (object) | (category) | (int64) | (int64) | (category) | |--------------+------------+--------------+-----------+-----------+--------------| | 1 | havana | gene | 11868 | 14409 | + | | 1 | havana | transcript | 11868 | 14409 | + | | 1 | havana | exon | 11868 | 12227 | + | | 1 | havana | exon | 12612 | 12721 | + | | ... | ... | ... | ... | ... | ... | | 1 | havana | gene | 1173055 | 1179555 | - | | 1 | havana | transcript | 1173055 | 1179555 | - | | 1 | havana | exon | 1179364 | 1179555 | - | | 1 | havana | exon | 1173055 | 1176396 | - | +--------------+------------+--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.features.tes() +--------------+------------+------------+-----------+-----------+--------------+ | Chromosome | Source | Feature | Start | End | Strand | | (category) | (object) | (object) | (int64) | (int64) | (category) | |--------------+------------+------------+-----------+-----------+--------------| | 1 | havana | tes | 14408 | 14409 | + | | 1 | havana | tes | 13669 | 13670 | + | | 1 | havana | tes | 31096 | 31097 | + | | 1 | havana | tes | 31108 | 31109 | + | | ... | ... | ... | ... | ... | ... | | 1 | havana | tes | 1090405 | 1090406 | - | | 1 | havana | tes | 1091045 | 1091046 | - | | 1 | havana | tes | 1091499 | 1091500 | - | | 1 | havana | tes | 1173055 | 1173056 | - | +--------------+------------+------------+-----------+-----------+--------------+ Stranded PyRanges object has 280 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: introns(by='gene', nb_cpu=1) Return the introns. :param by: Whether to find introns per gene or transcript. :type by: str, {"gene", "transcript"}, default "gene" :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 .. seealso:: :obj:`pyranges.genomicfeatures.GenomicFeaturesMethods.tss` return the transcription start sites .. rubric:: Examples >>> gr = pr.data.ensembl_gtf()[["Feature", "gene_id", "transcript_id"]] >>> gr +--------------+--------------+-----------+-----------+--------------+-----------------+-----------------+ | Chromosome | Feature | Start | End | Strand | gene_id | transcript_id | | (category) | (category) | (int64) | (int64) | (category) | (object) | (object) | |--------------+--------------+-----------+-----------+--------------+-----------------+-----------------| | 1 | gene | 11868 | 14409 | + | ENSG00000223972 | nan | | 1 | transcript | 11868 | 14409 | + | ENSG00000223972 | ENST00000456328 | | 1 | exon | 11868 | 12227 | + | ENSG00000223972 | ENST00000456328 | | 1 | exon | 12612 | 12721 | + | ENSG00000223972 | ENST00000456328 | | ... | ... | ... | ... | ... | ... | ... | | 1 | gene | 1173055 | 1179555 | - | ENSG00000205231 | nan | | 1 | transcript | 1173055 | 1179555 | - | ENSG00000205231 | ENST00000379317 | | 1 | exon | 1179364 | 1179555 | - | ENSG00000205231 | ENST00000379317 | | 1 | exon | 1173055 | 1176396 | - | ENSG00000205231 | ENST00000379317 | +--------------+--------------+-----------+-----------+--------------+-----------------+-----------------+ Stranded PyRanges object has 2,446 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.features.introns(by="gene") +--------------+------------+-----------+-----------+--------------+-----------------+-----------------+ | Chromosome | Feature | Start | End | Strand | gene_id | transcript_id | | (object) | (object) | (int64) | (int64) | (category) | (object) | (object) | |--------------+------------+-----------+-----------+--------------+-----------------+-----------------| | 1 | intron | 1173926 | 1174265 | + | ENSG00000162571 | nan | | 1 | intron | 1174321 | 1174423 | + | ENSG00000162571 | nan | | 1 | intron | 1174489 | 1174520 | + | ENSG00000162571 | nan | | 1 | intron | 1175034 | 1179188 | + | ENSG00000162571 | nan | | ... | ... | ... | ... | ... | ... | ... | | 1 | intron | 874591 | 875046 | - | ENSG00000283040 | nan | | 1 | intron | 875155 | 875525 | - | ENSG00000283040 | nan | | 1 | intron | 875625 | 876526 | - | ENSG00000283040 | nan | | 1 | intron | 876611 | 876754 | - | ENSG00000283040 | nan | +--------------+------------+-----------+-----------+--------------+-----------------+-----------------+ Stranded PyRanges object has 311 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.features.introns(by="transcript") +--------------+------------+-----------+-----------+--------------+-----------------+-----------------+ | Chromosome | Feature | Start | End | Strand | gene_id | transcript_id | | (object) | (object) | (int64) | (int64) | (category) | (object) | (object) | |--------------+------------+-----------+-----------+--------------+-----------------+-----------------| | 1 | intron | 818202 | 818722 | + | ENSG00000177757 | ENST00000326734 | | 1 | intron | 960800 | 961292 | + | ENSG00000187961 | ENST00000338591 | | 1 | intron | 961552 | 961628 | + | ENSG00000187961 | ENST00000338591 | | 1 | intron | 961750 | 961825 | + | ENSG00000187961 | ENST00000338591 | | ... | ... | ... | ... | ... | ... | ... | | 1 | intron | 732207 | 732980 | - | ENSG00000230021 | ENST00000648019 | | 1 | intron | 168165 | 169048 | - | ENSG00000241860 | ENST00000655252 | | 1 | intron | 165942 | 167958 | - | ENSG00000241860 | ENST00000662089 | | 1 | intron | 168165 | 169048 | - | ENSG00000241860 | ENST00000662089 | +--------------+------------+-----------+-----------+--------------+-----------------+-----------------+ Stranded PyRanges object has 1,043 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:function:: genome_bounds(gr, chromsizes, clip=False, only_right=False) Remove or clip intervals outside of genome bounds. :param gr: Input intervals :type gr: PyRanges :param chromsizes: Dict or PyRanges describing the lengths of the chromosomes. pyfaidx.Fasta object is also accepted since it conveniently loads chromosome length :type chromsizes: dict or PyRanges or pyfaidx.Fasta :param clip: Returns the portions of intervals within bounds, instead of dropping intervals entirely if they are even partially out of bounds :type clip: bool, default False :param only_right: If True, remove or clip only intervals that are out-of-bounds on the right, and do not alter those out-of-bounds on the left (whose Start is < 0) :type only_right: bool, default False .. rubric:: Examples >>> d = {"Chromosome": [1, 1, 3], "Start": [1, 249250600, 5], "End": [2, 249250640, 7]} >>> gr = pr.from_dict(d) >>> gr +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | 1 | 1 | 2 | | 1 | 249250600 | 249250640 | | 3 | 5 | 7 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> chromsizes = {"1": 249250621, "3": 500} >>> chromsizes {'1': 249250621, '3': 500} >>> pr.gf.genome_bounds(gr, chromsizes) +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | 1 | 1 | 2 | | 3 | 5 | 7 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 3 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> pr.gf.genome_bounds(gr, chromsizes, clip=True) +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | 1 | 1 | 2 | | 1 | 249250600 | 249250621 | | 3 | 5 | 7 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> del chromsizes['3'] >>> chromsizes {'1': 249250621} >>> pr.gf.genome_bounds(gr, chromsizes) Traceback (most recent call last): ... KeyError: '3' .. py:function:: tile_genome(genome, tile_size, tile_last=False) Create a tiled genome. :param chromsizes: Dict or PyRanges describing the lengths of the chromosomes. :type chromsizes: dict or PyRanges :param tile_size: Length of the tiles. :type tile_size: int :param tile_last: Use genome length as end of last tile. :type tile_last: bool, default False .. seealso:: :obj:`pyranges.PyRanges.tile` split intervals into adjacent non-overlapping tiles. .. rubric:: Examples >>> chromsizes = pr.data.chromsizes() >>> chromsizes +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 0 | 249250621 | | chr2 | 0 | 243199373 | | chr3 | 0 | 198022430 | | chr4 | 0 | 191154276 | | ... | ... | ... | | chr22 | 0 | 51304566 | | chrM | 0 | 16571 | | chrX | 0 | 155270560 | | chrY | 0 | 59373566 | +--------------+-----------+-----------+ Unstranded PyRanges object has 25 rows and 3 columns from 25 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> pr.gf.tile_genome(chromsizes, int(1e6)) +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 0 | 1000000 | | chr1 | 1000000 | 2000000 | | chr1 | 2000000 | 3000000 | | chr1 | 3000000 | 4000000 | | ... | ... | ... | | chrY | 56000000 | 57000000 | | chrY | 57000000 | 58000000 | | chrY | 58000000 | 59000000 | | chrY | 59000000 | 59373566 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,114 rows and 3 columns from 25 chromosomes. For printing, the PyRanges was sorted on Chromosome.