pyranges.genomicfeatures
Module Contents
Classes
Namespace for methods using feature information. |
Functions
|
Remove or clip intervals outside of genome bounds. |
|
Create a tiled genome. |
- class pyranges.genomicfeatures.GenomicFeaturesMethods(pr)
Namespace for methods using feature information.
Accessed through gr.features.
- pr
- tss()
Return the transcription start sites.
Returns the 5’ for every interval with feature “transcript”.
See also
pyranges.genomicfeatures.GenomicFeaturesMethods.tes
return the transcription end sites
Examples
>>> gr = pr.data.ensembl_gtf()[["Source", "Feature"]] >>> gr +--------------+------------+--------------+-----------+-----------+--------------+ | Chromosome | Source | Feature | Start | End | Strand | | (category) | (object) | (category) | (int64) | (int64) | (category) | |--------------+------------+--------------+-----------+-----------+--------------| | 1 | havana | gene | 11868 | 14409 | + | | 1 | havana | transcript | 11868 | 14409 | + | | 1 | havana | exon | 11868 | 12227 | + | | 1 | havana | exon | 12612 | 12721 | + | | ... | ... | ... | ... | ... | ... | | 1 | havana | gene | 1173055 | 1179555 | - | | 1 | havana | transcript | 1173055 | 1179555 | - | | 1 | havana | exon | 1179364 | 1179555 | - | | 1 | havana | exon | 1173055 | 1176396 | - | +--------------+------------+--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.features.tss() +--------------+------------+------------+-----------+-----------+--------------+ | Chromosome | Source | Feature | Start | End | Strand | | (category) | (object) | (object) | (int64) | (int64) | (category) | |--------------+------------+------------+-----------+-----------+--------------| | 1 | havana | tss | 11868 | 11869 | + | | 1 | havana | tss | 12009 | 12010 | + | | 1 | havana | tss | 29553 | 29554 | + | | 1 | havana | tss | 30266 | 30267 | + | | ... | ... | ... | ... | ... | ... | | 1 | havana | tss | 1092812 | 1092813 | - | | 1 | havana | tss | 1116086 | 1116087 | - | | 1 | havana | tss | 1116088 | 1116089 | - | | 1 | havana | tss | 1179554 | 1179555 | - | +--------------+------------+------------+-----------+-----------+--------------+ Stranded PyRanges object has 280 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand.
- tes(slack=0)
Return the transcription end sites.
Returns the 3’ for every interval with feature “transcript”.
See also
pyranges.genomicfeatures.GenomicFeaturesMethods.tss
return the transcription start sites
Examples
>>> gr = pr.data.ensembl_gtf()[["Source", "Feature"]] >>> gr +--------------+------------+--------------+-----------+-----------+--------------+ | Chromosome | Source | Feature | Start | End | Strand | | (category) | (object) | (category) | (int64) | (int64) | (category) | |--------------+------------+--------------+-----------+-----------+--------------| | 1 | havana | gene | 11868 | 14409 | + | | 1 | havana | transcript | 11868 | 14409 | + | | 1 | havana | exon | 11868 | 12227 | + | | 1 | havana | exon | 12612 | 12721 | + | | ... | ... | ... | ... | ... | ... | | 1 | havana | gene | 1173055 | 1179555 | - | | 1 | havana | transcript | 1173055 | 1179555 | - | | 1 | havana | exon | 1179364 | 1179555 | - | | 1 | havana | exon | 1173055 | 1176396 | - | +--------------+------------+--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.features.tes() +--------------+------------+------------+-----------+-----------+--------------+ | Chromosome | Source | Feature | Start | End | Strand | | (category) | (object) | (object) | (int64) | (int64) | (category) | |--------------+------------+------------+-----------+-----------+--------------| | 1 | havana | tes | 14408 | 14409 | + | | 1 | havana | tes | 13669 | 13670 | + | | 1 | havana | tes | 31096 | 31097 | + | | 1 | havana | tes | 31108 | 31109 | + | | ... | ... | ... | ... | ... | ... | | 1 | havana | tes | 1090405 | 1090406 | - | | 1 | havana | tes | 1091045 | 1091046 | - | | 1 | havana | tes | 1091499 | 1091500 | - | | 1 | havana | tes | 1173055 | 1173056 | - | +--------------+------------+------------+-----------+-----------+--------------+ Stranded PyRanges object has 280 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand.
- introns(by='gene', nb_cpu=1)
Return the introns.
- Parameters:
by (str, {"gene", "transcript"}, default "gene") – Whether to find introns per gene or transcript.
nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.
See also
pyranges.genomicfeatures.GenomicFeaturesMethods.tss
return the transcription start sites
Examples
>>> gr = pr.data.ensembl_gtf()[["Feature", "gene_id", "transcript_id"]] >>> gr +--------------+--------------+-----------+-----------+--------------+-----------------+-----------------+ | Chromosome | Feature | Start | End | Strand | gene_id | transcript_id | | (category) | (category) | (int64) | (int64) | (category) | (object) | (object) | |--------------+--------------+-----------+-----------+--------------+-----------------+-----------------| | 1 | gene | 11868 | 14409 | + | ENSG00000223972 | nan | | 1 | transcript | 11868 | 14409 | + | ENSG00000223972 | ENST00000456328 | | 1 | exon | 11868 | 12227 | + | ENSG00000223972 | ENST00000456328 | | 1 | exon | 12612 | 12721 | + | ENSG00000223972 | ENST00000456328 | | ... | ... | ... | ... | ... | ... | ... | | 1 | gene | 1173055 | 1179555 | - | ENSG00000205231 | nan | | 1 | transcript | 1173055 | 1179555 | - | ENSG00000205231 | ENST00000379317 | | 1 | exon | 1179364 | 1179555 | - | ENSG00000205231 | ENST00000379317 | | 1 | exon | 1173055 | 1176396 | - | ENSG00000205231 | ENST00000379317 | +--------------+--------------+-----------+-----------+--------------+-----------------+-----------------+ Stranded PyRanges object has 2,446 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.features.introns(by="gene") +--------------+------------+-----------+-----------+--------------+-----------------+-----------------+ | Chromosome | Feature | Start | End | Strand | gene_id | transcript_id | | (object) | (object) | (int64) | (int64) | (category) | (object) | (object) | |--------------+------------+-----------+-----------+--------------+-----------------+-----------------| | 1 | intron | 1173926 | 1174265 | + | ENSG00000162571 | nan | | 1 | intron | 1174321 | 1174423 | + | ENSG00000162571 | nan | | 1 | intron | 1174489 | 1174520 | + | ENSG00000162571 | nan | | 1 | intron | 1175034 | 1179188 | + | ENSG00000162571 | nan | | ... | ... | ... | ... | ... | ... | ... | | 1 | intron | 874591 | 875046 | - | ENSG00000283040 | nan | | 1 | intron | 875155 | 875525 | - | ENSG00000283040 | nan | | 1 | intron | 875625 | 876526 | - | ENSG00000283040 | nan | | 1 | intron | 876611 | 876754 | - | ENSG00000283040 | nan | +--------------+------------+-----------+-----------+--------------+-----------------+-----------------+ Stranded PyRanges object has 311 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.features.introns(by="transcript") +--------------+------------+-----------+-----------+--------------+-----------------+-----------------+ | Chromosome | Feature | Start | End | Strand | gene_id | transcript_id | | (object) | (object) | (int64) | (int64) | (category) | (object) | (object) | |--------------+------------+-----------+-----------+--------------+-----------------+-----------------| | 1 | intron | 818202 | 818722 | + | ENSG00000177757 | ENST00000326734 | | 1 | intron | 960800 | 961292 | + | ENSG00000187961 | ENST00000338591 | | 1 | intron | 961552 | 961628 | + | ENSG00000187961 | ENST00000338591 | | 1 | intron | 961750 | 961825 | + | ENSG00000187961 | ENST00000338591 | | ... | ... | ... | ... | ... | ... | ... | | 1 | intron | 732207 | 732980 | - | ENSG00000230021 | ENST00000648019 | | 1 | intron | 168165 | 169048 | - | ENSG00000241860 | ENST00000655252 | | 1 | intron | 165942 | 167958 | - | ENSG00000241860 | ENST00000662089 | | 1 | intron | 168165 | 169048 | - | ENSG00000241860 | ENST00000662089 | +--------------+------------+-----------+-----------+--------------+-----------------+-----------------+ Stranded PyRanges object has 1,043 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand.
- pyranges.genomicfeatures.genome_bounds(gr, chromsizes, clip=False, only_right=False)
Remove or clip intervals outside of genome bounds.
- Parameters:
gr (PyRanges) – Input intervals
chromsizes (dict or PyRanges or pyfaidx.Fasta) – Dict or PyRanges describing the lengths of the chromosomes. pyfaidx.Fasta object is also accepted since it conveniently loads chromosome length
clip (bool, default False) – Returns the portions of intervals within bounds, instead of dropping intervals entirely if they are even partially out of bounds
only_right (bool, default False) – If True, remove or clip only intervals that are out-of-bounds on the right, and do not alter those out-of-bounds on the left (whose Start is < 0)
Examples
>>> d = {"Chromosome": [1, 1, 3], "Start": [1, 249250600, 5], "End": [2, 249250640, 7]} >>> gr = pr.from_dict(d) >>> gr +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | 1 | 1 | 2 | | 1 | 249250600 | 249250640 | | 3 | 5 | 7 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome.
>>> chromsizes = {"1": 249250621, "3": 500} >>> chromsizes {'1': 249250621, '3': 500}
>>> pr.gf.genome_bounds(gr, chromsizes) +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | 1 | 1 | 2 | | 3 | 5 | 7 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 3 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome.
>>> pr.gf.genome_bounds(gr, chromsizes, clip=True) +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | 1 | 1 | 2 | | 1 | 249250600 | 249250621 | | 3 | 5 | 7 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome.
>>> del chromsizes['3'] >>> chromsizes {'1': 249250621}
>>> pr.gf.genome_bounds(gr, chromsizes) Traceback (most recent call last): ... KeyError: '3'
- pyranges.genomicfeatures.tile_genome(genome, tile_size, tile_last=False)
Create a tiled genome.
- Parameters:
chromsizes (dict or PyRanges) – Dict or PyRanges describing the lengths of the chromosomes.
tile_size (int) – Length of the tiles.
tile_last (bool, default False) – Use genome length as end of last tile.
See also
pyranges.PyRanges.tile
split intervals into adjacent non-overlapping tiles.
Examples
>>> chromsizes = pr.data.chromsizes() >>> chromsizes +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 0 | 249250621 | | chr2 | 0 | 243199373 | | chr3 | 0 | 198022430 | | chr4 | 0 | 191154276 | | ... | ... | ... | | chr22 | 0 | 51304566 | | chrM | 0 | 16571 | | chrX | 0 | 155270560 | | chrY | 0 | 59373566 | +--------------+-----------+-----------+ Unstranded PyRanges object has 25 rows and 3 columns from 25 chromosomes. For printing, the PyRanges was sorted on Chromosome.
>>> pr.gf.tile_genome(chromsizes, int(1e6)) +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 0 | 1000000 | | chr1 | 1000000 | 2000000 | | chr1 | 2000000 | 3000000 | | chr1 | 3000000 | 4000000 | | ... | ... | ... | | chrY | 56000000 | 57000000 | | chrY | 57000000 | 58000000 | | chrY | 58000000 | 59000000 | | chrY | 59000000 | 59373566 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3,114 rows and 3 columns from 25 chromosomes. For printing, the PyRanges was sorted on Chromosome.