pyranges.genomicfeatures

Module Contents

Classes

GenomicFeaturesMethods

Namespace for methods using feature information.

Functions

genome_bounds(gr, chromsizes[, clip, only_right])

Remove or clip intervals outside of genome bounds.

tile_genome(genome, tile_size[, tile_last])

Create a tiled genome.

class pyranges.genomicfeatures.GenomicFeaturesMethods(pr)

Namespace for methods using feature information.

Accessed through gr.features.

pr
tss()

Return the transcription start sites.

Returns the 5’ for every interval with feature “transcript”.

See also

pyranges.genomicfeatures.GenomicFeaturesMethods.tes

return the transcription end sites

Examples

>>> gr = pr.data.ensembl_gtf()[["Source", "Feature"]]
>>> gr
+--------------+------------+--------------+-----------+-----------+--------------+
| Chromosome   | Source     | Feature      | Start     | End       | Strand       |
| (category)   | (object)   | (category)   | (int64)   | (int64)   | (category)   |
|--------------+------------+--------------+-----------+-----------+--------------|
| 1            | havana     | gene         | 11868     | 14409     | +            |
| 1            | havana     | transcript   | 11868     | 14409     | +            |
| 1            | havana     | exon         | 11868     | 12227     | +            |
| 1            | havana     | exon         | 12612     | 12721     | +            |
| ...          | ...        | ...          | ...       | ...       | ...          |
| 1            | havana     | gene         | 1173055   | 1179555   | -            |
| 1            | havana     | transcript   | 1173055   | 1179555   | -            |
| 1            | havana     | exon         | 1179364   | 1179555   | -            |
| 1            | havana     | exon         | 1173055   | 1176396   | -            |
+--------------+------------+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.features.tss()
+--------------+------------+------------+-----------+-----------+--------------+
| Chromosome   | Source     | Feature    | Start     | End       | Strand       |
| (category)   | (object)   | (object)   | (int64)   | (int64)   | (category)   |
|--------------+------------+------------+-----------+-----------+--------------|
| 1            | havana     | tss        | 11868     | 11869     | +            |
| 1            | havana     | tss        | 12009     | 12010     | +            |
| 1            | havana     | tss        | 29553     | 29554     | +            |
| 1            | havana     | tss        | 30266     | 30267     | +            |
| ...          | ...        | ...        | ...       | ...       | ...          |
| 1            | havana     | tss        | 1092812   | 1092813   | -            |
| 1            | havana     | tss        | 1116086   | 1116087   | -            |
| 1            | havana     | tss        | 1116088   | 1116089   | -            |
| 1            | havana     | tss        | 1179554   | 1179555   | -            |
+--------------+------------+------------+-----------+-----------+--------------+
Stranded PyRanges object has 280 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
tes(slack=0)

Return the transcription end sites.

Returns the 3’ for every interval with feature “transcript”.

See also

pyranges.genomicfeatures.GenomicFeaturesMethods.tss

return the transcription start sites

Examples

>>> gr = pr.data.ensembl_gtf()[["Source", "Feature"]]
>>> gr
+--------------+------------+--------------+-----------+-----------+--------------+
| Chromosome   | Source     | Feature      | Start     | End       | Strand       |
| (category)   | (object)   | (category)   | (int64)   | (int64)   | (category)   |
|--------------+------------+--------------+-----------+-----------+--------------|
| 1            | havana     | gene         | 11868     | 14409     | +            |
| 1            | havana     | transcript   | 11868     | 14409     | +            |
| 1            | havana     | exon         | 11868     | 12227     | +            |
| 1            | havana     | exon         | 12612     | 12721     | +            |
| ...          | ...        | ...          | ...       | ...       | ...          |
| 1            | havana     | gene         | 1173055   | 1179555   | -            |
| 1            | havana     | transcript   | 1173055   | 1179555   | -            |
| 1            | havana     | exon         | 1179364   | 1179555   | -            |
| 1            | havana     | exon         | 1173055   | 1176396   | -            |
+--------------+------------+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.features.tes()
+--------------+------------+------------+-----------+-----------+--------------+
| Chromosome   | Source     | Feature    | Start     | End       | Strand       |
| (category)   | (object)   | (object)   | (int64)   | (int64)   | (category)   |
|--------------+------------+------------+-----------+-----------+--------------|
| 1            | havana     | tes        | 14408     | 14409     | +            |
| 1            | havana     | tes        | 13669     | 13670     | +            |
| 1            | havana     | tes        | 31096     | 31097     | +            |
| 1            | havana     | tes        | 31108     | 31109     | +            |
| ...          | ...        | ...        | ...       | ...       | ...          |
| 1            | havana     | tes        | 1090405   | 1090406   | -            |
| 1            | havana     | tes        | 1091045   | 1091046   | -            |
| 1            | havana     | tes        | 1091499   | 1091500   | -            |
| 1            | havana     | tes        | 1173055   | 1173056   | -            |
+--------------+------------+------------+-----------+-----------+--------------+
Stranded PyRanges object has 280 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
introns(by='gene', nb_cpu=1)

Return the introns.

Parameters:
  • by (str, {"gene", "transcript"}, default "gene") – Whether to find introns per gene or transcript.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

See also

pyranges.genomicfeatures.GenomicFeaturesMethods.tss

return the transcription start sites

Examples

>>> gr = pr.data.ensembl_gtf()[["Feature", "gene_id", "transcript_id"]]
>>> gr
+--------------+--------------+-----------+-----------+--------------+-----------------+-----------------+
| Chromosome   | Feature      | Start     | End       | Strand       | gene_id         | transcript_id   |
| (category)   | (category)   | (int64)   | (int64)   | (category)   | (object)        | (object)        |
|--------------+--------------+-----------+-----------+--------------+-----------------+-----------------|
| 1            | gene         | 11868     | 14409     | +            | ENSG00000223972 | nan             |
| 1            | transcript   | 11868     | 14409     | +            | ENSG00000223972 | ENST00000456328 |
| 1            | exon         | 11868     | 12227     | +            | ENSG00000223972 | ENST00000456328 |
| 1            | exon         | 12612     | 12721     | +            | ENSG00000223972 | ENST00000456328 |
| ...          | ...          | ...       | ...       | ...          | ...             | ...             |
| 1            | gene         | 1173055   | 1179555   | -            | ENSG00000205231 | nan             |
| 1            | transcript   | 1173055   | 1179555   | -            | ENSG00000205231 | ENST00000379317 |
| 1            | exon         | 1179364   | 1179555   | -            | ENSG00000205231 | ENST00000379317 |
| 1            | exon         | 1173055   | 1176396   | -            | ENSG00000205231 | ENST00000379317 |
+--------------+--------------+-----------+-----------+--------------+-----------------+-----------------+
Stranded PyRanges object has 2,446 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.features.introns(by="gene")
+--------------+------------+-----------+-----------+--------------+-----------------+-----------------+
| Chromosome   | Feature    | Start     | End       | Strand       | gene_id         | transcript_id   |
| (object)     | (object)   | (int64)   | (int64)   | (category)   | (object)        | (object)        |
|--------------+------------+-----------+-----------+--------------+-----------------+-----------------|
| 1            | intron     | 1173926   | 1174265   | +            | ENSG00000162571 | nan             |
| 1            | intron     | 1174321   | 1174423   | +            | ENSG00000162571 | nan             |
| 1            | intron     | 1174489   | 1174520   | +            | ENSG00000162571 | nan             |
| 1            | intron     | 1175034   | 1179188   | +            | ENSG00000162571 | nan             |
| ...          | ...        | ...       | ...       | ...          | ...             | ...             |
| 1            | intron     | 874591    | 875046    | -            | ENSG00000283040 | nan             |
| 1            | intron     | 875155    | 875525    | -            | ENSG00000283040 | nan             |
| 1            | intron     | 875625    | 876526    | -            | ENSG00000283040 | nan             |
| 1            | intron     | 876611    | 876754    | -            | ENSG00000283040 | nan             |
+--------------+------------+-----------+-----------+--------------+-----------------+-----------------+
Stranded PyRanges object has 311 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.features.introns(by="transcript")
+--------------+------------+-----------+-----------+--------------+-----------------+-----------------+
| Chromosome   | Feature    | Start     | End       | Strand       | gene_id         | transcript_id   |
| (object)     | (object)   | (int64)   | (int64)   | (category)   | (object)        | (object)        |
|--------------+------------+-----------+-----------+--------------+-----------------+-----------------|
| 1            | intron     | 818202    | 818722    | +            | ENSG00000177757 | ENST00000326734 |
| 1            | intron     | 960800    | 961292    | +            | ENSG00000187961 | ENST00000338591 |
| 1            | intron     | 961552    | 961628    | +            | ENSG00000187961 | ENST00000338591 |
| 1            | intron     | 961750    | 961825    | +            | ENSG00000187961 | ENST00000338591 |
| ...          | ...        | ...       | ...       | ...          | ...             | ...             |
| 1            | intron     | 732207    | 732980    | -            | ENSG00000230021 | ENST00000648019 |
| 1            | intron     | 168165    | 169048    | -            | ENSG00000241860 | ENST00000655252 |
| 1            | intron     | 165942    | 167958    | -            | ENSG00000241860 | ENST00000662089 |
| 1            | intron     | 168165    | 169048    | -            | ENSG00000241860 | ENST00000662089 |
+--------------+------------+-----------+-----------+--------------+-----------------+-----------------+
Stranded PyRanges object has 1,043 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
pyranges.genomicfeatures.genome_bounds(gr, chromsizes, clip=False, only_right=False)

Remove or clip intervals outside of genome bounds.

Parameters:
  • gr (PyRanges) – Input intervals

  • chromsizes (dict or PyRanges or pyfaidx.Fasta) – Dict or PyRanges describing the lengths of the chromosomes. pyfaidx.Fasta object is also accepted since it conveniently loads chromosome length

  • clip (bool, default False) – Returns the portions of intervals within bounds, instead of dropping intervals entirely if they are even partially out of bounds

  • only_right (bool, default False) – If True, remove or clip only intervals that are out-of-bounds on the right, and do not alter those out-of-bounds on the left (whose Start is < 0)

Examples

>>> d = {"Chromosome": [1, 1, 3], "Start": [1, 249250600, 5], "End": [2, 249250640, 7]}
>>> gr = pr.from_dict(d)
>>> gr
+--------------+-----------+-----------+
|   Chromosome |     Start |       End |
|   (category) |   (int64) |   (int64) |
|--------------+-----------+-----------|
|            1 |         1 |         2 |
|            1 | 249250600 | 249250640 |
|            3 |         5 |         7 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 2 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> chromsizes = {"1": 249250621, "3": 500}
>>> chromsizes
{'1': 249250621, '3': 500}
>>> pr.gf.genome_bounds(gr, chromsizes)
+--------------+-----------+-----------+
|   Chromosome |     Start |       End |
|   (category) |   (int64) |   (int64) |
|--------------+-----------+-----------|
|            1 |         1 |         2 |
|            3 |         5 |         7 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 3 columns from 2 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> pr.gf.genome_bounds(gr, chromsizes, clip=True)
+--------------+-----------+-----------+
|   Chromosome |     Start |       End |
|   (category) |   (int64) |   (int64) |
|--------------+-----------+-----------|
|            1 |         1 |         2 |
|            1 | 249250600 | 249250621 |
|            3 |         5 |         7 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 2 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> del chromsizes['3']
>>> chromsizes
{'1': 249250621}
>>> pr.gf.genome_bounds(gr, chromsizes)
Traceback (most recent call last):
...
KeyError: '3'
pyranges.genomicfeatures.tile_genome(genome, tile_size, tile_last=False)

Create a tiled genome.

Parameters:
  • chromsizes (dict or PyRanges) – Dict or PyRanges describing the lengths of the chromosomes.

  • tile_size (int) – Length of the tiles.

  • tile_last (bool, default False) – Use genome length as end of last tile.

See also

pyranges.PyRanges.tile

split intervals into adjacent non-overlapping tiles.

Examples

>>> chromsizes = pr.data.chromsizes()
>>> chromsizes
+--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int64)   | (int64)   |
|--------------+-----------+-----------|
| chr1         | 0         | 249250621 |
| chr2         | 0         | 243199373 |
| chr3         | 0         | 198022430 |
| chr4         | 0         | 191154276 |
| ...          | ...       | ...       |
| chr22        | 0         | 51304566  |
| chrM         | 0         | 16571     |
| chrX         | 0         | 155270560 |
| chrY         | 0         | 59373566  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 25 rows and 3 columns from 25 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> pr.gf.tile_genome(chromsizes, int(1e6))
+--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int64)   | (int64)   |
|--------------+-----------+-----------|
| chr1         | 0         | 1000000   |
| chr1         | 1000000   | 2000000   |
| chr1         | 2000000   | 3000000   |
| chr1         | 3000000   | 4000000   |
| ...          | ...       | ...       |
| chrY         | 56000000  | 57000000  |
| chrY         | 57000000  | 58000000  |
| chrY         | 58000000  | 59000000  |
| chrY         | 59000000  | 59373566  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3,114 rows and 3 columns from 25 chromosomes.
For printing, the PyRanges was sorted on Chromosome.