pyranges

Submodules

Package Contents

Classes

PyRanges

Two-dimensional representation of genomic intervals and their annotations.

Functions

count_overlaps(grs[, features, strandedness, how, nb_cpu])

Count overlaps in multiple pyranges.

read_bam(f[, sparse, as_df, mapq, required_flag, ...])

Return bam file as PyRanges.

read_bed(f[, as_df, nrows])

Return bed file as PyRanges.

read_gff3(f[, full, annotation, as_df, nrows])

Read files in the General Feature Format.

read_gtf(f[, full, as_df, nrows, duplicate_attr, ...])

Read files in the Gene Transfer Format.

from_dict(d[, int64])

Create a PyRanges from dict.

from_string(s[, int64])

Create a PyRanges from multiline string.

itergrs(prs[, strand, keys])

Iterate over multiple PyRanges at once.

random([n, length, chromsizes, strand, int64, seed])

Return PyRanges with random intervals.

to_bigwig(gr, path, chromosome_sizes)

Write df to bigwig.

version_info()

pyranges.count_overlaps(grs, features=None, strandedness=None, how=None, nb_cpu=1)

Count overlaps in multiple pyranges.

Parameters:
  • grs (dict of PyRanges) – The PyRanges to use as queries.

  • features (PyRanges, default None) – The PyRanges to use as subject in the query. If None, the PyRanges themselves are used as a query.

  • strandedness ({None, "same", "opposite", False}, default None, i.e. auto) –

    Whether to compare PyRanges on the same strand, the opposite or ignore strand

    information. The default, None, means use “same” if both PyRanges are stranded, otherwise ignore the strand information.

    how : {None, “all”, “containment”}, default None, i.e. all

    What intervals to report. By default reports all overlapping intervals. “containment” reports intervals where the overlapping is contained within it.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

Examples

>>> a = '''Chromosome Start End
... chr1    6    12
... chr1    10    20
... chr1    22    27
... chr1    24    30'''
>>> b = '''Chromosome Start End
... chr1    12    32
... chr1    14    30'''
>>> c = '''Chromosome Start End
... chr1    8    15
... chr1    10    14
... chr1    32    34'''
>>> grs = {n: pr.from_string(s) for n, s in zip(["a", "b", "c"], [a, b, c])}
>>> for k, v in grs.items():
...     print("Name: " + k)
...     print(v)
Name: a
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         6 |        12 |
| chr1         |        10 |        20 |
| chr1         |        22 |        27 |
| chr1         |        24 |        30 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 4 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
Name: b
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |        12 |        32 |
| chr1         |        14 |        30 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
Name: c
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         8 |        15 |
| chr1         |        10 |        14 |
| chr1         |        32 |        34 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> pr.count_overlaps(grs)
+--------------+-----------+-----------+-----------+-----------+-----------+
| Chromosome   | Start     | End       | a         | b         | c         |
| (object)     | (int64)   | (int64)   | (int64)   | (int64)   | (int64)   |
|--------------+-----------+-----------+-----------+-----------+-----------|
| chr1         | 6         | 8         | 1         | 0         | 0         |
| chr1         | 8         | 10        | 1         | 0         | 1         |
| chr1         | 10        | 12        | 2         | 0         | 2         |
| chr1         | 12        | 14        | 1         | 1         | 2         |
| ...          | ...       | ...       | ...       | ...       | ...       |
| chr1         | 24        | 27        | 2         | 2         | 0         |
| chr1         | 27        | 30        | 1         | 2         | 0         |
| chr1         | 30        | 32        | 0         | 1         | 0         |
| chr1         | 32        | 34        | 0         | 0         | 1         |
+--------------+-----------+-----------+-----------+-----------+-----------+
Unstranded PyRanges object has 12 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr = pr.PyRanges(chromosomes=["chr1"] * 4, starts=[0, 10, 20, 30], ends=[10, 20, 30, 40])
>>> gr
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         0 |        10 |
| chr1         |        10 |        20 |
| chr1         |        20 |        30 |
| chr1         |        30 |        40 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 4 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> pr.count_overlaps(grs, gr)
+--------------+-----------+-----------+-----------+-----------+-----------+
| Chromosome   |     Start |       End |         a |         b |         c |
| (category)   |   (int64) |   (int64) |   (int64) |   (int64) |   (int64) |
|--------------+-----------+-----------+-----------+-----------+-----------|
| chr1         |         0 |        10 |         1 |         0 |         1 |
| chr1         |        10 |        20 |         2 |         2 |         2 |
| chr1         |        20 |        30 |         2 |         2 |         0 |
| chr1         |        30 |        40 |         0 |         1 |         1 |
+--------------+-----------+-----------+-----------+-----------+-----------+
Unstranded PyRanges object has 4 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
class pyranges.PyRanges(df=None, chromosomes=None, starts=None, ends=None, strands=None, int64=False, copy_df=True)

Two-dimensional representation of genomic intervals and their annotations.

A PyRanges object must have the columns Chromosome, Start and End. These describe the genomic position and function as implicit row labels. A Strand column is optional and adds strand information to the intervals. Any other columns are allowed and are considered metadata.

Operations between PyRanges align intervals based on their position.

If a PyRanges is built using the arguments chromosomes, starts, ends and optionally strands, all non-scalars must be of the same length.

Parameters:
  • df (pandas.DataFrame or dict of pandas.DataFrame, default None) – The data to be stored in the PyRanges.

  • chromosomes (array-like or scalar value, default None) – The chromosome(s) in the PyRanges.

  • starts (array-like, default None) – The start postions in the PyRanges.

  • ends (array-like, default None) – The end postions in the PyRanges.

  • strands (array-like or scalar value, default None) – The strands in the PyRanges.

  • copy_df (bool, default True) – Copy input pandas.DataFrame

See also

pyranges.read_bed

read bed-file into PyRanges

pyranges.read_bam

read bam-file into PyRanges

pyranges.read_gff

read gff-file into PyRanges

pyranges.read_gtf

read gtf-file into PyRanges

pyranges.from_dict

create PyRanges from dict of columns

pyranges.from_string

create PyRanges from multiline string

Notes

A PyRanges object is represented internally as a dictionary efficiency. The keys are chromosomes or chromosome/strand tuples and the values are pandas DataFrames.

Examples

>>> pr.PyRanges()
Empty PyRanges
>>> pr.PyRanges(chromosomes="chr1", starts=(1, 5), ends=[3, 149],
...             strands=("+", "-"))
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         1 |         3 | +            |
| chr1         |         5 |       149 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> df = pd.DataFrame({"Chromosome": ["chr1", "chr2"], "Start": [100, 200],
...                    "End": [150, 201]})
>>> df
  Chromosome  Start  End
0       chr1    100  150
1       chr2    200  201
>>> pr.PyRanges(df)
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |       100 |       150 |
| chr2         |       200 |       201 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 3 columns from 2 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr = pr.from_dict({"Chromosome": [1, 1], "Strand": ["+", "-"], "Start": [1, 4], "End": [2, 27],
...                    "TP": [0, 1], "FP": [12, 11], "TN": [10, 9], "FN": [2, 3]})
>>> gr
+--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------+
|   Chromosome | Strand       |     Start |       End |        TP |        FP |        TN |        FN |
|   (category) | (category)   |   (int64) |   (int64) |   (int64) |   (int64) |   (int64) |   (int64) |
|--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------|
|            1 | +            |         1 |         2 |         0 |        12 |        10 |         2 |
|            1 | -            |         4 |        27 |         1 |        11 |         9 |         3 |
+--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------+
Stranded PyRanges object has 2 rows and 8 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
property chromosomes

Return chromosomes in natsorted order.

property columns

Return the column labels of the PyRanges.

Return type:

pandas.Index

See also

PyRanges.chromosomes

return the chromosomes in the PyRanges

Examples

>>> f2 = pr.data.f2()
>>> f2
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |         1 |         2 | a          |         0 | +            |
| chr1         |         6 |         7 | b          |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> f2.columns
Index(['Chromosome', 'Start', 'End', 'Name', 'Score', 'Strand'], dtype='object')
>>> f2.columns = f2.columns.str.replace("Sco|re", "NYAN", regex=True)
>>> f2
+--------------+-----------+-----------+------------+------------+--------------+
| Chromosome   |     Start |       End | Name       |   NYANNYAN | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |    (int64) | (category)   |
|--------------+-----------+-----------+------------+------------+--------------|
| chr1         |         1 |         2 | a          |          0 | +            |
| chr1         |         6 |         7 | b          |          0 | -            |
+--------------+-----------+-----------+------------+------------+--------------+
Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
property df

Return PyRanges as DataFrame.

See also

PyRanges.as_df

return PyRanges as DataFrame.

property dtypes

Return the dtypes of the PyRanges.

Examples

>>> gr = pr.data.chipseq()
>>> gr
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   | Start     | End       | Name       | Score     | Strand       |
| (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         | 212609534 | 212609559 | U0         | 0         | +            |
| chr1         | 169887529 | 169887554 | U0         | 0         | +            |
| chr1         | 216711011 | 216711036 | U0         | 0         | +            |
| chr1         | 144227079 | 144227104 | U0         | 0         | +            |
| ...          | ...       | ...       | ...        | ...       | ...          |
| chrY         | 15224235  | 15224260  | U0         | 0         | -            |
| chrY         | 13517892  | 13517917  | U0         | 0         | -            |
| chrY         | 8010951   | 8010976   | U0         | 0         | -            |
| chrY         | 7405376   | 7405401   | U0         | 0         | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.dtypes
Chromosome    category
Start            int64
End              int64
Name            object
Score            int64
Strand        category
dtype: object
property empty

Indicate whether PyRanges is empty.

property length

Return the total length of the intervals.

See also

PyRanges.lengths

return the intervals lengths

Examples

>>> gr = pr.data.f1()
>>> gr
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |         3 |         6 | interval1  |         0 | +            |
| chr1         |         8 |         9 | interval3  |         0 | +            |
| chr1         |         5 |         7 | interval2  |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.length
6

To find the length of the genome covered by the intervals, use merge first:

>>> gr.merge(strand=False).length
5
property stranded

Whether PyRanges has (valid) strand info.

Note

A PyRanges can have invalid values in the Strand-column. It is not considered stranded.

See also

PyRanges.strands

return the strands

Examples

>>> d =  {'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6],
...       'End': [5, 8], 'Strand': ['+', '.']}
>>> gr = pr.from_dict(d)
>>> gr
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         1 |         5 | +            |
| chr1         |         6 |         8 | .            |
+--------------+-----------+-----------+--------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
Considered unstranded due to these Strand values: '.'
>>> gr.stranded
False
>>> "Strand" in gr.columns
True
property strands

Return strands.

Notes

If the strand-column contains an invalid value, [] is returned.

See also

PyRanges.stranded

whether has valid strand info

Examples

>>> d =  {'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6],
...       'End': [5, 8], 'Strand': ['+', '.']}
>>> gr = pr.from_dict(d)
>>> gr
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         1 |         5 | +            |
| chr1         |         6 |         8 | .            |
+--------------+-----------+-----------+--------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
Considered unstranded due to these Strand values: '.'
>>> gr.strands
[]
>>> gr.Strand.drop_duplicates().to_list()
['+', '.']
>>> gr.Strand = ["+", "-"]
>>> gr.strands
['+', '-']
dfs

Dict mapping chromosomes or chromosome/strand pairs to pandas DataFrames.

features

Namespace for genomic-features methods.

See also

pyranges.genomicfeatures

namespace for feature-functionality

pyranges.genomicfeatures.GenomicFeaturesMethods

namespace for feature-functionality

stats

Namespace for statistcal methods.

See also

pyranges.statistics

namespace for statistics

pyranges.stats.StatisticsMethods

namespace for statistics

__array_ufunc__(*args, **kwargs)

Apply unary numpy-function.

Apply function to all columns which are not index, i.e. Chromosome, Start, End nor Strand.

Notes

Function must produce a vector of equal length.

Examples

>>> gr = pr.from_dict({"Chromosome": [1, 2, 3], "Start": [1, 2, 3],
... "End": [2, 3, 4], "Score": [9, 16, 25], "Score2": [121, 144, 169],
... "Name": ["n1", "n2", "n3"]})
>>> gr
+--------------+-----------+-----------+-----------+-----------+------------+
|   Chromosome |     Start |       End |     Score |    Score2 | Name       |
|   (category) |   (int64) |   (int64) |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+-----------+-----------+------------|
|            1 |         1 |         2 |         9 |       121 | n1         |
|            2 |         2 |         3 |        16 |       144 | n2         |
|            3 |         3 |         4 |        25 |       169 | n3         |
+--------------+-----------+-----------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 6 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> np.sqrt(gr)
+--------------+-----------+-----------+-------------+-------------+------------+
|   Chromosome |     Start |       End |       Score |      Score2 | Name       |
|   (category) |   (int64) |   (int64) |   (float64) |   (float64) | (object)   |
|--------------+-----------+-----------+-------------+-------------+------------|
|            1 |         1 |         2 |           3 |          11 | n1         |
|            2 |         2 |         3 |           4 |          12 | n2         |
|            3 |         3 |         4 |           5 |          13 | n3         |
+--------------+-----------+-----------+-------------+-------------+------------+
Unstranded PyRanges object has 3 rows and 6 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
__getattr__(name)

Return column.

Parameters:

name (str) – Column to return

Return type:

pandas.Series

Example

>>> gr = pr.from_dict({"Chromosome": [1, 1, 1], "Start": [0, 100, 250], "End": [10, 125, 251]})
>>> gr.Start
0      0
1    100
2    250
Name: Start, dtype: int64
__setattr__(column_name, column)

Insert or update column.

Parameters:
  • column_name (str) – Name of column to update or insert.

  • column (list, np.array or pd.Series) – Data to insert.

Example

>>> gr = pr.from_dict({"Chromosome": [1, 1, 1], "Start": [0, 100, 250], "End": [10, 125, 251]})
>>> gr.Start = np.array([1, 1, 2], dtype=np.int64)
>>> gr
+--------------+-----------+-----------+
|   Chromosome |     Start |       End |
|   (category) |   (int64) |   (int64) |
|--------------+-----------+-----------|
|            1 |         1 |        10 |
|            1 |         1 |       125 |
|            1 |         2 |       251 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
__getitem__(val)

Fetch columns or subset on position.

If a list is provided, the column(s) in the list is returned. This subsets on columns.

If a numpy array is provided, it must be of type bool and the same length as the PyRanges.

Otherwise, a subset of the rows is returned with the location info provided.

Parameters:

val (bool array/Series, tuple, list, str or slice) – Data to fetch.

Examples

>>> gr = pr.data.ensembl_gtf()
>>> list(gr.columns)
['Chromosome', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'gene_biotype', 'gene_id', 'gene_name', 'gene_source', 'gene_version', 'tag', 'transcript_biotype', 'transcript_id', 'transcript_name', 'transcript_source', 'transcript_support_level', 'transcript_version', 'exon_id', 'exon_number', 'exon_version', '(assigned', 'previous', 'protein_id', 'protein_version', 'ccds_id']
>>> gr = gr[["Source", "Feature", "gene_id"]]
>>> gr
+--------------+------------+--------------+-----------+-----------+--------------+-----------------+
| Chromosome   | Source     | Feature      | Start     | End       | Strand       | gene_id         |
| (category)   | (object)   | (category)   | (int64)   | (int64)   | (category)   | (object)        |
|--------------+------------+--------------+-----------+-----------+--------------+-----------------|
| 1            | havana     | gene         | 11868     | 14409     | +            | ENSG00000223972 |
| 1            | havana     | transcript   | 11868     | 14409     | +            | ENSG00000223972 |
| 1            | havana     | exon         | 11868     | 12227     | +            | ENSG00000223972 |
| 1            | havana     | exon         | 12612     | 12721     | +            | ENSG00000223972 |
| ...          | ...        | ...          | ...       | ...       | ...          | ...             |
| 1            | havana     | gene         | 1173055   | 1179555   | -            | ENSG00000205231 |
| 1            | havana     | transcript   | 1173055   | 1179555   | -            | ENSG00000205231 |
| 1            | havana     | exon         | 1179364   | 1179555   | -            | ENSG00000205231 |
| 1            | havana     | exon         | 1173055   | 1176396   | -            | ENSG00000205231 |
+--------------+------------+--------------+-----------+-----------+--------------+-----------------+
Stranded PyRanges object has 2,446 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

Create boolean Series and use it to subset:

>>> s = (gr.Feature == "gene") | (gr.gene_id == "ENSG00000223972")
>>> gr[s]
+--------------+----------------+--------------+-----------+-----------+--------------+-----------------+
| Chromosome   | Source         | Feature      | Start     | End       | Strand       | gene_id         |
| (category)   | (object)       | (category)   | (int64)   | (int64)   | (category)   | (object)        |
|--------------+----------------+--------------+-----------+-----------+--------------+-----------------|
| 1            | havana         | gene         | 11868     | 14409     | +            | ENSG00000223972 |
| 1            | havana         | transcript   | 11868     | 14409     | +            | ENSG00000223972 |
| 1            | havana         | exon         | 11868     | 12227     | +            | ENSG00000223972 |
| 1            | havana         | exon         | 12612     | 12721     | +            | ENSG00000223972 |
| ...          | ...            | ...          | ...       | ...       | ...          | ...             |
| 1            | havana         | gene         | 1062207   | 1063288   | -            | ENSG00000273443 |
| 1            | ensembl_havana | gene         | 1070966   | 1074306   | -            | ENSG00000237330 |
| 1            | ensembl_havana | gene         | 1081817   | 1116361   | -            | ENSG00000131591 |
| 1            | havana         | gene         | 1173055   | 1179555   | -            | ENSG00000205231 |
+--------------+----------------+--------------+-----------+-----------+--------------+-----------------+
Stranded PyRanges object has 95 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> cs = pr.data.chipseq()
>>> cs[10000:100000]
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr2         |     33241 |     33266 | U0         |         0 | +            |
| chr2         |     13611 |     13636 | U0         |         0 | -            |
| chr2         |     32620 |     32645 | U0         |         0 | -            |
| chr3         |     87179 |     87204 | U0         |         0 | +            |
| chr4         |     45413 |     45438 | U0         |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 5 rows and 6 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> cs["chr1", "-"]
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   | Start     | End       | Name       | Score     | Strand       |
| (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         | 100079649 | 100079674 | U0         | 0         | -            |
| chr1         | 223587418 | 223587443 | U0         | 0         | -            |
| chr1         | 202450161 | 202450186 | U0         | 0         | -            |
| chr1         | 156338310 | 156338335 | U0         | 0         | -            |
| ...          | ...       | ...       | ...        | ...       | ...          |
| chr1         | 203557775 | 203557800 | U0         | 0         | -            |
| chr1         | 28114107  | 28114132  | U0         | 0         | -            |
| chr1         | 21622765  | 21622790  | U0         | 0         | -            |
| chr1         | 80668132  | 80668157  | U0         | 0         | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 437 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> cs["chr5", "-", 90000:]
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   | Start     | End       | Name       | Score     | Strand       |
| (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr5         | 399682    | 399707    | U0         | 0         | -            |
| chr5         | 1847502   | 1847527   | U0         | 0         | -            |
| chr5         | 5247533   | 5247558   | U0         | 0         | -            |
| chr5         | 5300394   | 5300419   | U0         | 0         | -            |
| ...          | ...       | ...       | ...        | ...       | ...          |
| chr5         | 178786234 | 178786259 | U0         | 0         | -            |
| chr5         | 179268931 | 179268956 | U0         | 0         | -            |
| chr5         | 179289594 | 179289619 | U0         | 0         | -            |
| chr5         | 180513795 | 180513820 | U0         | 0         | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 285 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> cs["chrM"]
Empty PyRanges
__iter__()

Iterate over the keys and values.

See also

pyranges.iter

iterate over multiple PyRanges

Examples

>>> gr = pr.from_dict({"Chromosome": [1, 1, 1], "Start": [0, 100, 250],
...                   "End": [10, 125, 251], "Strand": ["+", "+", "-"]})
>>> for k, v in gr:
...     print(k)
...     print(v)
('1', '+')
  Chromosome  Start  End Strand
0          1      0   10      +
1          1    100  125      +
('1', '-')
  Chromosome  Start  End Strand
2          1    250  251      -
__len__()

Return the number of intervals in the PyRanges.

__str__()

Return string representation.

__repr__()

Return REPL representation.

_repr_html_()

Return REPL HTML representation for Jupyter Noteboooks.

apply(f, strand=None, as_pyranges=True, nb_cpu=1, **kwargs)

Apply a function to the PyRanges.

Parameters:
  • f (function) – Function to apply on each DataFrame in a PyRanges

  • strand (bool, default None, i.e. auto) – Whether to do operations on chromosome/strand pairs or chromosomes. If None, will use chromosome/strand pairs if the PyRanges is stranded.

  • as_pyranges (bool, default True) – Whether to return as a PyRanges or dict. If f does not return a DataFrame valid for PyRanges, as_pyranges must be False.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

  • **kwargs – Additional keyword arguments to pass as keyword arguments to f

Returns:

Result of applying f to each DataFrame in the PyRanges

Return type:

PyRanges or dict

See also

pyranges.PyRanges.apply_pair

apply a function to a pair of PyRanges

pyranges.PyRanges.apply_chunks

apply a row-based function to a PyRanges in parallel

Note

This is the function used internally to carry out almost all unary PyRanges methods.

Examples

>>> gr = pr.from_dict({"Chromosome": [1, 1, 2, 2], "Strand": ["+", "+", "-", "+"],
...                    "Start": [1, 4, 2, 9], "End": [2, 27, 13, 10]})
>>> gr
+--------------+--------------+-----------+-----------+
|   Chromosome | Strand       |     Start |       End |
|   (category) | (category)   |   (int64) |   (int64) |
|--------------+--------------+-----------+-----------|
|            1 | +            |         1 |         2 |
|            1 | +            |         4 |        27 |
|            2 | +            |         9 |        10 |
|            2 | -            |         2 |        13 |
+--------------+--------------+-----------+-----------+
Stranded PyRanges object has 4 rows and 4 columns from 2 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.apply(lambda df: len(df), as_pyranges=False)
{('1', '+'): 2, ('2', '+'): 1, ('2', '-'): 1}
>>> gr.apply(lambda df: len(df), as_pyranges=False, strand=False)
{'1': 2, '2': 2}
>>> def add_to_ends(df, **kwargs):
...     df.loc[:, "End"] = kwargs["slack"] + df.End
...     return df
>>> gr.apply(add_to_ends, slack=500)
+--------------+--------------+-----------+-----------+
|   Chromosome | Strand       |     Start |       End |
|   (category) | (category)   |   (int64) |   (int64) |
|--------------+--------------+-----------+-----------|
|            1 | +            |         1 |       502 |
|            1 | +            |         4 |       527 |
|            2 | +            |         9 |       510 |
|            2 | -            |         2 |       513 |
+--------------+--------------+-----------+-----------+
Stranded PyRanges object has 4 rows and 4 columns from 2 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
apply_chunks(f, as_pyranges=False, nb_cpu=1, **kwargs)

Apply a row-based function to arbitrary partitions of the PyRanges.

apply_chunks speeds up the application of functions where the result is not affected by applying the function to ordered, non-overlapping splits of the data.

Parameters:
  • f (function) – Row-based or associative function to apply on the partitions.

  • as_pyranges (bool, default False) – Whether to return as a PyRanges or dict.

  • nb_cpu (int, default 1) – How many cpus to use. The data is split into nb_cpu partitions.

  • **kwargs – Additional keyword arguments to pass as keyword arguments to f

Returns:

Result of applying f to each partition of the DataFrames in the PyRanges.

Return type:

dict of lists

See also

pyranges.PyRanges.apply_pair

apply a function to a pair of PyRanges

pyranges.PyRanges.apply_chunks

apply a row-based function to a PyRanges in parallel

Note

apply_chunks will only lead to speedups on large datasets or slow-running functions. Using it with nb_cpu=1 is pointless; use apply instead.

Examples

>>> gr = pr.from_dict({"Chromosome": [1, 1, 1], "Start": [2, 3, 5], "End": [9, 4, 6]})
>>> gr
+--------------+-----------+-----------+
|   Chromosome |     Start |       End |
|   (category) |   (int64) |   (int64) |
|--------------+-----------+-----------|
|            1 |         2 |         9 |
|            1 |         3 |         4 |
|            1 |         5 |         6 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.apply_chunks(
... lambda df, **kwargs: list(df.End + kwargs["add"]), nb_cpu=1, add=1000)
{'1': [[1009, 1004, 1006]]}
apply_pair(other, f, strandedness=None, as_pyranges=True, **kwargs)

Apply a function to a pair of PyRanges.

The function is applied to each chromosome or chromosome/strand pair found in at least one of the PyRanges.

Parameters:
  • f (function) – Row-based or associative function to apply on the DataFrames.

  • strandedness ({None, "same", "opposite", False}, default None, i.e. auto) – Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use “same” if both PyRanges are strande, otherwise ignore the strand information.

  • as_pyranges (bool, default False) – Whether to return as a PyRanges or dict. If f does not return a DataFrame valid for PyRanges, as_pyranges must be False.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

  • **kwargs – Additional keyword arguments to pass as keyword arguments to f

Returns:

Result of applying f to each partition of the DataFrames in the PyRanges.

Return type:

dict of lists

See also

pyranges.PyRanges.apply_pair

apply a function to a pair of PyRanges

pyranges.PyRanges.apply_chunks

apply a row-based function to a PyRanges in parallel

pyranges.iter

iterate over two or more PyRanges

Note

This is the function used internally to carry out almost all comparison functions in PyRanges.

Examples

>>> gr = pr.data.chipseq()
>>> gr2 = pr.data.chipseq_background()
>>> gr.apply_pair(gr2, pr.methods.intersection._intersection) # same as gr.intersect(gr2)
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         | 226987603 | 226987617 | U0         |         0 | +            |
| chr8         |  38747236 |  38747251 | U0         |         0 | -            |
| chr15        |  26105515 |  26105518 | U0         |         0 | +            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 3 rows and 6 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> f1 = pr.data.f1()
>>> f1
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |         3 |         6 | interval1  |         0 | +            |
| chr1         |         8 |         9 | interval3  |         0 | +            |
| chr1         |         5 |         7 | interval2  |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> f2 = pr.data.f2()
>>> f2
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |         1 |         2 | a          |         0 | +            |
| chr1         |         6 |         7 | b          |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> f1.apply_pair(f2, lambda df, df2: (len(df), len(df2)), as_pyranges=False)
{('chr1', '+'): (2, 2), ('chr1', '-'): (1, 2)}
as_df()

Return PyRanges as DataFrame.

Returns:

A DataFrame natural sorted on Chromosome and Strand. The ordering of rows within chromosomes and strands is preserved.

Return type:

DataFrame

See also

PyRanges.df

Return PyRanges as DataFrame.

Examples

>>> gr = pr.from_dict({"Chromosome": [1, 1, 2, 2], "Start": [1, 2, 3, 9],
...                    "End": [3, 3, 10, 12], "Gene": ["A", "B", "C", "D"]})
>>> gr
+--------------+-----------+-----------+------------+
|   Chromosome |     Start |       End | Gene       |
|   (category) |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
|            1 |         1 |         3 | A          |
|            1 |         2 |         3 | B          |
|            2 |         3 |        10 | C          |
|            2 |         9 |        12 | D          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 4 rows and 4 columns from 2 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.as_df()
  Chromosome  Start  End Gene
0          1      1    3    A
1          1      2    3    B
2          2      3   10    C
3          2      9   12    D
assign(col, f, strand=None, nb_cpu=1, **kwargs)

Add or replace a column.

Does not change the original PyRanges.

Parameters:
  • col (str) – Name of column.

  • f (function) – Function to create new column.

  • strand (bool, default None, i.e. auto) – Whether to do operations on chromosome/strand pairs or chromosomes. If None, will use chromosome/strand pairs if the PyRanges is stranded.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

  • **kwargs – Additional keyword arguments to pass as keyword arguments to f

Returns:

A copy of the PyRanges with the column inserted.

Return type:

PyRanges

Examples

>>> gr = pr.from_dict({"Chromosome": [1, 1], "Start": [1, 2], "End": [3, 5],
... "Name": ["a", "b"]})
>>> gr
+--------------+-----------+-----------+------------+
|   Chromosome |     Start |       End | Name       |
|   (category) |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
|            1 |         1 |         3 | a          |
|            1 |         2 |         5 | b          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.assign("Blabla", lambda df: df.Chromosome.astype(str) + "_yadayada")
+--------------+-----------+-----------+------------+------------+
|   Chromosome |     Start |       End | Name       | Blabla     |
|   (category) |   (int64) |   (int64) | (object)   | (object)   |
|--------------+-----------+-----------+------------+------------|
|            1 |         1 |         3 | a          | 1_yadayada |
|            1 |         2 |         5 | b          | 1_yadayada |
+--------------+-----------+-----------+------------+------------+
Unstranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.

Note that assigning to an existing name replaces the column:

>>> gr.assign("Name",
... lambda df, **kwargs: df.Start.astype(str) + kwargs["sep"] +
... df.Name.str.capitalize(), sep="_")
+--------------+-----------+-----------+------------+
|   Chromosome |     Start |       End | Name       |
|   (category) |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
|            1 |         1 |         3 | 1_A        |
|            1 |         2 |         5 | 2_B        |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
boundaries(group_by, agg=None)

Return the boundaries of groups of intervals (e.g. transcripts)

Parameters:
  • group_by (str or list of str) – Name(s) of column(s) to group intervals

  • agg (dict or None) – Defines how to aggregate metadata columns. Provided as dictionary of column names -> functions, function names or list of such, as accepted by the Pandas.DataFrame.agg method.

Returns:

One interval per group, with the min(Start) and max(End) of the group

Return type:

PyRanges

Examples

>>> d = {"Chromosome": [1, 1, 1], "Start": [1, 60, 110], "End": [40, 68, 130], "transcript_id": ["tr1", "tr1", "tr2"], "meta": ["a", "b", "c"]}
>>> gr = pr.from_dict(d)
>>> gr.length=gr.lengths()
>>> gr
+--------------+-----------+-----------+-----------------+------------+-----------+
|   Chromosome |     Start |       End | transcript_id   | meta       |    length |
|   (category) |   (int64) |   (int64) | (object)        | (object)   |   (int64) |
|--------------+-----------+-----------+-----------------+------------+-----------|
|            1 |         1 |        40 | tr1             | a          |        39 |
|            1 |        60 |        68 | tr1             | b          |         8 |
|            1 |       110 |       130 | tr2             | c          |        20 |
+--------------+-----------+-----------+-----------------+------------+-----------+
Unstranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.boundaries("transcript_id")
+--------------+-----------+-----------+-----------------+
|   Chromosome |     Start |       End | transcript_id   |
|   (category) |   (int64) |   (int64) | (object)        |
|--------------+-----------+-----------+-----------------|
|            1 |         1 |        68 | tr1             |
|            1 |       110 |       130 | tr2             |
+--------------+-----------+-----------+-----------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.boundaries("transcript_id", agg={"length":"sum", "meta": ",".join})
+--------------+-----------+-----------+-----------------+------------+-----------+
|   Chromosome |     Start |       End | transcript_id   | meta       |    length |
|   (category) |   (int64) |   (int64) | (object)        | (object)   |   (int64) |
|--------------+-----------+-----------+-----------------+------------+-----------|
|            1 |         1 |        68 | tr1             | a,b        |        47 |
|            1 |       110 |       130 | tr2             | c          |        20 |
+--------------+-----------+-----------+-----------------+------------+-----------+
Unstranded PyRanges object has 2 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
calculate_frame(by)

Calculate the frame of each genomic interval, assuming all are coding sequences (CDS), and add it as column inplace.

After this, the input Pyranges will contain an added “Frame” column, which determines the base of the CDS that is the first base of a codon. Resulting values are in range between 0 and 2 included. 0 indicates that the first base of the CDS is the first base of a codon, 1 indicates the second base and 2 indicates the third base of the CDS. While the 5’-most interval of each transcript has always 0 frame, the following ones may have any of these values.

Parameters:

by (str or list of str) – Column(s) to group by the intervals: coding exons belonging to the same transcript have the same values in this/these column(s).

Returns:

The “Frame” column is added inplace.

Return type:

None

Examples

>>> p= pr.from_dict({"Chromosome": [1,1,1,2,2],
...                  "Strand": ["+","+","+","-","-"],
...                  "Start": [1,31,52,101,201],
...                  "End": [10,45,90,130,218],
...                  "transcript_id": ["t1","t1","t1","t2","t2"] })
>>> p
+--------------+--------------+-----------+-----------+-----------------+
|   Chromosome | Strand       |     Start |       End | transcript_id   |
|   (category) | (category)   |   (int64) |   (int64) | (object)        |
|--------------+--------------+-----------+-----------+-----------------|
|            1 | +            |         1 |        10 | t1              |
|            1 | +            |        31 |        45 | t1              |
|            1 | +            |        52 |        90 | t1              |
|            2 | -            |       101 |       130 | t2              |
|            2 | -            |       201 |       218 | t2              |
+--------------+--------------+-----------+-----------+-----------------+
Stranded PyRanges object has 5 rows and 5 columns from 2 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> p.calculate_frame(by=['transcript_id'])
>>> p
+--------------+--------------+-----------+-----------+-----------------+-----------+
|   Chromosome | Strand       |     Start |       End | transcript_id   |     Frame |
|   (category) | (category)   |   (int64) |   (int64) | (object)        |   (int64) |
|--------------+--------------+-----------+-----------+-----------------+-----------|
|            1 | +            |         1 |        10 | t1              |         0 |
|            1 | +            |        31 |        45 | t1              |         9 |
|            1 | +            |        52 |        90 | t1              |        23 |
|            2 | -            |       101 |       130 | t2              |        17 |
|            2 | -            |       201 |       218 | t2              |         0 |
+--------------+--------------+-----------+-----------+-----------------+-----------+
Stranded PyRanges object has 5 rows and 6 columns from 2 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
cluster(strand=None, by=None, slack=0, count=False, nb_cpu=1)

Give overlapping intervals a common id.

Parameters:
  • strand (bool, default None, i.e. auto) – Whether to ignore strand information if PyRanges is stranded.

  • by (str or list, default None) – Only intervals with an equal value in column(s) by are clustered.

  • slack (int, default 0) – Consider intervals separated by less than slack to be in the same cluster. If slack is negative, intervals overlapping less than slack are not considered to be in the same cluster.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

Returns:

PyRanges with an ID-column “Cluster” added.

Return type:

PyRanges

Warning

Bookended intervals (i.e. the End of a PyRanges interval is the Start of another one) are by default considered to overlap. Avoid this with slack=-1.

See also

PyRanges.merge

combine overlapping intervals into one

Examples

>>> gr = pr.from_dict({"Chromosome": [1, 1, 1, 1], "Start": [1, 2, 3, 9],
...                    "End": [3, 3, 10, 12], "Gene": [1, 2, 3, 3]})
>>> gr
+--------------+-----------+-----------+-----------+
|   Chromosome |     Start |       End |      Gene |
|   (category) |   (int64) |   (int64) |   (int64) |
|--------------+-----------+-----------+-----------|
|            1 |         1 |         3 |         1 |
|            1 |         2 |         3 |         2 |
|            1 |         3 |        10 |         3 |
|            1 |         9 |        12 |         3 |
+--------------+-----------+-----------+-----------+
Unstranded PyRanges object has 4 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.cluster()
+--------------+-----------+-----------+-----------+-----------+
|   Chromosome |     Start |       End |      Gene |   Cluster |
|   (category) |   (int64) |   (int64) |   (int64) |   (int64) |
|--------------+-----------+-----------+-----------+-----------|
|            1 |         1 |         3 |         1 |         1 |
|            1 |         2 |         3 |         2 |         1 |
|            1 |         3 |        10 |         3 |         1 |
|            1 |         9 |        12 |         3 |         1 |
+--------------+-----------+-----------+-----------+-----------+
Unstranded PyRanges object has 4 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.cluster(by="Gene", count=True)
+--------------+-----------+-----------+-----------+-----------+-----------+
|   Chromosome |     Start |       End |      Gene |   Cluster |     Count |
|   (category) |   (int64) |   (int64) |   (int64) |   (int64) |   (int64) |
|--------------+-----------+-----------+-----------+-----------+-----------|
|            1 |         1 |         3 |         1 |         1 |         1 |
|            1 |         2 |         3 |         2 |         2 |         1 |
|            1 |         3 |        10 |         3 |         3 |         2 |
|            1 |         9 |        12 |         3 |         3 |         2 |
+--------------+-----------+-----------+-----------+-----------+-----------+
Unstranded PyRanges object has 4 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.

Avoid clustering bookended intervals with slack=-1:

>>> gr.cluster(slack=-1)
+--------------+-----------+-----------+-----------+-----------+
|   Chromosome |     Start |       End |      Gene |   Cluster |
|   (category) |   (int64) |   (int64) |   (int64) |   (int64) |
|--------------+-----------+-----------+-----------+-----------|
|            1 |         1 |         3 |         1 |         1 |
|            1 |         2 |         3 |         2 |         1 |
|            1 |         3 |        10 |         3 |         2 |
|            1 |         9 |        12 |         3 |         2 |
+--------------+-----------+-----------+-----------+-----------+
Unstranded PyRanges object has 4 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2 = pr.data.ensembl_gtf()[["Feature", "Source"]]
>>> gr2.cluster(by=["Feature", "Source"])
+--------------+--------------+---------------+-----------+-----------+--------------+-----------+
| Chromosome   | Feature      | Source        | Start     | End       | Strand       | Cluster   |
| (category)   | (category)   | (object)      | (int64)   | (int64)   | (category)   | (int64)   |
|--------------+--------------+---------------+-----------+-----------+--------------+-----------|
| 1            | CDS          | ensembl       | 69090     | 70005     | +            | 1         |
| 1            | CDS          | ensembl       | 925941    | 926013    | +            | 2         |
| 1            | CDS          | ensembl       | 925941    | 926013    | +            | 2         |
| 1            | CDS          | ensembl       | 925941    | 926013    | +            | 2         |
| ...          | ...          | ...           | ...       | ...       | ...          | ...       |
| 1            | transcript   | havana_tagene | 167128    | 169240    | -            | 1142      |
| 1            | transcript   | mirbase       | 17368     | 17436     | -            | 1143      |
| 1            | transcript   | mirbase       | 187890    | 187958    | -            | 1144      |
| 1            | transcript   | mirbase       | 632324    | 632413    | -            | 1145      |
+--------------+--------------+---------------+-----------+-----------+--------------+-----------+
Stranded PyRanges object has 2,446 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
copy()

Make a deep copy of the PyRanges.

Notes

See the pandas docs for deep-copying caveats.

count_overlaps(other, strandedness=None, keep_nonoverlapping=True, overlap_col='NumberOverlaps')

Count number of overlaps per interval.

Count how many intervals in self overlap with those in other.

Parameters:
  • strandedness ({"same", "opposite", None, False}, default None, i.e. auto) – Whether to perform the operation on the same, opposite or no strand. Use False to ignore the strand. None means use “same” if both PyRanges are stranded, otherwise ignore.

  • keep_nonoverlapping (bool, default True) – Keep intervals without overlaps.

  • overlap_col (str, default "NumberOverlaps") – Name of column with overlap counts.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

Returns:

PyRanges with a column of overlaps added.

Return type:

PyRanges

See also

PyRanges.coverage

find coverage of PyRanges

pyranges.count_overlaps

count overlaps from multiple PyRanges

Examples

>>> f1 = pr.data.f1().drop()
>>> f1
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         3 |         6 | +            |
| chr1         |         8 |         9 | +            |
| chr1         |         5 |         7 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> f2 = pr.data.f2().drop()
>>> f2
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         1 |         2 | +            |
| chr1         |         6 |         7 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> f1.count_overlaps(f2, overlap_col="Count")
+--------------+-----------+-----------+--------------+-----------+
| Chromosome   |     Start |       End | Strand       |     Count |
| (category)   |   (int64) |   (int64) | (category)   |   (int64) |
|--------------+-----------+-----------+--------------+-----------|
| chr1         |         3 |         6 | +            |         0 |
| chr1         |         8 |         9 | +            |         0 |
| chr1         |         5 |         7 | -            |         1 |
+--------------+-----------+-----------+--------------+-----------+
Stranded PyRanges object has 3 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
coverage(other, strandedness=None, keep_nonoverlapping=True, overlap_col='NumberOverlaps', fraction_col='FractionOverlaps', nb_cpu=1)

Count number of overlaps and their fraction per interval.

Count how many intervals in self overlap with those in other.

Parameters:
  • strandedness ({"same", "opposite", None, False}, default None, i.e. auto) – Whether to perform the operation on the same, opposite or no strand. Use False to ignore the strand. None means use “same” if both PyRanges are stranded, otherwise ignore.

  • keep_nonoverlapping (bool, default True) – Keep intervals without overlaps.

  • overlap_col (str, default "NumberOverlaps") – Name of column with overlap counts.

  • fraction_col (str, default "FractionOverlaps") – Name of column with fraction of counts.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

Returns:

PyRanges with a column of overlaps added.

Return type:

PyRanges

See also

pyranges.count_overlaps

count overlaps from multiple PyRanges

Examples

>>> f1 = pr.from_dict({"Chromosome": [1, 1, 1], "Start": [3, 8, 5],
...                    "End": [6,  9, 7]})
>>> f1
+--------------+-----------+-----------+
|   Chromosome |     Start |       End |
|   (category) |   (int64) |   (int64) |
|--------------+-----------+-----------|
|            1 |         3 |         6 |
|            1 |         8 |         9 |
|            1 |         5 |         7 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> f2 = pr.from_dict({"Chromosome": [1, 1], "Start": [1, 6],
...                    "End": [2, 7]})
>>> f2
+--------------+-----------+-----------+
|   Chromosome |     Start |       End |
|   (category) |   (int64) |   (int64) |
|--------------+-----------+-----------|
|            1 |         1 |         2 |
|            1 |         6 |         7 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> f1.coverage(f2, overlap_col="C", fraction_col="F")
+--------------+-----------+-----------+-----------+-------------+
|   Chromosome |     Start |       End |         C |           F |
|   (category) |   (int64) |   (int64) |   (int64) |   (float64) |
|--------------+-----------+-----------+-----------+-------------|
|            1 |         3 |         6 |         0 |         0   |
|            1 |         8 |         9 |         0 |         0   |
|            1 |         5 |         7 |         1 |         0.5 |
+--------------+-----------+-----------+-----------+-------------+
Unstranded PyRanges object has 3 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
drop(drop=None, like=None)

Drop column(s).

If no arguments are given, all the columns except Chromosome, Start, End and Strand are dropped.

Parameters:
  • drop (str or list, default None) – Columns to drop.

  • like (str, default None) – Regex-string matching columns to drop. Matches with Chromosome, Start, End or Strand are ignored.

See also

PyRanges.unstrand

drop strand information

Examples

>>> gr = pr.from_dict({"Chromosome": [1, 1], "Start": [1, 4], "End": [5, 6],
...                    "Strand": ["+", "-"], "Count": [1, 2],
...                    "Type": ["exon", "exon"]})
>>> gr
+--------------+-----------+-----------+--------------+-----------+------------+
|   Chromosome |     Start |       End | Strand       |     Count | Type       |
|   (category) |   (int64) |   (int64) | (category)   |   (int64) | (object)   |
|--------------+-----------+-----------+--------------+-----------+------------|
|            1 |         1 |         5 | +            |         1 | exon       |
|            1 |         4 |         6 | -            |         2 | exon       |
+--------------+-----------+-----------+--------------+-----------+------------+
Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.drop()
+--------------+-----------+-----------+--------------+
|   Chromosome |     Start |       End | Strand       |
|   (category) |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
|            1 |         1 |         5 | +            |
|            1 |         4 |         6 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

Matches with position-columns are ignored:

>>> gr.drop(like="Chromosome|Strand")
+--------------+-----------+-----------+--------------+-----------+------------+
|   Chromosome |     Start |       End | Strand       |     Count | Type       |
|   (category) |   (int64) |   (int64) | (category)   |   (int64) | (object)   |
|--------------+-----------+-----------+--------------+-----------+------------|
|            1 |         1 |         5 | +            |         1 | exon       |
|            1 |         4 |         6 | -            |         2 | exon       |
+--------------+-----------+-----------+--------------+-----------+------------+
Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.drop(like="e$")
+--------------+-----------+-----------+--------------+-----------+
|   Chromosome |     Start |       End | Strand       |     Count |
|   (category) |   (int64) |   (int64) | (category)   |   (int64) |
|--------------+-----------+-----------+--------------+-----------|
|            1 |         1 |         5 | +            |         1 |
|            1 |         4 |         6 | -            |         2 |
+--------------+-----------+-----------+--------------+-----------+
Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
drop_duplicate_positions(strand=None, keep='first')

Return PyRanges with duplicate postion rows removed.

Parameters:
  • strand (bool, default None, i.e. auto) – Whether to take strand-information into account when considering duplicates.

  • keep ({"first", "last", False}) – Whether to keep first, last or drop all duplicates.

Examples

>>> gr = pr.from_string('''Chromosome Start End Strand Name
... 1 1 2 + A
... 1 1 2 - B
... 1 1 2 + Z''')
>>> gr
+--------------+-----------+-----------+--------------+------------+
|   Chromosome |     Start |       End | Strand       | Name       |
|   (category) |   (int64) |   (int64) | (category)   | (object)   |
|--------------+-----------+-----------+--------------+------------|
|            1 |         1 |         2 | +            | A          |
|            1 |         1 |         2 | +            | Z          |
|            1 |         1 |         2 | -            | B          |
+--------------+-----------+-----------+--------------+------------+
Stranded PyRanges object has 3 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.drop_duplicate_positions()
+--------------+-----------+-----------+--------------+------------+
|   Chromosome |     Start |       End | Strand       | Name       |
|   (category) |   (int64) |   (int64) | (category)   | (object)   |
|--------------+-----------+-----------+--------------+------------|
|            1 |         1 |         2 | +            | A          |
|            1 |         1 |         2 | -            | B          |
+--------------+-----------+-----------+--------------+------------+
Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.drop_duplicate_positions(keep="last")
+--------------+-----------+-----------+--------------+------------+
|   Chromosome |     Start |       End | Strand       | Name       |
|   (category) |   (int64) |   (int64) | (category)   | (object)   |
|--------------+-----------+-----------+--------------+------------|
|            1 |         1 |         2 | +            | Z          |
|            1 |         1 |         2 | -            | B          |
+--------------+-----------+-----------+--------------+------------+
Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

Note that the reverse strand is considered to be behind the forward strand:

>>> gr.drop_duplicate_positions(keep="last", strand=False)
+--------------+-----------+-----------+--------------+------------+
|   Chromosome |     Start |       End | Strand       | Name       |
|   (category) |   (int64) |   (int64) | (category)   | (object)   |
|--------------+-----------+-----------+--------------+------------|
|            1 |         1 |         2 | -            | B          |
+--------------+-----------+-----------+--------------+------------+
Stranded PyRanges object has 1 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.drop_duplicate_positions(keep=False, strand=False)
Empty PyRanges
extend(ext, group_by=None)

Extend the intervals from the ends.

Parameters:
  • ext (int or dict of ints with "3" and/or "5" as keys.) – The number of nucleotides to extend the ends with. If an int is provided, the same extension is applied to both the start and end of intervals, while a dict input allows to control differently the two ends. Note also that 5’ and 3’ extensions take the strand into account, if the intervals are stranded.

  • group_by (str or list of str, default: None) – group intervals by these column name(s), so that the extension is applied only to the left-most and/or right-most interval.

See also

PyRanges.subsequence

obtain subsequences of intervals

PyRanges.spliced_subsequence

obtain subsequences of intervals, providing transcript-level coordinates

Examples

>>> d = {'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [3, 8, 5], 'End': [6, 9, 7],
...      'Strand': ['+', '+', '-']}
>>> gr = pr.from_dict(d)
>>> gr
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         3 |         6 | +            |
| chr1         |         8 |         9 | +            |
| chr1         |         5 |         7 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.extend(4)
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         0 |        10 | +            |
| chr1         |         4 |        13 | +            |
| chr1         |         1 |        11 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.extend({"3": 1})
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         3 |         7 | +            |
| chr1         |         8 |        10 | +            |
| chr1         |         4 |         7 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.extend({"3": 1, "5": 2})
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         1 |         7 | +            |
| chr1         |         6 |        10 | +            |
| chr1         |         4 |         9 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.extend(-1)
Traceback (most recent call last):
...
AssertionError: Some intervals are negative or zero length after applying extend!
five_end()

Return the five prime end of intervals.

The five prime end is the start of a forward strand or the end of a reverse strand.

Returns:

PyRanges with the five prime ends

Return type:

PyRanges

Notes

Requires the PyRanges to be stranded.

See also

PyRanges.three_end

return the 3’ end

Examples

>>> gr = pr.from_dict({'Chromosome': ['chr1', 'chr1'], 'Start': [3, 5], 'End': [9, 7],
...                    'Strand': ["+", "-"]})
>>> gr
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         3 |         9 | +            |
| chr1         |         5 |         7 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.five_end()
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         3 |         4 | +            |
| chr1         |         6 |         7 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
head(n=8)

Return the n first rows.

Parameters:

n (int, default 8) – Return n rows.

Returns:

PyRanges with the n first rows.

Return type:

PyRanges

See also

PyRanges.tail

return the last rows

PyRanges.sample

return random rows

Examples

>>> gr = pr.data.chipseq()
>>> gr
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   | Start     | End       | Name       | Score     | Strand       |
| (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         | 212609534 | 212609559 | U0         | 0         | +            |
| chr1         | 169887529 | 169887554 | U0         | 0         | +            |
| chr1         | 216711011 | 216711036 | U0         | 0         | +            |
| chr1         | 144227079 | 144227104 | U0         | 0         | +            |
| ...          | ...       | ...       | ...        | ...       | ...          |
| chrY         | 15224235  | 15224260  | U0         | 0         | -            |
| chrY         | 13517892  | 13517917  | U0         | 0         | -            |
| chrY         | 8010951   | 8010976   | U0         | 0         | -            |
| chrY         | 7405376   | 7405401   | U0         | 0         | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.head(3)
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         | 212609534 | 212609559 | U0         |         0 | +            |
| chr1         | 169887529 | 169887554 | U0         |         0 | +            |
| chr1         | 216711011 | 216711036 | U0         |         0 | +            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
insert(other, loc=None)

Add one or more columns to the PyRanges.

Parameters:
  • other (Series, DataFrame or dict) – Data to insert into the PyRanges. other must have the same number of rows as the PyRanges.

  • loc (int, default None, i.e. after last column of PyRanges.) – Insertion index.

Returns:

A copy of the PyRanges with the column(s) inserted starting at loc.

Return type:

PyRanges

Note

If a Series, or a dict of Series is used, the Series must have a name.

Examples

>>> gr = pr.from_dict({"Chromosome": ["L", "E", "E", "T"], "Start": [1, 1, 2, 3], "End": [5, 8, 13, 21]})
>>> gr
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| E            |         1 |         8 |
| E            |         2 |        13 |
| L            |         1 |         5 |
| T            |         3 |        21 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 4 rows and 3 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> s = pd.Series(data = [1, 3, 3, 7], name="Column")
>>> gr.insert(s)
+--------------+-----------+-----------+-----------+
| Chromosome   |     Start |       End |    Column |
| (category)   |   (int64) |   (int64) |   (int64) |
|--------------+-----------+-----------+-----------|
| E            |         1 |         8 |         1 |
| E            |         2 |        13 |         3 |
| L            |         1 |         5 |         3 |
| T            |         3 |        21 |         7 |
+--------------+-----------+-----------+-----------+
Unstranded PyRanges object has 4 rows and 4 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> df = pd.DataFrame({"NY": s, "AN": s})
>>> df
   NY  AN
0   1   1
1   3   3
2   3   3
3   7   7

Note that the original PyRanges was not affected by previously inserting Column:

>>> gr.insert(df, 1)
+--------------+-----------+-----------+-----------+-----------+
| Chromosome   |        NY |        AN |     Start |       End |
| (category)   |   (int64) |   (int64) |   (int64) |   (int64) |
|--------------+-----------+-----------+-----------+-----------|
| E            |         1 |         1 |         1 |         8 |
| E            |         3 |         3 |         2 |        13 |
| L            |         3 |         3 |         1 |         5 |
| T            |         7 |         7 |         3 |        21 |
+--------------+-----------+-----------+-----------+-----------+
Unstranded PyRanges object has 4 rows and 5 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> arbitrary_result = gr.apply(
... lambda df: pd.Series(df.Start + df.End, name="Hi!"), as_pyranges=False)
>>> arbitrary_result
{'E': 1     9
2    15
Name: Hi!, dtype: int64, 'L': 0    6
Name: Hi!, dtype: int64, 'T': 3    24
Name: Hi!, dtype: int64}
>>> gr.insert(arbitrary_result)
+--------------+-----------+-----------+-----------+
| Chromosome   |     Start |       End |       Hi! |
| (category)   |   (int64) |   (int64) |   (int64) |
|--------------+-----------+-----------+-----------|
| E            |         1 |         8 |         9 |
| E            |         2 |        13 |        15 |
| L            |         1 |         5 |         6 |
| T            |         3 |        21 |        24 |
+--------------+-----------+-----------+-----------+
Unstranded PyRanges object has 4 rows and 4 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
intersect(other, strandedness=None, how=None, invert=False, nb_cpu=1)

Return overlapping subintervals.

Returns the segments of the intervals in self which overlap with those in other.

Parameters:
  • other (PyRanges) – PyRanges to intersect.

  • strandedness ({None, "same", "opposite", False}, default None, i.e. auto) – Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use “same” if both PyRanges are strande, otherwise ignore the strand information.

  • how ({None, "first", "last", "containment"}, default None, i.e. all) – What intervals to report. By default reports all overlapping intervals. “containment” reports intervals where the overlapping is contained within it.

  • invert (bool, default False) – Whether to return the intervals without overlaps.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

Returns:

A PyRanges with overlapping subintervals.

Return type:

PyRanges

See also

PyRanges.set_intersect

set-intersect PyRanges

PyRanges.overlap

report overlapping intervals

Examples

>>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10],
...                    "End": [3, 9, 11], "ID": ["a", "b", "c"]})
>>> gr
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         1 |         3 | a          |
| chr1         |         4 |         9 | b          |
| chr1         |        10 |        11 | c          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]})
>>> gr2
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         2 |         3 |
| chr1         |         2 |         9 |
| chr1         |         9 |        10 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.intersect(gr2)
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         2 |         3 | a          |
| chr1         |         2 |         3 | a          |
| chr1         |         4 |         9 | b          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.intersect(gr2, how="first")
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         2 |         3 | a          |
| chr1         |         4 |         9 | b          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.intersect(gr2, how="containment")
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         4 |         9 | b          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 1 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
items()

Return the pairs of keys and DataFrames.

Returns:

The dict mapping keys to DataFrames in the PyRanges.

Return type:

dict

See also

PyRanges.chromosomes

return the chromosomes

PyRanges.keys

return the keys

PyRanges.values

return the DataFrames in the PyRanges

Examples

>>> gr = pr.data.f1()
>>> gr.items()
[(('chr1', '+'),   Chromosome  Start  End       Name  Score Strand
0       chr1      3    6  interval1      0      +
2       chr1      8    9  interval3      0      +), (('chr1', '-'),   Chromosome  Start  End       Name  Score Strand
1       chr1      5    7  interval2      0      -)]
join(other, strandedness=None, how=None, report_overlap=False, slack=0, suffix='_b', nb_cpu=1, apply_strand_suffix=None, preserve_order=False)

Join PyRanges on genomic location.

Parameters:
  • other (PyRanges) – PyRanges to join.

  • strandedness ({None, "same", "opposite", False}, default None, i.e. auto) – Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use “same” if both PyRanges are strande, otherwise ignore the strand information.

  • how ({None, "left", "right"}, default None, i.e. "inner") – How to handle intervals without overlap. None means only keep overlapping intervals. “left” keeps all intervals in self, “right” keeps all intervals in other.

  • report_overlap (bool, default False) – Report amount of overlap in base pairs.

  • slack (int, default 0) – Lengthen intervals in self before joining.

  • suffix (str or tuple, default "_b") – Suffix to give overlapping columns in other.

  • apply_strand_suffix (bool, default None) – If first pyranges is unstranded, but the second is not, the first will be given a strand column. apply_strand_suffix makes the added strand column a regular data column instead by adding a suffix.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

  • preserve_order (bool, default False) – If True, preserves the order after performing the join (only relevant in “outer”, “left” and “right” joins).

Returns:

A PyRanges appended with columns of another.

Return type:

PyRanges

Notes

The chromosome from other will never be reported as it is always the same as in self.

As pandas did not have NaN for non-float datatypes until recently, “left” and “right” join give non-overlapping rows the value -1 to avoid promoting columns to object. This will change to NaN in a future version as general NaN becomes stable in pandas.

See also

PyRanges.new_position

give joined PyRanges new coordinates

Examples

>>> f1 = pr.from_dict({'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [3, 8, 5],
...                    'End': [6, 9, 7], 'Name': ['interval1', 'interval3', 'interval2']})
>>> f1
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | Name       |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         3 |         6 | interval1  |
| chr1         |         8 |         9 | interval3  |
| chr1         |         5 |         7 | interval2  |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> f2 = pr.from_dict({'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6],
...                    'End': [2, 7], 'Name': ['a', 'b']})
>>> f2
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | Name       |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         1 |         2 | a          |
| chr1         |         6 |         7 | b          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> f1.join(f2)
+--------------+-----------+-----------+------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | Name       |   Start_b |     End_b | Name_b     |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------+-----------+-----------+------------|
| chr1         |         5 |         7 | interval2  |         6 |         7 | b          |
+--------------+-----------+-----------+------------+-----------+-----------+------------+
Unstranded PyRanges object has 1 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> f1.join(f2, how="right")
+--------------+-----------+-----------+------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | Name       |   Start_b |     End_b | Name_b     |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------+-----------+-----------+------------|
| chr1         |         5 |         7 | interval2  |         6 |         7 | b          |
| chr1         |        -1 |        -1 | -1         |         1 |         2 | a          |
+--------------+-----------+-----------+------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.

With slack 1, bookended features are joined (see row 1):

>>> f1.join(f2, slack=1)
+--------------+-----------+-----------+------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | Name       |   Start_b |     End_b | Name_b     |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------+-----------+-----------+------------|
| chr1         |         3 |         6 | interval1  |         6 |         7 | b          |
| chr1         |         5 |         7 | interval2  |         6 |         7 | b          |
+--------------+-----------+-----------+------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> f1.join(f2, how="right", preserve_order=True)
+--------------+-----------+-----------+------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | Name       |   Start_b |     End_b | Name_b     |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------+-----------+-----------+------------|
| chr1         |        -1 |        -1 | -1         |         1 |         2 | a          |
| chr1         |         5 |         7 | interval2  |         6 |         7 | b          |
+--------------+-----------+-----------+------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
keys()

Return the keys.

Returns:

  • Returns the keys (chromosomes or chromosome/strand pairs) as strings or tuples of strings

  • in natsorted order.

See also

PyRanges.chromosomes

return the chromosomes

Examples

>>> gr = pr.data.chipseq()
>>> gr.keys()
[('chr1', '+'), ('chr1', '-'), ('chr2', '+'), ('chr2', '-'), ('chr3', '+'), ('chr3', '-'), ('chr4', '+'), ('chr4', '-'), ('chr5', '+'), ('chr5', '-'), ('chr6', '+'), ('chr6', '-'), ('chr7', '+'), ('chr7', '-'), ('chr8', '+'), ('chr8', '-'), ('chr9', '+'), ('chr9', '-'), ('chr10', '+'), ('chr10', '-'), ('chr11', '+'), ('chr11', '-'), ('chr12', '+'), ('chr12', '-'), ('chr13', '+'), ('chr13', '-'), ('chr14', '+'), ('chr14', '-'), ('chr15', '+'), ('chr15', '-'), ('chr16', '+'), ('chr16', '-'), ('chr17', '+'), ('chr17', '-'), ('chr18', '+'), ('chr18', '-'), ('chr19', '+'), ('chr19', '-'), ('chr20', '+'), ('chr20', '-'), ('chr21', '+'), ('chr21', '-'), ('chr22', '+'), ('chr22', '-'), ('chrX', '+'), ('chrX', '-'), ('chrY', '+'), ('chrY', '-')]
>>> gr.unstrand().keys()
['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY']
k_nearest(other, k=1, ties=None, strandedness=None, overlap=True, how=None, suffix='_b', nb_cpu=1, apply_strand_suffix=None)

Find k nearest intervals.

Parameters:
  • other (PyRanges) – PyRanges to find nearest interval in.

  • k (int or list/array/Series of int) – Number of closest to return. If iterable, must be same length as PyRanges.

  • ties ({None, "first", "last", "different"}, default None) – How to resolve ties, i.e. closest intervals with equal distance. None means that the k nearest intervals are kept. “first” means that the first tie is kept, “last” meanst that the last is kept. “different” means that all nearest intervals with the k unique nearest distances are kept.

  • strandedness ({None, "same", "opposite", False}, default None, i.e. auto) – Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use “same” if both PyRanges are stranded, otherwise ignore the strand information.

  • overlap (bool, default True) – Whether to include overlaps.

  • how ({None, "upstream", "downstream"}, default None, i.e. both directions) – Whether to only look for nearest in one direction. Always with respect to the PyRanges it is called on.

  • suffix (str, default "_b") – Suffix to give columns with shared name in other.

  • apply_strand_suffix (bool, default None) – If first pyranges is unstranded, but the second is not, the first will be given a strand column. apply_strand_suffix makes the added strand column a regular data column instead by adding a suffix.

nb_cpu: int, default 1

How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

Returns:

A PyRanges with columns of nearest interval horizontally appended.

Return type:

PyRanges

Notes

nearest also exists, and is more performant.

See also

PyRanges.new_position

give joined PyRanges new coordinates

PyRanges.nearest

find nearest intervals

Examples

>>> f1 = pr.from_dict({'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [3, 8, 5],
...                    'End': [6, 9, 7], 'Strand': ['+', '+', '-']})
>>> f1
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         3 |         6 | +            |
| chr1         |         8 |         9 | +            |
| chr1         |         5 |         7 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> f2 = pr.from_dict({'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6],
...                    'End': [2, 7], 'Strand': ['+', '-']})
>>> f2
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         1 |         2 | +            |
| chr1         |         6 |         7 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> f1.k_nearest(f2, k=2)
+--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+
| Chromosome   |     Start |       End | Strand       |   Start_b |     End_b | Strand_b     |   Distance |
| (category)   |   (int64) |   (int64) | (category)   |   (int64) |   (int64) | (category)   |    (int64) |
|--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------|
| chr1         |         3 |         6 | +            |         6 |         7 | -            |          1 |
| chr1         |         3 |         6 | +            |         1 |         2 | +            |         -2 |
| chr1         |         8 |         9 | +            |         6 |         7 | -            |         -2 |
| chr1         |         8 |         9 | +            |         1 |         2 | +            |         -7 |
| chr1         |         5 |         7 | -            |         6 |         7 | -            |          0 |
| chr1         |         5 |         7 | -            |         1 |         2 | +            |          4 |
+--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+
Stranded PyRanges object has 6 rows and 8 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> f1.k_nearest(f2, how="upstream", k=2)
+--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+
| Chromosome   |     Start |       End | Strand       |   Start_b |     End_b | Strand_b     |   Distance |
| (category)   |   (int64) |   (int64) | (category)   |   (int64) |   (int64) | (category)   |    (int64) |
|--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------|
| chr1         |         3 |         6 | +            |         1 |         2 | +            |         -2 |
| chr1         |         8 |         9 | +            |         6 |         7 | -            |         -2 |
| chr1         |         8 |         9 | +            |         1 |         2 | +            |         -7 |
| chr1         |         5 |         7 | -            |         6 |         7 | -            |          0 |
+--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+
Stranded PyRanges object has 4 rows and 8 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> f1.k_nearest(f2, k=[1, 2, 1])
+--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+
| Chromosome   |     Start |       End | Strand       |   Start_b |     End_b | Strand_b     |   Distance |
| (category)   |   (int64) |   (int64) | (category)   |   (int64) |   (int64) | (category)   |    (int64) |
|--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------|
| chr1         |         3 |         6 | +            |         6 |         7 | -            |          1 |
| chr1         |         8 |         9 | +            |         6 |         7 | -            |         -2 |
| chr1         |         8 |         9 | +            |         1 |         2 | +            |         -7 |
| chr1         |         5 |         7 | -            |         6 |         7 | -            |          0 |
+--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+
Stranded PyRanges object has 4 rows and 8 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> d1 = {"Chromosome": [1], "Start": [5], "End": [6]}
>>> d2 = {"Chromosome": 1, "Start": [1] * 2 + [5] * 2 + [9] * 2,
...       "End": [3] * 2 + [7] * 2 + [11] * 2, "ID": range(6)}
>>> gr, gr2 = pr.from_dict(d1), pr.from_dict(d2)
>>> gr
+--------------+-----------+-----------+
|   Chromosome |     Start |       End |
|   (category) |   (int64) |   (int64) |
|--------------+-----------+-----------|
|            1 |         5 |         6 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 1 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2
+--------------+-----------+-----------+-----------+
|   Chromosome |     Start |       End |        ID |
|   (category) |   (int64) |   (int64) |   (int64) |
|--------------+-----------+-----------+-----------|
|            1 |         1 |         3 |         0 |
|            1 |         1 |         3 |         1 |
|            1 |         5 |         7 |         2 |
|            1 |         5 |         7 |         3 |
|            1 |         9 |        11 |         4 |
|            1 |         9 |        11 |         5 |
+--------------+-----------+-----------+-----------+
Unstranded PyRanges object has 6 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.k_nearest(gr2, k=2)
+--------------+-----------+-----------+-----------+-----------+-----------+------------+
|   Chromosome |     Start |       End |   Start_b |     End_b |        ID |   Distance |
|   (category) |   (int64) |   (int64) |   (int64) |   (int64) |   (int64) |    (int64) |
|--------------+-----------+-----------+-----------+-----------+-----------+------------|
|            1 |         5 |         6 |         5 |         7 |         2 |          0 |
|            1 |         5 |         6 |         5 |         7 |         3 |          0 |
+--------------+-----------+-----------+-----------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.k_nearest(gr2, k=2, ties="different")
+--------------+-----------+-----------+-----------+-----------+-----------+------------+
|   Chromosome |     Start |       End |   Start_b |     End_b |        ID |   Distance |
|   (category) |   (int64) |   (int64) |   (int64) |   (int64) |   (int64) |    (int64) |
|--------------+-----------+-----------+-----------+-----------+-----------+------------|
|            1 |         5 |         6 |         5 |         7 |         2 |          0 |
|            1 |         5 |         6 |         5 |         7 |         3 |          0 |
|            1 |         5 |         6 |         1 |         3 |         1 |         -3 |
|            1 |         5 |         6 |         1 |         3 |         0 |         -3 |
+--------------+-----------+-----------+-----------+-----------+-----------+------------+
Unstranded PyRanges object has 4 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.k_nearest(gr2, k=3, ties="first")
+--------------+-----------+-----------+-----------+-----------+-----------+------------+
|   Chromosome |     Start |       End |   Start_b |     End_b |        ID |   Distance |
|   (category) |   (int64) |   (int64) |   (int64) |   (int64) |   (int64) |    (int64) |
|--------------+-----------+-----------+-----------+-----------+-----------+------------|
|            1 |         5 |         6 |         5 |         7 |         2 |          0 |
|            1 |         5 |         6 |         1 |         3 |         1 |         -3 |
|            1 |         5 |         6 |         9 |        11 |         4 |          4 |
+--------------+-----------+-----------+-----------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.k_nearest(gr2, k=1, overlap=False)
+--------------+-----------+-----------+-----------+-----------+-----------+------------+
|   Chromosome |     Start |       End |   Start_b |     End_b |        ID |   Distance |
|   (category) |   (int64) |   (int64) |   (int64) |   (int64) |   (int64) |    (int64) |
|--------------+-----------+-----------+-----------+-----------+-----------+------------|
|            1 |         5 |         6 |         1 |         3 |         1 |         -3 |
+--------------+-----------+-----------+-----------+-----------+-----------+------------+
Unstranded PyRanges object has 1 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
lengths(as_dict=False)

Return the length of each interval.

Parameters:

as_dict (bool, default False) – Whether to return lengths as Series or dict of Series per key.

Return type:

Series or dict of Series with the lengths of each interval.

See also

PyRanges.lengths

return the intervals lengths

Examples

>>> gr = pr.data.f1()
>>> gr
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |         3 |         6 | interval1  |         0 | +            |
| chr1         |         8 |         9 | interval3  |         0 | +            |
| chr1         |         5 |         7 | interval2  |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.lengths()
0    3
1    1
2    2
dtype: int64

To find the length of the genome covered by the intervals, use merge first:

>>> gr.Length = gr.lengths()
>>> gr
+--------------+-----------+-----------+------------+-----------+--------------+-----------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |    Length |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |   (int64) |
|--------------+-----------+-----------+------------+-----------+--------------+-----------|
| chr1         |         3 |         6 | interval1  |         0 | +            |         3 |
| chr1         |         8 |         9 | interval3  |         0 | +            |         1 |
| chr1         |         5 |         7 | interval2  |         0 | -            |         2 |
+--------------+-----------+-----------+------------+-----------+--------------+-----------+
Stranded PyRanges object has 3 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
max_disjoint(strand=None, slack=0, **kwargs)

Find the maximal disjoint set of intervals.

Parameters:
  • strand (bool, default None, i.e. auto) – Find the max disjoint set separately for each strand.

  • slack (int, default 0) – Consider intervals within a distance of slack to be overlapping.

Returns:

PyRanges with maximal disjoint set of intervals.

Return type:

PyRanges

Examples

>>> gr = pr.data.f1()
>>> gr
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |         3 |         6 | interval1  |         0 | +            |
| chr1         |         8 |         9 | interval3  |         0 | +            |
| chr1         |         5 |         7 | interval2  |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.max_disjoint(strand=False)
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |         3 |         6 | interval1  |         0 | +            |
| chr1         |         8 |         9 | interval3  |         0 | +            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
merge(strand=None, count=False, count_col='Count', by=None, slack=0)

Merge overlapping intervals into one.

Parameters:
  • strand (bool, default None, i.e. auto) – Only merge intervals on same strand.

  • count (bool, default False) – Count intervals in each superinterval.

  • count_col (str, default "Count") – Name of column with counts.

  • by (str or list of str, default None) – Only merge intervals with equal values in these columns.

  • slack (int, default 0) – Allow this many nucleotides between each interval to merge.

Returns:

PyRanges with superintervals.

Return type:

PyRanges

Notes

To avoid losing metadata, use cluster instead. If you want to perform a reduction function on the metadata, use pandas groupby.

See also

PyRanges.cluster

annotate overlapping intervals with common ID

Examples

>>> gr = pr.data.ensembl_gtf()[["Feature", "gene_name"]]
>>> gr
+--------------+--------------+-----------+-----------+--------------+-------------+
| Chromosome   | Feature      | Start     | End       | Strand       | gene_name   |
| (category)   | (category)   | (int64)   | (int64)   | (category)   | (object)    |
|--------------+--------------+-----------+-----------+--------------+-------------|
| 1            | gene         | 11868     | 14409     | +            | DDX11L1     |
| 1            | transcript   | 11868     | 14409     | +            | DDX11L1     |
| 1            | exon         | 11868     | 12227     | +            | DDX11L1     |
| 1            | exon         | 12612     | 12721     | +            | DDX11L1     |
| ...          | ...          | ...       | ...       | ...          | ...         |
| 1            | gene         | 1173055   | 1179555   | -            | TTLL10-AS1  |
| 1            | transcript   | 1173055   | 1179555   | -            | TTLL10-AS1  |
| 1            | exon         | 1179364   | 1179555   | -            | TTLL10-AS1  |
| 1            | exon         | 1173055   | 1176396   | -            | TTLL10-AS1  |
+--------------+--------------+-----------+-----------+--------------+-------------+
Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.merge(count=True, count_col="Count")
+--------------+-----------+-----------+--------------+-----------+
| Chromosome   | Start     | End       | Strand       | Count     |
| (category)   | (int64)   | (int64)   | (category)   | (int64)   |
|--------------+-----------+-----------+--------------+-----------|
| 1            | 11868     | 14409     | +            | 12        |
| 1            | 29553     | 31109     | +            | 11        |
| 1            | 52472     | 53312     | +            | 3         |
| 1            | 57597     | 64116     | +            | 7         |
| ...          | ...       | ...       | ...          | ...       |
| 1            | 1062207   | 1063288   | -            | 4         |
| 1            | 1070966   | 1074306   | -            | 10        |
| 1            | 1081817   | 1116361   | -            | 319       |
| 1            | 1173055   | 1179555   | -            | 4         |
+--------------+-----------+-----------+--------------+-----------+
Stranded PyRanges object has 62 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.merge(by="Feature", count=True)
+--------------+-----------+-----------+--------------+--------------+-----------+
| Chromosome   | Start     | End       | Strand       | Feature      | Count     |
| (category)   | (int64)   | (int64)   | (category)   | (category)   | (int64)   |
|--------------+-----------+-----------+--------------+--------------+-----------|
| 1            | 65564     | 65573     | +            | CDS          | 1         |
| 1            | 69036     | 70005     | +            | CDS          | 2         |
| 1            | 924431    | 924948    | +            | CDS          | 1         |
| 1            | 925921    | 926013    | +            | CDS          | 11        |
| ...          | ...       | ...       | ...          | ...          | ...       |
| 1            | 1062207   | 1063288   | -            | transcript   | 1         |
| 1            | 1070966   | 1074306   | -            | transcript   | 1         |
| 1            | 1081817   | 1116361   | -            | transcript   | 19        |
| 1            | 1173055   | 1179555   | -            | transcript   | 1         |
+--------------+-----------+-----------+--------------+--------------+-----------+
Stranded PyRanges object has 748 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.merge(by=["Feature", "gene_name"], count=True)
+--------------+-----------+-----------+--------------+--------------+-------------+-----------+
| Chromosome   | Start     | End       | Strand       | Feature      | gene_name   | Count     |
| (category)   | (int64)   | (int64)   | (category)   | (category)   | (object)    | (int64)   |
|--------------+-----------+-----------+--------------+--------------+-------------+-----------|
| 1            | 1020172   | 1020373   | +            | CDS          | AGRN        | 1         |
| 1            | 1022200   | 1022462   | +            | CDS          | AGRN        | 2         |
| 1            | 1034555   | 1034703   | +            | CDS          | AGRN        | 2         |
| 1            | 1035276   | 1035324   | +            | CDS          | AGRN        | 4         |
| ...          | ...       | ...       | ...          | ...          | ...         | ...       |
| 1            | 347981    | 348366    | -            | transcript   | RPL23AP24   | 1         |
| 1            | 1173055   | 1179555   | -            | transcript   | TTLL10-AS1  | 1         |
| 1            | 14403     | 29570     | -            | transcript   | WASH7P      | 1         |
| 1            | 185216    | 195411    | -            | transcript   | WASH9P      | 1         |
+--------------+-----------+-----------+--------------+--------------+-------------+-----------+
Stranded PyRanges object has 807 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
mp(n=8, formatting=None)

Merge location and print.

See also

PyRanges.print

print PyRanges.

mpc(n=8, formatting=None)

Merge location, print and return self.

See also

PyRanges.print

print PyRanges.

msp(n=30, formatting=None)

Sort on location, merge location info and print.

See also

PyRanges.print

print PyRanges.

mspc(n=30, formatting=None)

Sort on location, merge location, print and return self.

See also

PyRanges.print

print PyRanges.

nearest(other, strandedness=None, overlap=True, how=None, suffix='_b', nb_cpu=1, apply_strand_suffix=None)

Find closest interval.

Parameters:
  • other (PyRanges) – PyRanges to find nearest interval in.

  • strandedness ({None, "same", "opposite", False}, default None, i.e. auto) – Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use “same” if both PyRanges are strande, otherwise ignore the strand information.

  • overlap (bool, default True) – Whether to include overlaps.

  • how ({None, "upstream", "downstream"}, default None, i.e. both directions) – Whether to only look for nearest in one direction. Always with respect to the PyRanges it is called on.

  • suffix (str, default "_b") – Suffix to give columns with shared name in other.

  • apply_strand_suffix (bool, default None) – If first pyranges is unstranded, but the second is not, the first will be given the strand column of the second. apply_strand_suffix makes the added strand column a regular data column instead by adding a suffix.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

Returns:

A PyRanges with columns representing nearest interval horizontally appended.

Return type:

PyRanges

Notes

A k_nearest also exists, but is less performant.

See also

PyRanges.new_position

give joined PyRanges new coordinates

PyRanges.k_nearest

find k nearest intervals

Examples

>>> f1 = pr.from_dict({'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [3, 8, 5],
...                    'End': [6, 9, 7], 'Strand': ['+', '+', '-']})
>>> f1
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         3 |         6 | +            |
| chr1         |         8 |         9 | +            |
| chr1         |         5 |         7 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> f2 = pr.from_dict({'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6],
...                    'End': [2, 7], 'Strand': ['+', '-']})
>>> f2
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         1 |         2 | +            |
| chr1         |         6 |         7 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> f1.nearest(f2)
+--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+
| Chromosome   |     Start |       End | Strand       |   Start_b |     End_b | Strand_b     |   Distance |
| (category)   |   (int64) |   (int64) | (category)   |   (int64) |   (int64) | (category)   |    (int64) |
|--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------|
| chr1         |         3 |         6 | +            |         6 |         7 | -            |          1 |
| chr1         |         8 |         9 | +            |         6 |         7 | -            |          2 |
| chr1         |         5 |         7 | -            |         6 |         7 | -            |          0 |
+--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+
Stranded PyRanges object has 3 rows and 8 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> f1.nearest(f2, how="upstream")
+--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+
| Chromosome   |     Start |       End | Strand       |   Start_b |     End_b | Strand_b     |   Distance |
| (category)   |   (int64) |   (int64) | (category)   |   (int64) |   (int64) | (category)   |    (int64) |
|--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------|
| chr1         |         3 |         6 | +            |         1 |         2 | +            |          2 |
| chr1         |         8 |         9 | +            |         6 |         7 | -            |          2 |
| chr1         |         5 |         7 | -            |         6 |         7 | -            |          0 |
+--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+
Stranded PyRanges object has 3 rows and 8 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
new_position(new_pos, columns=None)

Give new position.

The operation join produces a PyRanges with two pairs of start coordinates and two pairs of end coordinates. This operation uses these to give the PyRanges a new position.

Parameters:
  • new_pos ({"union", "intersection", "swap"}) – Change of coordinates.

  • columns (tuple of str, default None, i.e. auto) – The name of the coordinate columns. By default uses the two first columns containing “Start” and the two first columns containing “End”.

See also

PyRanges.join

combine two PyRanges horizontally with SQL-style joins.

Returns:

PyRanges with new coordinates.

Return type:

PyRanges

Examples

>>> gr = pr.from_dict({'Chromosome': ['chr1', 'chr1', 'chr1'],
...                    'Start': [3, 8, 5], 'End': [6, 9, 7]})
>>> gr
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         3 |         6 |
| chr1         |         8 |         9 |
| chr1         |         5 |         7 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2 = pr.from_dict({'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6],
...                     'End': [4, 7]})
>>> gr2
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         1 |         4 |
| chr1         |         6 |         7 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> j = gr.join(gr2)
>>> j
+--------------+-----------+-----------+-----------+-----------+
| Chromosome   |     Start |       End |   Start_b |     End_b |
| (category)   |   (int64) |   (int64) |   (int64) |   (int64) |
|--------------+-----------+-----------+-----------+-----------|
| chr1         |         3 |         6 |         1 |         4 |
| chr1         |         5 |         7 |         6 |         7 |
+--------------+-----------+-----------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> j.new_position("swap")
+--------------+-----------+-----------+-----------+-----------+
| Chromosome   |     Start |       End |   Start_b |     End_b |
| (category)   |   (int64) |   (int64) |   (int64) |   (int64) |
|--------------+-----------+-----------+-----------+-----------|
| chr1         |         1 |         4 |         3 |         6 |
| chr1         |         6 |         7 |         5 |         7 |
+--------------+-----------+-----------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> j.new_position("union").mp()
+--------------------+-----------+-----------+
| - Position -       |   Start_b |     End_b |
| (Multiple types)   |   (int64) |   (int64) |
|--------------------+-----------+-----------|
| chr1 1-6           |         1 |         4 |
| chr1 5-7           |         6 |         7 |
+--------------------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> j.new_position("intersection").mp()
+--------------------+-----------+-----------+
| - Position -       |   Start_b |     End_b |
| (Multiple types)   |   (int64) |   (int64) |
|--------------------+-----------+-----------|
| chr1 1-4           |         1 |         4 |
| chr1 6-7           |         6 |         7 |
+--------------------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> j2 = pr.from_dict({"Chromosome": [1], "Start": [3],
...                   "End": [4], "A": [1], "B": [3], "C": [2], "D": [5]})
>>> j2
+--------------+-----------+-----------+-----------+-----------+-----------+-----------+
|   Chromosome |     Start |       End |         A |         B |         C |         D |
|   (category) |   (int64) |   (int64) |   (int64) |   (int64) |   (int64) |   (int64) |
|--------------+-----------+-----------+-----------+-----------+-----------+-----------|
|            1 |         3 |         4 |         1 |         3 |         2 |         5 |
+--------------+-----------+-----------+-----------+-----------+-----------+-----------+
Unstranded PyRanges object has 1 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> j2.new_position("intersection", ("A", "B", "C", "D"))
+--------------+-----------+-----------+-----------+-----------+-----------+-----------+
|   Chromosome |     Start |       End |         A |         B |         C |         D |
|   (category) |   (int64) |   (int64) |   (int64) |   (int64) |   (int64) |   (int64) |
|--------------+-----------+-----------+-----------+-----------+-----------+-----------|
|            1 |         2 |         3 |         1 |         3 |         2 |         5 |
+--------------+-----------+-----------+-----------+-----------+-----------+-----------+
Unstranded PyRanges object has 1 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
overlap(other, strandedness=None, how='first', invert=False, nb_cpu=1)

Return overlapping intervals.

Returns the intervals in self which overlap with those in other.

Parameters:
  • other (PyRanges) – PyRanges to find overlaps with.

  • strandedness ({None, "same", "opposite", False}, default None, i.e. auto) – Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use “same” if both PyRanges are strande, otherwise ignore the strand information.

  • how ({"first", "containment", False, None}, default "first") – What intervals to report. By default, reports every interval in self with overlap once. “containment” reports all intervals where the overlapping is contained within it.

  • invert (bool, default False) – Whether to return the intervals without overlaps.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

Returns:

A PyRanges with overlapping intervals.

Return type:

PyRanges

See also

PyRanges.intersect

report overlapping subintervals

PyRanges.set_intersect

set-intersect PyRanges

Examples

>>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10],
...                    "End": [3, 9, 11], "ID": ["a", "b", "c"]})
>>> gr
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         1 |         3 | a          |
| chr1         |         4 |         9 | b          |
| chr1         |        10 |        11 | c          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]})
>>> gr2
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         2 |         3 |
| chr1         |         2 |         9 |
| chr1         |         9 |        10 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.overlap(gr2)
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         1 |         3 | a          |
| chr1         |         4 |         9 | b          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.overlap(gr2, how=None)
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         1 |         3 | a          |
| chr1         |         1 |         3 | a          |
| chr1         |         4 |         9 | b          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.overlap(gr2, how="containment")
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         4 |         9 | b          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 1 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.overlap(gr2, invert=True)
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |        10 |        11 | c          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 1 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
pc(n=8, formatting=None)

Print and return self.

See also

PyRanges.print

print PyRanges.

print(n=8, merge_position=False, sort=False, formatting=None, chain=False)

Print the PyRanges.

Parameters:
  • n (int, default 8) – The number of rows to print.

  • merge_postion (bool, default False) – Print location in same column to save screen space.

  • sort (bool or str, default False) – Sort the PyRanges before printing. Will print chromosomsomes or strands interleaved on sort columns.

  • formatting (dict, default None) – Formatting options per column.

  • chain (False) – Return the PyRanges. Useful to print intermediate results in call chains.

See also

PyRanges.pc

print chain

PyRanges.sp

sort print

PyRanges.mp

merge print

PyRanges.spc

sort print chain

PyRanges.mpc

merge print chain

PyRanges.msp

merge sort print

PyRanges.mspc

merge sort print chain

PyRanges.rp

raw print dictionary of DataFrames

Examples

>>> d = {'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [3, 8, 5000],
...      'End': [6, 9, 7000], 'Name': ['i1', 'i3', 'i2'],
...      'Score': [1.1, 2.3987, 5.9999995], 'Strand': ['+', '+', '-']}
>>> gr = pr.from_dict(d)
>>> gr
+--------------+-----------+-----------+------------+-------------+--------------+
| Chromosome   |     Start |       End | Name       |       Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (float64) | (category)   |
|--------------+-----------+-----------+------------+-------------+--------------|
| chr1         |         3 |         6 | i1         |      1.1    | +            |
| chr1         |         8 |         9 | i3         |      2.3987 | +            |
| chr1         |      5000 |      7000 | i2         |      6      | -            |
+--------------+-----------+-----------+------------+-------------+--------------+
Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.print(formatting={"Start": "{:,}", "Score": "{:.2f}"})
+--------------+-----------+-----------+------------+-------------+--------------+
| Chromosome   | Start     |       End | Name       |       Score | Strand       |
| (category)   | (int64)   |   (int64) | (object)   |   (float64) | (category)   |
|--------------+-----------+-----------+------------+-------------+--------------|
| chr1         | 3         |         6 | i1         |         1.1 | +            |
| chr1         | 8         |         9 | i3         |         2.4 | +            |
| chr1         | 5,000     |      7000 | i2         |         6   | -            |
+--------------+-----------+-----------+------------+-------------+--------------+
Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.print(merge_position=True) # gr.mp()
+--------------------+------------+-------------+
| - Position -       | Name       |       Score |
| (Multiple types)   | (object)   |   (float64) |
|--------------------+------------+-------------|
| chr1 3-6 +         | i1         |      1.1    |
| chr1 8-9 +         | i3         |      2.3987 |
| chr1 5000-7000 -   | i2         |      6      |
+--------------------+------------+-------------+
Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> chipseq = pr.data.chipseq()
>>> chipseq
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   | Start     | End       | Name       | Score     | Strand       |
| (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         | 212609534 | 212609559 | U0         | 0         | +            |
| chr1         | 169887529 | 169887554 | U0         | 0         | +            |
| chr1         | 216711011 | 216711036 | U0         | 0         | +            |
| chr1         | 144227079 | 144227104 | U0         | 0         | +            |
| ...          | ...       | ...       | ...        | ...       | ...          |
| chrY         | 15224235  | 15224260  | U0         | 0         | -            |
| chrY         | 13517892  | 13517917  | U0         | 0         | -            |
| chrY         | 8010951   | 8010976   | U0         | 0         | -            |
| chrY         | 7405376   | 7405401   | U0         | 0         | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

To interleave strands in output, use print with sort=True:

>>> chipseq.print(sort=True, n=20) # chipseq.sp()
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   | Start     | End       | Name       | Score     | Strand       |
| (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         | 1325303   | 1325328   | U0         | 0         | -            |
| chr1         | 1541598   | 1541623   | U0         | 0         | +            |
| chr1         | 1599121   | 1599146   | U0         | 0         | +            |
| chr1         | 1820285   | 1820310   | U0         | 0         | -            |
| chr1         | 2448322   | 2448347   | U0         | 0         | -            |
| chr1         | 3046141   | 3046166   | U0         | 0         | -            |
| chr1         | 3437168   | 3437193   | U0         | 0         | -            |
| chr1         | 3504032   | 3504057   | U0         | 0         | +            |
| chr1         | 3637087   | 3637112   | U0         | 0         | -            |
| chr1         | 3681903   | 3681928   | U0         | 0         | -            |
| ...          | ...       | ...       | ...        | ...       | ...          |
| chrY         | 15224235  | 15224260  | U0         | 0         | -            |
| chrY         | 15548022  | 15548047  | U0         | 0         | +            |
| chrY         | 16045242  | 16045267  | U0         | 0         | -            |
| chrY         | 16495497  | 16495522  | U0         | 0         | -            |
| chrY         | 21559181  | 21559206  | U0         | 0         | +            |
| chrY         | 21707662  | 21707687  | U0         | 0         | -            |
| chrY         | 21751211  | 21751236  | U0         | 0         | -            |
| chrY         | 21910706  | 21910731  | U0         | 0         | -            |
| chrY         | 22054002  | 22054027  | U0         | 0         | -            |
| chrY         | 22210637  | 22210662  | U0         | 0         | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes.
For printing, the PyRanges was sorted on Chromosome, Start, End and Strand.
>>> pr.data.chromsizes().print()
+--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int64)   | (int64)   |
|--------------+-----------+-----------|
| chr1         | 0         | 249250621 |
| chr2         | 0         | 243199373 |
| chr3         | 0         | 198022430 |
| chr4         | 0         | 191154276 |
| ...          | ...       | ...       |
| chr22        | 0         | 51304566  |
| chrM         | 0         | 16571     |
| chrX         | 0         | 155270560 |
| chrY         | 0         | 59373566  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 25 rows and 3 columns from 25 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
rp()

Print dict of DataFrames.

See also

PyRanges.print

print PyRanges.

rpc()

Print dict of DataFrames and return self.

See also

PyRanges.print

print PyRanges.

sample(n=8, replace=False)

Subsample arbitrary rows of PyRanges.

If n is larger than length of PyRanges, replace must be True.

Parameters:
  • n (int, default 8) – Number of rows to return

  • replace (bool, False) – Reuse rows.

Examples

>>> gr = pr.data.chipseq()
>>> np.random.seed(0)
>>> gr.sample(n=3)
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr2         |  76564764 |  76564789 | U0         |         0 | +            |
| chr3         | 185739979 | 185740004 | U0         |         0 | -            |
| chr20        |  40373657 |  40373682 | U0         |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 3 rows and 6 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.sample(10001)
Traceback (most recent call last):
...
ValueError: Cannot take a larger sample than population when 'replace=False'
set_intersect(other, strandedness=None, how=None, new_pos=False, nb_cpu=1)

Return set-theoretical intersection.

Like intersect, but both PyRanges are merged first.

Parameters:
  • other (PyRanges) – PyRanges to set-intersect.

  • strandedness ({None, "same", "opposite", False}, default None, i.e. auto) – Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use “same” if both PyRanges are strande, otherwise ignore the strand information.

  • how ({None, "first", "last", "containment"}, default None, i.e. all) – What intervals to report. By default, reports all overlapping intervals. “containment” reports intervals where the overlapping is contained within it.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

Returns:

A PyRanges with overlapping subintervals.

Return type:

PyRanges

See also

PyRanges.intersect

find overlapping subintervals

PyRanges.overlap

report overlapping intervals

Examples

>>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10],
...                    "End": [3, 9, 11], "ID": ["a", "b", "c"]})
>>> gr
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         1 |         3 | a          |
| chr1         |         4 |         9 | b          |
| chr1         |        10 |        11 | c          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]})
>>> gr2
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         2 |         3 |
| chr1         |         2 |         9 |
| chr1         |         9 |        10 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.set_intersect(gr2)
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         2 |         3 |
| chr1         |         4 |         9 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.

In this simple unstranded case, this is the same as the below:

>>> gr.merge().intersect(gr2.merge())
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         2 |         3 |
| chr1         |         4 |         9 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.set_intersect(gr2, how="containment")
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         4 |         9 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 1 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
set_union(other, strandedness=None, nb_cpu=1)

Return set-theoretical union.

Parameters:
  • other (PyRanges) – PyRanges to do union with.

  • strandedness ({None, "same", "opposite", False}, default None, i.e. auto) – Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use “same” if both PyRanges are strande, otherwise ignore the strand information.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

Returns:

A PyRanges with the union of intervals.

Return type:

PyRanges

See also

PyRanges.set_intersect

set-theoretical intersection

PyRanges.overlap

report overlapping intervals

Examples

>>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10],
...                    "End": [3, 9, 11], "ID": ["a", "b", "c"]})
>>> gr
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         1 |         3 | a          |
| chr1         |         4 |         9 | b          |
| chr1         |        10 |        11 | c          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]})
>>> gr2
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         2 |         3 |
| chr1         |         2 |         9 |
| chr1         |         9 |        10 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.set_union(gr2)
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         1 |        11 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 1 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
sort(by=None, nb_cpu=1)

Sort by position or columns.

Parameters:
  • by (str or list of str, default None) – Column(s) to sort by. Default is Start and End. Special value “5” can be provided to sort by 5’: intervals on + strand are sorted in ascending order, while those on - strand are sorted in descending order.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

Note

Since a PyRanges contains multiple DataFrames, the sorting only happens within dataframes.

Returns:

Sorted PyRanges

Return type:

PyRanges

See also

pyranges.multioverlap

find overlaps with multiple PyRanges

Examples

>>> p  = pr.from_dict({"Chromosome": [1, 1, 1, 1, 1, 1],
...                    "Strand": ["+", "+", "-", "-", "+", "+"],
...                    "Start": [40, 1, 10, 70, 140, 160],
...                    "End": [60, 11, 25, 80, 152, 190],
...                    "transcript_id":["t3", "t3", "t2", "t2", "t1", "t1"] })

By default, intervals are sorted by position:

>>> p.sort()
+--------------+--------------+-----------+-----------+-----------------+
|   Chromosome | Strand       |     Start |       End | transcript_id   |
|   (category) | (category)   |   (int64) |   (int64) | (object)        |
|--------------+--------------+-----------+-----------+-----------------|
|            1 | +            |         1 |        11 | t3              |
|            1 | +            |        40 |        60 | t3              |
|            1 | +            |       140 |       152 | t1              |
|            1 | +            |       160 |       190 | t1              |
|            1 | -            |        10 |        25 | t2              |
|            1 | -            |        70 |        80 | t2              |
+--------------+--------------+-----------+-----------+-----------------+
Stranded PyRanges object has 6 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

(Note how sorting takes place within Chromosome-Strand pairs.)

To sort according to a specified column:

>>> p.sort(by='transcript_id')
+--------------+--------------+-----------+-----------+-----------------+
|   Chromosome | Strand       |     Start |       End | transcript_id   |
|   (category) | (category)   |   (int64) |   (int64) | (object)        |
|--------------+--------------+-----------+-----------+-----------------|
|            1 | +            |       140 |       152 | t1              |
|            1 | +            |       160 |       190 | t1              |
|            1 | +            |        40 |        60 | t3              |
|            1 | +            |         1 |        11 | t3              |
|            1 | -            |        10 |        25 | t2              |
|            1 | -            |        70 |        80 | t2              |
+--------------+--------------+-----------+-----------+-----------------+
Stranded PyRanges object has 6 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

If the special value “5” is provided, intervals are sorted according to their five-prime end:

>>> p.sort("5")
+--------------+--------------+-----------+-----------+-----------------+
|   Chromosome | Strand       |     Start |       End | transcript_id   |
|   (category) | (category)   |   (int64) |   (int64) | (object)        |
|--------------+--------------+-----------+-----------+-----------------|
|            1 | +            |         1 |        11 | t3              |
|            1 | +            |        40 |        60 | t3              |
|            1 | +            |       140 |       152 | t1              |
|            1 | +            |       160 |       190 | t1              |
|            1 | -            |        70 |        80 | t2              |
|            1 | -            |        10 |        25 | t2              |
+--------------+--------------+-----------+-----------+-----------------+
Stranded PyRanges object has 6 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
sp(n=30, formatting=None)

Sort on location and print.

See also

PyRanges.print

print PyRanges.

spc(n=30, formatting=None)

Sort on location, print and return self.

See also

PyRanges.print

print PyRanges.

slack(slack)

Deprecated: this function has been moved to Pyranges.extend

spliced_subsequence(start=0, end=None, by=None, strand=None, **kwargs)

Get subsequences of the intervals, using coordinates mapping to spliced transcripts (without introns)

The returned intervals are subregions of self, cut according to specifications. Start and end are relative to the 5’ end: 0 means the leftmost nucleotide for + strand intervals, while it means the rightmost one for - strand. This method also allows to manipulate groups of intervals (e.g. exons belonging to same transcripts) through the ‘by’ argument. When using it, start and end refer to the spliced transcript coordinates, meaning that introns are ignored in the count.

Parameters:
  • start (int) – Start of subregion, 0-based and included, counting from the 5’ end. Use a negative int to count from the 3’ (e.g. -1 is the last nucleotide)

  • end (int, default None) – End of subregion, 0-based and excluded, counting from the 5’ end. Use a negative int to count from the 3’ (e.g. -1 is the last nucleotide) If None, the existing 3’ end is returned.

  • by (list of str, default None) – intervals are grouped by this/these ID column(s) beforehand, e.g. exons belonging to same transcripts

strandbool, default None, i.e. auto

Whether strand is considered when interpreting the start and end arguments of this function. If True, counting is from the 5’ end, which is the leftmost coordinate for + strand and the rightmost for - strand. If False, all intervals are processed like they reside on the + strand. If None (default), strand is considered if the PyRanges is stranded.

Returns:

Subregion of self, subsequenced as specified by arguments

Return type:

PyRanges

Note

If the request goes out of bounds (e.g. requesting 100 nts for a 90nt region), only the existing portion is returned

See also

subsequence

analogous to this method, but input coordinates refer to the unspliced transcript

Examples

>>> p  = pr.from_dict({"Chromosome": [1, 1, 2, 2, 3],
...                   "Strand": ["+", "+", "-", "-", "+"],
...                   "Start": [1, 40, 10, 70, 140],
...                   "End": [11, 60, 25, 80, 152],
...                   "transcript_id":["t1", "t1", "t2", "t2", "t3"] })
>>> p
+--------------+--------------+-----------+-----------+-----------------+
|   Chromosome | Strand       |     Start |       End | transcript_id   |
|   (category) | (category)   |   (int64) |   (int64) | (object)        |
|--------------+--------------+-----------+-----------+-----------------|
|            1 | +            |         1 |        11 | t1              |
|            1 | +            |        40 |        60 | t1              |
|            2 | -            |        10 |        25 | t2              |
|            2 | -            |        70 |        80 | t2              |
|            3 | +            |       140 |       152 | t3              |
+--------------+--------------+-----------+-----------+-----------------+
Stranded PyRanges object has 5 rows and 5 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

# Get the first 15 nucleotides of each spliced transcript, grouping exons by transcript_id:

>>> p.spliced_subsequence(0, 15, by='transcript_id')
+--------------+--------------+-----------+-----------+-----------------+
|   Chromosome | Strand       |     Start |       End | transcript_id   |
|   (category) | (category)   |   (int64) |   (int64) | (object)        |
|--------------+--------------+-----------+-----------+-----------------|
|            1 | +            |         1 |        11 | t1              |
|            1 | +            |        40 |        45 | t1              |
|            2 | -            |        70 |        80 | t2              |
|            2 | -            |        20 |        25 | t2              |
|            3 | +            |       140 |       152 | t3              |
+--------------+--------------+-----------+-----------+-----------------+
Stranded PyRanges object has 5 rows and 5 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

# Get the last 20 nucleotides of each spliced transcript:

>>> p.spliced_subsequence(-20, by='transcript_id')
+--------------+--------------+-----------+-----------+-----------------+
|   Chromosome | Strand       |     Start |       End | transcript_id   |
|   (category) | (category)   |   (int64) |   (int64) | (object)        |
|--------------+--------------+-----------+-----------+-----------------|
|            1 | +            |        40 |        60 | t1              |
|            2 | -            |        70 |        75 | t2              |
|            2 | -            |        10 |        25 | t2              |
|            3 | +            |       140 |       152 | t3              |
+--------------+--------------+-----------+-----------+-----------------+
Stranded PyRanges object has 4 rows and 5 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

# Get region from 25 to 60 of each spliced transcript, or their existing subportion:

>>> p.spliced_subsequence(25, 60, by='transcript_id')
+--------------+--------------+-----------+-----------+-----------------+
|   Chromosome | Strand       |     Start |       End | transcript_id   |
|   (category) | (category)   |   (int64) |   (int64) | (object)        |
|--------------+--------------+-----------+-----------+-----------------|
|            1 | +            |        55 |        60 | t1              |
+--------------+--------------+-----------+-----------+-----------------+
Stranded PyRanges object has 1 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

# Get region of each spliced transcript which excludes their first and last 3 nucleotides:

>>> p.spliced_subsequence(3, -3, by='transcript_id')
+--------------+--------------+-----------+-----------+-----------------+
|   Chromosome | Strand       |     Start |       End | transcript_id   |
|   (category) | (category)   |   (int64) |   (int64) | (object)        |
|--------------+--------------+-----------+-----------+-----------------|
|            1 | +            |         4 |        11 | t1              |
|            1 | +            |        40 |        57 | t1              |
|            2 | -            |        70 |        77 | t2              |
|            2 | -            |        13 |        25 | t2              |
|            3 | +            |       143 |       149 | t3              |
+--------------+--------------+-----------+-----------+-----------------+
Stranded PyRanges object has 5 rows and 5 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
split(strand=None, between=False, nb_cpu=1)

Split into non-overlapping intervals.

Parameters:
  • strand (bool, default None, i.e. auto) – Whether to ignore strand information if PyRanges is stranded.

  • between (bool, default False) – Include lengths between intervals.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

Returns:

PyRanges with intervals split at overlap points.

Return type:

PyRanges

See also

pyranges.multioverlap

find overlaps with multiple PyRanges

Examples

>>> d = {'Chromosome': ['chr1', 'chr1', 'chr1', 'chr1'], 'Start': [3, 5, 5, 11],
...       'End': [6, 9, 7, 12], 'Strand': ['+', '+', '-', '-']}
>>> gr = pr.from_dict(d)
>>> gr
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         3 |         6 | +            |
| chr1         |         5 |         9 | +            |
| chr1         |         5 |         7 | -            |
| chr1         |        11 |        12 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 4 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.split()
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | Strand     |
| (object)     |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         3 |         5 | +          |
| chr1         |         5 |         6 | +          |
| chr1         |         6 |         9 | +          |
| chr1         |         5 |         7 | -          |
| chr1         |        11 |        12 | -          |
+--------------+-----------+-----------+------------+
Stranded PyRanges object has 5 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.split(between=True)
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | Strand     |
| (object)     |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         3 |         5 | +          |
| chr1         |         5 |         6 | +          |
| chr1         |         6 |         9 | +          |
| chr1         |         5 |         7 | -          |
| chr1         |         7 |        11 | -          |
| chr1         |        11 |        12 | -          |
+--------------+-----------+-----------+------------+
Stranded PyRanges object has 6 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.split(strand=False)
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (object)     |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         3 |         5 |
| chr1         |         5 |         6 |
| chr1         |         6 |         7 |
| chr1         |         7 |         9 |
| chr1         |        11 |        12 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 5 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.split(strand=False, between=True)
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (object)     |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         3 |         5 |
| chr1         |         5 |         6 |
| chr1         |         6 |         7 |
| chr1         |         7 |         9 |
| chr1         |         9 |        11 |
| chr1         |        11 |        12 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 6 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
subset(f, strand=None, **kwargs)

Return a subset of the rows.

Parameters:
  • f (function) – Function which returns boolean Series equal to length of df.

  • strand (bool, default None, i.e. auto) – Whether to do operations on chromosome/strand pairs or chromosomes. If None, will use chromosome/strand pairs if the PyRanges is stranded.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

  • **kwargs – Additional keyword arguments to pass as keyword arguments to f

Notes

PyRanges can also be subsetted directly with a boolean Series. This function is slightly faster, but more cumbersome.

Returns:

PyRanges subset on rows.

Return type:

PyRanges

Examples

>>> gr = pr.data.f1()
>>> gr
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |         3 |         6 | interval1  |         0 | +            |
| chr1         |         8 |         9 | interval3  |         0 | +            |
| chr1         |         5 |         7 | interval2  |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.subset(lambda df: df.Start > 4)
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |         8 |         9 | interval3  |         0 | +            |
| chr1         |         5 |         7 | interval2  |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

Also possible:

>>> gr[gr.Start > 4]
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |         8 |         9 | interval3  |         0 | +            |
| chr1         |         5 |         7 | interval2  |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
subsequence(start=0, end=None, by=None, strand=None, **kwargs)

Get subsequences of the intervals.

The returned intervals are subregions of self, cut according to specifications. Start and end are relative to the 5’ end: 0 means the leftmost nucleotide for + strand intervals, while it means the rightmost one for - strand. This method also allows to manipulate groups of intervals (e.g. exons belonging to same transcripts) through the ‘by’ argument. When using it, start and end refer to the unspliced transcript coordinates, meaning that introns are included in the count.

Parameters:
  • start (int) – Start of subregion, 0-based and included, counting from the 5’ end. Use a negative int to count from the 3’ (e.g. -1 is the last nucleotide)

  • end (int, default None) –

    End of subregion, 0-based and excluded, counting from the 5’ end. Use a negative int to count from the 3’ (e.g. -1 is the last nucleotide)

    If None, the existing 3’ end is returned.

  • by (list of str, default None) – intervals are grouped by this/these ID column(s) beforehand, e.g. exons belonging to same transcripts

  • strand (bool, default None, i.e. auto) – Whether strand is considered when interpreting the start and end arguments of this function. If True, counting is from the 5’ end, which is the leftmost coordinate for + strand and the rightmost for - strand. If False, all intervals are processed like they reside on the + strand. If None (default), strand is considered if the PyRanges is stranded.

Returns:

Subregion of self, subsequenced as specified by arguments

Return type:

PyRanges

Note

If the request goes out of bounds (e.g. requesting 100 nts for a 90nt region), only the existing portion is returned

See also

spliced_subsequence

analogous to this method, but intronic regions are not counted, so that input coordinates refer to the spliced transcript

Examples

>>> p  = pr.from_dict({"Chromosome": [1, 1, 2, 2, 3],
...                   "Strand": ["+", "+", "-", "-", "+"],
...                   "Start": [1, 40, 2, 30, 140],
...                   "End": [20, 60, 13, 45, 155],
...                   "transcript_id":["t1", "t1", "t2", "t2", "t3"] })
>>> p
+--------------+--------------+-----------+-----------+-----------------+
|   Chromosome | Strand       |     Start |       End | transcript_id   |
|   (category) | (category)   |   (int64) |   (int64) | (object)        |
|--------------+--------------+-----------+-----------+-----------------|
|            1 | +            |         1 |        20 | t1              |
|            1 | +            |        40 |        60 | t1              |
|            2 | -            |         2 |        13 | t2              |
|            2 | -            |        30 |        45 | t2              |
|            3 | +            |       140 |       155 | t3              |
+--------------+--------------+-----------+-----------+-----------------+
Stranded PyRanges object has 5 rows and 5 columns from 3 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.

# Get the first 10 nucleotides (at the 5’) of each interval (each line of the dataframe): >>> p.subsequence(0, 10) +————–+————–+———–+———–+—————–+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 1 | 11 | t1 | | 1 | + | 40 | 50 | t1 | | 2 | - | 3 | 13 | t2 | | 2 | - | 35 | 45 | t2 | | 3 | + | 140 | 150 | t3 | +————–+————–+———–+———–+—————–+ Stranded PyRanges object has 5 rows and 5 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand.

# Get the first 10 nucleotides of each transcript, grouping exons by transcript_id: >>> p.subsequence(0, 10, by=’transcript_id’) +————–+————–+———–+———–+—————–+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 1 | 11 | t1 | | 2 | - | 35 | 45 | t2 | | 3 | + | 140 | 150 | t3 | +————–+————–+———–+———–+—————–+ Stranded PyRanges object has 3 rows and 5 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand.

# Get the last 20 nucleotides of each transcript: >>> p.subsequence(-20, by=’transcript_id’) +————–+————–+———–+———–+—————–+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 40 | 60 | t1 | | 2 | - | 2 | 13 | t2 | | 3 | + | 140 | 155 | t3 | +————–+————–+———–+———–+—————–+ Stranded PyRanges object has 3 rows and 5 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand.

# Get region from 30 to 330 of each transcript, or their existing subportion: >>> p.subsequence(30, 300, by=’transcript_id’) +————–+————–+———–+———–+—————–+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 40 | 60 | t1 | | 2 | - | 2 | 13 | t2 | +————–+————–+———–+———–+—————–+ Stranded PyRanges object has 2 rows and 5 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand.

subtract(other, strandedness=None, nb_cpu=1)

Subtract intervals.

Parameters:
  • strandedness ({None, "same", "opposite", False}, default None, i.e. auto) – Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use “same” if both PyRanges are strande, otherwise ignore the strand information.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

See also

pyranges.PyRanges.overlap

use with invert=True to return all intervals without overlap

Examples

>>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10],
...                    "End": [3, 9, 11], "ID": ["a", "b", "c"]})
>>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]})
>>> gr
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         1 |         3 | a          |
| chr1         |         4 |         9 | b          |
| chr1         |        10 |        11 | c          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         2 |         3 |
| chr1         |         2 |         9 |
| chr1         |         9 |        10 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.subtract(gr2)
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | ID         |
| (category)   |   (int64) |   (int64) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         1 |         2 | a          |
| chr1         |        10 |        11 | c          |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
summary(to_stdout=True, return_df=False)

Return info.

Count refers to the number of intervals, the rest to the lengths.

The column “pyrange” describes the data as is. “coverage_forward” and “coverage_reverse” describe the data after strand-specific merging of overlapping intervals. “coverage_unstranded” describes the data after merging, without considering the strands.

The row “count” is the number of intervals and “sum” is their total length. The rest describe the lengths of the intervals.

Parameters:
  • to_stdout (bool, default True) – Print summary.

  • return_df (bool, default False) – Return df with summary.

Return type:

None or DataFrame with summary.

Examples

>>> gr = pr.data.ensembl_gtf()[["Feature", "gene_id"]]
>>> gr
+--------------+--------------+-----------+-----------+--------------+-----------------+
| Chromosome   | Feature      | Start     | End       | Strand       | gene_id         |
| (category)   | (category)   | (int64)   | (int64)   | (category)   | (object)        |
|--------------+--------------+-----------+-----------+--------------+-----------------|
| 1            | gene         | 11868     | 14409     | +            | ENSG00000223972 |
| 1            | transcript   | 11868     | 14409     | +            | ENSG00000223972 |
| 1            | exon         | 11868     | 12227     | +            | ENSG00000223972 |
| 1            | exon         | 12612     | 12721     | +            | ENSG00000223972 |
| ...          | ...          | ...       | ...       | ...          | ...             |
| 1            | gene         | 1173055   | 1179555   | -            | ENSG00000205231 |
| 1            | transcript   | 1173055   | 1179555   | -            | ENSG00000205231 |
| 1            | exon         | 1179364   | 1179555   | -            | ENSG00000205231 |
| 1            | exon         | 1173055   | 1176396   | -            | ENSG00000205231 |
+--------------+--------------+-----------+-----------+--------------+-----------------+
Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.summary()
+-------+------------------+--------------------+--------------------+-----------------------+
|       |          pyrange |   coverage_forward |   coverage_reverse |   coverage_unstranded |
|-------+------------------+--------------------+--------------------+-----------------------|
| count |   2446           |               39   |               23   |                  32   |
| mean  |   2291.92        |             7058.1 |            30078.6 |               27704.2 |
| std   |  11906.9         |            10322.3 |            59467.7 |               67026.9 |
| min   |      1           |               83   |              154   |                  83   |
| 25%   |     90           |             1051   |             1204   |                1155   |
| 50%   |    138           |             2541   |             6500   |                6343   |
| 75%   |    382.25        |             7168   |            23778   |               20650.8 |
| max   | 241726           |            43065   |           241726   |              291164   |
| sum   |      5.60603e+06 |           275266   |           691807   |              886534   |
+-------+------------------+--------------------+--------------------+-----------------------+
>>> gr.summary(return_df=True, to_stdout=False)
            pyrange  coverage_forward  coverage_reverse  coverage_unstranded
count  2.446000e+03         39.000000         23.000000            32.000000
mean   2.291918e+03       7058.102564      30078.565217         27704.187500
std    1.190685e+04      10322.309347      59467.695265         67026.868647
min    1.000000e+00         83.000000        154.000000            83.000000
25%    9.000000e+01       1051.000000       1204.000000          1155.000000
50%    1.380000e+02       2541.000000       6500.000000          6343.000000
75%    3.822500e+02       7168.000000      23778.000000         20650.750000
max    2.417260e+05      43065.000000     241726.000000        291164.000000
sum    5.606031e+06     275266.000000     691807.000000        886534.000000
tail(n=8)

Return the n last rows.

Parameters:

n (int, default 8) – Return n rows.

Returns:

PyRanges with the n last rows.

Return type:

PyRanges

See also

PyRanges.head

return the first rows

PyRanges.sample

return random rows

Examples

>>> gr = pr.data.chipseq()
>>> gr
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   | Start     | End       | Name       | Score     | Strand       |
| (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         | 212609534 | 212609559 | U0         | 0         | +            |
| chr1         | 169887529 | 169887554 | U0         | 0         | +            |
| chr1         | 216711011 | 216711036 | U0         | 0         | +            |
| chr1         | 144227079 | 144227104 | U0         | 0         | +            |
| ...          | ...       | ...       | ...        | ...       | ...          |
| chrY         | 15224235  | 15224260  | U0         | 0         | -            |
| chrY         | 13517892  | 13517917  | U0         | 0         | -            |
| chrY         | 8010951   | 8010976   | U0         | 0         | -            |
| chrY         | 7405376   | 7405401   | U0         | 0         | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.tail(3)
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chrY         |  13517892 |  13517917 | U0         |         0 | -            |
| chrY         |   8010951 |   8010976 | U0         |         0 | -            |
| chrY         |   7405376 |   7405401 | U0         |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
tile(tile_size, overlap=False, strand=None, nb_cpu=1)

Return overlapping genomic tiles.

The genome is divided into bookended tiles of length tile_size and one is returned per overlapping interval.

Parameters:
  • tile_size (int) – Length of the tiles.

  • overlap (bool, default False) – Add column of nucleotide overlap to each tile.

  • strand (bool, default None, i.e. auto) – Whether to do operations on chromosome/strand pairs or chromosomes. If None, will use chromosome/strand pairs if the PyRanges is stranded.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

  • **kwargs – Additional keyword arguments to pass as keyword arguments to f

Returns:

Tiled PyRanges.

Return type:

PyRanges

See also

pyranges.PyRanges.window

divide intervals into windows

Examples

>>> gr = pr.data.ensembl_gtf()[["Feature", "gene_name"]]
>>> gr
+--------------+--------------+-----------+-----------+--------------+-------------+
| Chromosome   | Feature      | Start     | End       | Strand       | gene_name   |
| (category)   | (category)   | (int64)   | (int64)   | (category)   | (object)    |
|--------------+--------------+-----------+-----------+--------------+-------------|
| 1            | gene         | 11868     | 14409     | +            | DDX11L1     |
| 1            | transcript   | 11868     | 14409     | +            | DDX11L1     |
| 1            | exon         | 11868     | 12227     | +            | DDX11L1     |
| 1            | exon         | 12612     | 12721     | +            | DDX11L1     |
| ...          | ...          | ...       | ...       | ...          | ...         |
| 1            | gene         | 1173055   | 1179555   | -            | TTLL10-AS1  |
| 1            | transcript   | 1173055   | 1179555   | -            | TTLL10-AS1  |
| 1            | exon         | 1179364   | 1179555   | -            | TTLL10-AS1  |
| 1            | exon         | 1173055   | 1176396   | -            | TTLL10-AS1  |
+--------------+--------------+-----------+-----------+--------------+-------------+
Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.tile(200)
+--------------+--------------+-----------+-----------+--------------+-------------+
| Chromosome   | Feature      | Start     | End       | Strand       | gene_name   |
| (category)   | (category)   | (int64)   | (int64)   | (category)   | (object)    |
|--------------+--------------+-----------+-----------+--------------+-------------|
| 1            | gene         | 11800     | 12000     | +            | DDX11L1     |
| 1            | gene         | 12000     | 12200     | +            | DDX11L1     |
| 1            | gene         | 12200     | 12400     | +            | DDX11L1     |
| 1            | gene         | 12400     | 12600     | +            | DDX11L1     |
| ...          | ...          | ...       | ...       | ...          | ...         |
| 1            | exon         | 1175600   | 1175800   | -            | TTLL10-AS1  |
| 1            | exon         | 1175800   | 1176000   | -            | TTLL10-AS1  |
| 1            | exon         | 1176000   | 1176200   | -            | TTLL10-AS1  |
| 1            | exon         | 1176200   | 1176400   | -            | TTLL10-AS1  |
+--------------+--------------+-----------+-----------+--------------+-------------+
Stranded PyRanges object has 30,538 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.tile(100, overlap=True)
+--------------+--------------+-----------+-----------+--------------+-------------+---------------+
| Chromosome   | Feature      | Start     | End       | Strand       | gene_name   | TileOverlap   |
| (category)   | (category)   | (int64)   | (int64)   | (category)   | (object)    | (int64)       |
|--------------+--------------+-----------+-----------+--------------+-------------+---------------|
| 1            | gene         | 11800     | 11900     | +            | DDX11L1     | 32            |
| 1            | gene         | 11900     | 12000     | +            | DDX11L1     | 100           |
| 1            | gene         | 12000     | 12100     | +            | DDX11L1     | 100           |
| 1            | gene         | 12100     | 12200     | +            | DDX11L1     | 100           |
| ...          | ...          | ...       | ...       | ...          | ...         | ...           |
| 1            | exon         | 1176000   | 1176100   | -            | TTLL10-AS1  | 100           |
| 1            | exon         | 1176100   | 1176200   | -            | TTLL10-AS1  | 100           |
| 1            | exon         | 1176200   | 1176300   | -            | TTLL10-AS1  | 100           |
| 1            | exon         | 1176300   | 1176400   | -            | TTLL10-AS1  | 96            |
+--------------+--------------+-----------+-----------+--------------+-------------+---------------+
Stranded PyRanges object has 58,516 rows and 7 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
to_example(n=10)

Return as dict.

Used for easily creating examples for copy and pasting.

Parameters:

n (int, default 10) – Number of rows. Half is taken from the start, the other half from the end.

See also

PyRanges.from_dict

create PyRanges from dict

Examples

>>> gr = pr.data.chipseq()
>>> gr
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   | Start     | End       | Name       | Score     | Strand       |
| (category)   | (int64)   | (int64)   | (object)   | (int64)   | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         | 212609534 | 212609559 | U0         | 0         | +            |
| chr1         | 169887529 | 169887554 | U0         | 0         | +            |
| chr1         | 216711011 | 216711036 | U0         | 0         | +            |
| chr1         | 144227079 | 144227104 | U0         | 0         | +            |
| ...          | ...       | ...       | ...        | ...       | ...          |
| chrY         | 15224235  | 15224260  | U0         | 0         | -            |
| chrY         | 13517892  | 13517917  | U0         | 0         | -            |
| chrY         | 8010951   | 8010976   | U0         | 0         | -            |
| chrY         | 7405376   | 7405401   | U0         | 0         | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> d = gr.to_example(n=4)
>>> d
{'Chromosome': ['chr1', 'chr1', 'chrY', 'chrY'], 'Start': [212609534, 169887529, 8010951, 7405376], 'End': [212609559, 169887554, 8010976, 7405401], 'Name': ['U0', 'U0', 'U0', 'U0'], 'Score': [0, 0, 0, 0], 'Strand': ['+', '+', '-', '-']}
>>> pr.from_dict(d)
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         | 212609534 | 212609559 | U0         |         0 | +            |
| chr1         | 169887529 | 169887554 | U0         |         0 | +            |
| chrY         |   8010951 |   8010976 | U0         |         0 | -            |
| chrY         |   7405376 |   7405401 | U0         |         0 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 4 rows and 6 columns from 2 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
three_end()

Return the 3’-end.

The 3’-end is the start of intervals on the reverse strand and the end of intervals on the forward strand.

Returns:

PyRanges with the 3’.

Return type:

PyRanges

See also

PyRanges.five_end

return the five prime end

Examples

>>> d =  {'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6],
...       'End': [5, 8], 'Strand': ['+', '-']}
>>> gr = pr.from_dict(d)
>>> gr
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         1 |         5 | +            |
| chr1         |         6 |         8 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.three_end()
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         4 |         5 | +            |
| chr1         |         6 |         7 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
to_bed(path=None, keep=True, compression='infer', chain=False)

Write to bed.

Parameters:
  • path (str, default None) – Where to write. If None, returns string representation.

  • keep (bool, default True) – Whether to keep all columns, not just Chromosome, Start, End, Name, Score, Strand when writing.

  • compression (str, compression type to use, by default infer based on extension.) – See pandas.DataFree.to_csv for more info.

  • chain (bool, default False) – Whether to return the PyRanges after writing.

Examples

>>> d =  {'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6],
...       'End': [5, 8], 'Strand': ['+', '-'], "Gene": [1, 2]}
>>> gr = pr.from_dict(d)
>>> gr
+--------------+-----------+-----------+--------------+-----------+
| Chromosome   |     Start |       End | Strand       |      Gene |
| (category)   |   (int64) |   (int64) | (category)   |   (int64) |
|--------------+-----------+-----------+--------------+-----------|
| chr1         |         1 |         5 | +            |         1 |
| chr1         |         6 |         8 | -            |         2 |
+--------------+-----------+-----------+--------------+-----------+
Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.to_bed()
'chr1\t1\t5\t.\t.\t+\t1\nchr1\t6\t8\t.\t.\t-\t2\n'

# File contents: chr1 1 5 . . + 1 chr1 6 8 . . - 2

Does not include noncanonical bed-column Gene:

>>> gr.to_bed(keep=False)
'chr1\t1\t5\t.\t.\t+\nchr1\t6\t8\t.\t.\t-\n'

# File contents: chr1 1 5 . . + chr1 6 8 . . -

>>> gr.to_bed("test.bed", chain=True)
+--------------+-----------+-----------+--------------+-----------+
| Chromosome   |     Start |       End | Strand       |      Gene |
| (category)   |   (int64) |   (int64) | (category)   |   (int64) |
|--------------+-----------+-----------+--------------+-----------|
| chr1         |         1 |         5 | +            |         1 |
| chr1         |         6 |         8 | -            |         2 |
+--------------+-----------+-----------+--------------+-----------+
Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> open("test.bed").readlines()
['chr1\t1\t5\t.\t.\t+\t1\n', 'chr1\t6\t8\t.\t.\t-\t2\n']
to_bigwig(path=None, chromosome_sizes=None, rpm=True, divide=None, value_col=None, dryrun=False, chain=False)

Write regular or value coverage to bigwig.

Note

To create one bigwig per strand, subset the PyRanges first.

Parameters:
  • path (str) – Where to write bigwig.

  • chromosome_sizes (PyRanges or dict) – If dict: map of chromosome names to chromosome length.

  • rpm (True) – Whether to normalize data by dividing by total number of intervals and multiplying by 1e6.

  • divide (bool, default False) – (Only useful with value_col) Divide value coverage by regular coverage and take log2.

  • value_col (str, default None) – Name of column to compute coverage of.

  • dryrun (bool, default False) – Return data that would be written without writing bigwigs.

  • chain (bool, default False) – Whether to return the PyRanges after writing.

Note

Requires pybigwig to be installed.

If you require more control over the normalization process, use pyranges.to_bigwig()

See also

pyranges.to_bigwig

write pandas DataFrame to bigwig.

Examples

>>> d =  {'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [1, 4, 6],
...       'End': [7, 8, 10], 'Strand': ['+', '-', '-'],
...       'Value': [10, 20, 30]}
>>> gr = pr.from_dict(d)
>>> gr
+--------------+-----------+-----------+--------------+-----------+
| Chromosome   |     Start |       End | Strand       |     Value |
| (category)   |   (int64) |   (int64) | (category)   |   (int64) |
|--------------+-----------+-----------+--------------+-----------|
| chr1         |         1 |         7 | +            |        10 |
| chr1         |         4 |         8 | -            |        20 |
| chr1         |         6 |        10 | -            |        30 |
+--------------+-----------+-----------+--------------+-----------+
Stranded PyRanges object has 3 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.to_bigwig(dryrun=True, rpm=False)
+--------------+-----------+-----------+-------------+
| Chromosome   |     Start |       End |       Score |
| (category)   |   (int64) |   (int64) |   (float64) |
|--------------+-----------+-----------+-------------|
| chr1         |         1 |         4 |           1 |
| chr1         |         4 |         6 |           2 |
| chr1         |         6 |         7 |           3 |
| chr1         |         7 |         8 |           2 |
| chr1         |         8 |        10 |           1 |
+--------------+-----------+-----------+-------------+
Unstranded PyRanges object has 5 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.to_bigwig(dryrun=True, rpm=False, value_col="Value")
+--------------+-----------+-----------+-------------+
| Chromosome   |     Start |       End |       Score |
| (category)   |   (int64) |   (int64) |   (float64) |
|--------------+-----------+-----------+-------------|
| chr1         |         1 |         4 |          10 |
| chr1         |         4 |         6 |          30 |
| chr1         |         6 |         7 |          60 |
| chr1         |         7 |         8 |          50 |
| chr1         |         8 |        10 |          30 |
+--------------+-----------+-----------+-------------+
Unstranded PyRanges object has 5 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.to_bigwig(dryrun=True, rpm=False, value_col="Value", divide=True)
+--------------+-----------+-----------+-------------+
| Chromosome   |     Start |       End |       Score |
| (category)   |   (int64) |   (int64) |   (float64) |
|--------------+-----------+-----------+-------------|
| chr1         |         0 |         1 |   nan       |
| chr1         |         1 |         4 |     3.32193 |
| chr1         |         4 |         6 |     3.90689 |
| chr1         |         6 |         7 |     4.32193 |
| chr1         |         7 |         8 |     4.64386 |
| chr1         |         8 |        10 |     4.90689 |
+--------------+-----------+-----------+-------------+
Unstranded PyRanges object has 6 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
to_csv(path=None, sep=',', header=True, compression='infer', chain=False)

Write to comma- or other value-separated file.

Parameters:
  • path (str, default None, i.e. return string representation.) – Where to write file.

  • sep (str, default ",") – String of length 1. Field delimiter for the output file.

  • header (bool, default True) – Write out the column names.

  • compression ({‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default "infer") – Which compression to use. Uses file extension to infer by default.

  • chain (bool, default False) – Whether to return the PyRanges after writing.

Note

The output encodes intervals just like PyRanges: 0-based, Start included and End excluded.

Examples

>>> d = {"Chromosome": [1] * 3, "Start": [1, 3, 5], "End": [4, 6, 9], "Feature": ["gene", "exon", "exon"]}
>>> gr = pr.from_dict(d)
>>> gr.to_csv(sep="\t")
'Chromosome\tStart\tEnd\tFeature\n1\t1\t4\tgene\n1\t3\t6\texon\n1\t5\t9\texon\n'

# The file contents Chromosome Start End Feature 1 1 4 gene 1 3 6 exon 1 5 9 exon

to_gff3(path=None, compression='infer', chain=False, map_cols=None)

Write to General Feature Format 3.

The GFF format consists of a tab-separated file without header. GFF contains a fixed amount of columns, indicated below (names before “:”). For each of these, PyRanges will use the corresponding column (names after “:”).

seqname: Chromosome source: Source type: Feature start: Start end: End score: Score strand: Strand phase: Frame attribute: autofilled

Columns which are not mapped to GFF columns are appended as a field in the attribute string (i.e. the last field).

Parameters:
  • path (str, default None, i.e. return string representation.) – Where to write file.

  • compression ({‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default "infer") – Which compression to use. Uses file extension to infer by default.

  • chain (bool, default False) – Whether to return the PyRanges after writing.

  • map_cols (dict, default None) – Override mapping between GFF and PyRanges fields for any number of columns. Format: {gff_column : pyranges_column} If a mapping is found for the “attribute”` column, this not auto-filled

Notes

Nonexisting columns will be added with a ‘.’ to represent the missing values.

See also

pyranges.read_gff3

read GFF3 files

pyranges.to_gtf

write to GTF format

Examples

>>> d = {"Chromosome": [1] * 3, "Start": [1, 3, 5], "End": [4, 6, 9], "Feature": ["gene", "exon", "exon"]}
>>> gr = pr.from_dict(d)
>>> gr.to_gff3()
'1\t.\tgene\t2\t4\t.\t.\t.\t\n1\t.\texon\t4\t6\t.\t.\t.\t\n1\t.\texon\t6\t9\t.\t.\t.\t\n'

# How the file would look 1 . gene 2 4 . . . 1 . exon 4 6 . . . 1 . exon 6 9 . . .

>>> gr.Gene = [1, 2, 3]
>>> gr.function = ["a b", "c", "def"]
>>> gr.to_gff3()
'1\t.\tgene\t2\t4\t.\t.\t.\tGene=1;function=a b\n1\t.\texon\t4\t6\t.\t.\t.\tGene=2;function=c\n1\t.\texon\t6\t9\t.\t.\t.\tGene=3;function=def\n'

# How the file would look 1 . gene 2 4 . . . Gene=1;function=a b 1 . exon 4 6 . . . Gene=2;function=c 1 . exon 6 9 . . . Gene=3;function=def

>>> gr.the_frame = [0, 2, 1]
>>> gr.tag = ['mRNA', 'CDS', 'CDS']
>>> gr
+--------------+-----------+-----------+------------+-----------+------------+-------------+------------+
|   Chromosome |     Start |       End | Feature    |      Gene | function   |   the_frame | tag        |
|   (category) |   (int64) |   (int64) | (object)   |   (int64) | (object)   |     (int64) | (object)   |
|--------------+-----------+-----------+------------+-----------+------------+-------------+------------|
|            1 |         1 |         4 | gene       |         1 | a b        |           0 | mRNA       |
|            1 |         3 |         6 | exon       |         2 | c          |           2 | CDS        |
|            1 |         5 |         9 | exon       |         3 | def        |           1 | CDS        |
+--------------+-----------+-----------+------------+-----------+------------+-------------+------------+
Unstranded PyRanges object has 3 rows and 8 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.to_gff3(map_cols={'phase':'the_frame', 'feature':'tag'})
'1\t.\tmRNA\t2\t4\t.\t.\t0\tFeature=gene;Gene=1;function=a b\n1\t.\tCDS\t4\t6\t.\t.\t2\tFeature=exon;Gene=2;function=c\n1\t.\tCDS\t6\t9\t.\t.\t1\tFeature=exon;Gene=3;function=def\n'

# How the file would look 1 . mRNA 2 4 . . 0 Gene=1;function=a b 1 . CDS 4 6 . . 2 Gene=2;function=c 1 . CDS 6 9 . . 1 Gene=3;function=def

>>> gr.to_gff3(map_cols={'attribute':'Gene'})
'1\t.\tgene\t2\t4\t.\t.\t.\tGene=1\n1\t.\texon\t4\t6\t.\t.\t.\tGene=1\n1\t.\texon\t6\t9\t.\t.\t.\tGene=1\n'

# How the file would look 1 . gene 2 4 . . . Gene=1 1 . exon 4 6 . . . Gene=1 1 . exon 6 9 . . . Gene=1

to_gtf(path=None, compression='infer', chain=False, map_cols=None)

Write to Gene Transfer Format.

The GTF format consists of a tab-separated file without header. It contains a fixed amount of columns, indicated below (names before “:”). For each of these, PyRanges will use the corresponding column (names after “:”).

seqname: Chromosome source: Source type: Feature start: Start end: End score: Score strand: Strand frame: Frame attribute: auto-filled

Columns which are not mapped to GTF columns are appended as a field in the attribute string (i.e. the last field).

Parameters:
  • path (str, default None, i.e. return string representation.) – Where to write file.

  • compression ({‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default "infer") – Which compression to use. Uses file extension to infer by default.

  • chain (bool, default False) – Whether to return the PyRanges after writing.

  • map_cols (dict, default None) – Override mapping between GTF and PyRanges fields for any number of columns. Format: {gtf_column : pyranges_column} If a mapping is found for the “attribute”` column, this not auto-filled

Notes

Nonexisting columns will be added with a ‘.’ to represent the missing values.

See also

pyranges.read_gtf

read GTF files

pyranges.to_gff3

write to GFF3 format

Examples

>>> d = {"Chromosome": [1] * 3, "Start": [1, 3, 5], "End": [4, 6, 9], "Feature": ["gene", "exon", "exon"]}
>>> gr = pr.from_dict(d)
>>> gr.to_gtf()  # the raw string output
'1\t.\tgene\t2\t4\t.\t.\t.\t\n1\t.\texon\t4\t6\t.\t.\t.\t\n1\t.\texon\t6\t9\t.\t.\t.\t\n'

# What the file contents look like: 1 . gene 2 4 . . . 1 . exon 4 6 . . . 1 . exon 6 9 . . .

>>> gr.name = ["Tim", "Eric", "Endre"]
>>> gr.prices = ["Cheap", "Premium", "Fine European"]
>>> gr.to_gtf()  # the raw string output
'1\t.\tgene\t2\t4\t.\t.\t.\tname "Tim"; prices "Cheap";\n1\t.\texon\t4\t6\t.\t.\t.\tname "Eric"; prices "Premium";\n1\t.\texon\t6\t9\t.\t.\t.\tname "Endre"; prices "Fine European";\n'

# What the file contents look like: 1 . gene 2 4 . . . name “Tim”; prices “Cheap”; 1 . exon 4 6 . . . name “Eric”; prices “Premium”; 1 . exon 6 9 . . . name “Endre”; prices “Fine European”;

>>> gr.to_gtf(map_cols={"feature":"name", "attribute":"prices"})  # the raw string output
'1\t.\tTim\t2\t4\t.\t.\t.\tprices "Cheap";\n1\t.\tEric\t4\t6\t.\t.\t.\tprices "Premium";\n1\t.\tEndre\t6\t9\t.\t.\t.\tprices "Fine European";\n'

# What the file contents look like: 1 . Tim 2 4 . . . prices “Cheap”; 1 . Eric 4 6 . . . prices “Premium”; 1 . Endre 6 9 . . . prices “Fine European”;

to_rle(value_col=None, strand=None, rpm=False, nb_cpu=1)

Return as RleDict.

Create collection of Rles representing the coverage or other numerical value.

Parameters:
  • value_col (str, default None) – Numerical column to create RleDict from.

  • strand (bool, default None, i.e. auto) – Whether to treat strands serparately.

  • rpm (bool, default False) – Normalize by multiplying with 1e6/(number_intervals).

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

Returns:

Rle with coverage or other info from the PyRanges.

Return type:

pyrle.RleDict

Examples

>>> d = {'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [3, 8, 5],
...      'End': [6, 9, 7], 'Score': [0.1, 5, 3.14], 'Strand': ['+', '+', '-']}
>>> gr = pr.from_dict(d)
>>> gr.to_rle()
chr1 +
--
+--------+-----+-----+-----+-----+
| Runs   | 3   | 3   | 2   | 1   |
|--------+-----+-----+-----+-----|
| Values | 0.0 | 1.0 | 0.0 | 1.0 |
+--------+-----+-----+-----+-----+
Rle of length 9 containing 4 elements (avg. length 2.25)

chr1 -
--
+--------+-----+-----+
| Runs   | 5   | 2   |
|--------+-----+-----|
| Values | 0.0 | 1.0 |
+--------+-----+-----+
Rle of length 7 containing 2 elements (avg. length 3.5)
RleDict object with 2 chromosomes/strand pairs.
>>> gr.to_rle(value_col="Score")
chr1 +
--
+--------+-----+-----+-----+-----+
| Runs   | 3   | 3   | 2   | 1   |
|--------+-----+-----+-----+-----|
| Values | 0.0 | 0.1 | 0.0 | 5.0 |
+--------+-----+-----+-----+-----+
Rle of length 9 containing 4 elements (avg. length 2.25)

chr1 -
--
+--------+-----+------+
| Runs   | 5   | 2    |
|--------+-----+------|
| Values | 0.0 | 3.14 |
+--------+-----+------+
Rle of length 7 containing 2 elements (avg. length 3.5)
RleDict object with 2 chromosomes/strand pairs.
>>> gr.to_rle(value_col="Score", strand=False)
chr1
+--------+-----+-----+------+------+-----+-----+
| Runs   | 3   | 2   | 1    | 1    | 1   | 1   |
|--------+-----+-----+------+------+-----+-----|
| Values | 0.0 | 0.1 | 3.24 | 3.14 | 0.0 | 5.0 |
+--------+-----+-----+------+------+-----+-----+
Rle of length 9 containing 6 elements (avg. length 1.5)
Unstranded RleDict object with 1 chromosome.
>>> gr.to_rle(rpm=True)
chr1 +
--
+--------+-----+-------------------+-----+-------------------+
| Runs   | 3   | 3                 | 2   | 1                 |
|--------+-----+-------------------+-----+-------------------|
| Values | 0.0 | 333333.3333333333 | 0.0 | 333333.3333333333 |
+--------+-----+-------------------+-----+-------------------+
Rle of length 9 containing 4 elements (avg. length 2.25)

chr1 -
--
+--------+-----+-------------------+
| Runs   | 5   | 2                 |
|--------+-----+-------------------|
| Values | 0.0 | 333333.3333333333 |
+--------+-----+-------------------+
Rle of length 7 containing 2 elements (avg. length 3.5)
RleDict object with 2 chromosomes/strand pairs.
unstrand()

Remove strand.

Note

Removes Strand column even if PyRanges is not stranded.

See also

PyRanges.stranded

whether PyRanges contains valid strand info.

Examples

>>> d =  {'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6],
...       'End': [5, 8], 'Strand': ['+', '-']}
>>> gr = pr.from_dict(d)
>>> gr
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         |         1 |         5 | +            |
| chr1         |         6 |         8 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr.unstrand()
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int64) |   (int64) |
|--------------+-----------+-----------|
| chr1         |         1 |         5 |
| chr1         |         6 |         8 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
values()

Return the underlying DataFrames.

window(window_size, strand=None)

Return overlapping genomic windows.

Windows of length window_size are returned.

Parameters:
  • window_size (int) – Length of the windows.

  • strand (bool, default None, i.e. auto) – Whether to do operations on chromosome/strand pairs or chromosomes. If None, will use chromosome/strand pairs if the PyRanges is stranded.

  • nb_cpu (int, default 1) – How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets.

  • **kwargs – Additional keyword arguments to pass as keyword arguments to f

Returns:

Tiled PyRanges.

Return type:

PyRanges

See also

pyranges.PyRanges.tile

divide intervals into adjacent tiles.

Examples

>>> import pyranges as pr
>>> gr = pr.from_dict({"Chromosome": [1], "Start": [895], "End": [1259]})
>>> gr
+--------------+-----------+-----------+
|   Chromosome |     Start |       End |
|   (category) |   (int64) |   (int64) |
|--------------+-----------+-----------|
|            1 |       895 |      1259 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 1 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr.window(200)
+--------------+-----------+-----------+
|   Chromosome |     Start |       End |
|   (category) |   (int64) |   (int64) |
|--------------+-----------+-----------|
|            1 |       895 |      1095 |
|            1 |      1095 |      1259 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> gr2 = pr.data.ensembl_gtf()[["Feature", "gene_name"]]
>>> gr2
+--------------+--------------+-----------+-----------+--------------+-------------+
| Chromosome   | Feature      | Start     | End       | Strand       | gene_name   |
| (category)   | (category)   | (int64)   | (int64)   | (category)   | (object)    |
|--------------+--------------+-----------+-----------+--------------+-------------|
| 1            | gene         | 11868     | 14409     | +            | DDX11L1     |
| 1            | transcript   | 11868     | 14409     | +            | DDX11L1     |
| 1            | exon         | 11868     | 12227     | +            | DDX11L1     |
| 1            | exon         | 12612     | 12721     | +            | DDX11L1     |
| ...          | ...          | ...       | ...       | ...          | ...         |
| 1            | gene         | 1173055   | 1179555   | -            | TTLL10-AS1  |
| 1            | transcript   | 1173055   | 1179555   | -            | TTLL10-AS1  |
| 1            | exon         | 1179364   | 1179555   | -            | TTLL10-AS1  |
| 1            | exon         | 1173055   | 1176396   | -            | TTLL10-AS1  |
+--------------+--------------+-----------+-----------+--------------+-------------+
Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr2 = pr.data.ensembl_gtf()[["Feature", "gene_name"]]
>>> gr2.window(1000)
+--------------+--------------+-----------+-----------+--------------+-------------+
| Chromosome   | Feature      | Start     | End       | Strand       | gene_name   |
| (category)   | (category)   | (int64)   | (int64)   | (category)   | (object)    |
|--------------+--------------+-----------+-----------+--------------+-------------|
| 1            | gene         | 11868     | 12868     | +            | DDX11L1     |
| 1            | gene         | 12868     | 13868     | +            | DDX11L1     |
| 1            | gene         | 13868     | 14409     | +            | DDX11L1     |
| 1            | transcript   | 11868     | 12868     | +            | DDX11L1     |
| ...          | ...          | ...       | ...       | ...          | ...         |
| 1            | exon         | 1173055   | 1174055   | -            | TTLL10-AS1  |
| 1            | exon         | 1174055   | 1175055   | -            | TTLL10-AS1  |
| 1            | exon         | 1175055   | 1176055   | -            | TTLL10-AS1  |
| 1            | exon         | 1176055   | 1176396   | -            | TTLL10-AS1  |
+--------------+--------------+-----------+-----------+--------------+-------------+
Stranded PyRanges object has 7,516 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
__getstate__()

Helper for pickle.

__setstate__(d)
pyranges.read_bam(f, sparse=True, as_df=False, mapq=0, required_flag=0, filter_flag=1540)

Return bam file as PyRanges.

Parameters:
  • f (str) – Path to bam file

  • sparse (bool, default True) – Whether to return only.

  • as_df (bool, default False) – Whether to return as pandas DataFrame instead of PyRanges.

  • mapq (int, default 0) – Minimum mapping quality score.

  • required_flag (int, default 0) – Flags which must be present for the interval to be read.

  • filter_flag (int, default 1540) – Ignore reads with these flags. Default 1540, which means that either the read is unmapped, the read failed vendor or platfrom quality checks, or the read is a PCR or optical duplicate.

Notes

This functionality requires the library bamread. It can be installed with pip install bamread or conda install -c bioconda bamread.

Examples

>>> path = pr.get_example_path("control.bam")
>>> pr.read_bam(path).sort()
+--------------+-----------+-----------+--------------+------------+
| Chromosome   | Start     | End       | Strand       | Flag       |
| (category)   | (int64)   | (int64)   | (category)   | (uint16)   |
|--------------+-----------+-----------+--------------+------------|
| chr1         | 1041102   | 1041127   | +            | 0          |
| chr1         | 2129359   | 2129384   | +            | 0          |
| chr1         | 2239108   | 2239133   | +            | 0          |
| chr1         | 2318805   | 2318830   | +            | 0          |
| ...          | ...       | ...       | ...          | ...        |
| chrY         | 10632456  | 10632481  | -            | 16         |
| chrY         | 11918814  | 11918839  | -            | 16         |
| chrY         | 11936866  | 11936891  | -            | 16         |
| chrY         | 57402214  | 57402239  | -            | 16         |
+--------------+-----------+-----------+--------------+------------+
Stranded PyRanges object has 10,000 rows and 5 columns from 25 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
pyranges.read_bed(f, as_df=False, nrows=None)

Return bed file as PyRanges.

This is a reader for files that follow the bed format. They can have from 3-12 columns which will be named like so:

Chromosome Start End Name Score Strand ThickStart ThickEnd ItemRGB BlockCount BlockSizes BlockStarts

Parameters:
  • f (str) – Path to bed file

  • as_df (bool, default False) – Whether to return as pandas DataFrame instead of PyRanges.

  • nrows (int, default None) – Number of rows to return.

Notes

If you just want to create a PyRanges from a tab-delimited bed-like file, use pr.PyRanges(pandas.read_table(f)) instead.

Examples

>>> path = pr.get_example_path("aorta.bed")
>>> pr.read_bed(path, nrows=5)
+--------------+-----------+-----------+------------+-----------+--------------+
| Chromosome   |     Start |       End | Name       |     Score | Strand       |
| (category)   |   (int64) |   (int64) | (object)   |   (int64) | (category)   |
|--------------+-----------+-----------+------------+-----------+--------------|
| chr1         |      9939 |     10138 | H3K27me3   |         7 | +            |
| chr1         |      9953 |     10152 | H3K27me3   |         5 | +            |
| chr1         |      9916 |     10115 | H3K27me3   |         5 | -            |
| chr1         |      9951 |     10150 | H3K27me3   |         8 | -            |
| chr1         |      9978 |     10177 | H3K27me3   |         7 | -            |
+--------------+-----------+-----------+------------+-----------+--------------+
Stranded PyRanges object has 5 rows and 6 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> pr.read_bed(path, as_df=True, nrows=5)
  Chromosome  Start    End      Name  Score Strand
0       chr1   9916  10115  H3K27me3      5      -
1       chr1   9939  10138  H3K27me3      7      +
2       chr1   9951  10150  H3K27me3      8      -
3       chr1   9953  10152  H3K27me3      5      +
4       chr1   9978  10177  H3K27me3      7      -
pyranges.read_gff3(f, full=True, annotation=None, as_df=False, nrows=None)

Read files in the General Feature Format.

Parameters:
  • f (str) – Path to GFF file.

  • full (bool, default True) – Whether to read and interpret the annotation column.

  • as_df (bool, default False) – Whether to return as pandas DataFrame instead of PyRanges.

  • nrows (int, default None) – Number of rows to read. Default None, i.e. all.

Notes

The gff3 format encodes both Start and End as 1-based included. PyRanges (and also the DF returned by this function, if as_df=True), instead encodes intervals as 0-based, Start included and End excluded.

See also

pyranges.read_gtf

read files in the Gene Transfer Format

pyranges.read_gtf(f, full=True, as_df=False, nrows=None, duplicate_attr=False, rename_attr=False, ignore_bad: bool = False)

Read files in the Gene Transfer Format.

Parameters:
  • f (str) – Path to GTF file.

  • full (bool, default True) – Whether to read and interpret the annotation column.

  • as_df (bool, default False) – Whether to return as pandas DataFrame instead of PyRanges.

  • nrows (int, default None) – Number of rows to read. Default None, i.e. all.

  • duplicate_attr (bool, default False) – Whether to handle (potential) duplicate attributes or just keep last one.

  • rename_attr (bool, default False) – Whether to rename (potential) attributes with reserved column names with the suffix ‘_attr’ or to just raise an error (default)

  • ignore_bad (bool, default False) – Whether to ignore bad lines or raise an error.

Note

The GTF format encodes both Start and End as 1-based included. PyRanges (and also the DF returned by this function, if as_df=True), instead encodes intervals as 0-based, Start included and End excluded.

See also

pyranges.read_gff3

read files in the General Feature Format

Examples

>>> path = pr.get_example_path("ensembl.gtf")
>>> gr = pr.read_gtf(path)
>>> # +--------------+------------+--------------+-----------+-----------+------------+--------------+------------+-----------------+----------------+-------+
>>> # | Chromosome   | Source     | Feature      | Start     | End       | Score      | Strand       | Frame      | gene_id         | gene_version   | +18   |
>>> # | (category)   | (object)   | (category)   | (int64)   | (int64)   | (object)   | (category)   | (object)   | (object)        | (object)       | ...   |
>>> # |--------------+------------+--------------+-----------+-----------+------------+--------------+------------+-----------------+----------------+-------|
>>> # | 1            | havana     | gene         | 11868     | 14409     | .          | +            | .          | ENSG00000223972 | 5              | ...   |
>>> # | 1            | havana     | transcript   | 11868     | 14409     | .          | +            | .          | ENSG00000223972 | 5              | ...   |
>>> # | 1            | havana     | exon         | 11868     | 12227     | .          | +            | .          | ENSG00000223972 | 5              | ...   |
>>> # | 1            | havana     | exon         | 12612     | 12721     | .          | +            | .          | ENSG00000223972 | 5              | ...   |
>>> # | ...          | ...        | ...          | ...       | ...       | ...        | ...          | ...        | ...             | ...            | ...   |
>>> # | 1            | ensembl    | transcript   | 120724    | 133723    | .          | -            | .          | ENSG00000238009 | 6              | ...   |
>>> # | 1            | ensembl    | exon         | 133373    | 133723    | .          | -            | .          | ENSG00000238009 | 6              | ...   |
>>> # | 1            | ensembl    | exon         | 129054    | 129223    | .          | -            | .          | ENSG00000238009 | 6              | ...   |
>>> # | 1            | ensembl    | exon         | 120873    | 120932    | .          | -            | .          | ENSG00000238009 | 6              | ...   |
>>> # +--------------+------------+--------------+-----------+-----------+------------+--------------+------------+-----------------+----------------+-------+
>>> # Stranded PyRanges object has 95 rows and 28 columns from 1 chromosomes.
>>> # For printing, the PyRanges was sorted on Chromosome and Strand.
>>> # 18 hidden columns: gene_name, gene_source, gene_biotype, transcript_id, transcript_version, transcript_name, transcript_source, transcript_biotype, tag, transcript_support_level, ... (+ 8 more.)
pyranges.from_dict(d, int64=False)

Create a PyRanges from dict.

Parameters:
  • d (dict of array-like) – Dict with data.

  • int64 (bool, default False.) – Whether to use 64-bit integers for starts and ends.

Warning

On versions of Python prior to 3.6, this function returns a PyRanges with the columns in arbitrary order.

See also

pyranges.from_string

create a PyRanges from a multiline string.

Examples

>>> d = {"Chromosome": [1, 1, 2], "Start": [1, 2, 3], "End": [4, 9, 12], "Strand": ["+", "+", "-"], "ArbitraryValue": ["a", "b", "c"]}
>>> pr.from_dict(d)
+--------------+-----------+-----------+--------------+------------------+
|   Chromosome |     Start |       End | Strand       | ArbitraryValue   |
|   (category) |   (int64) |   (int64) | (category)   | (object)         |
|--------------+-----------+-----------+--------------+------------------|
|            1 |         1 |         4 | +            | a                |
|            1 |         2 |         9 | +            | b                |
|            2 |         3 |        12 | -            | c                |
+--------------+-----------+-----------+--------------+------------------+
Stranded PyRanges object has 3 rows and 5 columns from 2 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
pyranges.from_string(s, int64=False)

Create a PyRanges from multiline string.

Parameters:
  • s (str) – String with data.

  • int64 (bool, default False.) – Whether to use 64-bit integers for starts and ends.

See also

pyranges.from_dict

create a PyRanges from a dictionary.

Examples

>>> s = '''Chromosome      Start        End Strand
... chr1  246719402  246719502      +
... chr5   15400908   15401008      +
... chr9   68366534   68366634      +
... chr14   79220091   79220191      +
... chr14  103456471  103456571      -'''
>>> pr.from_string(s)
+--------------+-----------+-----------+--------------+
| Chromosome   |     Start |       End | Strand       |
| (category)   |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
| chr1         | 246719402 | 246719502 | +            |
| chr5         |  15400908 |  15401008 | +            |
| chr9         |  68366534 |  68366634 | +            |
| chr14        |  79220091 |  79220191 | +            |
| chr14        | 103456471 | 103456571 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 5 rows and 4 columns from 4 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
pyranges.itergrs(prs, strand=None, keys=False)

Iterate over multiple PyRanges at once.

Parameters:
  • prs (list of PyRanges) – PyRanges to iterate over.

  • strand (bool, default None, i.e. auto) – Whether to iterate over strands. If True, all PyRanges must be stranded.

  • keys (bool, default False) – Return tuple with key and value from iterator.

Examples

>>> d1 = {"Chromosome": [1, 1, 2], "Start": [1, 2, 3], "End": [4, 9, 12], "Strand": ["+", "+", "-"]}
>>> d2 = {"Chromosome": [2, 3, 3], "Start": [5, 9, 21], "End": [81, 42, 25], "Strand": ["-", "+", "-"]}
>>> gr1, gr2 = pr.from_dict(d1), pr.from_dict(d2)
>>> gr1
+--------------+-----------+-----------+--------------+
|   Chromosome |     Start |       End | Strand       |
|   (category) |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
|            1 |         1 |         4 | +            |
|            1 |         2 |         9 | +            |
|            2 |         3 |        12 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 3 rows and 4 columns from 2 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> gr2
+--------------+-----------+-----------+--------------+
|   Chromosome |     Start |       End | Strand       |
|   (category) |   (int64) |   (int64) | (category)   |
|--------------+-----------+-----------+--------------|
|            2 |         5 |        81 | -            |
|            3 |         9 |        42 | +            |
|            3 |        21 |        25 | -            |
+--------------+-----------+-----------+--------------+
Stranded PyRanges object has 3 rows and 4 columns from 2 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> ranges = [gr1, gr2]
>>> for key, dfs in pr.itergrs(ranges, keys=True):
...     print("-----------\n" + str(key) + "\n-----------")
...     for df in dfs:
...         print(df)
-----------
('1', '+')
-----------
  Chromosome  Start  End Strand
0          1      1    4      +
1          1      2    9      +
Empty DataFrame
Columns: [Chromosome, Start, End, Strand]
Index: []
-----------
('2', '-')
-----------
  Chromosome  Start  End Strand
2          2      3   12      -
  Chromosome  Start  End Strand
0          2      5   81      -
-----------
('3', '+')
-----------
Empty DataFrame
Columns: [Chromosome, Start, End, Strand]
Index: []
  Chromosome  Start  End Strand
1          3      9   42      +
-----------
('3', '-')
-----------
Empty DataFrame
Columns: [Chromosome, Start, End, Strand]
Index: []
  Chromosome  Start  End Strand
2          3     21   25      -
pyranges.random(n=1000, length=100, chromsizes=None, strand=True, int64=False, seed=None)

Return PyRanges with random intervals.

Parameters:
  • n (int, default 1000) – Number of intervals.

  • length (int, default 100) – Length of intervals.

  • chromsizes (dict or DataFrame, default None, i.e. use "hg19") – Draw intervals from within these bounds.

  • strand (bool, default True) – Data should have strand.

  • int64 (bool, default False) – Use int64 to represent Start and End.

Examples

# >>> pr.random() # +————–+———–+———–+————–+ # | Chromosome | Start | End | Strand | # | (category) | (int64) | (int64) | (category) | # |--------------+-----------+-----------+--------------| # | chr1 | 216128004 | 216128104 | + | # | chr1 | 114387955 | 114388055 | + | # | chr1 | 67597551 | 67597651 | + | # | chr1 | 26306616 | 26306716 | + | # | … | … | … | … | # | chrY | 20811459 | 20811559 | - | # | chrY | 12221362 | 12221462 | - | # | chrY | 8578041 | 8578141 | - | # | chrY | 43259695 | 43259795 | - | # +————–+———–+———–+————–+ # Stranded PyRanges object has 1,000 rows and 4 columns from 24 chromosomes. # For printing, the PyRanges was sorted on Chromosome and Strand.

To have random interval lengths:

# >>> gr = pr.random(length=1) # >>> gr.End += np.random.randint(int(1e5), size=len(gr)) # >>> gr.Length = gr.lengths() # >>> gr # +————–+———–+———–+————–+———–+ # | Chromosome | Start | End | Strand | Length | # | (category) | (int64) | (int64) | (category) | (int64) | # |--------------+-----------+-----------+--------------+-----------| # | chr1 | 203654331 | 203695380 | + | 41049 | # | chr1 | 46918271 | 46978908 | + | 60637 | # | chr1 | 97355021 | 97391587 | + | 36566 | # | chr1 | 57284999 | 57323542 | + | 38543 | # | … | … | … | … | … | # | chrY | 31665821 | 31692660 | - | 26839 | # | chrY | 20236607 | 20253473 | - | 16866 | # | chrY | 33255377 | 33315933 | - | 60556 | # | chrY | 31182964 | 31205467 | - | 22503 | # +————–+———–+———–+————–+———–+ # Stranded PyRanges object has 1,000 rows and 5 columns from 24 chromosomes. # For printing, the PyRanges was sorted on Chromosome and Strand.

pyranges.to_bigwig(gr, path, chromosome_sizes)

Write df to bigwig.

Must contain the columns Chromosome, Start, End and Score. All others are ignored.

Parameters:
  • path (str) – Where to write bigwig.

  • chromosome_sizes (PyRanges or dict) – If dict: map of chromosome names to chromosome length.

Examples

Extended example with how to prepare your data for writing bigwigs:

>>> d =  {'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [1, 4, 6],
...       'End': [7, 8, 10], 'Strand': ['+', '-', '-'],
...       'Value': [10, 20, 30]}
>>> import pyranges as pr
>>> gr = pr.from_dict(d)
>>> hg19 = pr.data.chromsizes()
>>> print(hg19)
+--------------+-----------+-----------+
| Chromosome   | Start     | End       |
| (category)   | (int64)   | (int64)   |
|--------------+-----------+-----------|
| chr1         | 0         | 249250621 |
| chr2         | 0         | 243199373 |
| chr3         | 0         | 198022430 |
| chr4         | 0         | 191154276 |
| ...          | ...       | ...       |
| chr22        | 0         | 51304566  |
| chrM         | 0         | 16571     |
| chrX         | 0         | 155270560 |
| chrY         | 0         | 59373566  |
+--------------+-----------+-----------+
Unstranded PyRanges object has 25 rows and 3 columns from 25 chromosomes.
For printing, the PyRanges was sorted on Chromosome.

Overlapping intervals are invalid in bigwigs:

>>> to_bigwig(gr, "outpath.bw", hg19)
Traceback (most recent call last):
...
AssertionError: Can only write one strand at a time. Use an unstranded PyRanges or subset on strand first.
>>> to_bigwig(gr["-"], "outpath.bw", hg19)
Traceback (most recent call last):
...
AssertionError: Intervals must not overlap.
>>> gr
+--------------+-----------+-----------+--------------+-----------+
| Chromosome   |     Start |       End | Strand       |     Value |
| (category)   |   (int64) |   (int64) | (category)   |   (int64) |
|--------------+-----------+-----------+--------------+-----------|
| chr1         |         1 |         7 | +            |        10 |
| chr1         |         4 |         8 | -            |        20 |
| chr1         |         6 |        10 | -            |        30 |
+--------------+-----------+-----------+--------------+-----------+
Stranded PyRanges object has 3 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> value = gr.to_rle(rpm=False, value_col="Value")
>>> value
chr1 +
--
+--------+-----+------+
| Runs   | 1   | 6    |
|--------+-----+------|
| Values | 0.0 | 10.0 |
+--------+-----+------+
Rle of length 7 containing 2 elements (avg. length 3.5)

chr1 -
--
+--------+-----+------+------+------+
| Runs   | 4   | 2    | 2    | 2    |
|--------+-----+------+------+------|
| Values | 0.0 | 20.0 | 50.0 | 30.0 |
+--------+-----+------+------+------+
Rle of length 10 containing 4 elements (avg. length 2.5)
RleDict object with 2 chromosomes/strand pairs.
>>> raw = gr.to_rle(rpm=False)
>>> raw
chr1 +
--
+--------+-----+-----+
| Runs   | 1   | 6   |
|--------+-----+-----|
| Values | 0.0 | 1.0 |
+--------+-----+-----+
Rle of length 7 containing 2 elements (avg. length 3.5)

chr1 -
--
+--------+-----+-----+-----+-----+
| Runs   | 4   | 2   | 2   | 2   |
|--------+-----+-----+-----+-----|
| Values | 0.0 | 1.0 | 2.0 | 1.0 |
+--------+-----+-----+-----+-----+
Rle of length 10 containing 4 elements (avg. length 2.5)
RleDict object with 2 chromosomes/strand pairs.
>>> result = (value / raw).apply_values(np.log10)
>>> result
chr1 +
--
+--------+-----+-----+
| Runs   | 1   | 6   |
|--------+-----+-----|
| Values | nan | 1.0 |
+--------+-----+-----+
Rle of length 7 containing 2 elements (avg. length 3.5)

chr1 -
--
+--------+-----+--------------------+--------------------+--------------------+
| Runs   | 4   | 2                  | 2                  | 2                  |
|--------+-----+--------------------+--------------------+--------------------|
| Values | nan | 1.3010300397872925 | 1.3979400396347046 | 1.4771212339401245 |
+--------+-----+--------------------+--------------------+--------------------+
Rle of length 10 containing 4 elements (avg. length 2.5)
RleDict object with 2 chromosomes/strand pairs.
>>> out = result.numbers_only().to_ranges()
>>> out
+--------------+-----------+-----------+-------------+--------------+
| Chromosome   |     Start |       End |       Score | Strand       |
| (category)   |   (int64) |   (int64) |   (float64) | (category)   |
|--------------+-----------+-----------+-------------+--------------|
| chr1         |         1 |         7 |     1       | +            |
| chr1         |         4 |         6 |     1.30103 | -            |
| chr1         |         6 |         8 |     1.39794 | -            |
| chr1         |         8 |        10 |     1.47712 | -            |
+--------------+-----------+-----------+-------------+--------------+
Stranded PyRanges object has 4 rows and 5 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome and Strand.
>>> to_bigwig(out["-"], "deleteme_reverse.bw", hg19)
>>> to_bigwig(out["+"], "deleteme_forward.bw", hg19)
pyranges.version_info()