Rows operations
Indexing with iloc, loc
PyRanges inherits all the indexing and slicing capabilities of pandas, e.g. boolean Series indexing,
iloc, loc, at, iat.
Note that these methods return a view, not a copy, with the caveats that it implies.
See the pandas documentation for details.
Briefly, to avoid ambiguity it is best to explicitly call copy if you want an object to not be linked
to the original object from which it was extracted. For example:
>>> import pyranges1 as pr
>>> gr = pr.example_data.aorta
>>> gr
index | Chromosome Start End Name Score Strand
int64 | category int64 int64 str int64 category
------- --- ------------ ------- ------- -------- ------- ----------
0 | chr1 9916 10115 H3K27me3 5 -
1 | chr1 9939 10138 H3K27me3 7 +
2 | chr1 9951 10150 H3K27me3 8 -
3 | chr1 9953 10152 H3K27me3 5 +
... | ... ... ... ... ... ...
7 | chr1 10127 10326 H3K27me3 1 -
8 | chr1 10241 10440 H3K27me3 6 -
9 | chr1 10246 10445 H3K27me3 4 +
10 | chr1 110246 110445 H3K27me3 1 +
PyRanges with 11 rows, 6 columns, and 1 index columns.
Contains 1 chromosomes and 2 strands.
>>> sgr = gr.iloc[0:3].copy()
>>> sgr
index | Chromosome Start End Name Score Strand
int64 | category int64 int64 str int64 category
------- --- ------------ ------- ------- -------- ------- ----------
0 | chr1 9916 10115 H3K27me3 5 -
1 | chr1 9939 10138 H3K27me3 7 +
2 | chr1 9951 10150 H3K27me3 8 -
PyRanges with 3 rows, 6 columns, and 1 index columns.
Contains 1 chromosomes and 2 strands.
>>> sgr['Score'] = 100 # does not modify gr
>>> gr.head(3)
index | Chromosome Start End Name Score Strand
int64 | category int64 int64 str int64 category
------- --- ------------ ------- ------- -------- ------- ----------
0 | chr1 9916 10115 H3K27me3 5 -
1 | chr1 9939 10138 H3K27me3 7 +
2 | chr1 9951 10150 H3K27me3 8 -
PyRanges with 3 rows, 6 columns, and 1 index columns.
Contains 1 chromosomes and 2 strands.
On the other hand, to modify even a few lines of a PyRanges object,
use loc (label-based indexing) or iloc (positional indexing) on the whole object, not on a view:
>>> gr.loc[0:2, 'Score'] = 100 # modifies gr
>>> gr.head(5)
index | Chromosome Start End Name Score Strand
int64 | category int64 int64 str int64 category
------- --- ------------ ------- ------- -------- ------- ----------
0 | chr1 9916 10115 H3K27me3 100 -
1 | chr1 9939 10138 H3K27me3 100 +
2 | chr1 9951 10150 H3K27me3 100 -
3 | chr1 9953 10152 H3K27me3 5 +
4 | chr1 9978 10177 H3K27me3 7 -
PyRanges with 5 rows, 6 columns, and 1 index columns.
Contains 1 chromosomes and 2 strands.
Using boolean indexers
In PyRanges, boolean indexers work analogously as in pandas, as already seen in the tutorial:
>>> sel = (gr['Score'] == 100)
>>> gr[sel]
index | Chromosome Start End Name Score Strand
int64 | category int64 int64 str int64 category
------- --- ------------ ------- ------- -------- ------- ----------
0 | chr1 9916 10115 H3K27me3 100 -
1 | chr1 9939 10138 H3K27me3 100 +
2 | chr1 9951 10150 H3K27me3 100 -
PyRanges with 3 rows, 6 columns, and 1 index columns.
Contains 1 chromosomes and 2 strands.
>>> gr[gr.Score==5]
index | Chromosome Start End Name Score Strand
int64 | category int64 int64 str int64 category
------- --- ------------ ------- ------- -------- ------- ----------
3 | chr1 9953 10152 H3K27me3 5 +
5 | chr1 10001 10200 H3K27me3 5 -
PyRanges with 2 rows, 6 columns, and 1 index columns.
Contains 1 chromosomes and 2 strands.
Combined with loc, boolean indexers can be used to add or update column values:
>>> gr.loc[gr['Score'] < 6, 'Score'] = -10 # modifies gr
>>> gr.head(5)
index | Chromosome Start End Name Score Strand
int64 | category int64 int64 str int64 category
------- --- ------------ ------- ------- -------- ------- ----------
0 | chr1 9916 10115 H3K27me3 100 -
1 | chr1 9939 10138 H3K27me3 100 +
2 | chr1 9951 10150 H3K27me3 100 -
3 | chr1 9953 10152 H3K27me3 -10 +
4 | chr1 9978 10177 H3K27me3 7 -
PyRanges with 5 rows, 6 columns, and 1 index columns.
Contains 1 chromosomes and 2 strands.
In pandas, these logical operators can be employed with boolean Series:
“&” = element-wise AND operator
“|” = element-wise OR operator
“~” = NOT operator, inverts the values of the Series on its right
When using logical operators, make sure to parenthesize properly.
Let’s get the + intervals with Score<8 starting before 10,000 or ending after 100,000:
>>> gr[ (gr.Score<8) & (gr.Strand=='+') &
... ((gr.Start<10000) | (gr.End>100000)) ]
index | Chromosome Start End Name Score Strand
int64 | category int64 int64 str int64 category
------- --- ------------ ------- ------- -------- ------- ----------
3 | chr1 9953 10152 H3K27me3 -10 +
10 | chr1 110246 110445 H3K27me3 -10 +
PyRanges with 2 rows, 6 columns, and 1 index columns.
Contains 1 chromosomes and 1 strands.
- Let’s invert the selection:
>>> gr[~( ... (gr.Score<8) & (gr.Strand=='+') & ... ((gr.Start<10000) | (gr.End>100000)) )] index | Chromosome Start End Name Score Strand int64 | category int64 int64 str int64 category ------- --- ------------ ------- ------- -------- ------- ---------- 0 | chr1 9916 10115 H3K27me3 100 - 1 | chr1 9939 10138 H3K27me3 100 + 2 | chr1 9951 10150 H3K27me3 100 - 4 | chr1 9978 10177 H3K27me3 7 - ... | ... ... ... ... ... ... 6 | chr1 10024 10223 H3K27me3 -10 + 7 | chr1 10127 10326 H3K27me3 -10 - 8 | chr1 10241 10440 H3K27me3 6 - 9 | chr1 10246 10445 H3K27me3 -10 + PyRanges with 9 rows, 6 columns, and 1 index columns. Contains 1 chromosomes and 2 strands.
Using PyRanges .loci
PyRanges provides the method loci
to select rows by genomic region:
>>> gr2 = pr.example_data.aorta2.sort_ranges()
>>> gr2
index | Chromosome Start End Name Score Strand
int64 | category int64 int64 str int64 category
------- --- ------------ ------- ------- ------ ------- ----------
1 | chr1 10073 10272 Input 1 +
5 | chr1 10280 10479 Input 1 +
6 | chr1 16056 16255 Input 1 +
7 | chr1 16064 16263 Input 1 +
... | ... ... ... ... ... ...
4 | chr1 10149 10348 Input 1 -
3 | chr1 10082 10281 Input 1 -
2 | chr1 10079 10278 Input 1 -
0 | chr1 9988 10187 Input 1 -
PyRanges with 10 rows, 6 columns, and 1 index columns.
Contains 1 chromosomes and 2 strands.
Various syntaxes are accepted, see its API. For example:
>>> gr2.loci['-'] # get all rows with strand '-' index | Chromosome Start End Name Score Strand int64 | category int64 int64 str int64 category ------- --- ------------ ------- ------- ------ ------- ---------- 9 | chr1 19958 20157 Input 1 - 4 | chr1 10149 10348 Input 1 - 3 | chr1 10082 10281 Input 1 - 2 | chr1 10079 10278 Input 1 - 0 | chr1 9988 10187 Input 1 - PyRanges with 5 rows, 6 columns, and 1 index columns. Contains 1 chromosomes and 1 strands.>>> gr2.loci['chr1', '+'] # get all rows with chromosome 'chr1' and strand '+' index | Chromosome Start End Name Score Strand int64 | category int64 int64 str int64 category ------- --- ------------ ------- ------- ------ ------- ---------- 1 | chr1 10073 10272 Input 1 + 5 | chr1 10280 10479 Input 1 + 6 | chr1 16056 16255 Input 1 + 7 | chr1 16064 16263 Input 1 + 8 | chr1 16109 16308 Input 1 + PyRanges with 5 rows, 6 columns, and 1 index columns. Contains 1 chromosomes and 1 strands.>>> gr2.loci['chr1', 10000:11000] # get all rows on 'chr1' and overlapping 10000:11000 index | Chromosome Start End Name Score Strand int64 | category int64 int64 str int64 category ------- --- ------------ ------- ------- ------ ------- ---------- 1 | chr1 10073 10272 Input 1 + 5 | chr1 10280 10479 Input 1 + 4 | chr1 10149 10348 Input 1 - 3 | chr1 10082 10281 Input 1 - 2 | chr1 10079 10278 Input 1 - 0 | chr1 9988 10187 Input 1 - PyRanges with 6 rows, 6 columns, and 1 index columns. Contains 1 chromosomes and 2 strands.>>> gr2.loci['chr1', '+', 10000:11000] # get all rows on 'chr1', strand '+', and overlapping 10000:11000 index | Chromosome Start End Name Score Strand int64 | category int64 int64 str int64 category ------- --- ------------ ------- ------- ------ ------- ---------- 1 | chr1 10073 10272 Input 1 + 5 | chr1 10280 10479 Input 1 + PyRanges with 2 rows, 6 columns, and 1 index columns. Contains 1 chromosomes and 1 strands.Loci also support assignment. In this case, you must provide a DataFrame with the same shape as the selection:
>>> gr2.loci['chr1', '+', 10000:11000] = gr2.loci['chr1', '+', 10000:11000].copy().assign(Score=100) >>> gr2.loci['chr1', '+', 10000:11000] # see below that the Score was altered index | Chromosome Start End Name Score Strand int64 | category int64 int64 str int64 category ------- --- ------------ ------- ------- ------ ------- ---------- 1 | chr1 10073 10272 Input 100 + 5 | chr1 10280 10479 Input 100 + PyRanges with 2 rows, 6 columns, and 1 index columns. Contains 1 chromosomes and 1 strands.For more flexible assignment, you can use
locand provide use theindexattribute oflocioutput:>>> sindex=gr2.loci['chr1', '+', 16000:17000].index >>> gr2.loc[sindex, "Score"]=150 >>> gr2 index | Chromosome Start End Name Score Strand int64 | category int64 int64 str int64 category ------- --- ------------ ------- ------- ------ ------- ---------- 1 | chr1 10073 10272 Input 100 + 5 | chr1 10280 10479 Input 100 + 6 | chr1 16056 16255 Input 150 + 7 | chr1 16064 16263 Input 150 + ... | ... ... ... ... ... ... 4 | chr1 10149 10348 Input 1 - 3 | chr1 10082 10281 Input 1 - 2 | chr1 10079 10278 Input 1 - 0 | chr1 9988 10187 Input 1 - PyRanges with 10 rows, 6 columns, and 1 index columns. Contains 1 chromosomes and 2 strands.
Sorting PyRanges
PyRanges objects can be sorted (i.e. altering the order of rows) by calling the pandas dataframe method sort_values,
or the PyRanges method sort_ranges.
>>> import random; random.seed(1)
>>> c = pr.example_data.chipseq.remove_nonloc_columns()
>>> c['peak'] = [random.randint(0, 1000) for _ in range(len(c))] # add a column with random values
>>> c
index | Chromosome Start End Strand peak
int64 | category int64 int64 category int64
------- --- ------------ --------- --------- ---------- -------
0 | chr8 28510032 28510057 - 137
1 | chr7 107153363 107153388 - 582
2 | chr5 135821802 135821827 - 867
3 | chr14 19418999 19419024 - 821
... | ... ... ... ... ...
16 | chr9 120803448 120803473 + 96
17 | chr6 89296757 89296782 - 499
18 | chr1 194245558 194245583 + 29
19 | chr8 57916061 57916086 + 914
PyRanges with 20 rows, 5 columns, and 1 index columns.
Contains 15 chromosomes and 2 strands.
Pandas sort_values sorts the whole dataframe by the specified columns. See its API for details.
For example, let’s sort by column peak:
>>> c.sort_values(by='peak', ascending=False)
index | Chromosome Start End Strand peak
int64 | category int64 int64 category int64
------- --- ------------ --------- --------- ---------- -------
19 | chr8 57916061 57916086 + 914
2 | chr5 135821802 135821827 - 867
3 | chr14 19418999 19419024 - 821
14 | chr2 152562484 152562509 - 807
... | ... ... ... ... ...
7 | chr19 19571102 19571127 + 120
16 | chr9 120803448 120803473 + 96
5 | chr21 40099618 40099643 + 64
18 | chr1 194245558 194245583 + 29
PyRanges with 20 rows, 5 columns, and 1 index columns.
Contains 15 chromosomes and 2 strands.
PyRanges sort_ranges is designed for genomic ranges.
By default, it sorts by Chromosome, Strand, then interval coordinates. If Strands are valid (
see strand_valid), then intervals on the reverse strand are
sorted in reverse order:
>>> c.sort_ranges()
index | Chromosome Start End Strand peak
int64 | category int64 int64 category int64
------- --- ------------ --------- --------- ---------- -------
12 | chr1 38457520 38457545 + 667
18 | chr1 194245558 194245583 + 29
13 | chr1 80668132 80668157 - 388
14 | chr2 152562484 152562509 - 807
... | ... ... ... ... ...
4 | chr12 106679761 106679786 - 782
3 | chr14 19418999 19419024 - 821
7 | chr19 19571102 19571127 + 120
5 | chr21 40099618 40099643 + 64
PyRanges with 20 rows, 5 columns, and 1 index columns.
Contains 15 chromosomes and 2 strands.
Above, chr8 appears before chr10 because of ‘natural sorting’. We can force alphabetical sorting instead:
>>> c.sort_ranges(natsort=False)
index | Chromosome Start End Strand peak
int64 | category int64 int64 category int64
------- --- ------------ --------- --------- ---------- -------
12 | chr1 38457520 38457545 + 667
18 | chr1 194245558 194245583 + 29
13 | chr1 80668132 80668157 - 388
9 | chr10 35419784 35419809 - 779
... | ... ... ... ... ...
19 | chr8 57916061 57916086 + 914
0 | chr8 28510032 28510057 - 137
6 | chr8 22714402 22714427 - 261
16 | chr9 120803448 120803473 + 96
PyRanges with 20 rows, 5 columns, and 1 index columns.
Contains 15 chromosomes and 2 strands.
To sort by a different column, use the first argument (by). This is used after Chromosome and Strand, but before
coordinates:
>>> c.sort_ranges('peak')
index | Chromosome Start End Strand peak
int64 | category int64 int64 category int64
------- --- ------------ --------- --------- ---------- -------
18 | chr1 194245558 194245583 + 29
12 | chr1 38457520 38457545 + 667
13 | chr1 80668132 80668157 - 388
14 | chr2 152562484 152562509 - 807
... | ... ... ... ... ...
4 | chr12 106679761 106679786 - 782
3 | chr14 19418999 19419024 - 821
7 | chr19 19571102 19571127 + 120
5 | chr21 40099618 40099643 + 64
PyRanges with 20 rows, 5 columns, and 1 index columns.
Contains 15 chromosomes and 2 strands.
To use a different priorization of genomic location columns, specify them in the first argument (by):
>>> c.sort_ranges(['peak', 'Chromosome', 'Strand'])
index | Chromosome Start End Strand peak
int64 | category int64 int64 category int64
------- --- ------------ --------- --------- ---------- -------
18 | chr1 194245558 194245583 + 29
12 | chr1 38457520 38457545 + 667
13 | chr1 80668132 80668157 - 388
14 | chr2 152562484 152562509 - 807
... | ... ... ... ... ...
4 | chr12 106679761 106679786 - 782
3 | chr14 19418999 19419024 - 821
7 | chr19 19571102 19571127 + 120
5 | chr21 40099618 40099643 + 64
PyRanges with 20 rows, 5 columns, and 1 index columns.
Contains 15 chromosomes and 2 strands.