Columns operations

Fetching or writing a column

Most column operations are analogous to pandas. A single PyRanges column (which are pandas Series) can be extracted through the dot notation, when reading it:

>>> import pyranges1 as pr
>>> gr = pr.example_data.chipseq
>>> gr
index    |    Chromosome    Start      End        Name    Score    Strand
int64    |    category      int64      int64      str     int64    category
-------  ---  ------------  ---------  ---------  ------  -------  ----------
0        |    chr8          28510032   28510057   U0      0        -
1        |    chr7          107153363  107153388  U0      0        -
2        |    chr5          135821802  135821827  U0      0        -
3        |    chr14         19418999   19419024   U0      0        -
...      |    ...           ...        ...        ...     ...      ...
16       |    chr9          120803448  120803473  U0      0        +
17       |    chr6          89296757   89296782   U0      0        -
18       |    chr1          194245558  194245583  U0      0        +
19       |    chr8          57916061   57916086   U0      0        +
PyRanges with 20 rows, 6 columns, and 1 index columns.
Contains 15 chromosomes and 2 strands.
>>> gr.Chromosome.head()
0     chr8
1     chr7
2     chr5
3    chr14
4    chr12
Name: Chromosome, dtype: category
Categories (15, str): ['chr1', 'chr10', 'chr11', 'chr12', ..., 'chr6', 'chr7', 'chr8', 'chr9']
>>> ( (gr.End + gr.Start)/2 ).head()
0     28510044.5
1    107153375.5
2    135821814.5
3     19419011.5
4    106679773.5
dtype: float64

The gr[column_name] syntax also extracts a column from a PyRanges object:

>>> gr['Chromosome'].head()
0     chr8
1     chr7
2     chr5
3    chr14
4    chr12
Name: Chromosome, dtype: category
Categories (15, str): ['chr1', 'chr10', 'chr11', 'chr12', ..., 'chr6', 'chr7', 'chr8', 'chr9']

The gr[column_name] syntax is the only one accepted for assignment (i.e. create or edit a column):

>>> gr['newchr'] = gr['Chromosome'].str.replace('chr', '')
>>> gr
index    |    Chromosome    Start      End        Name    Score    Strand      newchr
int64    |    category      int64      int64      str     int64    category    str
-------  ---  ------------  ---------  ---------  ------  -------  ----------  --------
0        |    chr8          28510032   28510057   U0      0        -           8
1        |    chr7          107153363  107153388  U0      0        -           7
2        |    chr5          135821802  135821827  U0      0        -           5
3        |    chr14         19418999   19419024   U0      0        -           14
...      |    ...           ...        ...        ...     ...      ...         ...
16       |    chr9          120803448  120803473  U0      0        +           9
17       |    chr6          89296757   89296782   U0      0        -           6
18       |    chr1          194245558  194245583  U0      0        +           1
19       |    chr8          57916061   57916086   U0      0        +           8
PyRanges with 20 rows, 7 columns, and 1 index columns.
Contains 15 chromosomes and 2 strands.

Extracting multiple columns

As in pandas, you can extract a dataframe with a subset of columns by indexing it with list of column names:

>>> gr[ ['Chromosome', 'Start'] ].head()
  Chromosome      Start
0       chr8   28510032
1       chr7  107153363
2       chr5  135821802
3      chr14   19418999
4      chr12  106679761

When the resulting dataframe has all required genomic location columns (Chromosome, Start, End), then a PyRanges is returned:

>>> gr[ ['Chromosome', 'Start', 'End', 'Name'] ].head()
  index  |    Chromosome        Start        End  Name
  int64  |    category          int64      int64  str
-------  ---  ------------  ---------  ---------  ------
      0  |    chr8           28510032   28510057  U0
      1  |    chr7          107153363  107153388  U0
      2  |    chr5          135821802  135821827  U0
      3  |    chr14          19418999   19419024  U0
      4  |    chr12         106679761  106679786  U0
PyRanges with 5 rows, 4 columns, and 1 index columns.
Contains 5 chromosomes.

The method get_with_loc_columns is a shortcut to extract any column together with the genomic location columns:

>>> gr.get_with_loc_columns('Name').head()
  index  |    Chromosome        Start        End  Strand      Name
  int64  |    category          int64      int64  category    str
-------  ---  ------------  ---------  ---------  ----------  ------
      0  |    chr8           28510032   28510057  -           U0
      1  |    chr7          107153363  107153388  -           U0
      2  |    chr5          135821802  135821827  -           U0
      3  |    chr14          19418999   19419024  -           U0
      4  |    chr12         106679761  106679786  -           U0
PyRanges with 5 rows, 5 columns, and 1 index columns.
Contains 5 chromosomes and 1 strands.
>>> gr.get_with_loc_columns(['Name', 'Score']).head()
  index  |    Chromosome        Start        End  Strand      Name      Score
  int64  |    category          int64      int64  category    str       int64
-------  ---  ------------  ---------  ---------  ----------  ------  -------
      0  |    chr8           28510032   28510057  -           U0            0
      1  |    chr7          107153363  107153388  -           U0            0
      2  |    chr5          135821802  135821827  -           U0            0
      3  |    chr14          19418999   19419024  -           U0            0
      4  |    chr12         106679761  106679786  -           U0            0
PyRanges with 5 rows, 6 columns, and 1 index columns.
Contains 5 chromosomes and 1 strands.

Dropping columns

Alternatively, you can specify which columns to remove with the pandas dataframe drop method. Again, a PyRanges object is returned only if genomic location columns are maintained:

>>> gr.drop('Name', axis=1)
index    |    Chromosome    Start      End        Score    Strand      newchr
int64    |    category      int64      int64      int64    category    str
-------  ---  ------------  ---------  ---------  -------  ----------  --------
0        |    chr8          28510032   28510057   0        -           8
1        |    chr7          107153363  107153388  0        -           7
2        |    chr5          135821802  135821827  0        -           5
3        |    chr14         19418999   19419024   0        -           14
...      |    ...           ...        ...        ...      ...         ...
16       |    chr9          120803448  120803473  0        +           9
17       |    chr6          89296757   89296782   0        -           6
18       |    chr1          194245558  194245583  0        +           1
19       |    chr8          57916061   57916086   0        +           8
PyRanges with 20 rows, 6 columns, and 1 index columns.
Contains 15 chromosomes and 2 strands.
>>> gr.drop(['Name', 'Chromosome', 'newchr'], axis=1).head()
       Start        End  Score Strand
0   28510032   28510057      0      -
1  107153363  107153388      0      -
2  135821802  135821827      0      -
3   19418999   19419024      0      -
4  106679761  106679786      0      -

The PyRanges method remove_strand is a shortcut to remove the Strand column:

>>> gr.remove_strand().head()
  index  |    Chromosome        Start        End  Name      Score    newchr
  int64  |    category          int64      int64  str       int64       str
-------  ---  ------------  ---------  ---------  ------  -------  --------
      0  |    chr8           28510032   28510057  U0            0         8
      1  |    chr7          107153363  107153388  U0            0         7
      2  |    chr5          135821802  135821827  U0            0         5
      3  |    chr14          19418999   19419024  U0            0        14
      4  |    chr12         106679761  106679786  U0            0        12
PyRanges with 5 rows, 6 columns, and 1 index columns.
Contains 5 chromosomes.