pyranges.get_fasta

Module Contents

Functions

get_fasta(gr, path=None, pyfaidx_fasta=None)

Get fasta sequence.

pyranges.get_fasta.get_fasta(gr, path=None, pyfaidx_fasta=None)

Get fasta sequence.

Parameters
  • gr (PyRanges) – Coordinates.

  • path (str) – Path to fasta file. It will be indexed using pyfaidx if an index is not found

  • pyfaidx_fasta (pyfaidx.Fasta) – Alternative method to provide fasta target, as a pyfaidx.Fasta object

Returns

Sequences, one per interval.

Return type

Series

Note

Sorting the PyRanges is likely to improve the speed. Intervals on the negative strand will be reverse complemented.

Warning

Note that the names in the fasta header and gr must be the same.

Examples

>>> gr = pr.from_dict({"Chromosome": ["chr1", "chr1"],
...                    "Start": [5, 0], "End": [8, 5]})
>>> gr
+--------------+-----------+-----------+
| Chromosome   |     Start |       End |
| (category)   |   (int32) |   (int32) |
|--------------+-----------+-----------|
| chr1         |         5 |         8 |
| chr1         |         0 |         5 |
+--------------+-----------+-----------+
Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.
>>> tmp_handle = open("temp.fasta", "w+")
>>> _ = tmp_handle.write("> chr1\n")
>>> _ = tmp_handle.write("ATTACCAT")
>>> tmp_handle.close()
>>> seq = pr.get_fasta(gr, "temp.fasta")
>>> seq
0      CAT
1    ATTAC
dtype: object
>>> gr.seq = seq
>>> gr
+--------------+-----------+-----------+------------+
| Chromosome   |     Start |       End | seq        |
| (category)   |   (int32) |   (int32) | (object)   |
|--------------+-----------+-----------+------------|
| chr1         |         5 |         8 | CAT        |
| chr1         |         0 |         5 | ATTAC      |
+--------------+-----------+-----------+------------+
Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes.
For printing, the PyRanges was sorted on Chromosome.