Writing to disk

The PyRanges can be written to several formats, namely csv, gtf, gff3 and bigwig. If no path-argument is given, the string representation of the data is returned. (It may potentially be very large.) If a path is given, it is taken as the path to the file to be written; in this case, the return value is the object itself, to allow inserting write methods into method call chains.

Writing genomic formats

Pyranges supports the most popular for genomic annotations, such as bed, gtf and gff3. You can readily write them using the correspondent methods (see to_bed, to_gtf, to_gff3).

>>> import pyranges1 as pr
>>> gr = pr.example_data.chipseq
>>> gr.to_gtf("chipseq.gtf")
>>> #file chipseq.gtf has been created

Methods to_gff3 and to_gtf have a default mapping of PyRanges columns to GFF/GTF fields. All extra (“metadata”) columns are put in the last field:

>>> gr['Label']='something'
>>> print(gr.head().to_gtf()) 
chr8  .       .       28510033        28510057        0       -       .       Name "U0"; Label "something";
chr7  .       .       107153364       107153388       0       -       .       Name "U0"; Label "something";
chr5  .       .       135821803       135821827       0       -       .       Name "U0"; Label "something";
chr14 .       .       19419000        19419024        0       -       .       Name "U0"; Label "something";
chr12 .       .       106679762       106679786       0       -       .       Name "U0"; Label "something";

Such mapping, as well as which attribute(s) are included as last field, can be altered. See the API for details.

The bigwig format differs substantially from the formats above. Bigwig is a binary format, and it is typically used for large continuous quantitative data along a genome sequence.

The pyranges1 library can also create bigwigs, but it needs the library pybigwig and pyrle which are not installed by default.

Use this to install it:

pip install pybigwig pyrle

The bigwig writer needs to know the chromosome sizes, e.g. provided as a dictionary {chromosome_name: size}. You can also derive chromosome sizes from a fasta file using pyfaidx (see above to install it).

>>> import pyfaidx
>>> chromsizes=pyfaidx.Fasta('your_genome.fa') 

Once you obtained chromosome sizes, you are ready to write your PyRanges object to a bigwig file:

>>> gr.to_bigwig("chipseq.bw", chromsizes) 
>>> # file chipseq.bw has been created

Bigwig is typically used to represent a coverage of some type. To compute it from an arbitrary value column, use the value_col argument. See the API for additional options. If you want to write one bigwig for each strand, you need to do it manually.

>>> gr.loci["+"].to_bigwig("chipseq_plus.bw", chromsizes) 
>>> gr.loci["-"].to_bigwig("chipseq_minus.bw", chromsizes) 

Writing tabular formats

The csv format is the most flexible format, as it allows for any column to be included, and any separator to be used. The method to_csv is directly inherited by pandas, so search for its API for details.

The to_csv method takes the arguments header and sep:

>>> print(gr.drop(['Label'], axis=1).head().to_csv(sep="\t", header=False, index=False)) 
chr8  28510032        28510057        U0      0       -
chr7  107153363       107153388       U0      0       -
chr5  135821802       135821827       U0      0       -
chr14 19418999        19419024        U0      0       -
chr12 106679761       106679786       U0      0       -

Remember that to_csv will not alter coordinates, so the output will have the same pythonic convention as PyRanges. Adjust accordingly if needed.