Writing to disk
The PyRanges can be written to several formats, namely csv, gtf, gff3 and bigwig. If no path-argument is given, the string representation of the data is returned. (It may potentially be very large.) If a path is given, it is taken as the path to the file to be written; in this case, the return value is the object itself, to allow inserting write methods into method call chains.
Writing genomic formats
Pyranges supports the most popular for genomic annotations, such as bed, gtf and gff3.
You can readily write them using the correspondent methods (see
to_bed,
to_gtf,
to_gff3).
>>> import pyranges1 as pr
>>> gr = pr.example_data.chipseq
>>> gr.to_gtf("chipseq.gtf")
>>> #file chipseq.gtf has been created
Methods to_gff3 and to_gtf have a default mapping of PyRanges columns to GFF/GTF fields. All extra (“metadata”) columns are put in the last field:
>>> gr['Label']='something'
>>> print(gr.head().to_gtf())
chr8 . . 28510033 28510057 0 - . Name "U0"; Label "something";
chr7 . . 107153364 107153388 0 - . Name "U0"; Label "something";
chr5 . . 135821803 135821827 0 - . Name "U0"; Label "something";
chr14 . . 19419000 19419024 0 - . Name "U0"; Label "something";
chr12 . . 106679762 106679786 0 - . Name "U0"; Label "something";
Such mapping, as well as which attribute(s) are included as last field, can be altered. See the API for details.
The bigwig format differs substantially from the formats above. Bigwig is a binary format, and it is typically used for large continuous quantitative data along a genome sequence.
The pyranges1 library can also create bigwigs, but it needs the library pybigwig and pyrle which are not installed by default.
Use this to install it:
pip install pybigwig pyrle
The bigwig writer needs to know the chromosome sizes, e.g. provided as a dictionary {chromosome_name: size}. You can also derive chromosome sizes from a fasta file using pyfaidx (see above to install it).
>>> import pyfaidx
>>> chromsizes=pyfaidx.Fasta('your_genome.fa')
Once you obtained chromosome sizes, you are ready to write your PyRanges object to a bigwig file:
>>> gr.to_bigwig("chipseq.bw", chromsizes)
>>> # file chipseq.bw has been created
Bigwig is typically used to represent a coverage of some type. To compute it from an arbitrary value column, use the value_col argument. See the API for additional options. If you want to write one bigwig for each strand, you need to do it manually.
>>> gr.loci["+"].to_bigwig("chipseq_plus.bw", chromsizes)
>>> gr.loci["-"].to_bigwig("chipseq_minus.bw", chromsizes)
Writing tabular formats
The csv format is the most flexible format, as it allows for any column to be included, and any separator to be used.
The method to_csv is directly inherited by pandas, so search for its API for details.
The to_csv method takes the arguments header and sep:
>>> print(gr.drop(['Label'], axis=1).head().to_csv(sep="\t", header=False, index=False))
chr8 28510032 28510057 U0 0 -
chr7 107153363 107153388 U0 0 -
chr5 135821802 135821827 U0 0 -
chr14 19418999 19419024 U0 0 -
chr12 106679761 106679786 U0 0 -
Remember that to_csv will not alter coordinates, so the output
will have the same pythonic convention as PyRanges. Adjust accordingly if needed.