Migration guide from v0 to v1

Pyranges v1 introduced major changes in data structure and interface. This guide is intended to help you migrate your code from v0 to v1.

Useful links:

In v0, PyRanges objects were implemented as a dictionary of DataFrames, one for each chromosome or for each (chromosome, strand) pair if strand was present and valid. This implied that intervals on different chromosomes had no inherent order, and operations on each chromosome were independent. In practical uses, PyRanges had to be often converted to a single DataFrame to access the full functionality of pandas, using the df or as_df attributes, which was inefficient and cumbersome. Moreover, performance was not very low on datasets comprising many “chromosomes” such as transcriptomic-based intervals.

In v1, PyRanges objects are implemented as a DataFrame subclass, which allows for more efficient operations and direct access to pandas methods. This required a major interface change, since some methods and attributes existed with the same name in both PyRanges and DataFrame, and the new implementation had to be consistent with the DataFrame. Ultimately, we took advantage of the necessity of the change to redesign the interface to be more consistent and maintainable in the future.

Below, we provide a cheatsheet to help you migrate your code from v0 to v1. The most problematic aspects are the get/set item methods, since the v0 syntax is not compatible with dataframes. See here a discussion on the topic.

In the table below, pr refers to the pyranges1 module, and g to a PyRanges object. Most items are linked to the corresponding documentation page.

Migration cheatsheet

v0

v1

caveats

Module methods:

pr.from_dict(d)

pr.PyRanges(d)

pr.from_string

pr.from_string

string representation has changed

pr.genomicfeatures.genome_bounds(g, …)

g.genome_bounds

pr.genomicfeatures.tile_genome

pr.tile_genome

pr.get_fasta.get_sequence(g, …)

g.get_sequence

pr.get_fasta(g, …)

g.get_sequence

pr.get_sequence(g, …)

g.get_sequence

pr.get_transcript_sequence(g, …)

g.get_transcript_sequence

pr.itergrs

REMOVED

pr.multioverlap.count_overlaps

pr.count_overlaps

pr.statistics.fdr

pr.stats.fdr

pr.statistics.fisher_exact

pr.stats.fisher_exact

pr.statistics.mcc

pr.stats.mcc

pr.statistics.rowbased_pearson

pr.stats.rowbased_pearson

pr.statistics.rowbased_rankdata

pr.stats.rowbased_rankdata

pr.statistics.rowbased_spearman

pr.stats.rowbased_spearman

pr.statistics.simes

pr.stats.simes

pr.to_bigwig

g.to_bigwig

PyRanges properties:

g.as_df

g.copy()

g.df

g.copy()

g.dfs

REMOVED

g.features

REMOVED

functions retained and moved: pr.tile_genome (same name) and pr.genome_bounds –> g.clip_ranges (now it clips by default instead of removing)

g.stats

pr.stats

g.stats.func_name() now becomes pr.stats.func_name(g)

g.stranded

g.strand_valid

g.strands

g.Strand.drop_duplicates()

Get/set syntaxes:

g.COLNAME

g['COLNAME']

inherited by pandas DataFrame

g.COLNAME = …

g['COLNAME'] = ...

inherited by pandas DataFrame

g[ [COL1, COL2] ]

g.get_with_loc_columns([COL1, COL2])

old syntax triggers pandas behavior: will only return requested columns

g[chrom, strand, slice]

g.loci[chrom, strand, slice]

flexible syntax

for k in g: …

REMOVED

PyRanges methods:

g,set_intersect

g,set_intersect_overlaps

args changed

g.apply

g.apply_single

args changed; and old syntax triggers pandas method

g.apply_chunks

REMOVED

g.apply_pair

g.apply_pair

args changed

g.assign

g.assign

inherited by pandas DataFrame

g.calculate_frame

pr.orfs.calculate_frame

now a copy is returned

g.cluster

g.cluster_overlaps

args changed; and use slack +1 for old behavior

g.count_overlaps

g.count_overlaps

args changed

g.coverage

g.count_overlaps(..., calculate_coverage=True)

args changed

g.drop_duplicate_positions

g.drop_duplicates( g.loc_columns+['Start', 'End'])

g.drop(cols)

g.drop(columns=cols)

inherited by pandas DataFrame

g.extend

g.extend_ranges

args changed

g.insert

g.insert

inherited by pandas DataFrame

g.intersect

g.intersect_overlaps

args changed

g.items

REMOVED

g.join

g.join_overlaps

args changed; and old syntax triggers unrelated pandas method

g.k_nearest

g.nearest_ranges

args changed; and only k=1 now supported

g.keys

REMOVED

g.max_disjoint

g.max_disjoint_overlaps

args changed; and use slack +1 for old behavior

g.merge

g.merge_overlaps

args changed; and use slack +1 for old behavior

g.mp

REMOVED

g.mpc

REMOVED

g.msp

REMOVED

g.mspc

REMOVED

g.nearest

g.nearest_ranges

args changed

g.new_position

g.combine_interval_columns

args changed

g.overlap

g.overlap

args changed

g.pc

REMOVED

g.print

REMOVED

g.rp

REMOVED

g.rpc

REMOVED

g.sample

g.sample

inherited by pandas DataFrame

g.set_union

g.set_union_overlaps

args changed

g.sort

g,sort_ranges

args changed

g.sp

REMOVED

g.spc

REMOVED

g.spliced_subsequence

g.slice_ranges

args changed;

g.split

g.split_overlaps

args changed

g.subsequence

g.slice_ranges

with count_introns=True

g.subset(fn)

g[fn]

g.subtract

g.subtract_overlaps

args changed; and old syntax triggers unrelated pandas method

g.tile

g.tile_ranges

args changed

g.to_csv

g.to_csv

inherited by pandas DataFrame

g.to_example

REMOVED

g.unstrand

g.remove_strand

g.values

REMOVED

g.window

g.window_ranges

args change