Migration guide from v0 to v1

Pyranges v1 introduced major changes in data structure and interface. This guide is intended to help you migrate your code from v0 to v1.

Useful links:

Documentation for v0

Github repository for v0

In v0, PyRanges objects were implemented as a dictionary of DataFrames, one for each chromosome or for each (chromosome, strand) pair if strand was present and valid. This implied that intervals on different chromosomes had no inherent order, and operations on each chromosome were independent. In practical uses, PyRanges had to be often converted to a single DataFrame to access the full functionality of pandas, using the df or as_df attributes, which was inefficient and cumbersome. Moreover, performance was not very low on datasets comprising many “chromosomes” such as transcriptomic-based intervals.

In v1, PyRanges objects are implemented as a DataFrame subclass, which allows for more efficient operations and direct access to pandas methods. This required a major interface change, since some methods and attributes existed with the same name in both PyRanges and DataFrame, and the new implementation had to be consistent with the DataFrame. Ultimately, we took advantage of the necessity of the change to redesign the interface to be more consistent and maintainable in the future.

Below, we provide a cheatsheet to help you migrate your code from v0 to v1. The most problematic aspects are the get/set item methods, since the v0 syntax is not compatible with dataframes. See here a discussion on the topic.

In the table below, pr refers to the pyranges1 module, and g to a PyRanges object. Most items are linked to the corresponding documentation page.

Migration cheatsheet
v0	v1	caveats
Module methods:
pr.from_dict(d)	`pr.PyRanges(d)`
pr.from_string	`pr.from_string`	string representation has changed
pr.genomicfeatures.genome_bounds(g, …)	`g.genome_bounds`
pr.genomicfeatures.tile_genome	`pr.tile_genome`
pr.get_fasta.get_sequence(g, …)	`g.get_sequence`
pr.get_fasta(g, …)	`g.get_sequence`
pr.get_sequence(g, …)	`g.get_sequence`
pr.get_transcript_sequence(g, …)	`g.get_transcript_sequence`
pr.itergrs	`REMOVED`
pr.multioverlap.count_overlaps	`pr.count_overlaps`
pr.statistics.fdr	`pr.stats.fdr`
pr.statistics.fisher_exact	`pr.stats.fisher_exact`
pr.statistics.mcc	`pr.stats.mcc`
pr.statistics.rowbased_pearson	`pr.stats.rowbased_pearson`
pr.statistics.rowbased_rankdata	`pr.stats.rowbased_rankdata`
pr.statistics.rowbased_spearman	`pr.stats.rowbased_spearman`
pr.statistics.simes	`pr.stats.simes`
pr.to_bigwig	`g.to_bigwig`
PyRanges properties:
g.as_df	`g.copy()`
g.df	`g.copy()`
g.dfs	`REMOVED`
g.features	`REMOVED`	functions retained and moved: pr.tile_genome (same name) and pr.genome_bounds –> g.clip_ranges (now it clips by default instead of removing)
g.stats	`pr.stats`	g.stats.func_name() now becomes pr.stats.func_name(g)
g.stranded	`g.strand_valid`
g.strands	`g.Strand.drop_duplicates()`
Get/set syntaxes:
g.COLNAME	`g['COLNAME']`	inherited by pandas DataFrame
g.COLNAME = …	`g['COLNAME'] = ...`	inherited by pandas DataFrame
g[ [COL1, COL2] ]	`g.get_with_loc_columns([COL1, COL2])`	old syntax triggers pandas behavior: will only return requested columns
g[chrom, strand, slice]	`g.loci[chrom, strand, slice]`	flexible syntax
for k in g: …	`REMOVED`
PyRanges methods:
g,set_intersect	`g,set_intersect_overlaps`	args changed
g.apply	`g.apply_single`	args changed; and old syntax triggers pandas method
g.apply_chunks	`REMOVED`
g.apply_pair	`g.apply_pair`	args changed
g.assign	`g.assign`	inherited by pandas DataFrame
g.calculate_frame	`pr.orfs.calculate_frame`	now a copy is returned
g.cluster	`g.cluster_overlaps`	args changed; and use slack +1 for old behavior
g.count_overlaps	`g.count_overlaps`	args changed
g.coverage	`g.count_overlaps(..., calculate_coverage=True)`	args changed
g.drop_duplicate_positions	`g.drop_duplicates( g.loc_columns+['Start', 'End'])`
g.drop(cols)	`g.drop(columns=cols)`	inherited by pandas DataFrame
g.extend	`g.extend_ranges`	args changed
g.insert	`g.insert`	inherited by pandas DataFrame
g.intersect	`g.intersect_overlaps`	args changed
g.items	`REMOVED`
g.join	`g.join_overlaps`	args changed; and old syntax triggers unrelated pandas method
g.k_nearest	`g.nearest_ranges`	args changed; and only k=1 now supported
g.keys	`REMOVED`
g.max_disjoint	`g.max_disjoint_overlaps`	args changed; and use slack +1 for old behavior
g.merge	`g.merge_overlaps`	args changed; and use slack +1 for old behavior
g.mp	`REMOVED`
g.mpc	`REMOVED`
g.msp	`REMOVED`
g.mspc	`REMOVED`
g.nearest	`g.nearest_ranges`	args changed
g.new_position	`g.combine_interval_columns`	args changed
g.overlap	`g.overlap`	args changed
g.pc	`REMOVED`
g.print	`REMOVED`
g.rp	`REMOVED`
g.rpc	`REMOVED`
g.sample	`g.sample`	inherited by pandas DataFrame
g.set_union	`g.set_union_overlaps`	args changed
g.sort	`g,sort_ranges`	args changed
g.sp	`REMOVED`
g.spc	`REMOVED`
g.spliced_subsequence	`g.slice_ranges`	args changed;
g.split	`g.split_overlaps`	args changed
g.subsequence	`g.slice_ranges`	with count_introns=True
g.subset(fn)	`g[fn]`
g.subtract	`g.subtract_overlaps`	args changed; and old syntax triggers unrelated pandas method
g.tile	`g.tile_ranges`	args changed
g.to_csv	`g.to_csv`	inherited by pandas DataFrame
g.to_example	`REMOVED`
g.unstrand	`g.remove_strand`
g.values	`REMOVED`
g.window	`g.window_ranges`	args change