:py:mod:`pyranges` ================== .. py:module:: pyranges Submodules ---------- .. toctree:: :titlesonly: :maxdepth: 1 data/index.rst genomicfeatures/index.rst get_fasta/index.rst helpers/index.rst multioverlap/index.rst multithreaded/index.rst pyranges_main/index.rst readers/index.rst statistics/index.rst stats/index.rst Package Contents ---------------- Classes ~~~~~~~ .. autoapisummary:: pyranges.PyRanges Functions ~~~~~~~~~ .. autoapisummary:: pyranges.count_overlaps pyranges.read_bam pyranges.read_bed pyranges.read_gff3 pyranges.read_gtf pyranges.from_dict pyranges.from_string pyranges.itergrs pyranges.random pyranges.to_bigwig pyranges.version_info .. py:function:: count_overlaps(grs, features=None, strandedness=None, how=None, nb_cpu=1) Count overlaps in multiple pyranges. :param grs: The PyRanges to use as queries. :type grs: dict of PyRanges :param features: The PyRanges to use as subject in the query. If None, the PyRanges themselves are used as a query. :type features: PyRanges, default None :param strandedness: Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use "same" if both PyRanges are stranded, otherwise ignore the strand information. how : {None, "all", "containment"}, default None, i.e. all What intervals to report. By default reports all overlapping intervals. "containment" reports intervals where the overlapping is contained within it. :type strandedness: {None, "same", "opposite", False}, default None, i.e. auto :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 .. rubric:: Examples >>> a = '''Chromosome Start End ... chr1 6 12 ... chr1 10 20 ... chr1 22 27 ... chr1 24 30''' >>> b = '''Chromosome Start End ... chr1 12 32 ... chr1 14 30''' >>> c = '''Chromosome Start End ... chr1 8 15 ... chr1 10 14 ... chr1 32 34''' >>> grs = {n: pr.from_string(s) for n, s in zip(["a", "b", "c"], [a, b, c])} >>> for k, v in grs.items(): ... print("Name: " + k) ... print(v) Name: a +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 6 | 12 | | chr1 | 10 | 20 | | chr1 | 22 | 27 | | chr1 | 24 | 30 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. Name: b +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 12 | 32 | | chr1 | 14 | 30 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. Name: c +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 8 | 15 | | chr1 | 10 | 14 | | chr1 | 32 | 34 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> pr.count_overlaps(grs) +--------------+-----------+-----------+-----------+-----------+-----------+ | Chromosome | Start | End | a | b | c | | (object) | (int64) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------+-----------| | chr1 | 6 | 8 | 1 | 0 | 0 | | chr1 | 8 | 10 | 1 | 0 | 1 | | chr1 | 10 | 12 | 2 | 0 | 2 | | chr1 | 12 | 14 | 1 | 1 | 2 | | ... | ... | ... | ... | ... | ... | | chr1 | 24 | 27 | 2 | 2 | 0 | | chr1 | 27 | 30 | 1 | 2 | 0 | | chr1 | 30 | 32 | 0 | 1 | 0 | | chr1 | 32 | 34 | 0 | 0 | 1 | +--------------+-----------+-----------+-----------+-----------+-----------+ Unstranded PyRanges object has 12 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr = pr.PyRanges(chromosomes=["chr1"] * 4, starts=[0, 10, 20, 30], ends=[10, 20, 30, 40]) >>> gr +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 0 | 10 | | chr1 | 10 | 20 | | chr1 | 20 | 30 | | chr1 | 30 | 40 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> pr.count_overlaps(grs, gr) +--------------+-----------+-----------+-----------+-----------+-----------+ | Chromosome | Start | End | a | b | c | | (category) | (int64) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------+-----------| | chr1 | 0 | 10 | 1 | 0 | 1 | | chr1 | 10 | 20 | 2 | 2 | 2 | | chr1 | 20 | 30 | 2 | 2 | 0 | | chr1 | 30 | 40 | 0 | 1 | 1 | +--------------+-----------+-----------+-----------+-----------+-----------+ Unstranded PyRanges object has 4 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:class:: PyRanges(df=None, chromosomes=None, starts=None, ends=None, strands=None, int64=False, copy_df=True) Two-dimensional representation of genomic intervals and their annotations. A PyRanges object must have the columns Chromosome, Start and End. These describe the genomic position and function as implicit row labels. A Strand column is optional and adds strand information to the intervals. Any other columns are allowed and are considered metadata. Operations between PyRanges align intervals based on their position. If a PyRanges is built using the arguments chromosomes, starts, ends and optionally strands, all non-scalars must be of the same length. :param df: The data to be stored in the PyRanges. :type df: pandas.DataFrame or dict of pandas.DataFrame, default None :param chromosomes: The chromosome(s) in the PyRanges. :type chromosomes: array-like or scalar value, default None :param starts: The start postions in the PyRanges. :type starts: array-like, default None :param ends: The end postions in the PyRanges. :type ends: array-like, default None :param strands: The strands in the PyRanges. :type strands: array-like or scalar value, default None :param copy_df: Copy input pandas.DataFrame :type copy_df: bool, default True .. seealso:: :obj:`pyranges.read_bed` read bed-file into PyRanges :obj:`pyranges.read_bam` read bam-file into PyRanges :obj:`pyranges.read_gff` read gff-file into PyRanges :obj:`pyranges.read_gtf` read gtf-file into PyRanges :obj:`pyranges.from_dict` create PyRanges from dict of columns :obj:`pyranges.from_string` create PyRanges from multiline string .. rubric:: Notes A PyRanges object is represented internally as a dictionary efficiency. The keys are chromosomes or chromosome/strand tuples and the values are pandas DataFrames. .. rubric:: Examples >>> pr.PyRanges() Empty PyRanges >>> pr.PyRanges(chromosomes="chr1", starts=(1, 5), ends=[3, 149], ... strands=("+", "-")) +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 1 | 3 | + | | chr1 | 5 | 149 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> df = pd.DataFrame({"Chromosome": ["chr1", "chr2"], "Start": [100, 200], ... "End": [150, 201]}) >>> df Chromosome Start End 0 chr1 100 150 1 chr2 200 201 >>> pr.PyRanges(df) +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 100 | 150 | | chr2 | 200 | 201 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 3 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr = pr.from_dict({"Chromosome": [1, 1], "Strand": ["+", "-"], "Start": [1, 4], "End": [2, 27], ... "TP": [0, 1], "FP": [12, 11], "TN": [10, 9], "FN": [2, 3]}) >>> gr +--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------+ | Chromosome | Strand | Start | End | TP | FP | TN | FN | | (category) | (category) | (int64) | (int64) | (int64) | (int64) | (int64) | (int64) | |--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------| | 1 | + | 1 | 2 | 0 | 12 | 10 | 2 | | 1 | - | 4 | 27 | 1 | 11 | 9 | 3 | +--------------+--------------+-----------+-----------+-----------+-----------+-----------+-----------+ Stranded PyRanges object has 2 rows and 8 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:property:: chromosomes Return chromosomes in natsorted order. .. py:property:: columns Return the column labels of the PyRanges. :rtype: pandas.Index .. seealso:: :obj:`PyRanges.chromosomes` return the chromosomes in the PyRanges .. rubric:: Examples >>> f2 = pr.data.f2() >>> f2 +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 1 | 2 | a | 0 | + | | chr1 | 6 | 7 | b | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> f2.columns Index(['Chromosome', 'Start', 'End', 'Name', 'Score', 'Strand'], dtype='object') >>> f2.columns = f2.columns.str.replace("Sco|re", "NYAN", regex=True) >>> f2 +--------------+-----------+-----------+------------+------------+--------------+ | Chromosome | Start | End | Name | NYANNYAN | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+------------+--------------| | chr1 | 1 | 2 | a | 0 | + | | chr1 | 6 | 7 | b | 0 | - | +--------------+-----------+-----------+------------+------------+--------------+ Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:property:: df Return PyRanges as DataFrame. .. seealso:: :obj:`PyRanges.as_df` return PyRanges as DataFrame. .. py:property:: dtypes Return the dtypes of the PyRanges. .. rubric:: Examples >>> gr = pr.data.chipseq() >>> gr +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 212609534 | 212609559 | U0 | 0 | + | | chr1 | 169887529 | 169887554 | U0 | 0 | + | | chr1 | 216711011 | 216711036 | U0 | 0 | + | | chr1 | 144227079 | 144227104 | U0 | 0 | + | | ... | ... | ... | ... | ... | ... | | chrY | 15224235 | 15224260 | U0 | 0 | - | | chrY | 13517892 | 13517917 | U0 | 0 | - | | chrY | 8010951 | 8010976 | U0 | 0 | - | | chrY | 7405376 | 7405401 | U0 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.dtypes Chromosome category Start int64 End int64 Name object Score int64 Strand category dtype: object .. py:property:: empty Indicate whether PyRanges is empty. .. py:property:: length Return the total length of the intervals. .. seealso:: :obj:`PyRanges.lengths` return the intervals lengths .. rubric:: Examples >>> gr = pr.data.f1() >>> gr +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 3 | 6 | interval1 | 0 | + | | chr1 | 8 | 9 | interval3 | 0 | + | | chr1 | 5 | 7 | interval2 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.length 6 To find the length of the genome covered by the intervals, use merge first: >>> gr.merge(strand=False).length 5 .. py:property:: stranded Whether PyRanges has (valid) strand info. .. note:: A PyRanges can have invalid values in the Strand-column. It is not considered stranded. .. seealso:: :obj:`PyRanges.strands` return the strands .. rubric:: Examples >>> d = {'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6], ... 'End': [5, 8], 'Strand': ['+', '.']} >>> gr = pr.from_dict(d) >>> gr +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 1 | 5 | + | | chr1 | 6 | 8 | . | +--------------+-----------+-----------+--------------+ Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. Considered unstranded due to these Strand values: '.' >>> gr.stranded False >>> "Strand" in gr.columns True .. py:property:: strands Return strands. .. rubric:: Notes If the strand-column contains an invalid value, [] is returned. .. seealso:: :obj:`PyRanges.stranded` whether has valid strand info .. rubric:: Examples >>> d = {'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6], ... 'End': [5, 8], 'Strand': ['+', '.']} >>> gr = pr.from_dict(d) >>> gr +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 1 | 5 | + | | chr1 | 6 | 8 | . | +--------------+-----------+-----------+--------------+ Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. Considered unstranded due to these Strand values: '.' >>> gr.strands [] >>> gr.Strand.drop_duplicates().to_list() ['+', '.'] >>> gr.Strand = ["+", "-"] >>> gr.strands ['+', '-'] .. py:attribute:: dfs Dict mapping chromosomes or chromosome/strand pairs to pandas DataFrames. .. py:attribute:: features Namespace for genomic-features methods. .. seealso:: :obj:`pyranges.genomicfeatures` namespace for feature-functionality :obj:`pyranges.genomicfeatures.GenomicFeaturesMethods` namespace for feature-functionality .. py:attribute:: stats Namespace for statistcal methods. .. seealso:: :obj:`pyranges.statistics` namespace for statistics :obj:`pyranges.stats.StatisticsMethods` namespace for statistics .. py:method:: __array_ufunc__(*args, **kwargs) Apply unary numpy-function. Apply function to all columns which are not index, i.e. Chromosome, Start, End nor Strand. .. rubric:: Notes Function must produce a vector of equal length. .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": [1, 2, 3], "Start": [1, 2, 3], ... "End": [2, 3, 4], "Score": [9, 16, 25], "Score2": [121, 144, 169], ... "Name": ["n1", "n2", "n3"]}) >>> gr +--------------+-----------+-----------+-----------+-----------+------------+ | Chromosome | Start | End | Score | Score2 | Name | | (category) | (int64) | (int64) | (int64) | (int64) | (object) | |--------------+-----------+-----------+-----------+-----------+------------| | 1 | 1 | 2 | 9 | 121 | n1 | | 2 | 2 | 3 | 16 | 144 | n2 | | 3 | 3 | 4 | 25 | 169 | n3 | +--------------+-----------+-----------+-----------+-----------+------------+ Unstranded PyRanges object has 3 rows and 6 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> np.sqrt(gr) +--------------+-----------+-----------+-------------+-------------+------------+ | Chromosome | Start | End | Score | Score2 | Name | | (category) | (int64) | (int64) | (float64) | (float64) | (object) | |--------------+-----------+-----------+-------------+-------------+------------| | 1 | 1 | 2 | 3 | 11 | n1 | | 2 | 2 | 3 | 4 | 12 | n2 | | 3 | 3 | 4 | 5 | 13 | n3 | +--------------+-----------+-----------+-------------+-------------+------------+ Unstranded PyRanges object has 3 rows and 6 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: __getattr__(name) Return column. :param name: Column to return :type name: str :rtype: pandas.Series .. rubric:: Example >>> gr = pr.from_dict({"Chromosome": [1, 1, 1], "Start": [0, 100, 250], "End": [10, 125, 251]}) >>> gr.Start 0 0 1 100 2 250 Name: Start, dtype: int64 .. py:method:: __setattr__(column_name, column) Insert or update column. :param column_name: Name of column to update or insert. :type column_name: str :param column: Data to insert. :type column: list, np.array or pd.Series .. rubric:: Example >>> gr = pr.from_dict({"Chromosome": [1, 1, 1], "Start": [0, 100, 250], "End": [10, 125, 251]}) >>> gr.Start = np.array([1, 1, 2], dtype=np.int64) >>> gr +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | 1 | 1 | 10 | | 1 | 1 | 125 | | 1 | 2 | 251 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: __getitem__(val) Fetch columns or subset on position. If a list is provided, the column(s) in the list is returned. This subsets on columns. If a numpy array is provided, it must be of type bool and the same length as the PyRanges. Otherwise, a subset of the rows is returned with the location info provided. :param val: Data to fetch. :type val: bool array/Series, tuple, list, str or slice .. rubric:: Examples >>> gr = pr.data.ensembl_gtf() >>> list(gr.columns) ['Chromosome', 'Source', 'Feature', 'Start', 'End', 'Score', 'Strand', 'Frame', 'gene_biotype', 'gene_id', 'gene_name', 'gene_source', 'gene_version', 'tag', 'transcript_biotype', 'transcript_id', 'transcript_name', 'transcript_source', 'transcript_support_level', 'transcript_version', 'exon_id', 'exon_number', 'exon_version', '(assigned', 'previous', 'protein_id', 'protein_version', 'ccds_id'] >>> gr = gr[["Source", "Feature", "gene_id"]] >>> gr +--------------+------------+--------------+-----------+-----------+--------------+-----------------+ | Chromosome | Source | Feature | Start | End | Strand | gene_id | | (category) | (object) | (category) | (int64) | (int64) | (category) | (object) | |--------------+------------+--------------+-----------+-----------+--------------+-----------------| | 1 | havana | gene | 11868 | 14409 | + | ENSG00000223972 | | 1 | havana | transcript | 11868 | 14409 | + | ENSG00000223972 | | 1 | havana | exon | 11868 | 12227 | + | ENSG00000223972 | | 1 | havana | exon | 12612 | 12721 | + | ENSG00000223972 | | ... | ... | ... | ... | ... | ... | ... | | 1 | havana | gene | 1173055 | 1179555 | - | ENSG00000205231 | | 1 | havana | transcript | 1173055 | 1179555 | - | ENSG00000205231 | | 1 | havana | exon | 1179364 | 1179555 | - | ENSG00000205231 | | 1 | havana | exon | 1173055 | 1176396 | - | ENSG00000205231 | +--------------+------------+--------------+-----------+-----------+--------------+-----------------+ Stranded PyRanges object has 2,446 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. Create boolean Series and use it to subset: >>> s = (gr.Feature == "gene") | (gr.gene_id == "ENSG00000223972") >>> gr[s] +--------------+----------------+--------------+-----------+-----------+--------------+-----------------+ | Chromosome | Source | Feature | Start | End | Strand | gene_id | | (category) | (object) | (category) | (int64) | (int64) | (category) | (object) | |--------------+----------------+--------------+-----------+-----------+--------------+-----------------| | 1 | havana | gene | 11868 | 14409 | + | ENSG00000223972 | | 1 | havana | transcript | 11868 | 14409 | + | ENSG00000223972 | | 1 | havana | exon | 11868 | 12227 | + | ENSG00000223972 | | 1 | havana | exon | 12612 | 12721 | + | ENSG00000223972 | | ... | ... | ... | ... | ... | ... | ... | | 1 | havana | gene | 1062207 | 1063288 | - | ENSG00000273443 | | 1 | ensembl_havana | gene | 1070966 | 1074306 | - | ENSG00000237330 | | 1 | ensembl_havana | gene | 1081817 | 1116361 | - | ENSG00000131591 | | 1 | havana | gene | 1173055 | 1179555 | - | ENSG00000205231 | +--------------+----------------+--------------+-----------+-----------+--------------+-----------------+ Stranded PyRanges object has 95 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> cs = pr.data.chipseq() >>> cs[10000:100000] +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr2 | 33241 | 33266 | U0 | 0 | + | | chr2 | 13611 | 13636 | U0 | 0 | - | | chr2 | 32620 | 32645 | U0 | 0 | - | | chr3 | 87179 | 87204 | U0 | 0 | + | | chr4 | 45413 | 45438 | U0 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 5 rows and 6 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> cs["chr1", "-"] +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 100079649 | 100079674 | U0 | 0 | - | | chr1 | 223587418 | 223587443 | U0 | 0 | - | | chr1 | 202450161 | 202450186 | U0 | 0 | - | | chr1 | 156338310 | 156338335 | U0 | 0 | - | | ... | ... | ... | ... | ... | ... | | chr1 | 203557775 | 203557800 | U0 | 0 | - | | chr1 | 28114107 | 28114132 | U0 | 0 | - | | chr1 | 21622765 | 21622790 | U0 | 0 | - | | chr1 | 80668132 | 80668157 | U0 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 437 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> cs["chr5", "-", 90000:] +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr5 | 399682 | 399707 | U0 | 0 | - | | chr5 | 1847502 | 1847527 | U0 | 0 | - | | chr5 | 5247533 | 5247558 | U0 | 0 | - | | chr5 | 5300394 | 5300419 | U0 | 0 | - | | ... | ... | ... | ... | ... | ... | | chr5 | 178786234 | 178786259 | U0 | 0 | - | | chr5 | 179268931 | 179268956 | U0 | 0 | - | | chr5 | 179289594 | 179289619 | U0 | 0 | - | | chr5 | 180513795 | 180513820 | U0 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 285 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> cs["chrM"] Empty PyRanges .. py:method:: __iter__() Iterate over the keys and values. .. seealso:: :obj:`pyranges.iter` iterate over multiple PyRanges .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": [1, 1, 1], "Start": [0, 100, 250], ... "End": [10, 125, 251], "Strand": ["+", "+", "-"]}) >>> for k, v in gr: ... print(k) ... print(v) ('1', '+') Chromosome Start End Strand 0 1 0 10 + 1 1 100 125 + ('1', '-') Chromosome Start End Strand 2 1 250 251 - .. py:method:: __len__() Return the number of intervals in the PyRanges. .. py:method:: __str__() Return string representation. .. py:method:: __repr__() Return REPL representation. .. py:method:: _repr_html_() Return REPL HTML representation for Jupyter Noteboooks. .. py:method:: apply(f, strand=None, as_pyranges=True, nb_cpu=1, **kwargs) Apply a function to the PyRanges. :param f: Function to apply on each DataFrame in a PyRanges :type f: function :param strand: Whether to do operations on chromosome/strand pairs or chromosomes. If None, will use chromosome/strand pairs if the PyRanges is stranded. :type strand: bool, default None, i.e. auto :param as_pyranges: Whether to return as a PyRanges or dict. If `f` does not return a DataFrame valid for PyRanges, `as_pyranges` must be False. :type as_pyranges: bool, default True :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :param \*\*kwargs: Additional keyword arguments to pass as keyword arguments to `f` :returns: Result of applying f to each DataFrame in the PyRanges :rtype: PyRanges or dict .. seealso:: :obj:`pyranges.PyRanges.apply_pair` apply a function to a pair of PyRanges :obj:`pyranges.PyRanges.apply_chunks` apply a row-based function to a PyRanges in parallel .. note:: This is the function used internally to carry out almost all unary PyRanges methods. .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": [1, 1, 2, 2], "Strand": ["+", "+", "-", "+"], ... "Start": [1, 4, 2, 9], "End": [2, 27, 13, 10]}) >>> gr +--------------+--------------+-----------+-----------+ | Chromosome | Strand | Start | End | | (category) | (category) | (int64) | (int64) | |--------------+--------------+-----------+-----------| | 1 | + | 1 | 2 | | 1 | + | 4 | 27 | | 2 | + | 9 | 10 | | 2 | - | 2 | 13 | +--------------+--------------+-----------+-----------+ Stranded PyRanges object has 4 rows and 4 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.apply(lambda df: len(df), as_pyranges=False) {('1', '+'): 2, ('2', '+'): 1, ('2', '-'): 1} >>> gr.apply(lambda df: len(df), as_pyranges=False, strand=False) {'1': 2, '2': 2} >>> def add_to_ends(df, **kwargs): ... df.loc[:, "End"] = kwargs["slack"] + df.End ... return df >>> gr.apply(add_to_ends, slack=500) +--------------+--------------+-----------+-----------+ | Chromosome | Strand | Start | End | | (category) | (category) | (int64) | (int64) | |--------------+--------------+-----------+-----------| | 1 | + | 1 | 502 | | 1 | + | 4 | 527 | | 2 | + | 9 | 510 | | 2 | - | 2 | 513 | +--------------+--------------+-----------+-----------+ Stranded PyRanges object has 4 rows and 4 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: apply_chunks(f, as_pyranges=False, nb_cpu=1, **kwargs) Apply a row-based function to arbitrary partitions of the PyRanges. apply_chunks speeds up the application of functions where the result is not affected by applying the function to ordered, non-overlapping splits of the data. :param f: Row-based or associative function to apply on the partitions. :type f: function :param as_pyranges: Whether to return as a PyRanges or dict. :type as_pyranges: bool, default False :param nb_cpu: How many cpus to use. The data is split into nb_cpu partitions. :type nb_cpu: int, default 1 :param \*\*kwargs: Additional keyword arguments to pass as keyword arguments to `f` :returns: Result of applying f to each partition of the DataFrames in the PyRanges. :rtype: dict of lists .. seealso:: :obj:`pyranges.PyRanges.apply_pair` apply a function to a pair of PyRanges :obj:`pyranges.PyRanges.apply_chunks` apply a row-based function to a PyRanges in parallel .. note:: apply_chunks will only lead to speedups on large datasets or slow-running functions. Using it with nb_cpu=1 is pointless; use apply instead. .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": [1, 1, 1], "Start": [2, 3, 5], "End": [9, 4, 6]}) >>> gr +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | 1 | 2 | 9 | | 1 | 3 | 4 | | 1 | 5 | 6 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.apply_chunks( ... lambda df, **kwargs: list(df.End + kwargs["add"]), nb_cpu=1, add=1000) {'1': [[1009, 1004, 1006]]} .. py:method:: apply_pair(other, f, strandedness=None, as_pyranges=True, **kwargs) Apply a function to a pair of PyRanges. The function is applied to each chromosome or chromosome/strand pair found in at least one of the PyRanges. :param f: Row-based or associative function to apply on the DataFrames. :type f: function :param strandedness: Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use "same" if both PyRanges are strande, otherwise ignore the strand information. :type strandedness: {None, "same", "opposite", False}, default None, i.e. auto :param as_pyranges: Whether to return as a PyRanges or dict. If `f` does not return a DataFrame valid for PyRanges, `as_pyranges` must be False. :type as_pyranges: bool, default False :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :param \*\*kwargs: Additional keyword arguments to pass as keyword arguments to `f` :returns: Result of applying f to each partition of the DataFrames in the PyRanges. :rtype: dict of lists .. seealso:: :obj:`pyranges.PyRanges.apply_pair` apply a function to a pair of PyRanges :obj:`pyranges.PyRanges.apply_chunks` apply a row-based function to a PyRanges in parallel :obj:`pyranges.iter` iterate over two or more PyRanges .. note:: This is the function used internally to carry out almost all comparison functions in PyRanges. .. rubric:: Examples >>> gr = pr.data.chipseq() >>> gr2 = pr.data.chipseq_background() >>> gr.apply_pair(gr2, pr.methods.intersection._intersection) # same as gr.intersect(gr2) +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 226987603 | 226987617 | U0 | 0 | + | | chr8 | 38747236 | 38747251 | U0 | 0 | - | | chr15 | 26105515 | 26105518 | U0 | 0 | + | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 3 rows and 6 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> f1 = pr.data.f1() >>> f1 +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 3 | 6 | interval1 | 0 | + | | chr1 | 8 | 9 | interval3 | 0 | + | | chr1 | 5 | 7 | interval2 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> f2 = pr.data.f2() >>> f2 +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 1 | 2 | a | 0 | + | | chr1 | 6 | 7 | b | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> f1.apply_pair(f2, lambda df, df2: (len(df), len(df2)), as_pyranges=False) {('chr1', '+'): (2, 2), ('chr1', '-'): (1, 2)} .. py:method:: as_df() Return PyRanges as DataFrame. :returns: A DataFrame natural sorted on Chromosome and Strand. The ordering of rows within chromosomes and strands is preserved. :rtype: DataFrame .. seealso:: :obj:`PyRanges.df` Return PyRanges as DataFrame. .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": [1, 1, 2, 2], "Start": [1, 2, 3, 9], ... "End": [3, 3, 10, 12], "Gene": ["A", "B", "C", "D"]}) >>> gr +--------------+-----------+-----------+------------+ | Chromosome | Start | End | Gene | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | 1 | 1 | 3 | A | | 1 | 2 | 3 | B | | 2 | 3 | 10 | C | | 2 | 9 | 12 | D | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 4 rows and 4 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.as_df() Chromosome Start End Gene 0 1 1 3 A 1 1 2 3 B 2 2 3 10 C 3 2 9 12 D .. py:method:: assign(col, f, strand=None, nb_cpu=1, **kwargs) Add or replace a column. Does not change the original PyRanges. :param col: Name of column. :type col: str :param f: Function to create new column. :type f: function :param strand: Whether to do operations on chromosome/strand pairs or chromosomes. If None, will use chromosome/strand pairs if the PyRanges is stranded. :type strand: bool, default None, i.e. auto :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :param \*\*kwargs: Additional keyword arguments to pass as keyword arguments to `f` :returns: A copy of the PyRanges with the column inserted. :rtype: PyRanges .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": [1, 1], "Start": [1, 2], "End": [3, 5], ... "Name": ["a", "b"]}) >>> gr +--------------+-----------+-----------+------------+ | Chromosome | Start | End | Name | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | 1 | 1 | 3 | a | | 1 | 2 | 5 | b | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.assign("Blabla", lambda df: df.Chromosome.astype(str) + "_yadayada") +--------------+-----------+-----------+------------+------------+ | Chromosome | Start | End | Name | Blabla | | (category) | (int64) | (int64) | (object) | (object) | |--------------+-----------+-----------+------------+------------| | 1 | 1 | 3 | a | 1_yadayada | | 1 | 2 | 5 | b | 1_yadayada | +--------------+-----------+-----------+------------+------------+ Unstranded PyRanges object has 2 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. Note that assigning to an existing name replaces the column: >>> gr.assign("Name", ... lambda df, **kwargs: df.Start.astype(str) + kwargs["sep"] + ... df.Name.str.capitalize(), sep="_") +--------------+-----------+-----------+------------+ | Chromosome | Start | End | Name | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | 1 | 1 | 3 | 1_A | | 1 | 2 | 5 | 2_B | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: boundaries(group_by, agg=None) Return the boundaries of groups of intervals (e.g. transcripts) :param group_by: Name(s) of column(s) to group intervals :type group_by: str or list of str :param agg: Defines how to aggregate metadata columns. Provided as dictionary of column names -> functions, function names or list of such, as accepted by the Pandas.DataFrame.agg method. :type agg: dict or None :returns: One interval per group, with the min(Start) and max(End) of the group :rtype: PyRanges .. rubric:: Examples >>> d = {"Chromosome": [1, 1, 1], "Start": [1, 60, 110], "End": [40, 68, 130], "transcript_id": ["tr1", "tr1", "tr2"], "meta": ["a", "b", "c"]} >>> gr = pr.from_dict(d) >>> gr.length=gr.lengths() >>> gr +--------------+-----------+-----------+-----------------+------------+-----------+ | Chromosome | Start | End | transcript_id | meta | length | | (category) | (int64) | (int64) | (object) | (object) | (int64) | |--------------+-----------+-----------+-----------------+------------+-----------| | 1 | 1 | 40 | tr1 | a | 39 | | 1 | 60 | 68 | tr1 | b | 8 | | 1 | 110 | 130 | tr2 | c | 20 | +--------------+-----------+-----------+-----------------+------------+-----------+ Unstranded PyRanges object has 3 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.boundaries("transcript_id") +--------------+-----------+-----------+-----------------+ | Chromosome | Start | End | transcript_id | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+-----------------| | 1 | 1 | 68 | tr1 | | 1 | 110 | 130 | tr2 | +--------------+-----------+-----------+-----------------+ Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.boundaries("transcript_id", agg={"length":"sum", "meta": ",".join}) +--------------+-----------+-----------+-----------------+------------+-----------+ | Chromosome | Start | End | transcript_id | meta | length | | (category) | (int64) | (int64) | (object) | (object) | (int64) | |--------------+-----------+-----------+-----------------+------------+-----------| | 1 | 1 | 68 | tr1 | a,b | 47 | | 1 | 110 | 130 | tr2 | c | 20 | +--------------+-----------+-----------+-----------------+------------+-----------+ Unstranded PyRanges object has 2 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: calculate_frame(by) Calculate the frame of each genomic interval, assuming all are coding sequences (CDS), and add it as column inplace. After this, the input Pyranges will contain an added "Frame" column, which determines the base of the CDS that is the first base of a codon. Resulting values are in range between 0 and 2 included. 0 indicates that the first base of the CDS is the first base of a codon, 1 indicates the second base and 2 indicates the third base of the CDS. While the 5'-most interval of each transcript has always 0 frame, the following ones may have any of these values. :param by: Column(s) to group by the intervals: coding exons belonging to the same transcript have the same values in this/these column(s). :type by: str or list of str :returns: The "Frame" column is added inplace. :rtype: None .. rubric:: Examples >>> p= pr.from_dict({"Chromosome": [1,1,1,2,2], ... "Strand": ["+","+","+","-","-"], ... "Start": [1,31,52,101,201], ... "End": [10,45,90,130,218], ... "transcript_id": ["t1","t1","t1","t2","t2"] }) >>> p +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 1 | 10 | t1 | | 1 | + | 31 | 45 | t1 | | 1 | + | 52 | 90 | t1 | | 2 | - | 101 | 130 | t2 | | 2 | - | 201 | 218 | t2 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 5 rows and 5 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> p.calculate_frame(by=['transcript_id']) >>> p +--------------+--------------+-----------+-----------+-----------------+-----------+ | Chromosome | Strand | Start | End | transcript_id | Frame | | (category) | (category) | (int64) | (int64) | (object) | (int64) | |--------------+--------------+-----------+-----------+-----------------+-----------| | 1 | + | 1 | 10 | t1 | 0 | | 1 | + | 31 | 45 | t1 | 9 | | 1 | + | 52 | 90 | t1 | 23 | | 2 | - | 101 | 130 | t2 | 17 | | 2 | - | 201 | 218 | t2 | 0 | +--------------+--------------+-----------+-----------+-----------------+-----------+ Stranded PyRanges object has 5 rows and 6 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: cluster(strand=None, by=None, slack=0, count=False, nb_cpu=1) Give overlapping intervals a common id. :param strand: Whether to ignore strand information if PyRanges is stranded. :type strand: bool, default None, i.e. auto :param by: Only intervals with an equal value in column(s) `by` are clustered. :type by: str or list, default None :param slack: Consider intervals separated by less than `slack` to be in the same cluster. If `slack` is negative, intervals overlapping less than `slack` are not considered to be in the same cluster. :type slack: int, default 0 :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :returns: PyRanges with an ID-column "Cluster" added. :rtype: PyRanges .. warning:: Bookended intervals (i.e. the End of a PyRanges interval is the Start of another one) are by default considered to overlap. Avoid this with slack=-1. .. seealso:: :obj:`PyRanges.merge` combine overlapping intervals into one .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": [1, 1, 1, 1], "Start": [1, 2, 3, 9], ... "End": [3, 3, 10, 12], "Gene": [1, 2, 3, 3]}) >>> gr +--------------+-----------+-----------+-----------+ | Chromosome | Start | End | Gene | | (category) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------| | 1 | 1 | 3 | 1 | | 1 | 2 | 3 | 2 | | 1 | 3 | 10 | 3 | | 1 | 9 | 12 | 3 | +--------------+-----------+-----------+-----------+ Unstranded PyRanges object has 4 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.cluster() +--------------+-----------+-----------+-----------+-----------+ | Chromosome | Start | End | Gene | Cluster | | (category) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------| | 1 | 1 | 3 | 1 | 1 | | 1 | 2 | 3 | 2 | 1 | | 1 | 3 | 10 | 3 | 1 | | 1 | 9 | 12 | 3 | 1 | +--------------+-----------+-----------+-----------+-----------+ Unstranded PyRanges object has 4 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.cluster(by="Gene", count=True) +--------------+-----------+-----------+-----------+-----------+-----------+ | Chromosome | Start | End | Gene | Cluster | Count | | (category) | (int64) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------+-----------| | 1 | 1 | 3 | 1 | 1 | 1 | | 1 | 2 | 3 | 2 | 2 | 1 | | 1 | 3 | 10 | 3 | 3 | 2 | | 1 | 9 | 12 | 3 | 3 | 2 | +--------------+-----------+-----------+-----------+-----------+-----------+ Unstranded PyRanges object has 4 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. Avoid clustering bookended intervals with slack=-1: >>> gr.cluster(slack=-1) +--------------+-----------+-----------+-----------+-----------+ | Chromosome | Start | End | Gene | Cluster | | (category) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------| | 1 | 1 | 3 | 1 | 1 | | 1 | 2 | 3 | 2 | 1 | | 1 | 3 | 10 | 3 | 2 | | 1 | 9 | 12 | 3 | 2 | +--------------+-----------+-----------+-----------+-----------+ Unstranded PyRanges object has 4 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr2 = pr.data.ensembl_gtf()[["Feature", "Source"]] >>> gr2.cluster(by=["Feature", "Source"]) +--------------+--------------+---------------+-----------+-----------+--------------+-----------+ | Chromosome | Feature | Source | Start | End | Strand | Cluster | | (category) | (category) | (object) | (int64) | (int64) | (category) | (int64) | |--------------+--------------+---------------+-----------+-----------+--------------+-----------| | 1 | CDS | ensembl | 69090 | 70005 | + | 1 | | 1 | CDS | ensembl | 925941 | 926013 | + | 2 | | 1 | CDS | ensembl | 925941 | 926013 | + | 2 | | 1 | CDS | ensembl | 925941 | 926013 | + | 2 | | ... | ... | ... | ... | ... | ... | ... | | 1 | transcript | havana_tagene | 167128 | 169240 | - | 1142 | | 1 | transcript | mirbase | 17368 | 17436 | - | 1143 | | 1 | transcript | mirbase | 187890 | 187958 | - | 1144 | | 1 | transcript | mirbase | 632324 | 632413 | - | 1145 | +--------------+--------------+---------------+-----------+-----------+--------------+-----------+ Stranded PyRanges object has 2,446 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: copy() Make a deep copy of the PyRanges. .. rubric:: Notes See the pandas docs for deep-copying caveats. .. py:method:: count_overlaps(other, strandedness=None, keep_nonoverlapping=True, overlap_col='NumberOverlaps') Count number of overlaps per interval. Count how many intervals in self overlap with those in other. :param strandedness: Whether to perform the operation on the same, opposite or no strand. Use False to ignore the strand. None means use "same" if both PyRanges are stranded, otherwise ignore. :type strandedness: {"same", "opposite", None, False}, default None, i.e. auto :param keep_nonoverlapping: Keep intervals without overlaps. :type keep_nonoverlapping: bool, default True :param overlap_col: Name of column with overlap counts. :type overlap_col: str, default "NumberOverlaps" :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :returns: PyRanges with a column of overlaps added. :rtype: PyRanges .. seealso:: :obj:`PyRanges.coverage` find coverage of PyRanges :obj:`pyranges.count_overlaps` count overlaps from multiple PyRanges .. rubric:: Examples >>> f1 = pr.data.f1().drop() >>> f1 +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 3 | 6 | + | | chr1 | 8 | 9 | + | | chr1 | 5 | 7 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> f2 = pr.data.f2().drop() >>> f2 +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 1 | 2 | + | | chr1 | 6 | 7 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> f1.count_overlaps(f2, overlap_col="Count") +--------------+-----------+-----------+--------------+-----------+ | Chromosome | Start | End | Strand | Count | | (category) | (int64) | (int64) | (category) | (int64) | |--------------+-----------+-----------+--------------+-----------| | chr1 | 3 | 6 | + | 0 | | chr1 | 8 | 9 | + | 0 | | chr1 | 5 | 7 | - | 1 | +--------------+-----------+-----------+--------------+-----------+ Stranded PyRanges object has 3 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: coverage(other, strandedness=None, keep_nonoverlapping=True, overlap_col='NumberOverlaps', fraction_col='FractionOverlaps', nb_cpu=1) Count number of overlaps and their fraction per interval. Count how many intervals in self overlap with those in other. :param strandedness: Whether to perform the operation on the same, opposite or no strand. Use False to ignore the strand. None means use "same" if both PyRanges are stranded, otherwise ignore. :type strandedness: {"same", "opposite", None, False}, default None, i.e. auto :param keep_nonoverlapping: Keep intervals without overlaps. :type keep_nonoverlapping: bool, default True :param overlap_col: Name of column with overlap counts. :type overlap_col: str, default "NumberOverlaps" :param fraction_col: Name of column with fraction of counts. :type fraction_col: str, default "FractionOverlaps" :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :returns: PyRanges with a column of overlaps added. :rtype: PyRanges .. seealso:: :obj:`pyranges.count_overlaps` count overlaps from multiple PyRanges .. rubric:: Examples >>> f1 = pr.from_dict({"Chromosome": [1, 1, 1], "Start": [3, 8, 5], ... "End": [6, 9, 7]}) >>> f1 +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | 1 | 3 | 6 | | 1 | 8 | 9 | | 1 | 5 | 7 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> f2 = pr.from_dict({"Chromosome": [1, 1], "Start": [1, 6], ... "End": [2, 7]}) >>> f2 +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | 1 | 1 | 2 | | 1 | 6 | 7 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> f1.coverage(f2, overlap_col="C", fraction_col="F") +--------------+-----------+-----------+-----------+-------------+ | Chromosome | Start | End | C | F | | (category) | (int64) | (int64) | (int64) | (float64) | |--------------+-----------+-----------+-----------+-------------| | 1 | 3 | 6 | 0 | 0 | | 1 | 8 | 9 | 0 | 0 | | 1 | 5 | 7 | 1 | 0.5 | +--------------+-----------+-----------+-----------+-------------+ Unstranded PyRanges object has 3 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: drop(drop=None, like=None) Drop column(s). If no arguments are given, all the columns except Chromosome, Start, End and Strand are dropped. :param drop: Columns to drop. :type drop: str or list, default None :param like: Regex-string matching columns to drop. Matches with Chromosome, Start, End or Strand are ignored. :type like: str, default None .. seealso:: :obj:`PyRanges.unstrand` drop strand information .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": [1, 1], "Start": [1, 4], "End": [5, 6], ... "Strand": ["+", "-"], "Count": [1, 2], ... "Type": ["exon", "exon"]}) >>> gr +--------------+-----------+-----------+--------------+-----------+------------+ | Chromosome | Start | End | Strand | Count | Type | | (category) | (int64) | (int64) | (category) | (int64) | (object) | |--------------+-----------+-----------+--------------+-----------+------------| | 1 | 1 | 5 | + | 1 | exon | | 1 | 4 | 6 | - | 2 | exon | +--------------+-----------+-----------+--------------+-----------+------------+ Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.drop() +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | 1 | 1 | 5 | + | | 1 | 4 | 6 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. Matches with position-columns are ignored: >>> gr.drop(like="Chromosome|Strand") +--------------+-----------+-----------+--------------+-----------+------------+ | Chromosome | Start | End | Strand | Count | Type | | (category) | (int64) | (int64) | (category) | (int64) | (object) | |--------------+-----------+-----------+--------------+-----------+------------| | 1 | 1 | 5 | + | 1 | exon | | 1 | 4 | 6 | - | 2 | exon | +--------------+-----------+-----------+--------------+-----------+------------+ Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.drop(like="e$") +--------------+-----------+-----------+--------------+-----------+ | Chromosome | Start | End | Strand | Count | | (category) | (int64) | (int64) | (category) | (int64) | |--------------+-----------+-----------+--------------+-----------| | 1 | 1 | 5 | + | 1 | | 1 | 4 | 6 | - | 2 | +--------------+-----------+-----------+--------------+-----------+ Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: drop_duplicate_positions(strand=None, keep='first') Return PyRanges with duplicate postion rows removed. :param strand: Whether to take strand-information into account when considering duplicates. :type strand: bool, default None, i.e. auto :param keep: Whether to keep first, last or drop all duplicates. :type keep: {"first", "last", False} .. rubric:: Examples >>> gr = pr.from_string('''Chromosome Start End Strand Name ... 1 1 2 + A ... 1 1 2 - B ... 1 1 2 + Z''') >>> gr +--------------+-----------+-----------+--------------+------------+ | Chromosome | Start | End | Strand | Name | | (category) | (int64) | (int64) | (category) | (object) | |--------------+-----------+-----------+--------------+------------| | 1 | 1 | 2 | + | A | | 1 | 1 | 2 | + | Z | | 1 | 1 | 2 | - | B | +--------------+-----------+-----------+--------------+------------+ Stranded PyRanges object has 3 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.drop_duplicate_positions() +--------------+-----------+-----------+--------------+------------+ | Chromosome | Start | End | Strand | Name | | (category) | (int64) | (int64) | (category) | (object) | |--------------+-----------+-----------+--------------+------------| | 1 | 1 | 2 | + | A | | 1 | 1 | 2 | - | B | +--------------+-----------+-----------+--------------+------------+ Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.drop_duplicate_positions(keep="last") +--------------+-----------+-----------+--------------+------------+ | Chromosome | Start | End | Strand | Name | | (category) | (int64) | (int64) | (category) | (object) | |--------------+-----------+-----------+--------------+------------| | 1 | 1 | 2 | + | Z | | 1 | 1 | 2 | - | B | +--------------+-----------+-----------+--------------+------------+ Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. Note that the reverse strand is considered to be behind the forward strand: >>> gr.drop_duplicate_positions(keep="last", strand=False) +--------------+-----------+-----------+--------------+------------+ | Chromosome | Start | End | Strand | Name | | (category) | (int64) | (int64) | (category) | (object) | |--------------+-----------+-----------+--------------+------------| | 1 | 1 | 2 | - | B | +--------------+-----------+-----------+--------------+------------+ Stranded PyRanges object has 1 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.drop_duplicate_positions(keep=False, strand=False) Empty PyRanges .. py:method:: extend(ext, group_by=None) Extend the intervals from the ends. :param ext: The number of nucleotides to extend the ends with. If an int is provided, the same extension is applied to both the start and end of intervals, while a dict input allows to control differently the two ends. Note also that 5' and 3' extensions take the strand into account, if the intervals are stranded. :type ext: int or dict of ints with "3" and/or "5" as keys. :param group_by: group intervals by these column name(s), so that the extension is applied only to the left-most and/or right-most interval. :type group_by: str or list of str, default: None .. seealso:: :obj:`PyRanges.subsequence` obtain subsequences of intervals :obj:`PyRanges.spliced_subsequence` obtain subsequences of intervals, providing transcript-level coordinates .. rubric:: Examples >>> d = {'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [3, 8, 5], 'End': [6, 9, 7], ... 'Strand': ['+', '+', '-']} >>> gr = pr.from_dict(d) >>> gr +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 3 | 6 | + | | chr1 | 8 | 9 | + | | chr1 | 5 | 7 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.extend(4) +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 0 | 10 | + | | chr1 | 4 | 13 | + | | chr1 | 1 | 11 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.extend({"3": 1}) +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 3 | 7 | + | | chr1 | 8 | 10 | + | | chr1 | 4 | 7 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.extend({"3": 1, "5": 2}) +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 1 | 7 | + | | chr1 | 6 | 10 | + | | chr1 | 4 | 9 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.extend(-1) Traceback (most recent call last): ... AssertionError: Some intervals are negative or zero length after applying extend! .. py:method:: five_end() Return the five prime end of intervals. The five prime end is the start of a forward strand or the end of a reverse strand. :returns: PyRanges with the five prime ends :rtype: PyRanges .. rubric:: Notes Requires the PyRanges to be stranded. .. seealso:: :obj:`PyRanges.three_end` return the 3' end .. rubric:: Examples >>> gr = pr.from_dict({'Chromosome': ['chr1', 'chr1'], 'Start': [3, 5], 'End': [9, 7], ... 'Strand': ["+", "-"]}) >>> gr +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 3 | 9 | + | | chr1 | 5 | 7 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.five_end() +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 3 | 4 | + | | chr1 | 6 | 7 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: head(n=8) Return the n first rows. :param n: Return n rows. :type n: int, default 8 :returns: PyRanges with the n first rows. :rtype: PyRanges .. seealso:: :obj:`PyRanges.tail` return the last rows :obj:`PyRanges.sample` return random rows .. rubric:: Examples >>> gr = pr.data.chipseq() >>> gr +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 212609534 | 212609559 | U0 | 0 | + | | chr1 | 169887529 | 169887554 | U0 | 0 | + | | chr1 | 216711011 | 216711036 | U0 | 0 | + | | chr1 | 144227079 | 144227104 | U0 | 0 | + | | ... | ... | ... | ... | ... | ... | | chrY | 15224235 | 15224260 | U0 | 0 | - | | chrY | 13517892 | 13517917 | U0 | 0 | - | | chrY | 8010951 | 8010976 | U0 | 0 | - | | chrY | 7405376 | 7405401 | U0 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.head(3) +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 212609534 | 212609559 | U0 | 0 | + | | chr1 | 169887529 | 169887554 | U0 | 0 | + | | chr1 | 216711011 | 216711036 | U0 | 0 | + | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: insert(other, loc=None) Add one or more columns to the PyRanges. :param other: Data to insert into the PyRanges. `other` must have the same number of rows as the PyRanges. :type other: Series, DataFrame or dict :param loc: Insertion index. :type loc: int, default None, i.e. after last column of PyRanges. :returns: A copy of the PyRanges with the column(s) inserted starting at `loc`. :rtype: PyRanges .. note:: If a Series, or a dict of Series is used, the Series must have a name. .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": ["L", "E", "E", "T"], "Start": [1, 1, 2, 3], "End": [5, 8, 13, 21]}) >>> gr +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | E | 1 | 8 | | E | 2 | 13 | | L | 1 | 5 | | T | 3 | 21 | +--------------+-----------+-----------+ Unstranded PyRanges object has 4 rows and 3 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> s = pd.Series(data = [1, 3, 3, 7], name="Column") >>> gr.insert(s) +--------------+-----------+-----------+-----------+ | Chromosome | Start | End | Column | | (category) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------| | E | 1 | 8 | 1 | | E | 2 | 13 | 3 | | L | 1 | 5 | 3 | | T | 3 | 21 | 7 | +--------------+-----------+-----------+-----------+ Unstranded PyRanges object has 4 rows and 4 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> df = pd.DataFrame({"NY": s, "AN": s}) >>> df NY AN 0 1 1 1 3 3 2 3 3 3 7 7 Note that the original PyRanges was not affected by previously inserting Column: >>> gr.insert(df, 1) +--------------+-----------+-----------+-----------+-----------+ | Chromosome | NY | AN | Start | End | | (category) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------| | E | 1 | 1 | 1 | 8 | | E | 3 | 3 | 2 | 13 | | L | 3 | 3 | 1 | 5 | | T | 7 | 7 | 3 | 21 | +--------------+-----------+-----------+-----------+-----------+ Unstranded PyRanges object has 4 rows and 5 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> arbitrary_result = gr.apply( ... lambda df: pd.Series(df.Start + df.End, name="Hi!"), as_pyranges=False) >>> arbitrary_result {'E': 1 9 2 15 Name: Hi!, dtype: int64, 'L': 0 6 Name: Hi!, dtype: int64, 'T': 3 24 Name: Hi!, dtype: int64} >>> gr.insert(arbitrary_result) +--------------+-----------+-----------+-----------+ | Chromosome | Start | End | Hi! | | (category) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------| | E | 1 | 8 | 9 | | E | 2 | 13 | 15 | | L | 1 | 5 | 6 | | T | 3 | 21 | 24 | +--------------+-----------+-----------+-----------+ Unstranded PyRanges object has 4 rows and 4 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: intersect(other, strandedness=None, how=None, invert=False, nb_cpu=1) Return overlapping subintervals. Returns the segments of the intervals in self which overlap with those in other. :param other: PyRanges to intersect. :type other: PyRanges :param strandedness: Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use "same" if both PyRanges are strande, otherwise ignore the strand information. :type strandedness: {None, "same", "opposite", False}, default None, i.e. auto :param how: What intervals to report. By default reports all overlapping intervals. "containment" reports intervals where the overlapping is contained within it. :type how: {None, "first", "last", "containment"}, default None, i.e. all :param invert: Whether to return the intervals without overlaps. :type invert: bool, default False :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :returns: A PyRanges with overlapping subintervals. :rtype: PyRanges .. seealso:: :obj:`PyRanges.set_intersect` set-intersect PyRanges :obj:`PyRanges.overlap` report overlapping intervals .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10], ... "End": [3, 9, 11], "ID": ["a", "b", "c"]}) >>> gr +--------------+-----------+-----------+------------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 1 | 3 | a | | chr1 | 4 | 9 | b | | chr1 | 10 | 11 | c | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]}) >>> gr2 +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 2 | 3 | | chr1 | 2 | 9 | | chr1 | 9 | 10 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.intersect(gr2) +--------------+-----------+-----------+------------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 2 | 3 | a | | chr1 | 2 | 3 | a | | chr1 | 4 | 9 | b | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.intersect(gr2, how="first") +--------------+-----------+-----------+------------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 2 | 3 | a | | chr1 | 4 | 9 | b | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.intersect(gr2, how="containment") +--------------+-----------+-----------+------------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 4 | 9 | b | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 1 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: items() Return the pairs of keys and DataFrames. :returns: The dict mapping keys to DataFrames in the PyRanges. :rtype: dict .. seealso:: :obj:`PyRanges.chromosomes` return the chromosomes :obj:`PyRanges.keys` return the keys :obj:`PyRanges.values` return the DataFrames in the PyRanges .. rubric:: Examples >>> gr = pr.data.f1() >>> gr.items() [(('chr1', '+'), Chromosome Start End Name Score Strand 0 chr1 3 6 interval1 0 + 2 chr1 8 9 interval3 0 +), (('chr1', '-'), Chromosome Start End Name Score Strand 1 chr1 5 7 interval2 0 -)] .. py:method:: join(other, strandedness=None, how=None, report_overlap=False, slack=0, suffix='_b', nb_cpu=1, apply_strand_suffix=None, preserve_order=False) Join PyRanges on genomic location. :param other: PyRanges to join. :type other: PyRanges :param strandedness: Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use "same" if both PyRanges are strande, otherwise ignore the strand information. :type strandedness: {None, "same", "opposite", False}, default None, i.e. auto :param how: How to handle intervals without overlap. None means only keep overlapping intervals. "left" keeps all intervals in self, "right" keeps all intervals in other. :type how: {None, "left", "right"}, default None, i.e. "inner" :param report_overlap: Report amount of overlap in base pairs. :type report_overlap: bool, default False :param slack: Lengthen intervals in self before joining. :type slack: int, default 0 :param suffix: Suffix to give overlapping columns in other. :type suffix: str or tuple, default "_b" :param apply_strand_suffix: If first pyranges is unstranded, but the second is not, the first will be given a strand column. apply_strand_suffix makes the added strand column a regular data column instead by adding a suffix. :type apply_strand_suffix: bool, default None :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :param preserve_order: If True, preserves the order after performing the join (only relevant in "outer", "left" and "right" joins). :type preserve_order: bool, default False :returns: A PyRanges appended with columns of another. :rtype: PyRanges .. rubric:: Notes The chromosome from other will never be reported as it is always the same as in self. As pandas did not have NaN for non-float datatypes until recently, "left" and "right" join give non-overlapping rows the value -1 to avoid promoting columns to object. This will change to NaN in a future version as general NaN becomes stable in pandas. .. seealso:: :obj:`PyRanges.new_position` give joined PyRanges new coordinates .. rubric:: Examples >>> f1 = pr.from_dict({'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [3, 8, 5], ... 'End': [6, 9, 7], 'Name': ['interval1', 'interval3', 'interval2']}) >>> f1 +--------------+-----------+-----------+------------+ | Chromosome | Start | End | Name | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 3 | 6 | interval1 | | chr1 | 8 | 9 | interval3 | | chr1 | 5 | 7 | interval2 | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> f2 = pr.from_dict({'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6], ... 'End': [2, 7], 'Name': ['a', 'b']}) >>> f2 +--------------+-----------+-----------+------------+ | Chromosome | Start | End | Name | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 1 | 2 | a | | chr1 | 6 | 7 | b | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> f1.join(f2) +--------------+-----------+-----------+------------+-----------+-----------+------------+ | Chromosome | Start | End | Name | Start_b | End_b | Name_b | | (category) | (int64) | (int64) | (object) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------+-----------+-----------+------------| | chr1 | 5 | 7 | interval2 | 6 | 7 | b | +--------------+-----------+-----------+------------+-----------+-----------+------------+ Unstranded PyRanges object has 1 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> f1.join(f2, how="right") +--------------+-----------+-----------+------------+-----------+-----------+------------+ | Chromosome | Start | End | Name | Start_b | End_b | Name_b | | (category) | (int64) | (int64) | (object) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------+-----------+-----------+------------| | chr1 | 5 | 7 | interval2 | 6 | 7 | b | | chr1 | -1 | -1 | -1 | 1 | 2 | a | +--------------+-----------+-----------+------------+-----------+-----------+------------+ Unstranded PyRanges object has 2 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. With slack 1, bookended features are joined (see row 1): >>> f1.join(f2, slack=1) +--------------+-----------+-----------+------------+-----------+-----------+------------+ | Chromosome | Start | End | Name | Start_b | End_b | Name_b | | (category) | (int64) | (int64) | (object) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------+-----------+-----------+------------| | chr1 | 3 | 6 | interval1 | 6 | 7 | b | | chr1 | 5 | 7 | interval2 | 6 | 7 | b | +--------------+-----------+-----------+------------+-----------+-----------+------------+ Unstranded PyRanges object has 2 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> f1.join(f2, how="right", preserve_order=True) +--------------+-----------+-----------+------------+-----------+-----------+------------+ | Chromosome | Start | End | Name | Start_b | End_b | Name_b | | (category) | (int64) | (int64) | (object) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------+-----------+-----------+------------| | chr1 | -1 | -1 | -1 | 1 | 2 | a | | chr1 | 5 | 7 | interval2 | 6 | 7 | b | +--------------+-----------+-----------+------------+-----------+-----------+------------+ Unstranded PyRanges object has 2 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: keys() Return the keys. :returns: * *Returns the keys (chromosomes or chromosome/strand pairs) as strings or tuples of strings* * *in natsorted order.* .. seealso:: :obj:`PyRanges.chromosomes` return the chromosomes .. rubric:: Examples >>> gr = pr.data.chipseq() >>> gr.keys() [('chr1', '+'), ('chr1', '-'), ('chr2', '+'), ('chr2', '-'), ('chr3', '+'), ('chr3', '-'), ('chr4', '+'), ('chr4', '-'), ('chr5', '+'), ('chr5', '-'), ('chr6', '+'), ('chr6', '-'), ('chr7', '+'), ('chr7', '-'), ('chr8', '+'), ('chr8', '-'), ('chr9', '+'), ('chr9', '-'), ('chr10', '+'), ('chr10', '-'), ('chr11', '+'), ('chr11', '-'), ('chr12', '+'), ('chr12', '-'), ('chr13', '+'), ('chr13', '-'), ('chr14', '+'), ('chr14', '-'), ('chr15', '+'), ('chr15', '-'), ('chr16', '+'), ('chr16', '-'), ('chr17', '+'), ('chr17', '-'), ('chr18', '+'), ('chr18', '-'), ('chr19', '+'), ('chr19', '-'), ('chr20', '+'), ('chr20', '-'), ('chr21', '+'), ('chr21', '-'), ('chr22', '+'), ('chr22', '-'), ('chrX', '+'), ('chrX', '-'), ('chrY', '+'), ('chrY', '-')] >>> gr.unstrand().keys() ['chr1', 'chr2', 'chr3', 'chr4', 'chr5', 'chr6', 'chr7', 'chr8', 'chr9', 'chr10', 'chr11', 'chr12', 'chr13', 'chr14', 'chr15', 'chr16', 'chr17', 'chr18', 'chr19', 'chr20', 'chr21', 'chr22', 'chrX', 'chrY'] .. py:method:: k_nearest(other, k=1, ties=None, strandedness=None, overlap=True, how=None, suffix='_b', nb_cpu=1, apply_strand_suffix=None) Find k nearest intervals. :param other: PyRanges to find nearest interval in. :type other: PyRanges :param k: Number of closest to return. If iterable, must be same length as PyRanges. :type k: int or list/array/Series of int :param ties: How to resolve ties, i.e. closest intervals with equal distance. None means that the k nearest intervals are kept. "first" means that the first tie is kept, "last" meanst that the last is kept. "different" means that all nearest intervals with the k unique nearest distances are kept. :type ties: {None, "first", "last", "different"}, default None :param strandedness: Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use "same" if both PyRanges are stranded, otherwise ignore the strand information. :type strandedness: {None, "same", "opposite", False}, default None, i.e. auto :param overlap: Whether to include overlaps. :type overlap: bool, default True :param how: Whether to only look for nearest in one direction. Always with respect to the PyRanges it is called on. :type how: {None, "upstream", "downstream"}, default None, i.e. both directions :param suffix: Suffix to give columns with shared name in other. :type suffix: str, default "_b" :param apply_strand_suffix: If first pyranges is unstranded, but the second is not, the first will be given a strand column. apply_strand_suffix makes the added strand column a regular data column instead by adding a suffix. :type apply_strand_suffix: bool, default None nb_cpu: int, default 1 How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :returns: A PyRanges with columns of nearest interval horizontally appended. :rtype: PyRanges .. rubric:: Notes nearest also exists, and is more performant. .. seealso:: :obj:`PyRanges.new_position` give joined PyRanges new coordinates :obj:`PyRanges.nearest` find nearest intervals .. rubric:: Examples >>> f1 = pr.from_dict({'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [3, 8, 5], ... 'End': [6, 9, 7], 'Strand': ['+', '+', '-']}) >>> f1 +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 3 | 6 | + | | chr1 | 8 | 9 | + | | chr1 | 5 | 7 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> f2 = pr.from_dict({'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6], ... 'End': [2, 7], 'Strand': ['+', '-']}) >>> f2 +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 1 | 2 | + | | chr1 | 6 | 7 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> f1.k_nearest(f2, k=2) +--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+ | Chromosome | Start | End | Strand | Start_b | End_b | Strand_b | Distance | | (category) | (int64) | (int64) | (category) | (int64) | (int64) | (category) | (int64) | |--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------| | chr1 | 3 | 6 | + | 6 | 7 | - | 1 | | chr1 | 3 | 6 | + | 1 | 2 | + | -2 | | chr1 | 8 | 9 | + | 6 | 7 | - | -2 | | chr1 | 8 | 9 | + | 1 | 2 | + | -7 | | chr1 | 5 | 7 | - | 6 | 7 | - | 0 | | chr1 | 5 | 7 | - | 1 | 2 | + | 4 | +--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+ Stranded PyRanges object has 6 rows and 8 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> f1.k_nearest(f2, how="upstream", k=2) +--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+ | Chromosome | Start | End | Strand | Start_b | End_b | Strand_b | Distance | | (category) | (int64) | (int64) | (category) | (int64) | (int64) | (category) | (int64) | |--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------| | chr1 | 3 | 6 | + | 1 | 2 | + | -2 | | chr1 | 8 | 9 | + | 6 | 7 | - | -2 | | chr1 | 8 | 9 | + | 1 | 2 | + | -7 | | chr1 | 5 | 7 | - | 6 | 7 | - | 0 | +--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+ Stranded PyRanges object has 4 rows and 8 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> f1.k_nearest(f2, k=[1, 2, 1]) +--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+ | Chromosome | Start | End | Strand | Start_b | End_b | Strand_b | Distance | | (category) | (int64) | (int64) | (category) | (int64) | (int64) | (category) | (int64) | |--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------| | chr1 | 3 | 6 | + | 6 | 7 | - | 1 | | chr1 | 8 | 9 | + | 6 | 7 | - | -2 | | chr1 | 8 | 9 | + | 1 | 2 | + | -7 | | chr1 | 5 | 7 | - | 6 | 7 | - | 0 | +--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+ Stranded PyRanges object has 4 rows and 8 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> d1 = {"Chromosome": [1], "Start": [5], "End": [6]} >>> d2 = {"Chromosome": 1, "Start": [1] * 2 + [5] * 2 + [9] * 2, ... "End": [3] * 2 + [7] * 2 + [11] * 2, "ID": range(6)} >>> gr, gr2 = pr.from_dict(d1), pr.from_dict(d2) >>> gr +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | 1 | 5 | 6 | +--------------+-----------+-----------+ Unstranded PyRanges object has 1 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr2 +--------------+-----------+-----------+-----------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------| | 1 | 1 | 3 | 0 | | 1 | 1 | 3 | 1 | | 1 | 5 | 7 | 2 | | 1 | 5 | 7 | 3 | | 1 | 9 | 11 | 4 | | 1 | 9 | 11 | 5 | +--------------+-----------+-----------+-----------+ Unstranded PyRanges object has 6 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.k_nearest(gr2, k=2) +--------------+-----------+-----------+-----------+-----------+-----------+------------+ | Chromosome | Start | End | Start_b | End_b | ID | Distance | | (category) | (int64) | (int64) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------+-----------+------------| | 1 | 5 | 6 | 5 | 7 | 2 | 0 | | 1 | 5 | 6 | 5 | 7 | 3 | 0 | +--------------+-----------+-----------+-----------+-----------+-----------+------------+ Unstranded PyRanges object has 2 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.k_nearest(gr2, k=2, ties="different") +--------------+-----------+-----------+-----------+-----------+-----------+------------+ | Chromosome | Start | End | Start_b | End_b | ID | Distance | | (category) | (int64) | (int64) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------+-----------+------------| | 1 | 5 | 6 | 5 | 7 | 2 | 0 | | 1 | 5 | 6 | 5 | 7 | 3 | 0 | | 1 | 5 | 6 | 1 | 3 | 1 | -3 | | 1 | 5 | 6 | 1 | 3 | 0 | -3 | +--------------+-----------+-----------+-----------+-----------+-----------+------------+ Unstranded PyRanges object has 4 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.k_nearest(gr2, k=3, ties="first") +--------------+-----------+-----------+-----------+-----------+-----------+------------+ | Chromosome | Start | End | Start_b | End_b | ID | Distance | | (category) | (int64) | (int64) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------+-----------+------------| | 1 | 5 | 6 | 5 | 7 | 2 | 0 | | 1 | 5 | 6 | 1 | 3 | 1 | -3 | | 1 | 5 | 6 | 9 | 11 | 4 | 4 | +--------------+-----------+-----------+-----------+-----------+-----------+------------+ Unstranded PyRanges object has 3 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.k_nearest(gr2, k=1, overlap=False) +--------------+-----------+-----------+-----------+-----------+-----------+------------+ | Chromosome | Start | End | Start_b | End_b | ID | Distance | | (category) | (int64) | (int64) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------+-----------+------------| | 1 | 5 | 6 | 1 | 3 | 1 | -3 | +--------------+-----------+-----------+-----------+-----------+-----------+------------+ Unstranded PyRanges object has 1 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: lengths(as_dict=False) Return the length of each interval. :param as_dict: Whether to return lengths as Series or dict of Series per key. :type as_dict: bool, default False :rtype: Series or dict of Series with the lengths of each interval. .. seealso:: :obj:`PyRanges.lengths` return the intervals lengths .. rubric:: Examples >>> gr = pr.data.f1() >>> gr +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 3 | 6 | interval1 | 0 | + | | chr1 | 8 | 9 | interval3 | 0 | + | | chr1 | 5 | 7 | interval2 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.lengths() 0 3 1 1 2 2 dtype: int64 To find the length of the genome covered by the intervals, use merge first: >>> gr.Length = gr.lengths() >>> gr +--------------+-----------+-----------+------------+-----------+--------------+-----------+ | Chromosome | Start | End | Name | Score | Strand | Length | | (category) | (int64) | (int64) | (object) | (int64) | (category) | (int64) | |--------------+-----------+-----------+------------+-----------+--------------+-----------| | chr1 | 3 | 6 | interval1 | 0 | + | 3 | | chr1 | 8 | 9 | interval3 | 0 | + | 1 | | chr1 | 5 | 7 | interval2 | 0 | - | 2 | +--------------+-----------+-----------+------------+-----------+--------------+-----------+ Stranded PyRanges object has 3 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: max_disjoint(strand=None, slack=0, **kwargs) Find the maximal disjoint set of intervals. :param strand: Find the max disjoint set separately for each strand. :type strand: bool, default None, i.e. auto :param slack: Consider intervals within a distance of slack to be overlapping. :type slack: int, default 0 :returns: PyRanges with maximal disjoint set of intervals. :rtype: PyRanges .. rubric:: Examples >>> gr = pr.data.f1() >>> gr +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 3 | 6 | interval1 | 0 | + | | chr1 | 8 | 9 | interval3 | 0 | + | | chr1 | 5 | 7 | interval2 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.max_disjoint(strand=False) +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 3 | 6 | interval1 | 0 | + | | chr1 | 8 | 9 | interval3 | 0 | + | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: merge(strand=None, count=False, count_col='Count', by=None, slack=0) Merge overlapping intervals into one. :param strand: Only merge intervals on same strand. :type strand: bool, default None, i.e. auto :param count: Count intervals in each superinterval. :type count: bool, default False :param count_col: Name of column with counts. :type count_col: str, default "Count" :param by: Only merge intervals with equal values in these columns. :type by: str or list of str, default None :param slack: Allow this many nucleotides between each interval to merge. :type slack: int, default 0 :returns: PyRanges with superintervals. :rtype: PyRanges .. rubric:: Notes To avoid losing metadata, use cluster instead. If you want to perform a reduction function on the metadata, use pandas groupby. .. seealso:: :obj:`PyRanges.cluster` annotate overlapping intervals with common ID .. rubric:: Examples >>> gr = pr.data.ensembl_gtf()[["Feature", "gene_name"]] >>> gr +--------------+--------------+-----------+-----------+--------------+-------------+ | Chromosome | Feature | Start | End | Strand | gene_name | | (category) | (category) | (int64) | (int64) | (category) | (object) | |--------------+--------------+-----------+-----------+--------------+-------------| | 1 | gene | 11868 | 14409 | + | DDX11L1 | | 1 | transcript | 11868 | 14409 | + | DDX11L1 | | 1 | exon | 11868 | 12227 | + | DDX11L1 | | 1 | exon | 12612 | 12721 | + | DDX11L1 | | ... | ... | ... | ... | ... | ... | | 1 | gene | 1173055 | 1179555 | - | TTLL10-AS1 | | 1 | transcript | 1173055 | 1179555 | - | TTLL10-AS1 | | 1 | exon | 1179364 | 1179555 | - | TTLL10-AS1 | | 1 | exon | 1173055 | 1176396 | - | TTLL10-AS1 | +--------------+--------------+-----------+-----------+--------------+-------------+ Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.merge(count=True, count_col="Count") +--------------+-----------+-----------+--------------+-----------+ | Chromosome | Start | End | Strand | Count | | (category) | (int64) | (int64) | (category) | (int64) | |--------------+-----------+-----------+--------------+-----------| | 1 | 11868 | 14409 | + | 12 | | 1 | 29553 | 31109 | + | 11 | | 1 | 52472 | 53312 | + | 3 | | 1 | 57597 | 64116 | + | 7 | | ... | ... | ... | ... | ... | | 1 | 1062207 | 1063288 | - | 4 | | 1 | 1070966 | 1074306 | - | 10 | | 1 | 1081817 | 1116361 | - | 319 | | 1 | 1173055 | 1179555 | - | 4 | +--------------+-----------+-----------+--------------+-----------+ Stranded PyRanges object has 62 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.merge(by="Feature", count=True) +--------------+-----------+-----------+--------------+--------------+-----------+ | Chromosome | Start | End | Strand | Feature | Count | | (category) | (int64) | (int64) | (category) | (category) | (int64) | |--------------+-----------+-----------+--------------+--------------+-----------| | 1 | 65564 | 65573 | + | CDS | 1 | | 1 | 69036 | 70005 | + | CDS | 2 | | 1 | 924431 | 924948 | + | CDS | 1 | | 1 | 925921 | 926013 | + | CDS | 11 | | ... | ... | ... | ... | ... | ... | | 1 | 1062207 | 1063288 | - | transcript | 1 | | 1 | 1070966 | 1074306 | - | transcript | 1 | | 1 | 1081817 | 1116361 | - | transcript | 19 | | 1 | 1173055 | 1179555 | - | transcript | 1 | +--------------+-----------+-----------+--------------+--------------+-----------+ Stranded PyRanges object has 748 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.merge(by=["Feature", "gene_name"], count=True) +--------------+-----------+-----------+--------------+--------------+-------------+-----------+ | Chromosome | Start | End | Strand | Feature | gene_name | Count | | (category) | (int64) | (int64) | (category) | (category) | (object) | (int64) | |--------------+-----------+-----------+--------------+--------------+-------------+-----------| | 1 | 1020172 | 1020373 | + | CDS | AGRN | 1 | | 1 | 1022200 | 1022462 | + | CDS | AGRN | 2 | | 1 | 1034555 | 1034703 | + | CDS | AGRN | 2 | | 1 | 1035276 | 1035324 | + | CDS | AGRN | 4 | | ... | ... | ... | ... | ... | ... | ... | | 1 | 347981 | 348366 | - | transcript | RPL23AP24 | 1 | | 1 | 1173055 | 1179555 | - | transcript | TTLL10-AS1 | 1 | | 1 | 14403 | 29570 | - | transcript | WASH7P | 1 | | 1 | 185216 | 195411 | - | transcript | WASH9P | 1 | +--------------+-----------+-----------+--------------+--------------+-------------+-----------+ Stranded PyRanges object has 807 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: mp(n=8, formatting=None) Merge location and print. .. seealso:: :obj:`PyRanges.print` print PyRanges. .. py:method:: mpc(n=8, formatting=None) Merge location, print and return self. .. seealso:: :obj:`PyRanges.print` print PyRanges. .. py:method:: msp(n=30, formatting=None) Sort on location, merge location info and print. .. seealso:: :obj:`PyRanges.print` print PyRanges. .. py:method:: mspc(n=30, formatting=None) Sort on location, merge location, print and return self. .. seealso:: :obj:`PyRanges.print` print PyRanges. .. py:method:: nearest(other, strandedness=None, overlap=True, how=None, suffix='_b', nb_cpu=1, apply_strand_suffix=None) Find closest interval. :param other: PyRanges to find nearest interval in. :type other: PyRanges :param strandedness: Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use "same" if both PyRanges are strande, otherwise ignore the strand information. :type strandedness: {None, "same", "opposite", False}, default None, i.e. auto :param overlap: Whether to include overlaps. :type overlap: bool, default True :param how: Whether to only look for nearest in one direction. Always with respect to the PyRanges it is called on. :type how: {None, "upstream", "downstream"}, default None, i.e. both directions :param suffix: Suffix to give columns with shared name in other. :type suffix: str, default "_b" :param apply_strand_suffix: If first pyranges is unstranded, but the second is not, the first will be given the strand column of the second. apply_strand_suffix makes the added strand column a regular data column instead by adding a suffix. :type apply_strand_suffix: bool, default None :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :returns: A PyRanges with columns representing nearest interval horizontally appended. :rtype: PyRanges .. rubric:: Notes A k_nearest also exists, but is less performant. .. seealso:: :obj:`PyRanges.new_position` give joined PyRanges new coordinates :obj:`PyRanges.k_nearest` find k nearest intervals .. rubric:: Examples >>> f1 = pr.from_dict({'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [3, 8, 5], ... 'End': [6, 9, 7], 'Strand': ['+', '+', '-']}) >>> f1 +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 3 | 6 | + | | chr1 | 8 | 9 | + | | chr1 | 5 | 7 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> f2 = pr.from_dict({'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6], ... 'End': [2, 7], 'Strand': ['+', '-']}) >>> f2 +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 1 | 2 | + | | chr1 | 6 | 7 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> f1.nearest(f2) +--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+ | Chromosome | Start | End | Strand | Start_b | End_b | Strand_b | Distance | | (category) | (int64) | (int64) | (category) | (int64) | (int64) | (category) | (int64) | |--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------| | chr1 | 3 | 6 | + | 6 | 7 | - | 1 | | chr1 | 8 | 9 | + | 6 | 7 | - | 2 | | chr1 | 5 | 7 | - | 6 | 7 | - | 0 | +--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+ Stranded PyRanges object has 3 rows and 8 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> f1.nearest(f2, how="upstream") +--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+ | Chromosome | Start | End | Strand | Start_b | End_b | Strand_b | Distance | | (category) | (int64) | (int64) | (category) | (int64) | (int64) | (category) | (int64) | |--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------| | chr1 | 3 | 6 | + | 1 | 2 | + | 2 | | chr1 | 8 | 9 | + | 6 | 7 | - | 2 | | chr1 | 5 | 7 | - | 6 | 7 | - | 0 | +--------------+-----------+-----------+--------------+-----------+-----------+--------------+------------+ Stranded PyRanges object has 3 rows and 8 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: new_position(new_pos, columns=None) Give new position. The operation join produces a PyRanges with two pairs of start coordinates and two pairs of end coordinates. This operation uses these to give the PyRanges a new position. :param new_pos: Change of coordinates. :type new_pos: {"union", "intersection", "swap"} :param columns: The name of the coordinate columns. By default uses the two first columns containing "Start" and the two first columns containing "End". :type columns: tuple of str, default None, i.e. auto .. seealso:: :obj:`PyRanges.join` combine two PyRanges horizontally with SQL-style joins. :returns: PyRanges with new coordinates. :rtype: PyRanges .. rubric:: Examples >>> gr = pr.from_dict({'Chromosome': ['chr1', 'chr1', 'chr1'], ... 'Start': [3, 8, 5], 'End': [6, 9, 7]}) >>> gr +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 3 | 6 | | chr1 | 8 | 9 | | chr1 | 5 | 7 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr2 = pr.from_dict({'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6], ... 'End': [4, 7]}) >>> gr2 +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 1 | 4 | | chr1 | 6 | 7 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> j = gr.join(gr2) >>> j +--------------+-----------+-----------+-----------+-----------+ | Chromosome | Start | End | Start_b | End_b | | (category) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------| | chr1 | 3 | 6 | 1 | 4 | | chr1 | 5 | 7 | 6 | 7 | +--------------+-----------+-----------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> j.new_position("swap") +--------------+-----------+-----------+-----------+-----------+ | Chromosome | Start | End | Start_b | End_b | | (category) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------| | chr1 | 1 | 4 | 3 | 6 | | chr1 | 6 | 7 | 5 | 7 | +--------------+-----------+-----------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> j.new_position("union").mp() +--------------------+-----------+-----------+ | - Position - | Start_b | End_b | | (Multiple types) | (int64) | (int64) | |--------------------+-----------+-----------| | chr1 1-6 | 1 | 4 | | chr1 5-7 | 6 | 7 | +--------------------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> j.new_position("intersection").mp() +--------------------+-----------+-----------+ | - Position - | Start_b | End_b | | (Multiple types) | (int64) | (int64) | |--------------------+-----------+-----------| | chr1 1-4 | 1 | 4 | | chr1 6-7 | 6 | 7 | +--------------------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> j2 = pr.from_dict({"Chromosome": [1], "Start": [3], ... "End": [4], "A": [1], "B": [3], "C": [2], "D": [5]}) >>> j2 +--------------+-----------+-----------+-----------+-----------+-----------+-----------+ | Chromosome | Start | End | A | B | C | D | | (category) | (int64) | (int64) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------+-----------+-----------| | 1 | 3 | 4 | 1 | 3 | 2 | 5 | +--------------+-----------+-----------+-----------+-----------+-----------+-----------+ Unstranded PyRanges object has 1 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> j2.new_position("intersection", ("A", "B", "C", "D")) +--------------+-----------+-----------+-----------+-----------+-----------+-----------+ | Chromosome | Start | End | A | B | C | D | | (category) | (int64) | (int64) | (int64) | (int64) | (int64) | (int64) | |--------------+-----------+-----------+-----------+-----------+-----------+-----------| | 1 | 2 | 3 | 1 | 3 | 2 | 5 | +--------------+-----------+-----------+-----------+-----------+-----------+-----------+ Unstranded PyRanges object has 1 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: overlap(other, strandedness=None, how='first', invert=False, nb_cpu=1) Return overlapping intervals. Returns the intervals in self which overlap with those in other. :param other: PyRanges to find overlaps with. :type other: PyRanges :param strandedness: Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use "same" if both PyRanges are strande, otherwise ignore the strand information. :type strandedness: {None, "same", "opposite", False}, default None, i.e. auto :param how: What intervals to report. By default, reports every interval in self with overlap once. "containment" reports all intervals where the overlapping is contained within it. :type how: {"first", "containment", False, None}, default "first" :param invert: Whether to return the intervals without overlaps. :type invert: bool, default False :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :returns: A PyRanges with overlapping intervals. :rtype: PyRanges .. seealso:: :obj:`PyRanges.intersect` report overlapping subintervals :obj:`PyRanges.set_intersect` set-intersect PyRanges .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10], ... "End": [3, 9, 11], "ID": ["a", "b", "c"]}) >>> gr +--------------+-----------+-----------+------------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 1 | 3 | a | | chr1 | 4 | 9 | b | | chr1 | 10 | 11 | c | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]}) >>> gr2 +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 2 | 3 | | chr1 | 2 | 9 | | chr1 | 9 | 10 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.overlap(gr2) +--------------+-----------+-----------+------------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 1 | 3 | a | | chr1 | 4 | 9 | b | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.overlap(gr2, how=None) +--------------+-----------+-----------+------------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 1 | 3 | a | | chr1 | 1 | 3 | a | | chr1 | 4 | 9 | b | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.overlap(gr2, how="containment") +--------------+-----------+-----------+------------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 4 | 9 | b | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 1 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.overlap(gr2, invert=True) +--------------+-----------+-----------+------------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 10 | 11 | c | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 1 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: pc(n=8, formatting=None) Print and return self. .. seealso:: :obj:`PyRanges.print` print PyRanges. .. py:method:: print(n=8, merge_position=False, sort=False, formatting=None, chain=False) Print the PyRanges. :param n: The number of rows to print. :type n: int, default 8 :param merge_postion: Print location in same column to save screen space. :type merge_postion: bool, default False :param sort: Sort the PyRanges before printing. Will print chromosomsomes or strands interleaved on sort columns. :type sort: bool or str, default False :param formatting: Formatting options per column. :type formatting: dict, default None :param chain: Return the PyRanges. Useful to print intermediate results in call chains. :type chain: False .. seealso:: :obj:`PyRanges.pc` print chain :obj:`PyRanges.sp` sort print :obj:`PyRanges.mp` merge print :obj:`PyRanges.spc` sort print chain :obj:`PyRanges.mpc` merge print chain :obj:`PyRanges.msp` merge sort print :obj:`PyRanges.mspc` merge sort print chain :obj:`PyRanges.rp` raw print dictionary of DataFrames .. rubric:: Examples >>> d = {'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [3, 8, 5000], ... 'End': [6, 9, 7000], 'Name': ['i1', 'i3', 'i2'], ... 'Score': [1.1, 2.3987, 5.9999995], 'Strand': ['+', '+', '-']} >>> gr = pr.from_dict(d) >>> gr +--------------+-----------+-----------+------------+-------------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (float64) | (category) | |--------------+-----------+-----------+------------+-------------+--------------| | chr1 | 3 | 6 | i1 | 1.1 | + | | chr1 | 8 | 9 | i3 | 2.3987 | + | | chr1 | 5000 | 7000 | i2 | 6 | - | +--------------+-----------+-----------+------------+-------------+--------------+ Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.print(formatting={"Start": "{:,}", "Score": "{:.2f}"}) +--------------+-----------+-----------+------------+-------------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (float64) | (category) | |--------------+-----------+-----------+------------+-------------+--------------| | chr1 | 3 | 6 | i1 | 1.1 | + | | chr1 | 8 | 9 | i3 | 2.4 | + | | chr1 | 5,000 | 7000 | i2 | 6 | - | +--------------+-----------+-----------+------------+-------------+--------------+ Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.print(merge_position=True) # gr.mp() +--------------------+------------+-------------+ | - Position - | Name | Score | | (Multiple types) | (object) | (float64) | |--------------------+------------+-------------| | chr1 3-6 + | i1 | 1.1 | | chr1 8-9 + | i3 | 2.3987 | | chr1 5000-7000 - | i2 | 6 | +--------------------+------------+-------------+ Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> chipseq = pr.data.chipseq() >>> chipseq +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 212609534 | 212609559 | U0 | 0 | + | | chr1 | 169887529 | 169887554 | U0 | 0 | + | | chr1 | 216711011 | 216711036 | U0 | 0 | + | | chr1 | 144227079 | 144227104 | U0 | 0 | + | | ... | ... | ... | ... | ... | ... | | chrY | 15224235 | 15224260 | U0 | 0 | - | | chrY | 13517892 | 13517917 | U0 | 0 | - | | chrY | 8010951 | 8010976 | U0 | 0 | - | | chrY | 7405376 | 7405401 | U0 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. To interleave strands in output, use print with `sort=True`: >>> chipseq.print(sort=True, n=20) # chipseq.sp() +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 1325303 | 1325328 | U0 | 0 | - | | chr1 | 1541598 | 1541623 | U0 | 0 | + | | chr1 | 1599121 | 1599146 | U0 | 0 | + | | chr1 | 1820285 | 1820310 | U0 | 0 | - | | chr1 | 2448322 | 2448347 | U0 | 0 | - | | chr1 | 3046141 | 3046166 | U0 | 0 | - | | chr1 | 3437168 | 3437193 | U0 | 0 | - | | chr1 | 3504032 | 3504057 | U0 | 0 | + | | chr1 | 3637087 | 3637112 | U0 | 0 | - | | chr1 | 3681903 | 3681928 | U0 | 0 | - | | ... | ... | ... | ... | ... | ... | | chrY | 15224235 | 15224260 | U0 | 0 | - | | chrY | 15548022 | 15548047 | U0 | 0 | + | | chrY | 16045242 | 16045267 | U0 | 0 | - | | chrY | 16495497 | 16495522 | U0 | 0 | - | | chrY | 21559181 | 21559206 | U0 | 0 | + | | chrY | 21707662 | 21707687 | U0 | 0 | - | | chrY | 21751211 | 21751236 | U0 | 0 | - | | chrY | 21910706 | 21910731 | U0 | 0 | - | | chrY | 22054002 | 22054027 | U0 | 0 | - | | chrY | 22210637 | 22210662 | U0 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome, Start, End and Strand. >>> pr.data.chromsizes().print() +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 0 | 249250621 | | chr2 | 0 | 243199373 | | chr3 | 0 | 198022430 | | chr4 | 0 | 191154276 | | ... | ... | ... | | chr22 | 0 | 51304566 | | chrM | 0 | 16571 | | chrX | 0 | 155270560 | | chrY | 0 | 59373566 | +--------------+-----------+-----------+ Unstranded PyRanges object has 25 rows and 3 columns from 25 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: rp() Print dict of DataFrames. .. seealso:: :obj:`PyRanges.print` print PyRanges. .. py:method:: rpc() Print dict of DataFrames and return self. .. seealso:: :obj:`PyRanges.print` print PyRanges. .. py:method:: sample(n=8, replace=False) Subsample arbitrary rows of PyRanges. If n is larger than length of PyRanges, replace must be True. :param n: Number of rows to return :type n: int, default 8 :param replace: Reuse rows. :type replace: bool, False .. rubric:: Examples >>> gr = pr.data.chipseq() >>> np.random.seed(0) >>> gr.sample(n=3) +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr2 | 76564764 | 76564789 | U0 | 0 | + | | chr3 | 185739979 | 185740004 | U0 | 0 | - | | chr20 | 40373657 | 40373682 | U0 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 3 rows and 6 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.sample(10001) Traceback (most recent call last): ... ValueError: Cannot take a larger sample than population when 'replace=False' .. py:method:: set_intersect(other, strandedness=None, how=None, new_pos=False, nb_cpu=1) Return set-theoretical intersection. Like intersect, but both PyRanges are merged first. :param other: PyRanges to set-intersect. :type other: PyRanges :param strandedness: Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use "same" if both PyRanges are strande, otherwise ignore the strand information. :type strandedness: {None, "same", "opposite", False}, default None, i.e. auto :param how: What intervals to report. By default, reports all overlapping intervals. "containment" reports intervals where the overlapping is contained within it. :type how: {None, "first", "last", "containment"}, default None, i.e. all :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :returns: A PyRanges with overlapping subintervals. :rtype: PyRanges .. seealso:: :obj:`PyRanges.intersect` find overlapping subintervals :obj:`PyRanges.overlap` report overlapping intervals .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10], ... "End": [3, 9, 11], "ID": ["a", "b", "c"]}) >>> gr +--------------+-----------+-----------+------------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 1 | 3 | a | | chr1 | 4 | 9 | b | | chr1 | 10 | 11 | c | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]}) >>> gr2 +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 2 | 3 | | chr1 | 2 | 9 | | chr1 | 9 | 10 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.set_intersect(gr2) +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 2 | 3 | | chr1 | 4 | 9 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. In this simple unstranded case, this is the same as the below: >>> gr.merge().intersect(gr2.merge()) +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 2 | 3 | | chr1 | 4 | 9 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.set_intersect(gr2, how="containment") +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 4 | 9 | +--------------+-----------+-----------+ Unstranded PyRanges object has 1 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: set_union(other, strandedness=None, nb_cpu=1) Return set-theoretical union. :param other: PyRanges to do union with. :type other: PyRanges :param strandedness: Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use "same" if both PyRanges are strande, otherwise ignore the strand information. :type strandedness: {None, "same", "opposite", False}, default None, i.e. auto :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :returns: A PyRanges with the union of intervals. :rtype: PyRanges .. seealso:: :obj:`PyRanges.set_intersect` set-theoretical intersection :obj:`PyRanges.overlap` report overlapping intervals .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10], ... "End": [3, 9, 11], "ID": ["a", "b", "c"]}) >>> gr +--------------+-----------+-----------+------------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 1 | 3 | a | | chr1 | 4 | 9 | b | | chr1 | 10 | 11 | c | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]}) >>> gr2 +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 2 | 3 | | chr1 | 2 | 9 | | chr1 | 9 | 10 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.set_union(gr2) +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 1 | 11 | +--------------+-----------+-----------+ Unstranded PyRanges object has 1 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: sort(by=None, nb_cpu=1) Sort by position or columns. :param by: Column(s) to sort by. Default is Start and End. Special value "5" can be provided to sort by 5': intervals on + strand are sorted in ascending order, while those on - strand are sorted in descending order. :type by: str or list of str, default None :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 .. note:: Since a PyRanges contains multiple DataFrames, the sorting only happens within dataframes. :returns: Sorted PyRanges :rtype: PyRanges .. seealso:: :obj:`pyranges.multioverlap` find overlaps with multiple PyRanges .. rubric:: Examples >>> p = pr.from_dict({"Chromosome": [1, 1, 1, 1, 1, 1], ... "Strand": ["+", "+", "-", "-", "+", "+"], ... "Start": [40, 1, 10, 70, 140, 160], ... "End": [60, 11, 25, 80, 152, 190], ... "transcript_id":["t3", "t3", "t2", "t2", "t1", "t1"] }) By default, intervals are sorted by position: >>> p.sort() +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 1 | 11 | t3 | | 1 | + | 40 | 60 | t3 | | 1 | + | 140 | 152 | t1 | | 1 | + | 160 | 190 | t1 | | 1 | - | 10 | 25 | t2 | | 1 | - | 70 | 80 | t2 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 6 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. (Note how sorting takes place within Chromosome-Strand pairs.) To sort according to a specified column: >>> p.sort(by='transcript_id') +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 140 | 152 | t1 | | 1 | + | 160 | 190 | t1 | | 1 | + | 40 | 60 | t3 | | 1 | + | 1 | 11 | t3 | | 1 | - | 10 | 25 | t2 | | 1 | - | 70 | 80 | t2 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 6 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. If the special value "5" is provided, intervals are sorted according to their five-prime end: >>> p.sort("5") +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 1 | 11 | t3 | | 1 | + | 40 | 60 | t3 | | 1 | + | 140 | 152 | t1 | | 1 | + | 160 | 190 | t1 | | 1 | - | 70 | 80 | t2 | | 1 | - | 10 | 25 | t2 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 6 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: sp(n=30, formatting=None) Sort on location and print. .. seealso:: :obj:`PyRanges.print` print PyRanges. .. py:method:: spc(n=30, formatting=None) Sort on location, print and return self. .. seealso:: :obj:`PyRanges.print` print PyRanges. .. py:method:: slack(slack) Deprecated: this function has been moved to Pyranges.extend .. py:method:: spliced_subsequence(start=0, end=None, by=None, strand=None, **kwargs) Get subsequences of the intervals, using coordinates mapping to spliced transcripts (without introns) The returned intervals are subregions of self, cut according to specifications. Start and end are relative to the 5' end: 0 means the leftmost nucleotide for + strand intervals, while it means the rightmost one for - strand. This method also allows to manipulate groups of intervals (e.g. exons belonging to same transcripts) through the 'by' argument. When using it, start and end refer to the spliced transcript coordinates, meaning that introns are ignored in the count. :param start: Start of subregion, 0-based and included, counting from the 5' end. Use a negative int to count from the 3' (e.g. -1 is the last nucleotide) :type start: int :param end: End of subregion, 0-based and excluded, counting from the 5' end. Use a negative int to count from the 3' (e.g. -1 is the last nucleotide) If None, the existing 3' end is returned. :type end: int, default None :param by: intervals are grouped by this/these ID column(s) beforehand, e.g. exons belonging to same transcripts :type by: list of str, default None strand : bool, default None, i.e. auto Whether strand is considered when interpreting the start and end arguments of this function. If True, counting is from the 5' end, which is the leftmost coordinate for + strand and the rightmost for - strand. If False, all intervals are processed like they reside on the + strand. If None (default), strand is considered if the PyRanges is stranded. :returns: Subregion of self, subsequenced as specified by arguments :rtype: PyRanges .. note:: If the request goes out of bounds (e.g. requesting 100 nts for a 90nt region), only the existing portion is returned .. seealso:: :obj:`subsequence` analogous to this method, but input coordinates refer to the unspliced transcript .. rubric:: Examples >>> p = pr.from_dict({"Chromosome": [1, 1, 2, 2, 3], ... "Strand": ["+", "+", "-", "-", "+"], ... "Start": [1, 40, 10, 70, 140], ... "End": [11, 60, 25, 80, 152], ... "transcript_id":["t1", "t1", "t2", "t2", "t3"] }) >>> p +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 1 | 11 | t1 | | 1 | + | 40 | 60 | t1 | | 2 | - | 10 | 25 | t2 | | 2 | - | 70 | 80 | t2 | | 3 | + | 140 | 152 | t3 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 5 rows and 5 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. # Get the first 15 nucleotides of *each spliced transcript*, grouping exons by transcript_id: >>> p.spliced_subsequence(0, 15, by='transcript_id') +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 1 | 11 | t1 | | 1 | + | 40 | 45 | t1 | | 2 | - | 70 | 80 | t2 | | 2 | - | 20 | 25 | t2 | | 3 | + | 140 | 152 | t3 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 5 rows and 5 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. # Get the last 20 nucleotides of each spliced transcript: >>> p.spliced_subsequence(-20, by='transcript_id') +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 40 | 60 | t1 | | 2 | - | 70 | 75 | t2 | | 2 | - | 10 | 25 | t2 | | 3 | + | 140 | 152 | t3 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 4 rows and 5 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. # Get region from 25 to 60 of each spliced transcript, or their existing subportion: >>> p.spliced_subsequence(25, 60, by='transcript_id') +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 55 | 60 | t1 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 1 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. # Get region of each spliced transcript which excludes their first and last 3 nucleotides: >>> p.spliced_subsequence(3, -3, by='transcript_id') +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 4 | 11 | t1 | | 1 | + | 40 | 57 | t1 | | 2 | - | 70 | 77 | t2 | | 2 | - | 13 | 25 | t2 | | 3 | + | 143 | 149 | t3 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 5 rows and 5 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: split(strand=None, between=False, nb_cpu=1) Split into non-overlapping intervals. :param strand: Whether to ignore strand information if PyRanges is stranded. :type strand: bool, default None, i.e. auto :param between: Include lengths between intervals. :type between: bool, default False :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :returns: PyRanges with intervals split at overlap points. :rtype: PyRanges .. seealso:: :obj:`pyranges.multioverlap` find overlaps with multiple PyRanges .. rubric:: Examples >>> d = {'Chromosome': ['chr1', 'chr1', 'chr1', 'chr1'], 'Start': [3, 5, 5, 11], ... 'End': [6, 9, 7, 12], 'Strand': ['+', '+', '-', '-']} >>> gr = pr.from_dict(d) >>> gr +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 3 | 6 | + | | chr1 | 5 | 9 | + | | chr1 | 5 | 7 | - | | chr1 | 11 | 12 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 4 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.split() +--------------+-----------+-----------+------------+ | Chromosome | Start | End | Strand | | (object) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 3 | 5 | + | | chr1 | 5 | 6 | + | | chr1 | 6 | 9 | + | | chr1 | 5 | 7 | - | | chr1 | 11 | 12 | - | +--------------+-----------+-----------+------------+ Stranded PyRanges object has 5 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.split(between=True) +--------------+-----------+-----------+------------+ | Chromosome | Start | End | Strand | | (object) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 3 | 5 | + | | chr1 | 5 | 6 | + | | chr1 | 6 | 9 | + | | chr1 | 5 | 7 | - | | chr1 | 7 | 11 | - | | chr1 | 11 | 12 | - | +--------------+-----------+-----------+------------+ Stranded PyRanges object has 6 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.split(strand=False) +--------------+-----------+-----------+ | Chromosome | Start | End | | (object) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 3 | 5 | | chr1 | 5 | 6 | | chr1 | 6 | 7 | | chr1 | 7 | 9 | | chr1 | 11 | 12 | +--------------+-----------+-----------+ Unstranded PyRanges object has 5 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.split(strand=False, between=True) +--------------+-----------+-----------+ | Chromosome | Start | End | | (object) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 3 | 5 | | chr1 | 5 | 6 | | chr1 | 6 | 7 | | chr1 | 7 | 9 | | chr1 | 9 | 11 | | chr1 | 11 | 12 | +--------------+-----------+-----------+ Unstranded PyRanges object has 6 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: subset(f, strand=None, **kwargs) Return a subset of the rows. :param f: Function which returns boolean Series equal to length of df. :type f: function :param strand: Whether to do operations on chromosome/strand pairs or chromosomes. If None, will use chromosome/strand pairs if the PyRanges is stranded. :type strand: bool, default None, i.e. auto :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :param \*\*kwargs: Additional keyword arguments to pass as keyword arguments to `f` .. rubric:: Notes PyRanges can also be subsetted directly with a boolean Series. This function is slightly faster, but more cumbersome. :returns: PyRanges subset on rows. :rtype: PyRanges .. rubric:: Examples >>> gr = pr.data.f1() >>> gr +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 3 | 6 | interval1 | 0 | + | | chr1 | 8 | 9 | interval3 | 0 | + | | chr1 | 5 | 7 | interval2 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.subset(lambda df: df.Start > 4) +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 8 | 9 | interval3 | 0 | + | | chr1 | 5 | 7 | interval2 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. Also possible: >>> gr[gr.Start > 4] +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 8 | 9 | interval3 | 0 | + | | chr1 | 5 | 7 | interval2 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 2 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: subsequence(start=0, end=None, by=None, strand=None, **kwargs) Get subsequences of the intervals. The returned intervals are subregions of self, cut according to specifications. Start and end are relative to the 5' end: 0 means the leftmost nucleotide for + strand intervals, while it means the rightmost one for - strand. This method also allows to manipulate groups of intervals (e.g. exons belonging to same transcripts) through the 'by' argument. When using it, start and end refer to the unspliced transcript coordinates, meaning that introns are included in the count. :param start: Start of subregion, 0-based and included, counting from the 5' end. Use a negative int to count from the 3' (e.g. -1 is the last nucleotide) :type start: int :param end: End of subregion, 0-based and excluded, counting from the 5' end. Use a negative int to count from the 3' (e.g. -1 is the last nucleotide) If None, the existing 3' end is returned. :type end: int, default None :param by: intervals are grouped by this/these ID column(s) beforehand, e.g. exons belonging to same transcripts :type by: list of str, default None :param strand: Whether strand is considered when interpreting the start and end arguments of this function. If True, counting is from the 5' end, which is the leftmost coordinate for + strand and the rightmost for - strand. If False, all intervals are processed like they reside on the + strand. If None (default), strand is considered if the PyRanges is stranded. :type strand: bool, default None, i.e. auto :returns: Subregion of self, subsequenced as specified by arguments :rtype: PyRanges .. note:: If the request goes out of bounds (e.g. requesting 100 nts for a 90nt region), only the existing portion is returned .. seealso:: :obj:`spliced_subsequence` analogous to this method, but intronic regions are not counted, so that input coordinates refer to the spliced transcript .. rubric:: Examples >>> p = pr.from_dict({"Chromosome": [1, 1, 2, 2, 3], ... "Strand": ["+", "+", "-", "-", "+"], ... "Start": [1, 40, 2, 30, 140], ... "End": [20, 60, 13, 45, 155], ... "transcript_id":["t1", "t1", "t2", "t2", "t3"] }) >>> p +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 1 | 20 | t1 | | 1 | + | 40 | 60 | t1 | | 2 | - | 2 | 13 | t2 | | 2 | - | 30 | 45 | t2 | | 3 | + | 140 | 155 | t3 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 5 rows and 5 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. # Get the first 10 nucleotides (at the 5') of *each interval* (each line of the dataframe): >>> p.subsequence(0, 10) +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 1 | 11 | t1 | | 1 | + | 40 | 50 | t1 | | 2 | - | 3 | 13 | t2 | | 2 | - | 35 | 45 | t2 | | 3 | + | 140 | 150 | t3 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 5 rows and 5 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. # Get the first 10 nucleotides of *each transcript*, grouping exons by transcript_id: >>> p.subsequence(0, 10, by='transcript_id') +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 1 | 11 | t1 | | 2 | - | 35 | 45 | t2 | | 3 | + | 140 | 150 | t3 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 3 rows and 5 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. # Get the last 20 nucleotides of each transcript: >>> p.subsequence(-20, by='transcript_id') +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 40 | 60 | t1 | | 2 | - | 2 | 13 | t2 | | 3 | + | 140 | 155 | t3 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 3 rows and 5 columns from 3 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. # Get region from 30 to 330 of each transcript, or their existing subportion: >>> p.subsequence(30, 300, by='transcript_id') +--------------+--------------+-----------+-----------+-----------------+ | Chromosome | Strand | Start | End | transcript_id | | (category) | (category) | (int64) | (int64) | (object) | |--------------+--------------+-----------+-----------+-----------------| | 1 | + | 40 | 60 | t1 | | 2 | - | 2 | 13 | t2 | +--------------+--------------+-----------+-----------+-----------------+ Stranded PyRanges object has 2 rows and 5 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: subtract(other, strandedness=None, nb_cpu=1) Subtract intervals. :param strandedness: Whether to compare PyRanges on the same strand, the opposite or ignore strand information. The default, None, means use "same" if both PyRanges are strande, otherwise ignore the strand information. :type strandedness: {None, "same", "opposite", False}, default None, i.e. auto :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 .. seealso:: :obj:`pyranges.PyRanges.overlap` use with invert=True to return all intervals without overlap .. rubric:: Examples >>> gr = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [1, 4, 10], ... "End": [3, 9, 11], "ID": ["a", "b", "c"]}) >>> gr2 = pr.from_dict({"Chromosome": ["chr1"] * 3, "Start": [2, 2, 9], "End": [3, 9, 10]}) >>> gr +--------------+-----------+-----------+------------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 1 | 3 | a | | chr1 | 4 | 9 | b | | chr1 | 10 | 11 | c | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 3 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr2 +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 2 | 3 | | chr1 | 2 | 9 | | chr1 | 9 | 10 | +--------------+-----------+-----------+ Unstranded PyRanges object has 3 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.subtract(gr2) +--------------+-----------+-----------+------------+ | Chromosome | Start | End | ID | | (category) | (int64) | (int64) | (object) | |--------------+-----------+-----------+------------| | chr1 | 1 | 2 | a | | chr1 | 10 | 11 | c | +--------------+-----------+-----------+------------+ Unstranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: summary(to_stdout=True, return_df=False) Return info. Count refers to the number of intervals, the rest to the lengths. The column "pyrange" describes the data as is. "coverage_forward" and "coverage_reverse" describe the data after strand-specific merging of overlapping intervals. "coverage_unstranded" describes the data after merging, without considering the strands. The row "count" is the number of intervals and "sum" is their total length. The rest describe the lengths of the intervals. :param to_stdout: Print summary. :type to_stdout: bool, default True :param return_df: Return df with summary. :type return_df: bool, default False :rtype: None or DataFrame with summary. .. rubric:: Examples >>> gr = pr.data.ensembl_gtf()[["Feature", "gene_id"]] >>> gr +--------------+--------------+-----------+-----------+--------------+-----------------+ | Chromosome | Feature | Start | End | Strand | gene_id | | (category) | (category) | (int64) | (int64) | (category) | (object) | |--------------+--------------+-----------+-----------+--------------+-----------------| | 1 | gene | 11868 | 14409 | + | ENSG00000223972 | | 1 | transcript | 11868 | 14409 | + | ENSG00000223972 | | 1 | exon | 11868 | 12227 | + | ENSG00000223972 | | 1 | exon | 12612 | 12721 | + | ENSG00000223972 | | ... | ... | ... | ... | ... | ... | | 1 | gene | 1173055 | 1179555 | - | ENSG00000205231 | | 1 | transcript | 1173055 | 1179555 | - | ENSG00000205231 | | 1 | exon | 1179364 | 1179555 | - | ENSG00000205231 | | 1 | exon | 1173055 | 1176396 | - | ENSG00000205231 | +--------------+--------------+-----------+-----------+--------------+-----------------+ Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.summary() +-------+------------------+--------------------+--------------------+-----------------------+ | | pyrange | coverage_forward | coverage_reverse | coverage_unstranded | |-------+------------------+--------------------+--------------------+-----------------------| | count | 2446 | 39 | 23 | 32 | | mean | 2291.92 | 7058.1 | 30078.6 | 27704.2 | | std | 11906.9 | 10322.3 | 59467.7 | 67026.9 | | min | 1 | 83 | 154 | 83 | | 25% | 90 | 1051 | 1204 | 1155 | | 50% | 138 | 2541 | 6500 | 6343 | | 75% | 382.25 | 7168 | 23778 | 20650.8 | | max | 241726 | 43065 | 241726 | 291164 | | sum | 5.60603e+06 | 275266 | 691807 | 886534 | +-------+------------------+--------------------+--------------------+-----------------------+ >>> gr.summary(return_df=True, to_stdout=False) pyrange coverage_forward coverage_reverse coverage_unstranded count 2.446000e+03 39.000000 23.000000 32.000000 mean 2.291918e+03 7058.102564 30078.565217 27704.187500 std 1.190685e+04 10322.309347 59467.695265 67026.868647 min 1.000000e+00 83.000000 154.000000 83.000000 25% 9.000000e+01 1051.000000 1204.000000 1155.000000 50% 1.380000e+02 2541.000000 6500.000000 6343.000000 75% 3.822500e+02 7168.000000 23778.000000 20650.750000 max 2.417260e+05 43065.000000 241726.000000 291164.000000 sum 5.606031e+06 275266.000000 691807.000000 886534.000000 .. py:method:: tail(n=8) Return the n last rows. :param n: Return n rows. :type n: int, default 8 :returns: PyRanges with the n last rows. :rtype: PyRanges .. seealso:: :obj:`PyRanges.head` return the first rows :obj:`PyRanges.sample` return random rows .. rubric:: Examples >>> gr = pr.data.chipseq() >>> gr +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 212609534 | 212609559 | U0 | 0 | + | | chr1 | 169887529 | 169887554 | U0 | 0 | + | | chr1 | 216711011 | 216711036 | U0 | 0 | + | | chr1 | 144227079 | 144227104 | U0 | 0 | + | | ... | ... | ... | ... | ... | ... | | chrY | 15224235 | 15224260 | U0 | 0 | - | | chrY | 13517892 | 13517917 | U0 | 0 | - | | chrY | 8010951 | 8010976 | U0 | 0 | - | | chrY | 7405376 | 7405401 | U0 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.tail(3) +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chrY | 13517892 | 13517917 | U0 | 0 | - | | chrY | 8010951 | 8010976 | U0 | 0 | - | | chrY | 7405376 | 7405401 | U0 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 3 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: tile(tile_size, overlap=False, strand=None, nb_cpu=1) Return overlapping genomic tiles. The genome is divided into bookended tiles of length `tile_size` and one is returned per overlapping interval. :param tile_size: Length of the tiles. :type tile_size: int :param overlap: Add column of nucleotide overlap to each tile. :type overlap: bool, default False :param strand: Whether to do operations on chromosome/strand pairs or chromosomes. If None, will use chromosome/strand pairs if the PyRanges is stranded. :type strand: bool, default None, i.e. auto :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :param \*\*kwargs: Additional keyword arguments to pass as keyword arguments to `f` :returns: Tiled PyRanges. :rtype: PyRanges .. seealso:: :obj:`pyranges.PyRanges.window` divide intervals into windows .. rubric:: Examples >>> gr = pr.data.ensembl_gtf()[["Feature", "gene_name"]] >>> gr +--------------+--------------+-----------+-----------+--------------+-------------+ | Chromosome | Feature | Start | End | Strand | gene_name | | (category) | (category) | (int64) | (int64) | (category) | (object) | |--------------+--------------+-----------+-----------+--------------+-------------| | 1 | gene | 11868 | 14409 | + | DDX11L1 | | 1 | transcript | 11868 | 14409 | + | DDX11L1 | | 1 | exon | 11868 | 12227 | + | DDX11L1 | | 1 | exon | 12612 | 12721 | + | DDX11L1 | | ... | ... | ... | ... | ... | ... | | 1 | gene | 1173055 | 1179555 | - | TTLL10-AS1 | | 1 | transcript | 1173055 | 1179555 | - | TTLL10-AS1 | | 1 | exon | 1179364 | 1179555 | - | TTLL10-AS1 | | 1 | exon | 1173055 | 1176396 | - | TTLL10-AS1 | +--------------+--------------+-----------+-----------+--------------+-------------+ Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.tile(200) +--------------+--------------+-----------+-----------+--------------+-------------+ | Chromosome | Feature | Start | End | Strand | gene_name | | (category) | (category) | (int64) | (int64) | (category) | (object) | |--------------+--------------+-----------+-----------+--------------+-------------| | 1 | gene | 11800 | 12000 | + | DDX11L1 | | 1 | gene | 12000 | 12200 | + | DDX11L1 | | 1 | gene | 12200 | 12400 | + | DDX11L1 | | 1 | gene | 12400 | 12600 | + | DDX11L1 | | ... | ... | ... | ... | ... | ... | | 1 | exon | 1175600 | 1175800 | - | TTLL10-AS1 | | 1 | exon | 1175800 | 1176000 | - | TTLL10-AS1 | | 1 | exon | 1176000 | 1176200 | - | TTLL10-AS1 | | 1 | exon | 1176200 | 1176400 | - | TTLL10-AS1 | +--------------+--------------+-----------+-----------+--------------+-------------+ Stranded PyRanges object has 30,538 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.tile(100, overlap=True) +--------------+--------------+-----------+-----------+--------------+-------------+---------------+ | Chromosome | Feature | Start | End | Strand | gene_name | TileOverlap | | (category) | (category) | (int64) | (int64) | (category) | (object) | (int64) | |--------------+--------------+-----------+-----------+--------------+-------------+---------------| | 1 | gene | 11800 | 11900 | + | DDX11L1 | 32 | | 1 | gene | 11900 | 12000 | + | DDX11L1 | 100 | | 1 | gene | 12000 | 12100 | + | DDX11L1 | 100 | | 1 | gene | 12100 | 12200 | + | DDX11L1 | 100 | | ... | ... | ... | ... | ... | ... | ... | | 1 | exon | 1176000 | 1176100 | - | TTLL10-AS1 | 100 | | 1 | exon | 1176100 | 1176200 | - | TTLL10-AS1 | 100 | | 1 | exon | 1176200 | 1176300 | - | TTLL10-AS1 | 100 | | 1 | exon | 1176300 | 1176400 | - | TTLL10-AS1 | 96 | +--------------+--------------+-----------+-----------+--------------+-------------+---------------+ Stranded PyRanges object has 58,516 rows and 7 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: to_example(n=10) Return as dict. Used for easily creating examples for copy and pasting. :param n: Number of rows. Half is taken from the start, the other half from the end. :type n: int, default 10 .. seealso:: :obj:`PyRanges.from_dict` create PyRanges from dict .. rubric:: Examples >>> gr = pr.data.chipseq() >>> gr +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 212609534 | 212609559 | U0 | 0 | + | | chr1 | 169887529 | 169887554 | U0 | 0 | + | | chr1 | 216711011 | 216711036 | U0 | 0 | + | | chr1 | 144227079 | 144227104 | U0 | 0 | + | | ... | ... | ... | ... | ... | ... | | chrY | 15224235 | 15224260 | U0 | 0 | - | | chrY | 13517892 | 13517917 | U0 | 0 | - | | chrY | 8010951 | 8010976 | U0 | 0 | - | | chrY | 7405376 | 7405401 | U0 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 10,000 rows and 6 columns from 24 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> d = gr.to_example(n=4) >>> d {'Chromosome': ['chr1', 'chr1', 'chrY', 'chrY'], 'Start': [212609534, 169887529, 8010951, 7405376], 'End': [212609559, 169887554, 8010976, 7405401], 'Name': ['U0', 'U0', 'U0', 'U0'], 'Score': [0, 0, 0, 0], 'Strand': ['+', '+', '-', '-']} >>> pr.from_dict(d) +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 212609534 | 212609559 | U0 | 0 | + | | chr1 | 169887529 | 169887554 | U0 | 0 | + | | chrY | 8010951 | 8010976 | U0 | 0 | - | | chrY | 7405376 | 7405401 | U0 | 0 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 4 rows and 6 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: three_end() Return the 3'-end. The 3'-end is the start of intervals on the reverse strand and the end of intervals on the forward strand. :returns: PyRanges with the 3'. :rtype: PyRanges .. seealso:: :obj:`PyRanges.five_end` return the five prime end .. rubric:: Examples >>> d = {'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6], ... 'End': [5, 8], 'Strand': ['+', '-']} >>> gr = pr.from_dict(d) >>> gr +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 1 | 5 | + | | chr1 | 6 | 8 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.three_end() +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 4 | 5 | + | | chr1 | 6 | 7 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: to_bed(path=None, keep=True, compression='infer', chain=False) Write to bed. :param path: Where to write. If None, returns string representation. :type path: str, default None :param keep: Whether to keep all columns, not just Chromosome, Start, End, Name, Score, Strand when writing. :type keep: bool, default True :param compression: See pandas.DataFree.to_csv for more info. :type compression: str, compression type to use, by default infer based on extension. :param chain: Whether to return the PyRanges after writing. :type chain: bool, default False .. rubric:: Examples >>> d = {'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6], ... 'End': [5, 8], 'Strand': ['+', '-'], "Gene": [1, 2]} >>> gr = pr.from_dict(d) >>> gr +--------------+-----------+-----------+--------------+-----------+ | Chromosome | Start | End | Strand | Gene | | (category) | (int64) | (int64) | (category) | (int64) | |--------------+-----------+-----------+--------------+-----------| | chr1 | 1 | 5 | + | 1 | | chr1 | 6 | 8 | - | 2 | +--------------+-----------+-----------+--------------+-----------+ Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.to_bed() 'chr1\t1\t5\t.\t.\t+\t1\nchr1\t6\t8\t.\t.\t-\t2\n' # File contents: chr1 1 5 . . + 1 chr1 6 8 . . - 2 Does not include noncanonical bed-column `Gene`: >>> gr.to_bed(keep=False) 'chr1\t1\t5\t.\t.\t+\nchr1\t6\t8\t.\t.\t-\n' # File contents: chr1 1 5 . . + chr1 6 8 . . - >>> gr.to_bed("test.bed", chain=True) +--------------+-----------+-----------+--------------+-----------+ | Chromosome | Start | End | Strand | Gene | | (category) | (int64) | (int64) | (category) | (int64) | |--------------+-----------+-----------+--------------+-----------| | chr1 | 1 | 5 | + | 1 | | chr1 | 6 | 8 | - | 2 | +--------------+-----------+-----------+--------------+-----------+ Stranded PyRanges object has 2 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> open("test.bed").readlines() ['chr1\t1\t5\t.\t.\t+\t1\n', 'chr1\t6\t8\t.\t.\t-\t2\n'] .. py:method:: to_bigwig(path=None, chromosome_sizes=None, rpm=True, divide=None, value_col=None, dryrun=False, chain=False) Write regular or value coverage to bigwig. .. note:: To create one bigwig per strand, subset the PyRanges first. :param path: Where to write bigwig. :type path: str :param chromosome_sizes: If dict: map of chromosome names to chromosome length. :type chromosome_sizes: PyRanges or dict :param rpm: Whether to normalize data by dividing by total number of intervals and multiplying by 1e6. :type rpm: True :param divide: (Only useful with value_col) Divide value coverage by regular coverage and take log2. :type divide: bool, default False :param value_col: Name of column to compute coverage of. :type value_col: str, default None :param dryrun: Return data that would be written without writing bigwigs. :type dryrun: bool, default False :param chain: Whether to return the PyRanges after writing. :type chain: bool, default False .. note:: Requires pybigwig to be installed. If you require more control over the normalization process, use pyranges.to_bigwig() .. seealso:: :obj:`pyranges.to_bigwig` write pandas DataFrame to bigwig. .. rubric:: Examples >>> d = {'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [1, 4, 6], ... 'End': [7, 8, 10], 'Strand': ['+', '-', '-'], ... 'Value': [10, 20, 30]} >>> gr = pr.from_dict(d) >>> gr +--------------+-----------+-----------+--------------+-----------+ | Chromosome | Start | End | Strand | Value | | (category) | (int64) | (int64) | (category) | (int64) | |--------------+-----------+-----------+--------------+-----------| | chr1 | 1 | 7 | + | 10 | | chr1 | 4 | 8 | - | 20 | | chr1 | 6 | 10 | - | 30 | +--------------+-----------+-----------+--------------+-----------+ Stranded PyRanges object has 3 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.to_bigwig(dryrun=True, rpm=False) +--------------+-----------+-----------+-------------+ | Chromosome | Start | End | Score | | (category) | (int64) | (int64) | (float64) | |--------------+-----------+-----------+-------------| | chr1 | 1 | 4 | 1 | | chr1 | 4 | 6 | 2 | | chr1 | 6 | 7 | 3 | | chr1 | 7 | 8 | 2 | | chr1 | 8 | 10 | 1 | +--------------+-----------+-----------+-------------+ Unstranded PyRanges object has 5 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.to_bigwig(dryrun=True, rpm=False, value_col="Value") +--------------+-----------+-----------+-------------+ | Chromosome | Start | End | Score | | (category) | (int64) | (int64) | (float64) | |--------------+-----------+-----------+-------------| | chr1 | 1 | 4 | 10 | | chr1 | 4 | 6 | 30 | | chr1 | 6 | 7 | 60 | | chr1 | 7 | 8 | 50 | | chr1 | 8 | 10 | 30 | +--------------+-----------+-----------+-------------+ Unstranded PyRanges object has 5 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.to_bigwig(dryrun=True, rpm=False, value_col="Value", divide=True) +--------------+-----------+-----------+-------------+ | Chromosome | Start | End | Score | | (category) | (int64) | (int64) | (float64) | |--------------+-----------+-----------+-------------| | chr1 | 0 | 1 | nan | | chr1 | 1 | 4 | 3.32193 | | chr1 | 4 | 6 | 3.90689 | | chr1 | 6 | 7 | 4.32193 | | chr1 | 7 | 8 | 4.64386 | | chr1 | 8 | 10 | 4.90689 | +--------------+-----------+-----------+-------------+ Unstranded PyRanges object has 6 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: to_csv(path=None, sep=',', header=True, compression='infer', chain=False) Write to comma- or other value-separated file. :param path: Where to write file. :type path: str, default None, i.e. return string representation. :param sep: String of length 1. Field delimiter for the output file. :type sep: str, default "," :param header: Write out the column names. :type header: bool, default True :param compression: Which compression to use. Uses file extension to infer by default. :type compression: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default "infer" :param chain: Whether to return the PyRanges after writing. :type chain: bool, default False .. note:: The output encodes intervals just like PyRanges: 0-based, Start included and End excluded. .. rubric:: Examples >>> d = {"Chromosome": [1] * 3, "Start": [1, 3, 5], "End": [4, 6, 9], "Feature": ["gene", "exon", "exon"]} >>> gr = pr.from_dict(d) >>> gr.to_csv(sep="\t") 'Chromosome\tStart\tEnd\tFeature\n1\t1\t4\tgene\n1\t3\t6\texon\n1\t5\t9\texon\n' # The file contents Chromosome Start End Feature 1 1 4 gene 1 3 6 exon 1 5 9 exon .. py:method:: to_gff3(path=None, compression='infer', chain=False, map_cols=None) Write to General Feature Format 3. The GFF format consists of a tab-separated file without header. GFF contains a fixed amount of columns, indicated below (names before ":"). For each of these, PyRanges will use the corresponding column (names after ":"). ``seqname: Chromosome source: Source type: Feature start: Start end: End score: Score strand: Strand phase: Frame attribute: autofilled`` Columns which are not mapped to GFF columns are appended as a field in the attribute string (i.e. the last field). :param path: Where to write file. :type path: str, default None, i.e. return string representation. :param compression: Which compression to use. Uses file extension to infer by default. :type compression: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default "infer" :param chain: Whether to return the PyRanges after writing. :type chain: bool, default False :param map_cols: Override mapping between GFF and PyRanges fields for any number of columns. Format: ``{gff_column : pyranges_column}`` If a mapping is found for the "attribute"` column, this not auto-filled :type map_cols: dict, default None .. rubric:: Notes Nonexisting columns will be added with a '.' to represent the missing values. .. seealso:: :obj:`pyranges.read_gff3` read GFF3 files :obj:`pyranges.to_gtf` write to GTF format .. rubric:: Examples >>> d = {"Chromosome": [1] * 3, "Start": [1, 3, 5], "End": [4, 6, 9], "Feature": ["gene", "exon", "exon"]} >>> gr = pr.from_dict(d) >>> gr.to_gff3() '1\t.\tgene\t2\t4\t.\t.\t.\t\n1\t.\texon\t4\t6\t.\t.\t.\t\n1\t.\texon\t6\t9\t.\t.\t.\t\n' # How the file would look 1 . gene 2 4 . . . 1 . exon 4 6 . . . 1 . exon 6 9 . . . >>> gr.Gene = [1, 2, 3] >>> gr.function = ["a b", "c", "def"] >>> gr.to_gff3() '1\t.\tgene\t2\t4\t.\t.\t.\tGene=1;function=a b\n1\t.\texon\t4\t6\t.\t.\t.\tGene=2;function=c\n1\t.\texon\t6\t9\t.\t.\t.\tGene=3;function=def\n' # How the file would look 1 . gene 2 4 . . . Gene=1;function=a b 1 . exon 4 6 . . . Gene=2;function=c 1 . exon 6 9 . . . Gene=3;function=def >>> gr.the_frame = [0, 2, 1] >>> gr.tag = ['mRNA', 'CDS', 'CDS'] >>> gr +--------------+-----------+-----------+------------+-----------+------------+-------------+------------+ | Chromosome | Start | End | Feature | Gene | function | the_frame | tag | | (category) | (int64) | (int64) | (object) | (int64) | (object) | (int64) | (object) | |--------------+-----------+-----------+------------+-----------+------------+-------------+------------| | 1 | 1 | 4 | gene | 1 | a b | 0 | mRNA | | 1 | 3 | 6 | exon | 2 | c | 2 | CDS | | 1 | 5 | 9 | exon | 3 | def | 1 | CDS | +--------------+-----------+-----------+------------+-----------+------------+-------------+------------+ Unstranded PyRanges object has 3 rows and 8 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.to_gff3(map_cols={'phase':'the_frame', 'feature':'tag'}) '1\t.\tmRNA\t2\t4\t.\t.\t0\tFeature=gene;Gene=1;function=a b\n1\t.\tCDS\t4\t6\t.\t.\t2\tFeature=exon;Gene=2;function=c\n1\t.\tCDS\t6\t9\t.\t.\t1\tFeature=exon;Gene=3;function=def\n' # How the file would look 1 . mRNA 2 4 . . 0 Gene=1;function=a b 1 . CDS 4 6 . . 2 Gene=2;function=c 1 . CDS 6 9 . . 1 Gene=3;function=def >>> gr.to_gff3(map_cols={'attribute':'Gene'}) '1\t.\tgene\t2\t4\t.\t.\t.\tGene=1\n1\t.\texon\t4\t6\t.\t.\t.\tGene=1\n1\t.\texon\t6\t9\t.\t.\t.\tGene=1\n' # How the file would look 1 . gene 2 4 . . . Gene=1 1 . exon 4 6 . . . Gene=1 1 . exon 6 9 . . . Gene=1 .. py:method:: to_gtf(path=None, compression='infer', chain=False, map_cols=None) Write to Gene Transfer Format. The GTF format consists of a tab-separated file without header. It contains a fixed amount of columns, indicated below (names before ":"). For each of these, PyRanges will use the corresponding column (names after ":"). ``seqname: Chromosome source: Source type: Feature start: Start end: End score: Score strand: Strand frame: Frame attribute: auto-filled`` Columns which are not mapped to GTF columns are appended as a field in the attribute string (i.e. the last field). :param path: Where to write file. :type path: str, default None, i.e. return string representation. :param compression: Which compression to use. Uses file extension to infer by default. :type compression: {‘infer’, ‘gzip’, ‘bz2’, ‘zip’, ‘xz’, None}, default "infer" :param chain: Whether to return the PyRanges after writing. :type chain: bool, default False :param map_cols: Override mapping between GTF and PyRanges fields for any number of columns. Format: ``{gtf_column : pyranges_column}`` If a mapping is found for the "attribute"` column, this not auto-filled :type map_cols: dict, default None .. rubric:: Notes Nonexisting columns will be added with a '.' to represent the missing values. .. seealso:: :obj:`pyranges.read_gtf` read GTF files :obj:`pyranges.to_gff3` write to GFF3 format .. rubric:: Examples >>> d = {"Chromosome": [1] * 3, "Start": [1, 3, 5], "End": [4, 6, 9], "Feature": ["gene", "exon", "exon"]} >>> gr = pr.from_dict(d) >>> gr.to_gtf() # the raw string output '1\t.\tgene\t2\t4\t.\t.\t.\t\n1\t.\texon\t4\t6\t.\t.\t.\t\n1\t.\texon\t6\t9\t.\t.\t.\t\n' # What the file contents look like: 1 . gene 2 4 . . . 1 . exon 4 6 . . . 1 . exon 6 9 . . . >>> gr.name = ["Tim", "Eric", "Endre"] >>> gr.prices = ["Cheap", "Premium", "Fine European"] >>> gr.to_gtf() # the raw string output '1\t.\tgene\t2\t4\t.\t.\t.\tname "Tim"; prices "Cheap";\n1\t.\texon\t4\t6\t.\t.\t.\tname "Eric"; prices "Premium";\n1\t.\texon\t6\t9\t.\t.\t.\tname "Endre"; prices "Fine European";\n' # What the file contents look like: 1 . gene 2 4 . . . name "Tim"; prices "Cheap"; 1 . exon 4 6 . . . name "Eric"; prices "Premium"; 1 . exon 6 9 . . . name "Endre"; prices "Fine European"; >>> gr.to_gtf(map_cols={"feature":"name", "attribute":"prices"}) # the raw string output '1\t.\tTim\t2\t4\t.\t.\t.\tprices "Cheap";\n1\t.\tEric\t4\t6\t.\t.\t.\tprices "Premium";\n1\t.\tEndre\t6\t9\t.\t.\t.\tprices "Fine European";\n' # What the file contents look like: 1 . Tim 2 4 . . . prices "Cheap"; 1 . Eric 4 6 . . . prices "Premium"; 1 . Endre 6 9 . . . prices "Fine European"; .. py:method:: to_rle(value_col=None, strand=None, rpm=False, nb_cpu=1) Return as RleDict. Create collection of Rles representing the coverage or other numerical value. :param value_col: Numerical column to create RleDict from. :type value_col: str, default None :param strand: Whether to treat strands serparately. :type strand: bool, default None, i.e. auto :param rpm: Normalize by multiplying with `1e6/(number_intervals)`. :type rpm: bool, default False :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :returns: Rle with coverage or other info from the PyRanges. :rtype: pyrle.RleDict .. rubric:: Examples >>> d = {'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [3, 8, 5], ... 'End': [6, 9, 7], 'Score': [0.1, 5, 3.14], 'Strand': ['+', '+', '-']} >>> gr = pr.from_dict(d) >>> gr.to_rle() chr1 + -- +--------+-----+-----+-----+-----+ | Runs | 3 | 3 | 2 | 1 | |--------+-----+-----+-----+-----| | Values | 0.0 | 1.0 | 0.0 | 1.0 | +--------+-----+-----+-----+-----+ Rle of length 9 containing 4 elements (avg. length 2.25) chr1 - -- +--------+-----+-----+ | Runs | 5 | 2 | |--------+-----+-----| | Values | 0.0 | 1.0 | +--------+-----+-----+ Rle of length 7 containing 2 elements (avg. length 3.5) RleDict object with 2 chromosomes/strand pairs. >>> gr.to_rle(value_col="Score") chr1 + -- +--------+-----+-----+-----+-----+ | Runs | 3 | 3 | 2 | 1 | |--------+-----+-----+-----+-----| | Values | 0.0 | 0.1 | 0.0 | 5.0 | +--------+-----+-----+-----+-----+ Rle of length 9 containing 4 elements (avg. length 2.25) chr1 - -- +--------+-----+------+ | Runs | 5 | 2 | |--------+-----+------| | Values | 0.0 | 3.14 | +--------+-----+------+ Rle of length 7 containing 2 elements (avg. length 3.5) RleDict object with 2 chromosomes/strand pairs. >>> gr.to_rle(value_col="Score", strand=False) chr1 +--------+-----+-----+------+------+-----+-----+ | Runs | 3 | 2 | 1 | 1 | 1 | 1 | |--------+-----+-----+------+------+-----+-----| | Values | 0.0 | 0.1 | 3.24 | 3.14 | 0.0 | 5.0 | +--------+-----+-----+------+------+-----+-----+ Rle of length 9 containing 6 elements (avg. length 1.5) Unstranded RleDict object with 1 chromosome. >>> gr.to_rle(rpm=True) chr1 + -- +--------+-----+-------------------+-----+-------------------+ | Runs | 3 | 3 | 2 | 1 | |--------+-----+-------------------+-----+-------------------| | Values | 0.0 | 333333.3333333333 | 0.0 | 333333.3333333333 | +--------+-----+-------------------+-----+-------------------+ Rle of length 9 containing 4 elements (avg. length 2.25) chr1 - -- +--------+-----+-------------------+ | Runs | 5 | 2 | |--------+-----+-------------------| | Values | 0.0 | 333333.3333333333 | +--------+-----+-------------------+ Rle of length 7 containing 2 elements (avg. length 3.5) RleDict object with 2 chromosomes/strand pairs. .. py:method:: unstrand() Remove strand. .. note:: Removes Strand column even if PyRanges is not stranded. .. seealso:: :obj:`PyRanges.stranded` whether PyRanges contains valid strand info. .. rubric:: Examples >>> d = {'Chromosome': ['chr1', 'chr1'], 'Start': [1, 6], ... 'End': [5, 8], 'Strand': ['+', '-']} >>> gr = pr.from_dict(d) >>> gr +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 1 | 5 | + | | chr1 | 6 | 8 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 2 rows and 4 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr.unstrand() +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 1 | 5 | | chr1 | 6 | 8 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. .. py:method:: values() Return the underlying DataFrames. .. py:method:: window(window_size, strand=None) Return overlapping genomic windows. Windows of length `window_size` are returned. :param window_size: Length of the windows. :type window_size: int :param strand: Whether to do operations on chromosome/strand pairs or chromosomes. If None, will use chromosome/strand pairs if the PyRanges is stranded. :type strand: bool, default None, i.e. auto :param nb_cpu: How many cpus to use. Can at most use 1 per chromosome or chromosome/strand tuple. Will only lead to speedups on large datasets. :type nb_cpu: int, default 1 :param \*\*kwargs: Additional keyword arguments to pass as keyword arguments to `f` :returns: Tiled PyRanges. :rtype: PyRanges .. seealso:: :obj:`pyranges.PyRanges.tile` divide intervals into adjacent tiles. .. rubric:: Examples >>> import pyranges as pr >>> gr = pr.from_dict({"Chromosome": [1], "Start": [895], "End": [1259]}) >>> gr +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | 1 | 895 | 1259 | +--------------+-----------+-----------+ Unstranded PyRanges object has 1 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr.window(200) +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | 1 | 895 | 1095 | | 1 | 1095 | 1259 | +--------------+-----------+-----------+ Unstranded PyRanges object has 2 rows and 3 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome. >>> gr2 = pr.data.ensembl_gtf()[["Feature", "gene_name"]] >>> gr2 +--------------+--------------+-----------+-----------+--------------+-------------+ | Chromosome | Feature | Start | End | Strand | gene_name | | (category) | (category) | (int64) | (int64) | (category) | (object) | |--------------+--------------+-----------+-----------+--------------+-------------| | 1 | gene | 11868 | 14409 | + | DDX11L1 | | 1 | transcript | 11868 | 14409 | + | DDX11L1 | | 1 | exon | 11868 | 12227 | + | DDX11L1 | | 1 | exon | 12612 | 12721 | + | DDX11L1 | | ... | ... | ... | ... | ... | ... | | 1 | gene | 1173055 | 1179555 | - | TTLL10-AS1 | | 1 | transcript | 1173055 | 1179555 | - | TTLL10-AS1 | | 1 | exon | 1179364 | 1179555 | - | TTLL10-AS1 | | 1 | exon | 1173055 | 1176396 | - | TTLL10-AS1 | +--------------+--------------+-----------+-----------+--------------+-------------+ Stranded PyRanges object has 2,446 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr2 = pr.data.ensembl_gtf()[["Feature", "gene_name"]] >>> gr2.window(1000) +--------------+--------------+-----------+-----------+--------------+-------------+ | Chromosome | Feature | Start | End | Strand | gene_name | | (category) | (category) | (int64) | (int64) | (category) | (object) | |--------------+--------------+-----------+-----------+--------------+-------------| | 1 | gene | 11868 | 12868 | + | DDX11L1 | | 1 | gene | 12868 | 13868 | + | DDX11L1 | | 1 | gene | 13868 | 14409 | + | DDX11L1 | | 1 | transcript | 11868 | 12868 | + | DDX11L1 | | ... | ... | ... | ... | ... | ... | | 1 | exon | 1173055 | 1174055 | - | TTLL10-AS1 | | 1 | exon | 1174055 | 1175055 | - | TTLL10-AS1 | | 1 | exon | 1175055 | 1176055 | - | TTLL10-AS1 | | 1 | exon | 1176055 | 1176396 | - | TTLL10-AS1 | +--------------+--------------+-----------+-----------+--------------+-------------+ Stranded PyRanges object has 7,516 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:method:: __getstate__() Helper for pickle. .. py:method:: __setstate__(d) .. py:function:: read_bam(f, sparse=True, as_df=False, mapq=0, required_flag=0, filter_flag=1540) Return bam file as PyRanges. :param f: Path to bam file :type f: str :param sparse: Whether to return only. :type sparse: bool, default True :param as_df: Whether to return as pandas DataFrame instead of PyRanges. :type as_df: bool, default False :param mapq: Minimum mapping quality score. :type mapq: int, default 0 :param required_flag: Flags which must be present for the interval to be read. :type required_flag: int, default 0 :param filter_flag: Ignore reads with these flags. Default 1540, which means that either the read is unmapped, the read failed vendor or platfrom quality checks, or the read is a PCR or optical duplicate. :type filter_flag: int, default 1540 .. rubric:: Notes This functionality requires the library `bamread`. It can be installed with `pip install bamread` or `conda install -c bioconda bamread`. .. rubric:: Examples >>> path = pr.get_example_path("control.bam") >>> pr.read_bam(path).sort() +--------------+-----------+-----------+--------------+------------+ | Chromosome | Start | End | Strand | Flag | | (category) | (int64) | (int64) | (category) | (uint16) | |--------------+-----------+-----------+--------------+------------| | chr1 | 1041102 | 1041127 | + | 0 | | chr1 | 2129359 | 2129384 | + | 0 | | chr1 | 2239108 | 2239133 | + | 0 | | chr1 | 2318805 | 2318830 | + | 0 | | ... | ... | ... | ... | ... | | chrY | 10632456 | 10632481 | - | 16 | | chrY | 11918814 | 11918839 | - | 16 | | chrY | 11936866 | 11936891 | - | 16 | | chrY | 57402214 | 57402239 | - | 16 | +--------------+-----------+-----------+--------------+------------+ Stranded PyRanges object has 10,000 rows and 5 columns from 25 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:function:: read_bed(f, as_df=False, nrows=None) Return bed file as PyRanges. This is a reader for files that follow the bed format. They can have from 3-12 columns which will be named like so: Chromosome Start End Name Score Strand ThickStart ThickEnd ItemRGB BlockCount BlockSizes BlockStarts :param f: Path to bed file :type f: str :param as_df: Whether to return as pandas DataFrame instead of PyRanges. :type as_df: bool, default False :param nrows: Number of rows to return. :type nrows: int, default None .. rubric:: Notes If you just want to create a PyRanges from a tab-delimited bed-like file, use `pr.PyRanges(pandas.read_table(f))` instead. .. rubric:: Examples >>> path = pr.get_example_path("aorta.bed") >>> pr.read_bed(path, nrows=5) +--------------+-----------+-----------+------------+-----------+--------------+ | Chromosome | Start | End | Name | Score | Strand | | (category) | (int64) | (int64) | (object) | (int64) | (category) | |--------------+-----------+-----------+------------+-----------+--------------| | chr1 | 9939 | 10138 | H3K27me3 | 7 | + | | chr1 | 9953 | 10152 | H3K27me3 | 5 | + | | chr1 | 9916 | 10115 | H3K27me3 | 5 | - | | chr1 | 9951 | 10150 | H3K27me3 | 8 | - | | chr1 | 9978 | 10177 | H3K27me3 | 7 | - | +--------------+-----------+-----------+------------+-----------+--------------+ Stranded PyRanges object has 5 rows and 6 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> pr.read_bed(path, as_df=True, nrows=5) Chromosome Start End Name Score Strand 0 chr1 9916 10115 H3K27me3 5 - 1 chr1 9939 10138 H3K27me3 7 + 2 chr1 9951 10150 H3K27me3 8 - 3 chr1 9953 10152 H3K27me3 5 + 4 chr1 9978 10177 H3K27me3 7 - .. py:function:: read_gff3(f, full=True, annotation=None, as_df=False, nrows=None) Read files in the General Feature Format. :param f: Path to GFF file. :type f: str :param full: Whether to read and interpret the annotation column. :type full: bool, default True :param as_df: Whether to return as pandas DataFrame instead of PyRanges. :type as_df: bool, default False :param nrows: Number of rows to read. Default None, i.e. all. :type nrows: int, default None .. rubric:: Notes The gff3 format encodes both Start and End as 1-based included. PyRanges (and also the DF returned by this function, if as_df=True), instead encodes intervals as 0-based, Start included and End excluded. .. seealso:: :obj:`pyranges.read_gtf` read files in the Gene Transfer Format .. py:function:: read_gtf(f, full=True, as_df=False, nrows=None, duplicate_attr=False, rename_attr=False, ignore_bad: bool = False) Read files in the Gene Transfer Format. :param f: Path to GTF file. :type f: str :param full: Whether to read and interpret the annotation column. :type full: bool, default True :param as_df: Whether to return as pandas DataFrame instead of PyRanges. :type as_df: bool, default False :param nrows: Number of rows to read. Default None, i.e. all. :type nrows: int, default None :param duplicate_attr: Whether to handle (potential) duplicate attributes or just keep last one. :type duplicate_attr: bool, default False :param rename_attr: Whether to rename (potential) attributes with reserved column names with the suffix '_attr' or to just raise an error (default) :type rename_attr: bool, default False :param ignore_bad: Whether to ignore bad lines or raise an error. :type ignore_bad: bool, default False .. note:: The GTF format encodes both Start and End as 1-based included. PyRanges (and also the DF returned by this function, if as_df=True), instead encodes intervals as 0-based, Start included and End excluded. .. seealso:: :obj:`pyranges.read_gff3` read files in the General Feature Format .. rubric:: Examples >>> path = pr.get_example_path("ensembl.gtf") >>> gr = pr.read_gtf(path) >>> # +--------------+------------+--------------+-----------+-----------+------------+--------------+------------+-----------------+----------------+-------+ >>> # | Chromosome | Source | Feature | Start | End | Score | Strand | Frame | gene_id | gene_version | +18 | >>> # | (category) | (object) | (category) | (int64) | (int64) | (object) | (category) | (object) | (object) | (object) | ... | >>> # |--------------+------------+--------------+-----------+-----------+------------+--------------+------------+-----------------+----------------+-------| >>> # | 1 | havana | gene | 11868 | 14409 | . | + | . | ENSG00000223972 | 5 | ... | >>> # | 1 | havana | transcript | 11868 | 14409 | . | + | . | ENSG00000223972 | 5 | ... | >>> # | 1 | havana | exon | 11868 | 12227 | . | + | . | ENSG00000223972 | 5 | ... | >>> # | 1 | havana | exon | 12612 | 12721 | . | + | . | ENSG00000223972 | 5 | ... | >>> # | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | >>> # | 1 | ensembl | transcript | 120724 | 133723 | . | - | . | ENSG00000238009 | 6 | ... | >>> # | 1 | ensembl | exon | 133373 | 133723 | . | - | . | ENSG00000238009 | 6 | ... | >>> # | 1 | ensembl | exon | 129054 | 129223 | . | - | . | ENSG00000238009 | 6 | ... | >>> # | 1 | ensembl | exon | 120873 | 120932 | . | - | . | ENSG00000238009 | 6 | ... | >>> # +--------------+------------+--------------+-----------+-----------+------------+--------------+------------+-----------------+----------------+-------+ >>> # Stranded PyRanges object has 95 rows and 28 columns from 1 chromosomes. >>> # For printing, the PyRanges was sorted on Chromosome and Strand. >>> # 18 hidden columns: gene_name, gene_source, gene_biotype, transcript_id, transcript_version, transcript_name, transcript_source, transcript_biotype, tag, transcript_support_level, ... (+ 8 more.) .. py:function:: from_dict(d, int64=False) Create a PyRanges from dict. :param d: Dict with data. :type d: dict of array-like :param int64: Whether to use 64-bit integers for starts and ends. :type int64: bool, default False. .. warning:: On versions of Python prior to 3.6, this function returns a PyRanges with the columns in arbitrary order. .. seealso:: :obj:`pyranges.from_string` create a PyRanges from a multiline string. .. rubric:: Examples >>> d = {"Chromosome": [1, 1, 2], "Start": [1, 2, 3], "End": [4, 9, 12], "Strand": ["+", "+", "-"], "ArbitraryValue": ["a", "b", "c"]} >>> pr.from_dict(d) +--------------+-----------+-----------+--------------+------------------+ | Chromosome | Start | End | Strand | ArbitraryValue | | (category) | (int64) | (int64) | (category) | (object) | |--------------+-----------+-----------+--------------+------------------| | 1 | 1 | 4 | + | a | | 1 | 2 | 9 | + | b | | 2 | 3 | 12 | - | c | +--------------+-----------+-----------+--------------+------------------+ Stranded PyRanges object has 3 rows and 5 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:function:: from_string(s, int64=False) Create a PyRanges from multiline string. :param s: String with data. :type s: str :param int64: Whether to use 64-bit integers for starts and ends. :type int64: bool, default False. .. seealso:: :obj:`pyranges.from_dict` create a PyRanges from a dictionary. .. rubric:: Examples >>> s = '''Chromosome Start End Strand ... chr1 246719402 246719502 + ... chr5 15400908 15401008 + ... chr9 68366534 68366634 + ... chr14 79220091 79220191 + ... chr14 103456471 103456571 -''' >>> pr.from_string(s) +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | chr1 | 246719402 | 246719502 | + | | chr5 | 15400908 | 15401008 | + | | chr9 | 68366534 | 68366634 | + | | chr14 | 79220091 | 79220191 | + | | chr14 | 103456471 | 103456571 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 5 rows and 4 columns from 4 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. .. py:function:: itergrs(prs, strand=None, keys=False) Iterate over multiple PyRanges at once. :param prs: PyRanges to iterate over. :type prs: list of PyRanges :param strand: Whether to iterate over strands. If True, all PyRanges must be stranded. :type strand: bool, default None, i.e. auto :param keys: Return tuple with key and value from iterator. :type keys: bool, default False .. rubric:: Examples >>> d1 = {"Chromosome": [1, 1, 2], "Start": [1, 2, 3], "End": [4, 9, 12], "Strand": ["+", "+", "-"]} >>> d2 = {"Chromosome": [2, 3, 3], "Start": [5, 9, 21], "End": [81, 42, 25], "Strand": ["-", "+", "-"]} >>> gr1, gr2 = pr.from_dict(d1), pr.from_dict(d2) >>> gr1 +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | 1 | 1 | 4 | + | | 1 | 2 | 9 | + | | 2 | 3 | 12 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 3 rows and 4 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> gr2 +--------------+-----------+-----------+--------------+ | Chromosome | Start | End | Strand | | (category) | (int64) | (int64) | (category) | |--------------+-----------+-----------+--------------| | 2 | 5 | 81 | - | | 3 | 9 | 42 | + | | 3 | 21 | 25 | - | +--------------+-----------+-----------+--------------+ Stranded PyRanges object has 3 rows and 4 columns from 2 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> ranges = [gr1, gr2] >>> for key, dfs in pr.itergrs(ranges, keys=True): ... print("-----------\n" + str(key) + "\n-----------") ... for df in dfs: ... print(df) ----------- ('1', '+') ----------- Chromosome Start End Strand 0 1 1 4 + 1 1 2 9 + Empty DataFrame Columns: [Chromosome, Start, End, Strand] Index: [] ----------- ('2', '-') ----------- Chromosome Start End Strand 2 2 3 12 - Chromosome Start End Strand 0 2 5 81 - ----------- ('3', '+') ----------- Empty DataFrame Columns: [Chromosome, Start, End, Strand] Index: [] Chromosome Start End Strand 1 3 9 42 + ----------- ('3', '-') ----------- Empty DataFrame Columns: [Chromosome, Start, End, Strand] Index: [] Chromosome Start End Strand 2 3 21 25 - .. py:function:: random(n=1000, length=100, chromsizes=None, strand=True, int64=False, seed=None) Return PyRanges with random intervals. :param n: Number of intervals. :type n: int, default 1000 :param length: Length of intervals. :type length: int, default 100 :param chromsizes: Draw intervals from within these bounds. :type chromsizes: dict or DataFrame, default None, i.e. use "hg19" :param strand: Data should have strand. :type strand: bool, default True :param int64: Use int64 to represent Start and End. :type int64: bool, default False .. rubric:: Examples # >>> pr.random() # +--------------+-----------+-----------+--------------+ # | Chromosome | Start | End | Strand | # | (category) | (int64) | (int64) | (category) | # |--------------+-----------+-----------+--------------| # | chr1 | 216128004 | 216128104 | + | # | chr1 | 114387955 | 114388055 | + | # | chr1 | 67597551 | 67597651 | + | # | chr1 | 26306616 | 26306716 | + | # | ... | ... | ... | ... | # | chrY | 20811459 | 20811559 | - | # | chrY | 12221362 | 12221462 | - | # | chrY | 8578041 | 8578141 | - | # | chrY | 43259695 | 43259795 | - | # +--------------+-----------+-----------+--------------+ # Stranded PyRanges object has 1,000 rows and 4 columns from 24 chromosomes. # For printing, the PyRanges was sorted on Chromosome and Strand. To have random interval lengths: # >>> gr = pr.random(length=1) # >>> gr.End += np.random.randint(int(1e5), size=len(gr)) # >>> gr.Length = gr.lengths() # >>> gr # +--------------+-----------+-----------+--------------+-----------+ # | Chromosome | Start | End | Strand | Length | # | (category) | (int64) | (int64) | (category) | (int64) | # |--------------+-----------+-----------+--------------+-----------| # | chr1 | 203654331 | 203695380 | + | 41049 | # | chr1 | 46918271 | 46978908 | + | 60637 | # | chr1 | 97355021 | 97391587 | + | 36566 | # | chr1 | 57284999 | 57323542 | + | 38543 | # | ... | ... | ... | ... | ... | # | chrY | 31665821 | 31692660 | - | 26839 | # | chrY | 20236607 | 20253473 | - | 16866 | # | chrY | 33255377 | 33315933 | - | 60556 | # | chrY | 31182964 | 31205467 | - | 22503 | # +--------------+-----------+-----------+--------------+-----------+ # Stranded PyRanges object has 1,000 rows and 5 columns from 24 chromosomes. # For printing, the PyRanges was sorted on Chromosome and Strand. .. py:function:: to_bigwig(gr, path, chromosome_sizes) Write df to bigwig. Must contain the columns Chromosome, Start, End and Score. All others are ignored. :param path: Where to write bigwig. :type path: str :param chromosome_sizes: If dict: map of chromosome names to chromosome length. :type chromosome_sizes: PyRanges or dict .. rubric:: Examples Extended example with how to prepare your data for writing bigwigs: >>> d = {'Chromosome': ['chr1', 'chr1', 'chr1'], 'Start': [1, 4, 6], ... 'End': [7, 8, 10], 'Strand': ['+', '-', '-'], ... 'Value': [10, 20, 30]} >>> import pyranges as pr >>> gr = pr.from_dict(d) >>> hg19 = pr.data.chromsizes() >>> print(hg19) +--------------+-----------+-----------+ | Chromosome | Start | End | | (category) | (int64) | (int64) | |--------------+-----------+-----------| | chr1 | 0 | 249250621 | | chr2 | 0 | 243199373 | | chr3 | 0 | 198022430 | | chr4 | 0 | 191154276 | | ... | ... | ... | | chr22 | 0 | 51304566 | | chrM | 0 | 16571 | | chrX | 0 | 155270560 | | chrY | 0 | 59373566 | +--------------+-----------+-----------+ Unstranded PyRanges object has 25 rows and 3 columns from 25 chromosomes. For printing, the PyRanges was sorted on Chromosome. Overlapping intervals are invalid in bigwigs: >>> to_bigwig(gr, "outpath.bw", hg19) Traceback (most recent call last): ... AssertionError: Can only write one strand at a time. Use an unstranded PyRanges or subset on strand first. >>> to_bigwig(gr["-"], "outpath.bw", hg19) Traceback (most recent call last): ... AssertionError: Intervals must not overlap. >>> gr +--------------+-----------+-----------+--------------+-----------+ | Chromosome | Start | End | Strand | Value | | (category) | (int64) | (int64) | (category) | (int64) | |--------------+-----------+-----------+--------------+-----------| | chr1 | 1 | 7 | + | 10 | | chr1 | 4 | 8 | - | 20 | | chr1 | 6 | 10 | - | 30 | +--------------+-----------+-----------+--------------+-----------+ Stranded PyRanges object has 3 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> value = gr.to_rle(rpm=False, value_col="Value") >>> value chr1 + -- +--------+-----+------+ | Runs | 1 | 6 | |--------+-----+------| | Values | 0.0 | 10.0 | +--------+-----+------+ Rle of length 7 containing 2 elements (avg. length 3.5) chr1 - -- +--------+-----+------+------+------+ | Runs | 4 | 2 | 2 | 2 | |--------+-----+------+------+------| | Values | 0.0 | 20.0 | 50.0 | 30.0 | +--------+-----+------+------+------+ Rle of length 10 containing 4 elements (avg. length 2.5) RleDict object with 2 chromosomes/strand pairs. >>> raw = gr.to_rle(rpm=False) >>> raw chr1 + -- +--------+-----+-----+ | Runs | 1 | 6 | |--------+-----+-----| | Values | 0.0 | 1.0 | +--------+-----+-----+ Rle of length 7 containing 2 elements (avg. length 3.5) chr1 - -- +--------+-----+-----+-----+-----+ | Runs | 4 | 2 | 2 | 2 | |--------+-----+-----+-----+-----| | Values | 0.0 | 1.0 | 2.0 | 1.0 | +--------+-----+-----+-----+-----+ Rle of length 10 containing 4 elements (avg. length 2.5) RleDict object with 2 chromosomes/strand pairs. >>> result = (value / raw).apply_values(np.log10) >>> result chr1 + -- +--------+-----+-----+ | Runs | 1 | 6 | |--------+-----+-----| | Values | nan | 1.0 | +--------+-----+-----+ Rle of length 7 containing 2 elements (avg. length 3.5) chr1 - -- +--------+-----+--------------------+--------------------+--------------------+ | Runs | 4 | 2 | 2 | 2 | |--------+-----+--------------------+--------------------+--------------------| | Values | nan | 1.3010300397872925 | 1.3979400396347046 | 1.4771212339401245 | +--------+-----+--------------------+--------------------+--------------------+ Rle of length 10 containing 4 elements (avg. length 2.5) RleDict object with 2 chromosomes/strand pairs. >>> out = result.numbers_only().to_ranges() >>> out +--------------+-----------+-----------+-------------+--------------+ | Chromosome | Start | End | Score | Strand | | (category) | (int64) | (int64) | (float64) | (category) | |--------------+-----------+-----------+-------------+--------------| | chr1 | 1 | 7 | 1 | + | | chr1 | 4 | 6 | 1.30103 | - | | chr1 | 6 | 8 | 1.39794 | - | | chr1 | 8 | 10 | 1.47712 | - | +--------------+-----------+-----------+-------------+--------------+ Stranded PyRanges object has 4 rows and 5 columns from 1 chromosomes. For printing, the PyRanges was sorted on Chromosome and Strand. >>> to_bigwig(out["-"], "deleteme_reverse.bw", hg19) >>> to_bigwig(out["+"], "deleteme_forward.bw", hg19) .. py:function:: version_info()