NAME

    Bio::Cigar - Parse CIGAR strings and translate coordinates to/from
    reference/query

SYNOPSIS

        use 5.014;
        use Bio::Cigar;
        my $cigar = Bio::Cigar->new("2M1D1M1I4M");
        say "Query length is ", $cigar->query_length;
        say "Reference length is ", $cigar->reference_length;
    
        my ($qpos, $op) = $cigar->rpos_to_qpos(3);
        say "Alignment operation at reference position 3 is $op";
    
        my $query = "GCAAATGC";
        my $ref   = "AAAAGCAAATGC";
        my $aln   = $cigar->align($query, $ref, 5);  # align query to pos 5 of ref
        say foreach @$aln;

DESCRIPTION

    Bio::Cigar is a small library to parse CIGAR strings ("Compact
    Idiosyncratic Gapped Alignment Report"), such as those used in the SAM
    file format. CIGAR strings are a run-length encoding which minimally
    describes the alignment of a query sequence to an (often longer)
    reference sequence.

    Parsing follows the SAM v1 spec
    <http://samtools.github.io/hts-specs/SAMv1.pdf> for the CIGAR column.

    Parsed strings are represented by an object that provides a few utility
    methods.

ATTRIBUTES

    All attributes are read-only.

 string

    The CIGAR string for this object.

 reference_length

    The length of the reference sequence segment aligned with the query
    sequence described by the CIGAR string.

 query_length

    The length of the query sequence described by the CIGAR string.

 ops

    An arrayref of [length, operation] tuples describing the CIGAR string.
    Lengths are integers, possible operations are below.

  CIGAR operations

    The CIGAR operations are given in the following table, taken from the
    SAM v1 spec:

        Op  Description
        ‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾
        M   alignment match (can be a sequence match or mismatch)
        I   insertion to the reference
        D   deletion from the reference
        N   skipped region from the reference
        S   soft clipping (clipped sequences present in SEQ)
        H   hard clipping (clipped sequences NOT present in SEQ)
        P   padding (silent deletion from padded reference)
        =   sequence match
        X   sequence mismatch
    
        • H can only be present as the first and/or last operation.
        • S may only have H operations between them and the ends of the string.
        • For mRNA-to-genome alignment, an N operation represents an intron.
          For other types of alignments, the interpretation of N is not defined.
        • Sum of the lengths of the M/I/S/=/X operations shall equal the length of SEQ.

CONSTRUCTOR

 new

    Takes a CIGAR string as the sole argument and returns a new Bio::Cigar
    object.

METHODS

 rpos_to_qpos

    Takes a reference position (origin 1, base-numbered) and returns the
    corresponding position (origin 1, base-numbered) on the query sequence.
    Indels affect how the numbering maps from reference to query.

    In list context returns a tuple of [query position, operation at
    position]. Operation is a single-character string. See the table of
    CIGAR operations.

    If the reference position does not map to the query sequence (as with a
    deletion, for example), returns undef or [undef, operation].

 qpos_to_rpos

    Takes a query position (origin 1, base-numbered) and returns the
    corresponding position (origin 1, base-numbered) on the reference
    sequence. Indels affect how the numbering maps from query to reference.

    In list context returns a tuple of [references position, operation at
    position]. Operation is a single-character string. See the table of
    CIGAR operations.

    If the query position does not map to the reference sequence (as with
    an insertion, for example), returns undef or [undef, operation].

 op_at_rpos

    Takes a reference position and returns the operation at that position.
    Simply a shortcut for calling "rpos_to_qpos" in list context and
    discarding the first return value.

 op_at_qpos

    Takes a query position and returns the operation at that position.
    Simply a shortcut for calling "qpos_to_rpos" in list context and
    discarding the first return value.

 reversed

    Returns a new Bio::Cigar object with a CIGAR string that's the reverse
    of this one, i.e. the last operation becomes the first, the
    second-to-last the second, etc. until the first operation becomes the
    last.

 align($query, $reference, $start_pos=1, $reversed=0)

    Takes a query sequence and a reference sequence and aligns them
    according to the CIGAR string, using gap characters (-) for indels and
    spaces for soft clipping. This is pure string manipulation and as such
    the match and mismatch operators (= and X) are assumed to be correct
    for the given input sequences and not verified. Returns an array ref of
    [query seq, ref seq].

    Optionally, the leftmost reference position (origin 1) can be passed,
    i.e. the query is aligned starting at that position.

    When $reversed is given a true value, the reverse complement of the
    passed query sequence is used to generate the alignment. Only the IUPAC
    nucleotide codes ATCGU are currently supported for reverse
    complementation.

AUTHOR

    Thomas Sibley <trsibley@uw.edu>

    Felix Kühnl <felix@bioinf.uni-leipzig.de>

COPYRIGHT

    Copyright 2014- Mullins Lab, Department of Microbiology, University of
    Washington.

LICENSE

    This library is free software; you can redistribute it and/or modify it
    under the GNU General Public License, version 2.

SEE ALSO

    SAMv1 spec <http://samtools.github.io/hts-specs/SAMv1.pdf>