NAME
    CPAN::Visitor - Generic traversal of distributions in a CPAN repository

VERSION
    version 0.005

SYNOPSIS
        use CPAN::Visitor;
        my $visitor = CPAN::Visitor->new( cpan => "/path/to/cpan" );

        # Prepare to visit all distributions
        $visitor->select();

        # Or a subset of distributions
        $visitor->select(
          subtrees => [ 'D/DA', 'A/AD' ], # relative to authors/id/
          exclude => qr{/Acme-},          # No Acme- dists
          match => qr{/Test-}             # Only Test- dists
        );

        # Action is specified via a callback
        $visitor->iterate(
          visit => sub {
            my $job = shift;
            print $job->{distfile} if -f 'Build.PL'
          }
        );

        # Or start with a list of files
        $visitor = CPAN::Visitor->new(
          cpan => "/path/to/cpan",
          files => \@distfiles,     # e.g. ANDK/CPAN-1.94.tar.gz
        );
        $visitor->iterate( visit => \&callback );

        # Iterate in parallel
        $visitor->iterate( visit => \&callback, jobs => 5 );

DESCRIPTION
    A very generic, callback-driven program to iterate over a CPAN
    repository.

    Needs better documentation and tests, but is provided for others to
    examine, use or contribute to.

USAGE
  new
      my $visitor = CPAN::Visitor->new( @args );

    Object attributes include:

    *   "cpan" — path to CPAN or mini CPAN repository. Required.

    *   "quiet" — whether warnings should be silenced (e.g. from
        extraction). Optional.

    *   "stash" — hash-ref of user-data to be made available during
        iteration. Optional.

    *   "files" — array-ref with a pre-selection of of distribution files.
        These must be in AUTHOR/NAME.suffix format. Optional.

  select
      $visitor->select( @args );

    Valid arguments include:

    *   "subtrees" — path or array-ref of paths to search. These must be
        relative to the 'authors/id/' directory within a CPAN repo. If
        given, only files within those subtrees will be considered. If not
        specified, the entire 'authors/id' tree is searched.

    *   "exclude" — qr() or array-ref of qr() patterns. If a path matches
        *any* pattern, it is excluded

    *   "match" — qr() or array-ref of qr() patterns. If an array-ref is
        provided, only paths that match *all* patterns are included

    *   all_files — boolean that determines whether all files or only files
        that have a distribution archive suffix are selected. Default is
        false.

    *   append — boolean that determines whether the selected files should
        be appended to previously selected files. The default is false,
        which replaces any previous selection

    The "select" method returns a count of files selected.

  iterate
     $visitor->iterate( @args );

    Valid arguments include:

    *   "jobs" — non-negative integer specifying the maximum number of
        forked processes. Defaults to none.

    *   "check" — code reference callback

    *   "start" — code reference callback

    *   "extract" — code reference callback

    *   "enter" — code reference callback

    *   "visit" — code reference callback

    *   "leave" — code reference callback

    *   "finish" — code reference callback

    See "ACTION CALLBACKS" for more. Generally, you only need to provide the
    "visit" callback, which is called from inside the unpacked distribution
    directory.

    The "iterate" method always returns true.

ACTION CALLBACKS
    Each selected distribution is processed with a series of callback
    functions. These are each passed a hash-ref with information about the
    particular distribution being processed.

      sub _my_visit {
        my $job = shift;
        # do stuff
      }

    The job hash-ref is initialized with the following fields:

    *   "distfile" — the unique, short CPAN distfile name, e.g.
        DAGOLDEN/CPAN-Visitor-0.001.tar.gz

    *   "distpath" — the absolute path the distribution archive, e.g.
        /my/cpan/authors/id/D/DA/DAGOLDEN/CPAN-Visitor-0.001.tar.gz

    *   "tempdir" — a File::Temp directory object for extraction or other
        things

    *   "stash" — the 'stash' hashref from the Visitor object

    *   "quiet" — the 'quiet' flag from the Visitor object

    *   "result" — an empty hashref to start; the return values from each
        action are added and may be referenced by subsequent actions

    The "result" field is used to accumulate the return values from action
    callbacks. For example, the return value from the default 'extract'
    action is the unpacked distribution directory:

      $job->{result}{extract} # distribution directory path

    You do not need to store the results yourself — the "iterate" method
    takes care of it for you.

    Callbacks occur in the following order. Some callbacks skip further
    processing if the return value is false.

    *   "check" — determines whether the distribution should be processed;
        goes to next file if false; default is always true

    *   "start" — used for any setup, logging, etc; default does nothing

    *   "extract" — operate on the tarball to prepare for visiting; skips to
        finish action if it returns a false value; the default extracts a
        distribution into a temp directory and returns the path to the
        extracted directory; if the "stash" has a true value for
        "prefer_bin", binary tar, etc. will be preferred. This is faster,
        but less portable.

    *   "enter" — skips to the finish action if it returns false; default
        takes the result of extract, chdir's into it, and returns the
        original directory; if the extract result is missing the +x
        permissions, this will attempt to add it before calling chdir.

    *   "visit" — examine the distribution or otherwise do stuff; the
        default does nothing;

    *   "leave" — default returns to the original directory (the result of
        enter)

    *   "finish" — any teardown processing, logging, etc.

    These allow complete customization of the iteration process. For
    example, one could do something like this:

    *   replace the default "extract" callback with one that returns an
        arrayref of distribution files without actually unpacking it into a
        physical directory

    *   replace the default "enter" callback with one that does nothing but
        return a true value; replace the default "leave" callback likewise

    *   have the "visit" callback get the "$job->{result}{extract}" listing
        and examine it for the presence of certain files

    This could potentially speed up iteration if only the file names within
    the distribution are of interest and not the contents of the actual
    files.

SEE ALSO
    *   App::CPAN::Mini::Visit

    *   CPAN::Mini::Visit

SUPPORT
  Bugs / Feature Requests
    Please report any bugs or feature requests through the issue tracker at
    <https://github.com/dagolden/CPAN-Visitor/issues>. You will be notified
    automatically of any progress on your issue.

  Source Code
    This is open source software. The code repository is available for
    public review and contribution under the terms of the license.

    <https://github.com/dagolden/CPAN-Visitor>

      git clone https://github.com/dagolden/CPAN-Visitor.git

AUTHOR
    David Golden <dagolden@cpan.org>

COPYRIGHT AND LICENSE
    This software is Copyright (c) 2010 by David Golden.

    This is free software, licensed under:

      The Apache License, Version 2.0, January 2004