HTML::TagParser - Yet another HTML tag parser by pure Perl

    Parse a HTML file and find its <title> element's value.

        my $html = HTML::TagParser->new( "index-j.html" );
        my $elem = $html->getElementsByTagName( "title" );
        print "<title>", $elem->innerText(), "</title>\n" if ref $elem;

    Parse a HTML source and find its first <form action=""> attribute's

        my $html = HTML::TagParser->new( '<html><form action="hoge.cgi"></form></html>' );
        my $elem = $html->getElementsByTagName( "form" );
        print "<form action=\"", $elem->getAttribute("action"), "\">\n" if ref $elem;

    Fetch a HTML file via HTTP, and display its all <a> elements and

        my $html = HTML::TagParser->new( "" );
        my @list = $html->getElementsByTagName( "a" );
        foreach my $elem ( @list ) {
            my $tagname = $elem->tagName;
            my $attr = $elem->attributes;
            my $text = $elem->innerText;
            print "<$tagname";
            foreach my $key ( sort keys %$attr ) {
                print " $key=\"$attr->{$key}\"";
            if ( $text eq "" ) {
                print " />\n";
            } else {
                print ">$text</$tagname>\n";

    HTML::TagParser is a pure Perl implementaion for parsing HTML files.
    This module provides some methods like DOM. This module is not strict
    about XHTML format because many of HTML pages are not strict. You know,
    many pages use <br> elemtents instead of <br/> and have <p> elements
    which are not closed.

  $html = HTML::TagParser->new();
    This method constructs an empty instance of the "HTML::TagParser" class.

  $html = HTML::TagParser->new( $url );
    If new() is called with a URL, this method fetches a HTML file from
    remote web server and parses it and returns its instance. URI::Fetch
    module is required to fetch a file.

  $html = HTML::TagParser->new( $file );
    If new() is called with a filename, this method parses a local HTML file
    and returns its instance

  $html = HTML::TagParser->new( "<html>...snip...</html>" );
    If new() is called with a string of HTML source code, this method parses
    it and returns its instance.

  $html->fetch( $url, %param );
    This method fetches a HTML file from remote web server and parse it. The
    second argument is optional parameters for URI::Fetch module.

  $html->open( $file );
    This method parses a local HTML file.

  $html->parse( $source );
    This method parses a string of HTML source code.

  $elem = $html->getElementById( $id );
    This method returns the element which id attribute is $id.

  @elem = $html->getElementsByName( $name );
    This method returns an array of elements which name attribute is $name.
    On scalar context, the first element is only retruned.

  @elem = $html->getElementsByTagName( $tagname );
    This method returns an array of elements which tagName is $tagName. On
    scalar context, the first element is only retruned.

  @elem = $html->getElementsByClassName( $class );
    This method returns an array of elements which className is $tagName. On
    scalar context, the first element is only retruned.

  @elem = $html->getElementsByAttribute( $attrname, $value );
    This method returns an array of elements which $attrname attribute's
    value is $value. On scalar context, the first element is only retruned.

HTML::TagParser::Element SUBCLASS
  $tagname = $elem->tagName();
    This method returns $elem's tagName.

  $text = $elem->id();
    This method returns $elem's id attribute.

  $text = $elem->innerText();
    This method returns $elem's innerText without tags.

  $attr = $elem->attributes();
    This method returns a hash of $elem's all attributes.

  $value = $elem->getAttribute( $key );
    This method returns the value of $elem's attributes which name is $key.

    This module natively understands the character encoding used in document
    by parsing its meta element.

        <meta http-equiv="Content-Type" content="text/html; charset=Shift_JIS">

    The parsed document's encoding is converted as this class's fixed
    internal encoding "UTF-8".

    Yusuke Kawasaki,

    Copyright (c) 2006 Yusuke Kawasaki. All rights reserved. This program is
    free software; you can redistribute it and/or modify it under the same
    terms as Perl itself.