org.apache.cocoon.generation
Class LinkStatusGenerator

java.lang.Object
  extended byorg.apache.cocoon.util.AbstractLogEnabled
      extended byorg.apache.cocoon.xml.AbstractXMLProducer
          extended byorg.apache.cocoon.generation.AbstractGenerator
              extended byorg.apache.cocoon.generation.ServiceableGenerator
                  extended byorg.apache.cocoon.generation.LinkStatusGenerator
All Implemented Interfaces:
Configurable, Disposable, org.apache.cocoon.generation.Generator, Poolable, Recyclable, Serviceable, org.apache.cocoon.sitemap.SitemapModelComponent, org.apache.cocoon.xml.XMLProducer

public class LinkStatusGenerator
extends org.apache.cocoon.generation.ServiceableGenerator
implements Configurable

Generates a list of links that are reachable from the src and their status.

Version:
$Id: LinkStatusGenerator.html 1304280 2012-03-23 11:18:01Z ilgrosso $

Field Summary
static String ACCEPT_CONFIG
          Config element name specifying http header value for accept.
static String ACCEPT_DEFAULT
          Default value of accept configuration value.
protected  AttributesImpl attributes
           
protected static String CONTENT_ATTR_NAME
           
static String EXCLUDE_CONFIG
          Config element name specifying excluding regular expression pattern.
protected static String HREF_ATTR_NAME
           
static String INCLUDE_CONFIG
          Config element name specifying including regular expression pattern.
static String LINK_CONTENT_TYPE_CONFIG
          Config element name specifying expected link content-typ.
 String LINK_CONTENT_TYPE_DEFAULT
          Default value of link-content-type configuration value.
protected static String LINK_NODE_NAME
           
static String LINK_VIEW_QUERY_CONFIG
          Config element name specifying query-string appendend for requesting links of an URL.
static String LINK_VIEW_QUERY_DEFAULT
          Default value of link-view-query configuration value.
protected static String MESSAGE_ATTR_NAME
           
protected static String PREFIX
          The namespace prefix for this namespace.
protected static String REFERRER_ATTR_NAME
           
protected static String STATUS_ATTR_NAME
           
protected static String TOP_NODE_NAME
           
protected static String URI
          The URI of the namespace of this generator.
static String USER_AGENT_CONFIG
          Config element name specifying http header value for user-Agent.
static String USER_AGENT_DEFAULT
          Default value of user-agent configuration value.
 
Fields inherited from class org.apache.cocoon.generation.ServiceableGenerator
manager
 
Fields inherited from class org.apache.cocoon.generation.AbstractGenerator
objectModel, parameters, resolver, source
 
Fields inherited from class org.apache.cocoon.xml.AbstractXMLProducer
contentHandler, EMPTY_CONTENT_HANDLER, lexicalHandler, xmlConsumer
 
Fields inherited from interface org.apache.cocoon.generation.Generator
ROLE
 
Constructor Summary
LinkStatusGenerator()
           
 
Method Summary
 void configure(Configuration configuration)
          Configure the crawler component.
 void generate()
          Generate XML data.
protected  List getLinksFromConnection(String url_link_string, URL url_of_referrer)
          Retrieve a list of links of a url
protected  String processURL(URL url, String referrer)
          Generate xml attributes of a url, calculate url for retrieving links
 void recycle()
           
 void setup(org.apache.cocoon.environment.SourceResolver resolver, Map objectModel, String src, Parameters par)
           
 
Methods inherited from class org.apache.cocoon.generation.ServiceableGenerator
dispose, service
 
Methods inherited from class org.apache.cocoon.xml.AbstractXMLProducer
setConsumer, setContentHandler, setLexicalHandler
 
Methods inherited from class org.apache.cocoon.util.AbstractLogEnabled
getLogger, setLogger
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.apache.cocoon.xml.XMLProducer
setConsumer
 

Field Detail

URI

protected static final String URI
The URI of the namespace of this generator.

See Also:
Constant Field Values

PREFIX

protected static final String PREFIX
The namespace prefix for this namespace.

See Also:
Constant Field Values

TOP_NODE_NAME

protected static final String TOP_NODE_NAME
See Also:
Constant Field Values

LINK_NODE_NAME

protected static final String LINK_NODE_NAME
See Also:
Constant Field Values

HREF_ATTR_NAME

protected static final String HREF_ATTR_NAME
See Also:
Constant Field Values

REFERRER_ATTR_NAME

protected static final String REFERRER_ATTR_NAME
See Also:
Constant Field Values

CONTENT_ATTR_NAME

protected static final String CONTENT_ATTR_NAME
See Also:
Constant Field Values

STATUS_ATTR_NAME

protected static final String STATUS_ATTR_NAME
See Also:
Constant Field Values

MESSAGE_ATTR_NAME

protected static final String MESSAGE_ATTR_NAME
See Also:
Constant Field Values

attributes

protected AttributesImpl attributes

LINK_CONTENT_TYPE_CONFIG

public static final String LINK_CONTENT_TYPE_CONFIG
Config element name specifying expected link content-typ.

Its value is link-content-type.

Since:
See Also:
Constant Field Values

LINK_CONTENT_TYPE_DEFAULT

public final String LINK_CONTENT_TYPE_DEFAULT
Default value of link-content-type configuration value.

Its value is application/x-cocoon-links.

Since:
See Also:
Constant Field Values

LINK_VIEW_QUERY_CONFIG

public static final String LINK_VIEW_QUERY_CONFIG
Config element name specifying query-string appendend for requesting links of an URL.

Its value is link-view-query.

Since:
See Also:
Constant Field Values

LINK_VIEW_QUERY_DEFAULT

public static final String LINK_VIEW_QUERY_DEFAULT
Default value of link-view-query configuration value.

Its value is ?cocoon-view=links.

Since:
See Also:
Constant Field Values

EXCLUDE_CONFIG

public static final String EXCLUDE_CONFIG
Config element name specifying excluding regular expression pattern.

Its value is exclude.

Since:
See Also:
Constant Field Values

INCLUDE_CONFIG

public static final String INCLUDE_CONFIG
Config element name specifying including regular expression pattern.

Its value is include.

Since:
See Also:
Constant Field Values

USER_AGENT_CONFIG

public static final String USER_AGENT_CONFIG
Config element name specifying http header value for user-Agent.

Its value is user-agent.

Since:
See Also:
Constant Field Values

USER_AGENT_DEFAULT

public static final String USER_AGENT_DEFAULT
Default value of user-agent configuration value.

Since:
See Also:
Constants.COMPLETE_NAME

ACCEPT_CONFIG

public static final String ACCEPT_CONFIG
Config element name specifying http header value for accept.

Its value is accept.

Since:
See Also:
Constant Field Values

ACCEPT_DEFAULT

public static final String ACCEPT_DEFAULT
Default value of accept configuration value.

Its value is * / *

Since:
See Also:
Constant Field Values
Constructor Detail

LinkStatusGenerator

public LinkStatusGenerator()
Method Detail

configure

public void configure(Configuration configuration)
               throws ConfigurationException
Configure the crawler component.

Configure can specify which URI to include, and which URI to exclude from crawling. You specify the patterns as regular expressions.

Morover you can configure the required content-type of crawling request, and the query-string appended to each crawling request.


 <include>.*\.html?</include> or <include>.*\.html?, .*\.xsp</include>
 <exclude>.*\.gif</exclude> or <exclude>.*\.gif, .*\.jpe?g</exclude>
 <link-content-type> application/x-cocoon-links </link-content-type>
 <link-view-query> ?cocoon-view=links </link-view-query>
 <user-agent> Cocoon </user-agent>
 <accept> text/xml </accept>
 

Specified by:
configure in interface Configurable
Parameters:
configuration - XML configuration of this avalon component.
Throws:
ConfigurationException - is throwing if configuration is invalid.
Since:

setup

public void setup(org.apache.cocoon.environment.SourceResolver resolver,
                  Map objectModel,
                  String src,
                  Parameters par)
           throws org.apache.cocoon.ProcessingException,
                  SAXException,
                  IOException
Specified by:
setup in interface org.apache.cocoon.sitemap.SitemapModelComponent
Throws:
org.apache.cocoon.ProcessingException
SAXException
IOException

generate

public void generate()
              throws SAXException,
                     org.apache.cocoon.ProcessingException
Generate XML data.

Specified by:
generate in interface org.apache.cocoon.generation.Generator
Throws:
SAXException - if an error occurs while outputting the document
org.apache.cocoon.ProcessingException - if the requsted URI wasn't found

getLinksFromConnection

protected List getLinksFromConnection(String url_link_string,
                                      URL url_of_referrer)
Retrieve a list of links of a url

Parameters:
url_link_string - url for requesting links, it is assumed that url_link_string queries the cocoon view links, ie of the form http://host/foo/bar?cocoon-view=links
url_of_referrer - base url of which links are requested, ie of the form http://host/foo/bar
Returns:
List of links from url_of_referrer, as result of requesting url url_link_string

processURL

protected String processURL(URL url,
                            String referrer)
                     throws SAXException
Generate xml attributes of a url, calculate url for retrieving links

Parameters:
url - to process
referrer - of the url
Returns:
String url for retrieving links, or null if url is an excluded-url, and not an included-url.
Throws:
SAXException

recycle

public void recycle()
Specified by:
recycle in interface Recyclable


Copyright 1999-2008 The Apache Software Foundation. All Rights Reserved.