2006-06-19

Transformers in Cocoon

Transformers are the components that get the job done. Without them, a pipeline could only generate some SAX events and serialize them back out. You’d get a little functionality, but not much. The standard Cocoon distribution includes a number of transformers; the following list gives you an overview of those that are built in. Some of these transformers are quite sophisticated, so we’ll provide a general description of how the transformers work. You’ll need to look at the components appendix of the Cocoon manual to learn about all the features of a particular transformer:





  • XSLT transformer—We’ve already looked at this transformer in the example sitemap. The transformer is in org.apache.cocoon.transformation.TraxTransformer and uses the name xslt.

  • Fragment extractor transformer—This transformer assumes that the incoming event stream represents an XML document that contains embedded SVG images. The transformer replaces those embedded images with an XLink locator that points to the image. This transformer is in org.apche.cocoon.transformation.FragmentExtractorTransfomer. It uses the name extractor.

  • I18N transformer—This transformer provides support to make it easier to internationalize your Web application. It does this by providing the ability to translate text content and attribute values into various languages. It lets you use parameters in those translations, similar to the functionality you get in a java.text.MessageFormat. Support for formatting dates, numbers, and currency is provided via the functionality provided in the java.text package.
    The I18N transformer assumes there are elements and attributes from the namespace http://apache.org/cocoon/i18n/2.1. The <text> element indicates text that is supplied from a message catalog. The content of the <text> element is used as the key to the catalog—the message catalog is queried with this key, and the resulting text replaces the <text> element in the output event stream. The attr attribute contains a space-separated list of attribute names. The values of these attributes are also assumed to be keys into the message catalog and are replaced with the text obtained from the catalog. Parameter substitution is done by enclosing the <text> element in a <translate> element and using {n} notation to indicate the placeholders to be filled in. Placeholders are numbered starting at 0 and are filled in by <param> elements the follow the <text> element inside the <translate> element, one for each placeholder. The content of a <param> element is the value of the parameter. The parameters may also be translated by enclosing the content of the <param> elements in a <text> element.
    You can format dates, times, or dates and times according to the current locale using the <date>, <time>, or <datetime> elements. Each of these elements takes a value attribute that contains the value to be formatted. A src-pattern attribute tells the transformer how to parse the value, and a pattern attribute tells the transformer how to format the value into the output event stream. You can also specify a locale and source-local attribute to indicate the current locale and the locale for the value. The patterns use the syntax of the java.text.SimpleDateFormat.
    The <number> element is used to format numbers. To format numbers only, you can specify a pattern attribute and a value attribute. The patterns follow the syntax of java.text.DecimalFormat. You can also use <number> to format currency values or percentages by specifying a type attribute with the value "currency" or "percent" instead of pattern.
    The message catalogs used by the I18n transformer are XML files whose root element is <catalogue> and whose child elements are <message> elements. The content of the <message> element is the text to be replaced. A key attribute associates a key with each <message> element. Each message catalog is given a name when the transformer is defined. This name is used as the base name for the message catalog file. The I18n transformer allows for a hierarchy of message catalogs that looks like the hierarchy allowed by java.util.ResourceBundle. The hierarchy search proceeds by trying basename.xml followed by basename_langauge.xml, followed by basename_language_country.xml, and ending with basename_language_country_variant.xml.
    The message catalogs are configured in the <transformer> element. Three configuration elements appear as children of the <transformer> element. The <catalogues> element contains a sequence of <catalogue> elements, one for each catalog. Each <catalogue> has an id element for identification, a name element that provides the base name for the catalog file, and a location attribute that specifies the location of the catalog files. After the <catalogues> element, an optional <untranslated-text> element contains the text that’s returned if a key can’t be translated (by default, the key name is output instead). The optional <cache-at-startup> element contains the value "true" or "false" as its content ("false" is the default). If the value is "true", then Cocoon tries to cache the messages in that catalog when it starts up.
    The I18n transformer is in org.apache.cocoon.transformation.I18nTransformer and is known by the name i18n.

  • Log transformer—The log transformer prints all the events that pass through it into a file. When you use the log transformer in a <transform> tag, you can supply two parameters: logfile, which tells the transformer which file to write the events into; and append, which tells the transformer whether it should append to the logfile or start the log over. If you don’t specify a value for logfile, the events are logged to the servlet engine’s standard output. This transformer is primarily used for debugging. It uses the name log and is available in org.apache.cocoon.transformation.LogTransformer.

  • SQL transformer—This transformer is one way of interacting with a SQL database in Cocoon. It assumes that some special XML elements in the input stream are destined for it. These elements are taken from http://apache.org/cocoon/SQL/2.0. The way it works is a little tricky. The input stream must contain a <page> element from the SQL namespace. There also must be an <execute-query> element as a child of the <page> element. Here’s the difficult part: There may be other elements from other namespaces as children of the <page> element, and the <execute-query> may be a child of one of these elements or their children. This is necessary because you want be able to position the results of the SQL query in the correct place in the document/event stream. The <execute-query> element has a single child element called <query>. The content of the <query> element is a SQL query. You can use simple SQL statements like select, insert, and update. You can also use a SQL stored procedure. If you use a stored procedure, then you must supply an isstoredprocedure attribute on <query>, and its value must be "true". The <query> element also takes a name attribute that’s used to name the result set.
    After the transformer has executed, the output event stream contains a <rowset> element where the <execute-query> element was. If a name attribute was supplied for the <query> element, then the <rowset> has a name attribute with the same value. If you set the show-nr-of-rows parameter in the <transform> element, then there is an attribute named nrofrows whose value is the number of rows in the <rowset> The content of the <rowset> is a sequence of <row> elements. Each <row> element contains an element for each column in the result set, and the content of that element is the value of the column in the appropriate row.
    When you specify the SQL transformer in a <transform> element, you need to supply a parameter called use-connection. The value of this parameter is the name of a datasource connection defined in the Cocoon configuration file cocoon.xconf. You can supply a parameter called show-nr-of-rows, which adds a nrofrows attribute containing the number of rows to the <rowset>. You can also supply a parameter called clob-encoding that specifies the character encoding to be used when reading data out of CLOB columns.
    The SQL transformer is available under the name sql and is in the class org.apache.cocoon.transformation.SQLTransformer.

  • Filter transformer—The filter transformer allows you to reduce the number of elements in a sequence in order to avoid processing them. It assumes that the incoming event stream contains a sequence of the same element. The parameters for the transformer allow you to specify which element should be filtered (the parameter name is element-name), how many elements should be passed through (the parameter name is count), and what block number to start at (the parameter name is blocknr). When the transformer executes, it breaks the sequence into blocks whose size is determined by the count parameter. The output event stream takes the elements in the sequence and wraps them up in a <block> element. There are count elements per <block>, and each block is given an id attribute whose value starts at 1. The blocknr parameter specifies the id of the <block> that is to be filled in. That’s the only <block> that has elements from the sequence in it; all the other <block> elements are empty. This transformer is useful for producing paged output, because you can use variables to provide the values for the parameters. The name assigned to this transformer is filter, and the class is org.apache.cocoon.transformation.FilterTransformer.

  • Write DOM session transformer—This transformer converts the input event stream into a DOM tree and stores that DOM tree in the servlet session. There are two parameters to this transformer: dom-name is the name used to store the DOM tree in the servlet session, and dom-root-element allows you to specify the name of the element in the input event stream that’s used as the root of the DOM tree. You use the name writeDOMsession to use this transformer, and the class is org.apache.cocoon.transformation.WriteDOMSessionTransformer.

  • Read DOM session transformer—This transformer retrieves a DOM tree from the servlet session and converts it back into a SAX event stream. The dom-name parameter is the name of the DOM tree that’s retrieved from the session. The trigger-name parameter is the name of the element in the input event stream that triggers the transformer to start generating events. The position parameter determines how the events from the DOM tree are placed relative to the trigger element. If position is "before", then the events from the tree appear before the trigger element. If position is "in", then the transformer generates a startElement for the trigger element, generates all the events for the DOM tree, and then resumes generating events from the input event stream. If the position is "after", then the events for the DOM tree are generated right after the endElement event for the trigger element. In all cases, the events from the DOM tree are added to the stream coming from the transformer input. It’s just a question of where. This transformer is available via the name readDOMsession, and the class is org.apache.cocoon.transformation.ReadDOMSessionTransformer.

  • XInclude transformer—The XInclude transformer expects the input event stream to contain at least one XInclude element. XInclude provides a way to merge one or more XML documents into another. The transformer performs the inclusion specified by the XInclude element or elements and outputs an event stream containing the merged document. The class for the XInclude transformer is org.apache.cocoon.transformation.XIncludeTransformer, and the name is xinclude.

  • CInclude transformer—In addition to using XInclude to combine documents, Cocoon has defined its own inclusion mechanism. This is available via the CInclude transformer. It expects the input event stream to contain elements from the namespace http://apache.org/cocoon/include/1.0. The simplest form of include is an <include> element, which has a src attribute that indicates the document to include. You can also specify an element attribute that defines the name of an element used to wrap the included XML. If the wrapper element is specified, the <include> element in the input stream is replaced by the wrapper element, and the child of the wrapper element is the contents of the included document; otherwise, the <include> element is replaced by the document contents. The namespace and prefix of the wrapper element are controlled by the ns and prefix attributes of the <include> element.
    The CInclude transform also allows you to include XML from an external HTTP via either the GET or POST method. The GET method is relatively simple. Instead of <include>, you use <includexml>, which has no attributes and a single child element <src>. The content of the <src> element is the URL that should be accessed using the GET method. If an error occurs, then the input event stream is lost. If you wish to proceed anyway, you can set the ignoreErrors attribute of <includexml> to "true".
    To use the CInclude transform to do a POST to request a document, you again use the <includexml> element, but this time it has three child elements. In addition to the <src> element, it contains a <configuration> element that contains a <parameter> element. <parameter> elements have two children—<name> and <value>—and store the name and value as their content. To perform a POST, the <parameter> is named method and the value is POST. After the <configuration> element is a <parameters> element. This element contains a sequence of <parameter> elements (just like the one used in <configuration>), one for each parameter to the POST method.
    The CInclude transformer is in org.apache.cocoon.transformation.CIncludeTransformer and is available under the name cinclude.

  • EncodeURL transformer—The EncodeURL transformer takes care of encoding URLs that appear in the input event stream. This is much easier that trying to call encodeURL at all the right points. By making the EncodeURL transformer the last transformer in your pipeline (before the serializer), you can ensure that all URLs in the output event stream are properly encoded. The transformer takes two configuration options as children of the <transformer> element where it’s defined. The <include-name> option allows you to specify a regular expression that’s used to determine which attributes are treated as URLs to be encoded. The regular expressions are of the form element-name/@attribute-name. The default value for <include-name> is ./*@href|.*/@action|frame/@src, which covers any href attribute, any action attribute, and any src attribute of a <frame> element. The <exclude-name> option allows you to exclude attributes that should not be treated as URLs. Its default value is img/@src, which means the src attributes of <img> elements won’t be encoded. This transformer is in the class org.apache.cocoon .trasnformation.EncodeURLTrasnformer and is assigned the name encodeURL.

  • Augment transformer—This transformer looks at all href attributes in the input event stream and converts any relative URLs to absolute URLs. The transformer normally makes relative URLs absolute in relation to the request URI. If you specify the mount parameter as a child of the <transform> element, then URLs are made absolute relative to the servlet context appended with the value mount. For example, if the value of mount is "resources" and the Cocoon Web application has been installed as http://localhost:8080/cocoon, then URLs are made absolute against http://localhost:8080/cocoon/resources. This means the relative URL icon.gif becomes http://localhost:8080/cocoon/resources/icon.gif. This transformer is in org.apache.cocoon .transformation.AugmentTransformer and uses the name augment.

没有评论:

发表评论