TagSoup Clean-Up Service

Introduction

The XAK TagSoup Clean-Up Service provides a simple online service to allow on the fly correction of malformed and dodgy HTML documents found in the wild. This allows them to be processed further, e.g. to extract metadata or apply XSLT transformations.

The service is based on John Cowan's TagSoup Parser.

Base URL

The Base URL of the query service is: http://xmlarmyknife.com/api/xhtml/tagsoup

Request Methods

This service currently only supports the HTTP GET method.

Request Parameters

Parameter Notes Required? Occurence
html-uri URL of HTML data to process Yes 1

TagSoup supports a number of other parameters, these same parameters can be applied to this service. Consult the TagSoup documentation for a complete list of options (see section "TagSoup as a stand-alone program").

Response Codes

Response Format

The service currently returns all responses with a Content-Type of application/xhtml+xml, unless the html option is specified, in which case the response is served as text/html.

TagSoup also supports responses in PYX format. These responses are returned as text/plain

Implementation Notes

This service has been implemented using TagSoup 1.0 Release Candidate 6.