ea08e6cd55
initdb/regression tests pass. |
||
---|---|---|
.. | ||
Makefile | ||
pgxml.c | ||
pgxml.h | ||
pgxml.source | ||
pgxml_dom.c | ||
pgxml_dom.source | ||
README | ||
TODO |
This package contains some simple routines for manipulating XML documents stored in PostgreSQL. This is a work-in-progress and somewhat basic at the moment (see the file TODO for some outline of what remains to be done). At present, two modules (based on different XML handling libraries) are provided. Prerequisite: pgxml.c: expat parser 1.95.0 or newer (http://expat.sourceforge.net) or pgxml_dom.c: libxml2 (http://xmlsoft.org) The libxml2 version provides more complete XPath functionality, and seems like a good way to go. I've left the old versions in there for comparison. Compiling and loading: ---------------------- The Makefile only builds the libxml2 version. To compile, just type make. Then you can use psql to load the two function definitions: \i pgxml_dom.sql Function documentation and usage: --------------------------------- pgxml_parse(text) returns bool parses the provided text and returns true or false if it is well-formed or not. It returns NULL if the parser couldn't be created for any reason. pgxml_xpath (XQuery functions) - differs between the versions: pgxml.c (expat version) has: pgxml_xpath(text doc, text xpath, int n) returns text parses doc and returns the cdata of the nth occurence of the "simple path" entry. However, the remainder of this document will cover the pgxml_dom.c version. pgxml_xpath(text doc, text xpath, text toptag, text septag) returns text evaluates xpath on doc, and returns the result wrapped in <toptag>...</toptag> and each result node wrapped in <septag></septag>. toptag and septag may be empty strings, in which case the respective tag will be omitted. Example: Given a table docstore: Attribute | Type | Modifier -----------+---------+---------- docid | integer | document | text | containing documents such as (these are archaeological site descriptions, in case anyone is wondering): <?XML version="1.0"?> <site provider="Foundations" sitecode="ak97" version="1"> <name>Church Farm, Ashton Keynes</name> <invtype>watching brief</invtype> <location scheme="osgb">SU04209424</location> </site> one can type: select docid, pgxml_xpath(document,'//site/name/text()','','') as sitename, pgxml_xpath(document,'//site/location/text()','','') as location from docstore; and get as output: docid | sitename | location -------+--------------------------------------+------------ 1 | Church Farm, Ashton Keynes | SU04209424 2 | Glebe Farm, Long Itchington | SP41506500 3 | The Bungalow, Thames Lane, Cricklade | SU10229362 (3 rows) or, to illustrate the use of the extra tags: select docid as id, pgxml_xpath(document,'//find/type/text()','set','findtype') from docstore; id | pgxml_xpath ----+------------------------------------------------------------------------- 1 | <set></set> 2 | <set><findtype>Urn</findtype></set> 3 | <set><findtype>Pottery</findtype><findtype>Animal bone</findtype></set> (3 rows) Which produces a new, well-formed document. Note that document 1 had no matching instances, so the set returned contains no elements. document 2 has 1 matching element and document 3 has 2. This is just scratching the surface because XPath allows all sorts of operations. Note: I've only implemented the return of nodeset and string values so far. This covers (I think) many types of queries, however. John Gray <jgray@azuli.co.uk> 16 August 2001