HowTo – Using SimpleXMLElement And xpath To Grab XML Node Data

Filed under: Web Design

Whether you are grabbing an XML feed off of the Interwebs someplace or from a local database or file accessing the data in the nodes can be a drag.

In this example we will be loading the raw xml into SimpleXML for processing then we will find the node that has the data we want to extract with xpath.

The first thing you have to understand about SimpleXML is that it is very picky about character sets. It really hates anything but pure UTF8 so you are likely to run into a lot of choking of your script until you get some clean XML to push through SimpleXML.

Depending on your source this may require a lot of preg_replace calls to strip out the bad characters.

This line should help for a lot of you.

$rawxml is your xml source and $utfXMLresults is your processed xml

[php]

$utfXMLresults = preg_replace(‘/[^(x20-x7F)]*/’,”, $rawxml);

[/php]

Once you can get the xml feed in to SimpleXML you want to echo it back or print_r it to make sure SimpleXML is reading it. You will be displayed a layout of your xml file.

Now you have the choice of hunting deep into the nodes and constructing the exact path to the data you want to extract or you may want to use xpath to grab the data based on the node name.

In this piece of code you can see we are reading in the $xmlFeed that was stored when we used curl to grab it.

Once the data is in SimpleXML we use xpath with the NodeName … replace NodeName with your actual node’s name to read the data in a specific node.  If you had a node named telephone in your address book this code would read all the nodes and return just the telephone numbers by looping through the xml.

as you will notice on line 2 the part //NodeName … the // slashes tell xpath to find the node named NodeName where ever it is in the XML file. For this reason using this code is best when you are careful setting up your XML and know there is always a node named NodeName available… if not read your xml feed and find the nodes you need to grab.

[php]
$xml = new SimpleXMLElement($xmlFeed);

$result = $xml->xpath(‘//NodeName’);

while(list( , $node) = each($result)) {
echo ‘NodeName: ‘,$node,"n";
}
[/php]

I hope this helps you and you can find more info about SimpleXML and xpath on the PHP Documentation site.