2007-10-10
| Table of Contents: |
| Rate This Article: | Add This Article To: |
( Page 1 of 2 )
The .NET framework includes some powerful classes for navigating about XML data. Jeff Cogswell shows you how to use them.
Over the past decade or so, XML has proved an incredibly powerful means for storing data. If you're writing a .NET application, you can make use of XML in your own applications using a set of classes specifically for processing and navigating XML.
In this article I'm exploring how to navigate XML files using a class called XPathNavigator. In future articles I'll discuss how to read, write, and manipulate XML files.
Understanding XPath
If you're going to do much work with processing XML files, you'll really want to master XPath. XPath is a notation for specifying individual or groups of elements in an XML file. XPath is not a language for processing XML files; it's simply a way of specifying nodes. Here I'll show you some basics of XPath; however, I strongly encourage you to explore further and learn as much as you can about it.
Before I show you some examples, let me draw an analogy. When you need to specify a file on your hard drive, you use directory names and slashes, such as this:
C:\Windows\System32\winlogon.exe
This is a path in the file system; each directory is delimited by a slash, and the final item is not a directory but a file.
XPath works similarly. Like directories and files, elements inside an XML files are organized hierarchically. As such, you can use a very similar notation to refer to individual elements inside your XML file.
Here's a sample XML file:
<customers>
<customer name="Ziff Davis Enterprise">
<orders>
<order id="1000" item="Windows Vista Ultimate" Quantity="500" />
<order id="1001" item="Microsoft Office Professional"
Quantity="200" />
</orders>
</customer>
<customer name="ACME Publishing">
<orders>
<order id="1002" item="Windows Vista Ultimate" Quantity="10" />
</orders>
</customer>
</customers>
The outermost element is customers. Inside that there are two elements of name customer; each has an attributed called name. Inside each customer element is an element called orders, which itself has one or more elements each of name order.
You can see the hierarchical structure; the outermost element contains two elements, each of which contains elements, and so on.
Using a path notation much like that of a directory path, you can refer to individual items inside the XML file. Take a look at this path:
/customers/customer/orders/order
This refers to an order element inside an orders element, inside a customer element, which is inside the root customers element.
Note, however, that so far this path isn't precise. There's only one root customers element (the XML standard requires that you only have a single root element), but inside that element there are two customer elements. So the portion
/customers/customer
could refer to two different elements. The entire path
/customers/customer/orders/order
then refers to all the order elements in this XML file.
XPath lets you get more specific, however, to narrow down to certain elements by specifying, in addition to the element names, attributes. I don't have room here to give a fully detailed explanation here, but suffice to say that to get more specific, we use what are called predicates, which are expressions that allow us to get more specific. Predicates go inside square brackets and contain an expression. To refer to an element's attributes in a predicate, you use the @ symbol like so:
/customers/customer[@name='Ziff Davis Enterprise']
Now this path refers to a specific element, the customer element whose name is Ziff Davis Enterprise.
From there we can access a particular order, like so:
/customers/customer[@name='Ziff Davis Enterprise']/orders/order[@id='1000']
Notice, however, that there's only a single element in the entire file that is an order element with an id attribute of 1000. XPath lets us skip levels in the hierarchy by using two slashes. Take a look at this path:
/customers//order[@id='1000']
Since only one order element has an id of 1000, this path will refer to the same single order element.
Note, however, that just because we're using a predicate, we're not automatically guaranteed to have a single node. For example, look at this path:
/customers//order[@item='Windows Vista Ultimate']
In this case, there are two order nodes whose item attribute is Windows Vista Ultimate. Thus, this path refers to a collection containing two elements.
Moving back up (sort of)
When you refer to directories on a file system, you can specify the names of the directories, but you can also step up a directory using two dots, like so:
c:\windows\system32\..\notepad.exe
This refers to the notepad.exe file in the windows directory. Of course, that's not particularly useful in practice, because it's identical to this shorter, simpler path:
c:\windows\notepad.exe
In XPath, this concept is much more useful. First, take a look at the earlier example again that referred to a specific order:
/customers//order[@id='1000']
This refers to a single order for a single customer. But which customer? To find that out, you step up two levels—first to the orders element, and then to the customer element. To accomplish that in XPath, you use two dots to move up, just as with a directory:
/customers//order[@id='1000']/../..
That gets us to the customer element. But what is the name of the customer? You can find that out by checking the customer element's name attribute. However, in this case we're not doing a test to find a specific node. So we don't use a predicate inside square brackets. Instead, we just check the attribute. Here's how you do it:
/customers//order[@id='1000']/../../@name
(This might seem a little odd: the second set of dots gets you to the customer element, but from there you again use a slash to get to the name attribute. However, you aren't really moving to a different node. But that's okay; it's just the rules of XPath.)
Before moving on, I want to point out something. Notice I called this section "Moving back up (sort of)". The reason I put in that "sort of" phrase is that with XPath you're not really moving around in an XML document with XPath. XPath is a static position (or set of positions) within an XML document. True, the double dot refers to the directory above, but you're not actually moving about. Rather, the XPath is taken as a whole, and the final item refers to a location or set of locations.
Still, using XPaths in succession, you can actually move around a document. Now let's see how you can do that in .NET.
![]() |
|


