Welcome to Part II of the Java and XML series. Last week we looked at the basics of parsing an
XML document using the IBM SAX parser. After a brief introduction to XML you learned that SAX is an
event driven parser, and how to overload the appropriate SAX event methods to extract data from an XML
document.
In our last example two of the methods that we overloaded were the startElement(...) and endElement methods. Just for review and to refresh your
memory these methods are called every time the SAX parser encountered the opening tag or closing tag of an element respectively.
Below is the sample XML document we used previously.
<?xml version="1.0"?>
<order>
<item>
<name>Soccer Ball</name>
<price>15.00</price>
<quantity>5</quantity>
</item>
</order>
To extract information from the XML document we overloaded the startElement(...) and endElement(...) methods
as shown below.
public void startElement(String uri, String local_name, String raw_name, Attributes amap)
throws SAXException
{
System.out.println("start " + local_name + " found ");
}
public void endElement(String name) throws SAXException
{
System.out.println("end " + local_name + " found");
}
To illustrate the logic behind these event methods the startElement(...) method is called for the <order> , <item> , <name> ,
<price> and <quantity> tags. As you might guess the endElement method is called for the
</order> , </item> , </name> , </price> and </quantity> tags.
To extract any character data within the context of an element we overload the characters(...) method as shown below.
public void characters(char[] ch, int start, int length)
throws SAXException
{
System.out.println("characters " + new String(ch,start,length) + " found ");
}
However unlike the startElement(...) and endElement(...) methods the characters(...) method has no
parameter passed to it identifying which element it is associated with. To get around this little inconvenience
we will add a state variable to our class which tracks which element is currently being parsed. Our new
class is redefined below.
import org.xml.sax.*;
import org.xml.sax.helpers.DefaultHandler;
public class SimpleSax extends DefaultHandler
{
private String currentElement = null; // State variable
public static void main(String[] args)
{
try
{
Class c = Class.forName("org.apache.xerces.parsers.SAXParser");
XMLReader reader = (XMLReader)c.newInstance();
SimpleSax ss = new SimpleSax();
reader.setContentHandler(ss);
reader.parse("department.xml");
}
catch(Exception e){System.out.println(e);}
}
public void startElement(String uri, String local_name, String raw_name, Attributes amap)
throws SAXException
{
currentElement = local_name;
System.out.println("start " + local_name + " found ");
}
public void endElement(String uri, String local_name, String raw_name)
throws SAXException
{
System.out.println("end " + local_name + " found");
}
public void startDocument()
throws SAXException
{
System.out.println("start document found");
}
public void endDocument()
throws SAXException
{
System.out.println("end document found");
}
public void characters(char[] ch, int start, int length)
throws SAXException
{
/* Print characters along with element context */
System.out.println("characters " +
new String(ch,start,length) + "
found for element " + currentElement);
}
}
In our new class we added a member variable of type String called currentElement. A simple modification
was made to our startElement(...) method which now upon encountering a start element sets the value of
currentElement to its local_name. With that all set we can now easily determine the context of character
data processed in the characters(...) method. The output of our new class when run is shown below.
start document found
start order found
characters
found for element order
start item found
characters
found for element item
start name found
characters Soccer Ball found for element name
end name found
characters
found for element name
start price found
characters 15.00 found for element price
end price found
characters
found for element price
start quantity found
characters 5 found for element quantity
end quantity found
characters
found for element quantity
end item found
characters
found for element quantity
end order found
end document found
So far so good, right?. Well almost. If you look carefully at the output above you will see several instances
of the following text.
characters
found for element name
Where is this coming from? Don't worry there is nothing wrong with the code. This is just the way that XML handles whitespace. Whitespace, if you
are unfamiliar with the term is what separates words or characters from each other. This includes not only spaces but all
non visible control characters such as tabs and newline characters. What we are seeing above is the XML parser handling a newline
character after the end of each closing element tag. To fix this problem we could put all the XML data on one line however that
would make the XML document a bit difficult to read. A better solution is to modify our characters(...) method to filter out all control
characters. The modified characters(...) method is shown below along with our new output.
public void characters(char[] ch, int start, int length)
throws SAXException
{
if(!Character.isISOControl(ch[0]))
{
System.out.println("characters " +
new String(ch,start,length) +
" found for element " + currentElement);
}
}
start document found
start order found
start item found
start name found
characters Soccer Ball found for element name
end name found
start price found
characters 15.00 found for element price
end price found
start quantity found
characters 5 found for element quantity
end quantity found
end item found
end order found
end document found
Download Source Code
Well once again I am out of time. I hope that this may have helped some of you get a bit more familiar with XML. Next week we will
continue our coverage of XML where we will look at using SAX to parse more complex XML documents. Until then. Happy programming!
| Sponsored Links - please visit our sponsors |
| Java SMTP Component | | Easily add SMTP to your Java apps | | http://www.jscape.com/inetfactory/smtp.html |
| Secure FTP Applet | | Connect to FTP securely from within your browser. | | http://www.jscape.com/sftpapplet/ |
| Java POP3 Component | | Easily add POP3 to your Java apps | | http://www.jscape.com/inetfactory/smtp.html |
| FTP Applet | | Add file transfer capabilities to your web pages. | | http://www.jscape.com/ftpapplet/index.html |
Sponsor this site
| |