| « | November 2025 | » | | 日 | 一 | 二 | 三 | 四 | 五 | 六 | | | | | | | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | | | | | | | |
| 公告 |
戒除浮躁,读好书,交益友 |
| Blog信息 |
|
blog名称:邢红瑞的blog 日志总数:523 评论数量:1142 留言数量:0 访问次数:9733190 建立时间:2004年12月20日 |

| |
|
[java语言]Jaxp XPath实现的选择  原创空间, 软件技术, 电脑与网络
邢红瑞 发表于 2007/4/8 17:33:16 |
|
Xpath是针对DOM的,不是SAX。常用xml解析器java一般用JDOM,就是Java+DOM,或者Dom4J,这个速度很快,hibernate就是使用它,或者使用sun的Jaxp,这里只讨论Jaxp。使用专用Xpath库Jaxen,也是不错的选择,apache的CXF就使用它。XPathFactory的实现,META-INF\services\javax.xml.xpath.XPathFactory定义实现。也可以编程指定实现,newInstance方法说明,获取使用指定对象模型的新 XPathFactory 实例。为了查找 XPathFactory 对象,此方法按以下顺序查找以下位置,其中“类加载器”指上下文类加载器:如果存在系统属性 DEFAULT_PROPERTY_NAME + ":uri"(其中 uri 是此方法的参数),则其值作为类名称读取。该方法将试图通过使用类加载器创建此类的新实例,如果创建成功,则返回它。 读取 ${java.home}/lib/jaxp.properties,并查找与作为系统属性的键关联的值。如果存在,则按上面的方式处理该值。 类加载器要求服务提供者的提供者配置文件与资源目录 META-INF/services 中的 javax.xml.xpath.XPathFactory 匹配。有关文件格式和解析规则,请参阅 JAR File Specification。每个可能的服务提供者均要实现该方法: isObjectModelSupported(String objectModel) 返回支持指定对象模型的类加载器顺序中的第一个服务提供者。 以特定于平台的方式来定位平台默认的 XPathFactory。必须存在 W3C DOM 的平台默认 的 XPathFactory,即 DEFAULT_OBJECT_MODEL_URI。 如果这些都失败,则抛出 XPathFactoryConfigurationException。
疑难解答提示:
有关如何精确解析属性文件的信息,请参阅 Properties.load(java.io.InputStream)。尤其是,冒号 ':'在属性文件中需要转义,因此要确保 URI 在其中进行正确转义。例如:
http\://java.sun.com/jaxp/xpath/dom=org.acme.DomXPathFactory 参数:uri - 标识底层对象模型。规范只定义了 URI DEFAULT_OBJECT_MODEL_URI、用于 W3C DOM 的 http://java.sun.com/jaxp/xpath/dom、org.w3c.dom 包。至于其他对象模型,实现可随意引入其他 URI。 返回:XPathFactory 的实例。 抛出: XPathFactoryConfigurationException - 如果指定的对象模型不可用。 NullPointerException - 如果 uri 为 null。 IllegalArgumentException - 如果 uri 为 null 或 uri.length() == 0。例如 XPathFactory factory = XPathFactory.newInstance("http://jdom.org/jaxp/xpath/jdom");这里使用jdom的实现。XPath一般返回STRING, NUMBER, BOOLEAN, NODE和NODESET,返回string XPathFactory factory = XPathFactory.newInstance( ); XPath xPath = factory.newXPath( );
String result = xPath.evaluate("/schedule/@name", new InputSource( new FileReader("tds.xml"))); System.out.println(result);可以具体的处理属性 Attr result = (Attr) xPath.evaluate("/schedule/@name", new InputSource( new FileReader("tds.xml")), XPathConstants.NODE); System.out.println(result.getValue( ));返回NUMBER,注意这里是double,可以强制转为int Double result = (Double) xPath.evaluate("/schedule/@seriesId", new InputSource(new FileReader("tds.xml")), XPathConstants.NUMBER); System.out.println(result.intValue);经常用的是返回NODESET,就是org.w3c.dom.NodeList对象,NodeList shows = (NodeList) xPath.evaluate("/schedule/show", new InputSource(new FileReader("tds.xml")), XPathConstants.NODESET); System.out.println("Document has " + shows.getLength( ) + " shows."); for (int i = 0; i < shows.getLength( ); i++) { Element show = (Element) shows.item(i); String guestName = xPath.evaluate("guest/name/text( )", show); String guestCredit = xPath.evaluate("guest/credit/text( )", show);
System.out.println(show.getAttribute("weekday") + ", " + show.getAttribute("date") + " - " + guestName + " (" + guestCredit + ")"); }Jaxp的xpath实现可以使用Namespaces,好像Saxon不支持Namespaces,不大想具体确认了,有知道的告诉我,而且saxon的xpath速度明显慢于Jaxp,可能saxon支持xpath2,支持的功能多吧,如果不是使用xpath2和xquery,不建议使用saxon。Xpath支持变量先实现XPathVariableResolver接口import java.util.HashMap;
import javax.xml.namespace.QName;import javax.xml.xpath.XPathVariableResolver;
public class MapVariableResolver implements XPathVariableResolver {
private HashMap variables = new HashMap( );
public void addVariable(String namespaceURI, String localName, Object value) { addVariable(new QName(namespaceURI, localName), value); } public void addVariable(QName name, Object value) { variables.put(name, value); }
public Object resolveVariable(QName name) { Object retval = variables.get(name); return retval; }}import java.io.File;import java.io.IOException;import java.text.SimpleDateFormat;import java.util.Date;
import javax.xml.parsers.DocumentBuilder;import javax.xml.parsers.DocumentBuilderFactory;import javax.xml.parsers.ParserConfigurationException;import javax.xml.xpath.XPath;import javax.xml.xpath.XPathConstants;import javax.xml.xpath.XPathExpression;import javax.xml.xpath.XPathExpressionException;import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;import org.w3c.dom.Element;import org.xml.sax.SAXException;
public class GuestManager { private Document document;
private XPathExpression expression;
private MapVariableResolver resolver = new MapVariableResolver( );
private SimpleDateFormat xmlDateFormat = new SimpleDateFormat("MM.dd.yy");
public GuestManager(String fileName) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance( ); DocumentBuilder builder = dbf.newDocumentBuilder( ); document = builder.parse(new File(fileName));
XPathFactory factory = XPathFactory.newInstance( ); XPath xPath = factory.newXPath( ); xPath.setXPathVariableResolver(resolver); expression = xPath.compile("/schedule/show[@date=$date]/guest"); }
public synchronized Element getGuest(Date guestDate) throws XPathExpressionException { String formattedDate = xmlDateFormat.format(guestDate); resolver.addVariable(null, "date", formattedDate); return (Element) expression.evaluate(document, XPathConstants.NODE); } public static void main(String[] args) throws Exception { GuestManager gm = new GuestManager("tds.xml"); Element guest = gm.getGuest(new Date(2006, 5, 14)); System.out.println(guest.getElementsByTagName("name").item(0) .getTextContent( )); }
}getGuest( ) 方法需要同步,因为XPathExpression 类不是线程安全的,XPathVariableResolver不是,如果多个线程同时调用addVariable,执行的结果难以预测。Xpath也支持函数,但是你不能覆盖xpath内置的函数,创建函数类必须实现XPathFunctionResolver 接口,定义resolveFunction( )方法。 import java.io.FileReader;import java.util.List;
import javax.xml.namespace.QName;import javax.xml.xpath.XPath;import javax.xml.xpath.XPathConstants;import javax.xml.xpath.XPathFactory;import javax.xml.xpath.XPathFunction;import javax.xml.xpath.XPathFunctionException;import javax.xml.xpath.XPathFunctionResolver;
import org.w3c.dom.Element;import org.w3c.dom.NodeList;import org.xml.sax.InputSource;
class SampleFunction implements XPathFunction {
public Object evaluate(List args) throws XPathFunctionException { if (args.size( ) != 1) throw new XPathFunctionException("I need exactly one argument");
// args is a single guest node NodeList guestNodes = (NodeList) args.get(0); Element guest = (Element) guestNodes.item(0); NodeList nameNodes = guest.getElementsByTagNameNS("uri:comedy:guest", "name"); NodeList creditNodes = guest.getElementsByTagNameNS("uri:comedy:guest", "credit");
return evaluate(nameNodes, creditNodes);
}
private String evaluate(NodeList nameNodes, NodeList creditNodes) { return new String("I hope " + nameNodes.item(0).getTextContent( ) + " makes a good joke about being " + creditNodes.item(0).getTextContent( )); }
}
class SampleFunctionResolver implements XPathFunctionResolver {
public XPathFunction resolveFunction(QName functionName, int arity) { if ("uri:comedy:guest".equals(functionName.getNamespaceURI( )) && "joke".equals(functionName.getLocalPart( )) && (arity == 1)) { return new SampleFunction( ); } else return null; }
}
public class FunctionExample {
public static void main(String[] args) throws Exception { XPathFactory factory = XPathFactory.newInstance( ); XPath xPath = factory.newXPath( ); SimpleNamespaceContext nsContext = new SimpleNamespaceContext( ); xPath.setNamespaceContext(nsContext); nsContext.addNamespace("s", "uri:comedy:schedule"); nsContext.addNamespace("g", "uri:comedy:guest"); xPath.setXPathFunctionResolver(new SampleFunctionResolver( ));
NodeList shows = (NodeList) xPath.evaluate("/s:schedule/s:show", new InputSource(new FileReader("tds_ns.xml")), XPathConstants.NODESET); for (int i = 0; i < shows.getLength( ); i++) { Element show = (Element) shows.item(i);
String guestJoke = xPath.evaluate("g:joke(g:guest)", show); System.out .println(show.getAttribute("weekday") + " - " + guestJoke); } }
}Jaxp的Xpath不但使用麻烦,而且很少是线程安全的。XPathFactory 类不是线程安全的。换句话说,应用程序负责确保任意给定时刻最多只有一个线程使用 XPathFactory 对象。建议实现将方法标记为 synchronized 以保护客户端不至于崩溃。
XPathFactory 不可重入。当正在调用某个 newInstance 方法时,应用程序不能试图以递归方式调用 newInstance 方法,即使是从相同的线程调用。
XPathExpression不是线程安全的,也不能重入。换句话说,应用程序负责确保在任意给定时刻不能有多个线程使用一个 XPathExpression 对象,且当调用 evaluate 方法时,应用程序不能以递归方式调用 evaluate 方法。
XPath 对象不是线程安全的,也不能重入。换句话说,应用程序负责确保在任意给定时刻不能有多个线程使用一个 XPath 对象,且当调用 evaluate 方法时,应用程序不能以递归方式调用 evaluate 方法。
其实sun的JDK内置xalan的Xpath实现XPathAPI,这个类即使加入Saxon的jar,也不受影响,而且使用方便,安全。唯一的缺点就是慢,因为每次调用都创建一个新的XPathContext,新的DTMManager,新的DTM...,很多对象,如果提高速度,要使用低级的API进行precompile,不要用那个静态方法,用CachedXPathAPI是个不错的选择。但是也有新的问题,这个实现是有问题的,xml的节点越多,他的速度越慢,例如2000个Node,开始查询时,每个大约10ms,最后每个大约1s多,实现中有内存的泄露。打开xalan的XPathContext.java,你会笑掉大牙, protected DTMManager m_dtmManager = DTMManager.newInstance( org.apache.xpath.objects.XMLStringFactoryImpl.getFactory());作者应该是vc程序员。主要代码在这里 public int getDTMHandleFromNode(org.w3c.dom.Node node) { return m_dtmManager.getDTMHandleFromNode(node); }类DTMManager是个抽象类, public abstract int getDTMHandleFromNode(org.w3c.dom.Node node);具体实现DOM2DTM.java /** * Get the handle from a Node. * <p>%OPT% This will be pretty slow.</p> * * <p>%OPT% An XPath-like search (walk up DOM to root, tracking path; * walk down DTM reconstructing path) might be considerably faster * on later nodes in large documents. That might also imply improving * this call to handle nodes which would be in this DTM but * have not yet been built, which might or might not be a Good Thing.</p> * * %REVIEW% This relies on being able to test node-identity via * object-identity. DTM2DOM proxying is a great example of a case where * that doesn't work. DOM Level 3 will provide the isSameNode() method * to fix that, but until then this is going to be flaky. * * @param node A node, which may be null. * * @return The node handle or <code>DTM.NULL</code>. */ private int getHandleFromNode(Node node) { if (null != node) { int len = m_nodes.size(); boolean isMore; int i = 0; do { for (; i < len; i++) { if (m_nodes.elementAt(i) == node) return makeNodeHandle(i); }
isMore = nextNode(); len = m_nodes.size(); } while(isMore || i < len); } return DTM.NULL; }
/** Get the handle from a Node. This is a more robust version of * getHandleFromNode, intended to be usable by the public. * * <p>%OPT% This will be pretty slow.</p> * * %REVIEW% This relies on being able to test node-identity via * object-identity. DTM2DOM proxying is a great example of a case where * that doesn't work. DOM Level 3 will provide the isSameNode() method * to fix that, but until then this is going to be flaky. * * @param node A node, which may be null. * * @return The node handle or <code>DTM.NULL</code>. */ public int getHandleOfNode(Node node) { if (null != node) { // Is Node actually within the same document? If not, don't search! // This would be easier if m_root was always the Document node, but // we decided to allow wrapping a DTM around a subtree. if((m_root==node) || (m_root.getNodeType()==DOCUMENT_NODE && m_root==node.getOwnerDocument()) || (m_root.getNodeType()!=DOCUMENT_NODE && m_root.getOwnerDocument()==node.getOwnerDocument()) ) { // If node _is_ in m_root's tree, find its handle // // %OPT% This check may be improved significantly when DOM // Level 3 nodeKey and relative-order tests become // available! for(Node cursor=node; cursor!=null; cursor= (cursor.getNodeType()!=ATTRIBUTE_NODE) ? cursor.getParentNode() : ((org.w3c.dom.Attr)cursor).getOwnerElement()) { if(cursor==m_root) // We know this node; find its handle. return getHandleFromNode(node); } // for ancestors of node } // if node and m_root in same Document } // if node!=null
return DTM.NULL; }问题就在这里/** The node objects. The instance part of the handle indexes * directly into this vector. Each DTM node may actually be * composed of several DOM nodes (for example, if logically-adjacent * Text/CDATASection nodes in the DOM have been coalesced into a * single DTM Text node); this table points only to the first in * that sequence. */ protected Vector m_nodes = new Vector();不信大家使用这段代码运行一下import com.sun.org.apache.xml.internal.utils.PrefixResolverDefault;import com.sun.org.apache.xpath.internal.XPath;import com.sun.org.apache.xpath.internal.XPathContext;import org.w3c.dom.Document;import org.w3c.dom.Element;import org.w3c.dom.NodeList;
import javax.xml.parsers.DocumentBuilderFactory;
public class TestXpath { public static void main(String[] argv) throws Exception { int numChilds = 100000 + 1;
System.out.println("Building a document with " + numChilds + "childs"); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument(); Element root = doc.createElement("root"); doc.appendChild(root); for (int i = 0; i < numChilds; i++) { Element child = doc.createElement("child"); root.appendChild(child); Element subChild = doc.createElement("sub-child"); child.appendChild(subChild); Element subSubChild = doc.createElement("sub-sub-child"); subChild.appendChild(subSubChild); subSubChild.setAttribute("title", "title" + i); }
XPathContext xpathSupport = new XPathContext(); PrefixResolverDefault prefixResolver = new PrefixResolverDefault(doc); XPath titleXpath = new XPath("sub-child/sub-sub-child/@title", null, prefixResolver, XPath.SELECT, null); Runtime r = Runtime.getRuntime();
System.out.println("Evaluating XPath for each " + numChilds + " childs"); NodeList nodeList = root.getChildNodes(); int size = nodeList.getLength(); for (int i = 0; i < size; i++) { long start = System.currentTimeMillis(); Element child = (Element) nodeList.item(i); int ctxtNode = xpathSupport.getDTMHandleFromNode(child); long duration = System.currentTimeMillis() - start; if (i < 10 || (i % (numChilds / 10)) == 0) System.out.println("child #" + i + "\t took " + duration + " ms." + "\tfreeMemory: " + r.freeMemory() + "\ttotalMemory: " + r.totalMemory()); else if (i == 10) System.out.println("printing some selected childs onlyfrom now on..."); } }} |
|
|