邢红瑞的blog--Jaxp XPath实现的选择

本站首页管理页面写新日志退出

« November 2025 »
日一二三四五六
1
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30

公告

戒除浮躁，读好书，交益友

我的分类（专题）

首页(523)
生活杂事(38)
脚本语言(15)
template engine(3)
opensource(4)
数据库(23)
c++(68)
linux kernel(20)
jvm(22)
java语言(118)
web开发(1)
开发工具(35)
算法与数据结构(0)
orm(4)
linux(37)
软件项目管理(15)
j2ee(67)
编程感想(45)
PKI(7)
UTM(16)
rootkit(9)
concurrent(0)
multicore(0)
WAF(2)

日志更新

ubuntu下安装vmware
ubuntu删除vmware
nginx配置ssl
半价售书限北京
2012年的计划
centos安装LiHei Pro字体
fedora 15 root不能登陆修
secrt在实现vim彩色显示
vc9编译openvpn2.2.1
如何调试nginx

留言板

签写新留言

求助
mysql5.0.45客户端登陆hang
关于jdk本地代码
哈哈，看来国内的产权保护意识越来越浓了，

链接

尚老大的blog
cyt
黑夜路人的开源世界
庄周梦蝶
熔岩
 成都心情
 龙居
 mmwy
jackyrong
猩猩的空间
 他山之石可以攻玉
 坏男孩
 上善若水
 杨中科
 蛟龍居
 周波的Blog
小明思考
 sysnap

Blog信息

blog名称:邢红瑞的blog
日志总数:523
评论数量:1142
留言数量:0
访问次数:9733190
建立时间:2004年12月20日

[java语言]Jaxp XPath实现的选择　
原创空间, 软件技术, 电脑与网络

邢红瑞发表于 2007/4/8 17:33:16

Xpath是针对DOM的，不是SAX。常用xml解析器java一般用JDOM，就是Java+DOM，或者Dom4J，这个速度很快，hibernate就是使用它，或者使用sun的Jaxp，这里只讨论Jaxp。使用专用Xpath库Jaxen，也是不错的选择，apache的CXF就使用它。XPathFactory的实现，META-INF\services\javax.xml.xpath.XPathFactory定义实现。也可以编程指定实现，newInstance方法说明，获取使用指定对象模型的新 XPathFactory 实例。为了查找 XPathFactory 对象，此方法按以下顺序查找以下位置，其中“类加载器”指上下文类加载器：如果存在系统属性 DEFAULT_PROPERTY_NAME + ":uri"（其中 uri 是此方法的参数），则其值作为类名称读取。该方法将试图通过使用类加载器创建此类的新实例，如果创建成功，则返回它。读取 ${java.home}/lib/jaxp.properties，并查找与作为系统属性的键关联的值。如果存在，则按上面的方式处理该值。类加载器要求服务提供者的提供者配置文件与资源目录 META-INF/services 中的 javax.xml.xpath.XPathFactory 匹配。有关文件格式和解析规则，请参阅 JAR File Specification。每个可能的服务提供者均要实现该方法： isObjectModelSupported(String objectModel) 返回支持指定对象模型的类加载器顺序中的第一个服务提供者。以特定于平台的方式来定位平台默认的 XPathFactory。必须存在 W3C DOM 的平台默认的 XPathFactory，即 DEFAULT_OBJECT_MODEL_URI。如果这些都失败，则抛出 XPathFactoryConfigurationException。疑难解答提示：有关如何精确解析属性文件的信息，请参阅 Properties.load(java.io.InputStream)。尤其是，冒号 ':'在属性文件中需要转义，因此要确保 URI 在其中进行正确转义。例如： http\://java.sun.com/jaxp/xpath/dom=org.acme.DomXPathFactory 参数：uri - 标识底层对象模型。规范只定义了 URI DEFAULT_OBJECT_MODEL_URI、用于 W3C DOM 的 http://java.sun.com/jaxp/xpath/dom、org.w3c.dom 包。至于其他对象模型，实现可随意引入其他 URI。返回：XPathFactory 的实例。抛出： XPathFactoryConfigurationException - 如果指定的对象模型不可用。 NullPointerException - 如果 uri 为 null。 IllegalArgumentException - 如果 uri 为 null 或 uri.length() == 0。例如 XPathFactory factory = XPathFactory.newInstance("http://jdom.org/jaxp/xpath/jdom");这里使用jdom的实现。XPath一般返回STRING, NUMBER, BOOLEAN, NODE和NODESET,返回string XPathFactory factory = XPathFactory.newInstance( ); XPath xPath = factory.newXPath( ); String result = xPath.evaluate("/schedule/@name", new InputSource( new FileReader("tds.xml"))); System.out.println(result);可以具体的处理属性 Attr result = (Attr) xPath.evaluate("/schedule/@name", new InputSource( new FileReader("tds.xml")), XPathConstants.NODE); System.out.println(result.getValue( ));返回NUMBER，注意这里是double，可以强制转为int Double result = (Double) xPath.evaluate("/schedule/@seriesId", new InputSource(new FileReader("tds.xml")), XPathConstants.NUMBER); System.out.println(result.intValue);经常用的是返回NODESET，就是org.w3c.dom.NodeList对象，NodeList shows = (NodeList) xPath.evaluate("/schedule/show", new InputSource(new FileReader("tds.xml")), XPathConstants.NODESET); System.out.println("Document has " + shows.getLength( ) + " shows."); for (int i = 0; i < shows.getLength( ); i++) { Element show = (Element) shows.item(i); String guestName = xPath.evaluate("guest/name/text( )", show); String guestCredit = xPath.evaluate("guest/credit/text( )", show); System.out.println(show.getAttribute("weekday") + ", " + show.getAttribute("date") + " - " + guestName + " (" + guestCredit + ")"); }Jaxp的xpath实现可以使用Namespaces，好像Saxon不支持Namespaces，不大想具体确认了，有知道的告诉我，而且saxon的xpath速度明显慢于Jaxp，可能saxon支持xpath2，支持的功能多吧，如果不是使用xpath2和xquery，不建议使用saxon。Xpath支持变量先实现XPathVariableResolver接口import java.util.HashMap; import javax.xml.namespace.QName;import javax.xml.xpath.XPathVariableResolver; public class MapVariableResolver implements XPathVariableResolver { private HashMap variables = new HashMap( ); public void addVariable(String namespaceURI, String localName, Object value) { addVariable(new QName(namespaceURI, localName), value); } public void addVariable(QName name, Object value) { variables.put(name, value); } public Object resolveVariable(QName name) { Object retval = variables.get(name); return retval; }}import java.io.File;import java.io.IOException;import java.text.SimpleDateFormat;import java.util.Date; import javax.xml.parsers.DocumentBuilder;import javax.xml.parsers.DocumentBuilderFactory;import javax.xml.parsers.ParserConfigurationException;import javax.xml.xpath.XPath;import javax.xml.xpath.XPathConstants;import javax.xml.xpath.XPathExpression;import javax.xml.xpath.XPathExpressionException;import javax.xml.xpath.XPathFactory; import org.w3c.dom.Document;import org.w3c.dom.Element;import org.xml.sax.SAXException; public class GuestManager { private Document document; private XPathExpression expression; private MapVariableResolver resolver = new MapVariableResolver( ); private SimpleDateFormat xmlDateFormat = new SimpleDateFormat("MM.dd.yy"); public GuestManager(String fileName) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance( ); DocumentBuilder builder = dbf.newDocumentBuilder( ); document = builder.parse(new File(fileName)); XPathFactory factory = XPathFactory.newInstance( ); XPath xPath = factory.newXPath( ); xPath.setXPathVariableResolver(resolver); expression = xPath.compile("/schedule/show[@date=$date]/guest"); } public synchronized Element getGuest(Date guestDate) throws XPathExpressionException { String formattedDate = xmlDateFormat.format(guestDate); resolver.addVariable(null, "date", formattedDate); return (Element) expression.evaluate(document, XPathConstants.NODE); } public static void main(String[] args) throws Exception { GuestManager gm = new GuestManager("tds.xml"); Element guest = gm.getGuest(new Date(2006, 5, 14)); System.out.println(guest.getElementsByTagName("name").item(0) .getTextContent( )); } }getGuest( ) 方法需要同步，因为XPathExpression 类不是线程安全的，XPathVariableResolver不是，如果多个线程同时调用addVariable，执行的结果难以预测。Xpath也支持函数，但是你不能覆盖xpath内置的函数，创建函数类必须实现XPathFunctionResolver 接口，定义resolveFunction( )方法。 import java.io.FileReader;import java.util.List; import javax.xml.namespace.QName;import javax.xml.xpath.XPath;import javax.xml.xpath.XPathConstants;import javax.xml.xpath.XPathFactory;import javax.xml.xpath.XPathFunction;import javax.xml.xpath.XPathFunctionException;import javax.xml.xpath.XPathFunctionResolver; import org.w3c.dom.Element;import org.w3c.dom.NodeList;import org.xml.sax.InputSource; class SampleFunction implements XPathFunction { public Object evaluate(List args) throws XPathFunctionException { if (args.size( ) != 1) throw new XPathFunctionException("I need exactly one argument"); // args is a single guest node NodeList guestNodes = (NodeList) args.get(0); Element guest = (Element) guestNodes.item(0); NodeList nameNodes = guest.getElementsByTagNameNS("uri:comedy:guest", "name"); NodeList creditNodes = guest.getElementsByTagNameNS("uri:comedy:guest", "credit"); return evaluate(nameNodes, creditNodes); } private String evaluate(NodeList nameNodes, NodeList creditNodes) { return new String("I hope " + nameNodes.item(0).getTextContent( ) + " makes a good joke about being " + creditNodes.item(0).getTextContent( )); } } class SampleFunctionResolver implements XPathFunctionResolver { public XPathFunction resolveFunction(QName functionName, int arity) { if ("uri:comedy:guest".equals(functionName.getNamespaceURI( )) && "joke".equals(functionName.getLocalPart( )) && (arity == 1)) { return new SampleFunction( ); } else return null; } } public class FunctionExample { public static void main(String[] args) throws Exception { XPathFactory factory = XPathFactory.newInstance( ); XPath xPath = factory.newXPath( ); SimpleNamespaceContext nsContext = new SimpleNamespaceContext( ); xPath.setNamespaceContext(nsContext); nsContext.addNamespace("s", "uri:comedy:schedule"); nsContext.addNamespace("g", "uri:comedy:guest"); xPath.setXPathFunctionResolver(new SampleFunctionResolver( )); NodeList shows = (NodeList) xPath.evaluate("/s:schedule/s:show", new InputSource(new FileReader("tds_ns.xml")), XPathConstants.NODESET); for (int i = 0; i < shows.getLength( ); i++) { Element show = (Element) shows.item(i); String guestJoke = xPath.evaluate("g:joke(g:guest)", show); System.out .println(show.getAttribute("weekday") + " - " + guestJoke); } } }Jaxp的Xpath不但使用麻烦，而且很少是线程安全的。XPathFactory 类不是线程安全的。换句话说，应用程序负责确保任意给定时刻最多只有一个线程使用 XPathFactory 对象。建议实现将方法标记为 synchronized 以保护客户端不至于崩溃。 XPathFactory 不可重入。当正在调用某个 newInstance 方法时，应用程序不能试图以递归方式调用 newInstance 方法，即使是从相同的线程调用。 XPathExpression不是线程安全的，也不能重入。换句话说，应用程序负责确保在任意给定时刻不能有多个线程使用一个 XPathExpression 对象，且当调用 evaluate 方法时，应用程序不能以递归方式调用 evaluate 方法。 XPath 对象不是线程安全的，也不能重入。换句话说，应用程序负责确保在任意给定时刻不能有多个线程使用一个 XPath 对象，且当调用 evaluate 方法时，应用程序不能以递归方式调用 evaluate 方法。其实sun的JDK内置xalan的Xpath实现XPathAPI，这个类即使加入Saxon的jar，也不受影响，而且使用方便，安全。唯一的缺点就是慢，因为每次调用都创建一个新的XPathContext，新的DTMManager，新的DTM...,很多对象，如果提高速度，要使用低级的API进行precompile，不要用那个静态方法，用CachedXPathAPI是个不错的选择。但是也有新的问题，这个实现是有问题的，xml的节点越多，他的速度越慢，例如2000个Node，开始查询时，每个大约10ms，最后每个大约1s多，实现中有内存的泄露。打开xalan的XPathContext.java,你会笑掉大牙， protected DTMManager m_dtmManager = DTMManager.newInstance( org.apache.xpath.objects.XMLStringFactoryImpl.getFactory());作者应该是vc程序员。主要代码在这里 public int getDTMHandleFromNode(org.w3c.dom.Node node) { return m_dtmManager.getDTMHandleFromNode(node); }类DTMManager是个抽象类， public abstract int getDTMHandleFromNode(org.w3c.dom.Node node);具体实现DOM2DTM.java /** * Get the handle from a Node. * %OPT% This will be pretty slow. * * %OPT% An XPath-like search (walk up DOM to root, tracking path; * walk down DTM reconstructing path) might be considerably faster * on later nodes in large documents. That might also imply improving * this call to handle nodes which would be in this DTM but * have not yet been built, which might or might not be a Good Thing. * * %REVIEW% This relies on being able to test node-identity via * object-identity. DTM2DOM proxying is a great example of a case where * that doesn't work. DOM Level 3 will provide the isSameNode() method * to fix that, but until then this is going to be flaky. * * @param node A node, which may be null. * * @return The node handle or <code>DTM.NULL</code>. */ private int getHandleFromNode(Node node) { if (null != node) { int len = m_nodes.size(); boolean isMore; int i = 0; do { for (; i < len; i++) { if (m_nodes.elementAt(i) == node) return makeNodeHandle(i); } isMore = nextNode(); len = m_nodes.size(); } while(isMore || i < len); } return DTM.NULL; } /** Get the handle from a Node. This is a more robust version of * getHandleFromNode, intended to be usable by the public. * * %OPT% This will be pretty slow. * * %REVIEW% This relies on being able to test node-identity via * object-identity. DTM2DOM proxying is a great example of a case where * that doesn't work. DOM Level 3 will provide the isSameNode() method * to fix that, but until then this is going to be flaky. * * @param node A node, which may be null. * * @return The node handle or <code>DTM.NULL</code>. */ public int getHandleOfNode(Node node) { if (null != node) { // Is Node actually within the same document? If not, don't search! // This would be easier if m_root was always the Document node, but // we decided to allow wrapping a DTM around a subtree. if((m_root==node) || (m_root.getNodeType()==DOCUMENT_NODE && m_root==node.getOwnerDocument()) || (m_root.getNodeType()!=DOCUMENT_NODE && m_root.getOwnerDocument()==node.getOwnerDocument()) ) { // If node _is_ in m_root's tree, find its handle // // %OPT% This check may be improved significantly when DOM // Level 3 nodeKey and relative-order tests become // available! for(Node cursor=node; cursor!=null; cursor= (cursor.getNodeType()!=ATTRIBUTE_NODE) ? cursor.getParentNode() : ((org.w3c.dom.Attr)cursor).getOwnerElement()) { if(cursor==m_root) // We know this node; find its handle. return getHandleFromNode(node); } // for ancestors of node } // if node and m_root in same Document } // if node!=null return DTM.NULL; }问题就在这里/** The node objects. The instance part of the handle indexes * directly into this vector. Each DTM node may actually be * composed of several DOM nodes (for example, if logically-adjacent * Text/CDATASection nodes in the DOM have been coalesced into a * single DTM Text node); this table points only to the first in * that sequence. */ protected Vector m_nodes = new Vector();不信大家使用这段代码运行一下import com.sun.org.apache.xml.internal.utils.PrefixResolverDefault;import com.sun.org.apache.xpath.internal.XPath;import com.sun.org.apache.xpath.internal.XPathContext;import org.w3c.dom.Document;import org.w3c.dom.Element;import org.w3c.dom.NodeList; import javax.xml.parsers.DocumentBuilderFactory; public class TestXpath { public static void main(String[] argv) throws Exception { int numChilds = 100000 + 1; System.out.println("Building a document with " + numChilds + "childs"); Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument(); Element root = doc.createElement("root"); doc.appendChild(root); for (int i = 0; i < numChilds; i++) { Element child = doc.createElement("child"); root.appendChild(child); Element subChild = doc.createElement("sub-child"); child.appendChild(subChild); Element subSubChild = doc.createElement("sub-sub-child"); subChild.appendChild(subSubChild); subSubChild.setAttribute("title", "title" + i); } XPathContext xpathSupport = new XPathContext(); PrefixResolverDefault prefixResolver = new PrefixResolverDefault(doc); XPath titleXpath = new XPath("sub-child/sub-sub-child/@title", null, prefixResolver, XPath.SELECT, null); Runtime r = Runtime.getRuntime(); System.out.println("Evaluating XPath for each " + numChilds + " childs"); NodeList nodeList = root.getChildNodes(); int size = nodeList.getLength(); for (int i = 0; i < size; i++) { long start = System.currentTimeMillis(); Element child = (Element) nodeList.item(i); int ctxtNode = xpathSupport.getDTMHandleFromNode(child); long duration = System.currentTimeMillis() - start; if (i < 10 || (i % (numChilds / 10)) == 0) System.out.println("child #" + i + "\t took " + duration + " ms." + "\tfreeMemory: " + r.freeMemory() + "\ttotalMemory: " + r.totalMemory()); else if (i == 10) System.out.println("printing some selected childs onlyfrom now on..."); } }}

阅读全文(11161) | 回复(0) | 编辑 | 精华

发表评论：

昵称：
密码：
主页：
标题：

验证码： (不区分大小写,请仔细填写,输错需重写评论内容！)

站点首页 | 联系我们 | 博客注册 | 博客登陆

Sponsored By W3CHINA
W3CHINA Blog 0.8 Processed in 0.062 second(s), page refreshed 144801610 times.
《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
苏ICP备05006046号