本站首页    管理页面    写新日志    退出


«November 2025»
1
2345678
9101112131415
16171819202122
23242526272829
30


公告

戒除浮躁,读好书,交益友


我的分类(专题)

日志更新

最新评论

留言板

链接

Blog信息
blog名称:邢红瑞的blog
日志总数:523
评论数量:1142
留言数量:0
访问次数:9733190
建立时间:2004年12月20日




[java语言]Jaxp XPath实现的选择 
原创空间,  软件技术,  电脑与网络

邢红瑞 发表于 2007/4/8 17:33:16

Xpath是针对DOM的,不是SAX。常用xml解析器java一般用JDOM,就是Java+DOM,或者Dom4J,这个速度很快,hibernate就是使用它,或者使用sun的Jaxp,这里只讨论Jaxp。使用专用Xpath库Jaxen,也是不错的选择,apache的CXF就使用它。XPathFactory的实现,META-INF\services\javax.xml.xpath.XPathFactory定义实现。也可以编程指定实现,newInstance方法说明,获取使用指定对象模型的新 XPathFactory 实例。为了查找 XPathFactory 对象,此方法按以下顺序查找以下位置,其中“类加载器”指上下文类加载器:如果存在系统属性 DEFAULT_PROPERTY_NAME + ":uri"(其中 uri 是此方法的参数),则其值作为类名称读取。该方法将试图通过使用类加载器创建此类的新实例,如果创建成功,则返回它。 读取 ${java.home}/lib/jaxp.properties,并查找与作为系统属性的键关联的值。如果存在,则按上面的方式处理该值。 类加载器要求服务提供者的提供者配置文件与资源目录 META-INF/services 中的 javax.xml.xpath.XPathFactory 匹配。有关文件格式和解析规则,请参阅 JAR File Specification。每个可能的服务提供者均要实现该方法:        isObjectModelSupported(String objectModel)     返回支持指定对象模型的类加载器顺序中的第一个服务提供者。 以特定于平台的方式来定位平台默认的 XPathFactory。必须存在 W3C DOM 的平台默认 的 XPathFactory,即 DEFAULT_OBJECT_MODEL_URI。 如果这些都失败,则抛出 XPathFactoryConfigurationException。 疑难解答提示: 有关如何精确解析属性文件的信息,请参阅 Properties.load(java.io.InputStream)。尤其是,冒号 ':'在属性文件中需要转义,因此要确保 URI 在其中进行正确转义。例如:    http\://java.sun.com/jaxp/xpath/dom=org.acme.DomXPathFactory 参数:uri - 标识底层对象模型。规范只定义了 URI DEFAULT_OBJECT_MODEL_URI、用于 W3C DOM 的 http://java.sun.com/jaxp/xpath/dom、org.w3c.dom 包。至于其他对象模型,实现可随意引入其他 URI。 返回:XPathFactory 的实例。 抛出: XPathFactoryConfigurationException - 如果指定的对象模型不可用。 NullPointerException - 如果 uri 为 null。 IllegalArgumentException - 如果 uri 为 null 或 uri.length() == 0。例如  XPathFactory factory =   XPathFactory.newInstance("http://jdom.org/jaxp/xpath/jdom");这里使用jdom的实现。XPath一般返回STRING, NUMBER, BOOLEAN, NODE和NODESET,返回string    XPathFactory factory = XPathFactory.newInstance(  );        XPath xPath = factory.newXPath(  );         String result = xPath.evaluate("/schedule/@name", new InputSource(                new FileReader("tds.xml")));        System.out.println(result);可以具体的处理属性    Attr result = (Attr) xPath.evaluate("/schedule/@name", new InputSource(                new FileReader("tds.xml")), XPathConstants.NODE);        System.out.println(result.getValue(  ));返回NUMBER,注意这里是double,可以强制转为int Double result = (Double) xPath.evaluate("/schedule/@seriesId",                new InputSource(new FileReader("tds.xml")),                XPathConstants.NUMBER);        System.out.println(result.intValue);经常用的是返回NODESET,就是org.w3c.dom.NodeList对象,NodeList shows = (NodeList) xPath.evaluate("/schedule/show",                new InputSource(new FileReader("tds.xml")),                XPathConstants.NODESET);        System.out.println("Document has " + shows.getLength(  ) + " shows.");  for (int i = 0; i < shows.getLength(  ); i++) {            Element show = (Element) shows.item(i);            String guestName = xPath.evaluate("guest/name/text(  )", show);            String guestCredit = xPath.evaluate("guest/credit/text(  )", show);             System.out.println(show.getAttribute("weekday") + ", "                    + show.getAttribute("date") + " - " + guestName + " ("                    + guestCredit + ")");        }Jaxp的xpath实现可以使用Namespaces,好像Saxon不支持Namespaces,不大想具体确认了,有知道的告诉我,而且saxon的xpath速度明显慢于Jaxp,可能saxon支持xpath2,支持的功能多吧,如果不是使用xpath2和xquery,不建议使用saxon。Xpath支持变量先实现XPathVariableResolver接口import java.util.HashMap; import javax.xml.namespace.QName;import javax.xml.xpath.XPathVariableResolver; public class MapVariableResolver implements XPathVariableResolver {     private HashMap variables = new HashMap(  );     public void addVariable(String namespaceURI, String localName, Object value) {       addVariable(new QName(namespaceURI, localName), value);    }        public void addVariable(QName name, Object value) {        variables.put(name, value);    }     public Object resolveVariable(QName name) {        Object retval = variables.get(name);        return retval;    }}import java.io.File;import java.io.IOException;import java.text.SimpleDateFormat;import java.util.Date; import javax.xml.parsers.DocumentBuilder;import javax.xml.parsers.DocumentBuilderFactory;import javax.xml.parsers.ParserConfigurationException;import javax.xml.xpath.XPath;import javax.xml.xpath.XPathConstants;import javax.xml.xpath.XPathExpression;import javax.xml.xpath.XPathExpressionException;import javax.xml.xpath.XPathFactory; import org.w3c.dom.Document;import org.w3c.dom.Element;import org.xml.sax.SAXException; public class GuestManager {    private Document document;     private XPathExpression expression;     private MapVariableResolver resolver = new MapVariableResolver(  );     private SimpleDateFormat xmlDateFormat = new SimpleDateFormat("MM.dd.yy");     public GuestManager(String fileName) throws ParserConfigurationException,            SAXException, IOException, XPathExpressionException {        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(  );        DocumentBuilder builder = dbf.newDocumentBuilder(  );        document = builder.parse(new File(fileName));         XPathFactory factory = XPathFactory.newInstance(  );        XPath xPath = factory.newXPath(  );        xPath.setXPathVariableResolver(resolver);        expression = xPath.compile("/schedule/show[@date=$date]/guest");    }     public synchronized Element getGuest(Date guestDate)            throws XPathExpressionException {        String formattedDate = xmlDateFormat.format(guestDate);        resolver.addVariable(null, "date", formattedDate);        return (Element) expression.evaluate(document, XPathConstants.NODE);    }    public static void main(String[] args) throws Exception {        GuestManager gm = new GuestManager("tds.xml");        Element guest = gm.getGuest(new Date(2006, 5, 14));        System.out.println(guest.getElementsByTagName("name").item(0)                .getTextContent(  ));    } }getGuest( ) 方法需要同步,因为XPathExpression 类不是线程安全的,XPathVariableResolver不是,如果多个线程同时调用addVariable,执行的结果难以预测。Xpath也支持函数,但是你不能覆盖xpath内置的函数,创建函数类必须实现XPathFunctionResolver 接口,定义resolveFunction( )方法。 import java.io.FileReader;import java.util.List; import javax.xml.namespace.QName;import javax.xml.xpath.XPath;import javax.xml.xpath.XPathConstants;import javax.xml.xpath.XPathFactory;import javax.xml.xpath.XPathFunction;import javax.xml.xpath.XPathFunctionException;import javax.xml.xpath.XPathFunctionResolver; import org.w3c.dom.Element;import org.w3c.dom.NodeList;import org.xml.sax.InputSource; class SampleFunction implements XPathFunction {     public Object evaluate(List args) throws XPathFunctionException {        if (args.size(  ) != 1)            throw new XPathFunctionException("I need exactly one argument");         // args is a single guest node        NodeList guestNodes = (NodeList) args.get(0);        Element guest = (Element) guestNodes.item(0);        NodeList nameNodes = guest.getElementsByTagNameNS("uri:comedy:guest",                "name");        NodeList creditNodes = guest.getElementsByTagNameNS("uri:comedy:guest",                "credit");         return evaluate(nameNodes, creditNodes);     }     private String evaluate(NodeList nameNodes, NodeList creditNodes) {        return new String("I hope " + nameNodes.item(0).getTextContent(  )                + " makes a good joke about being "                + creditNodes.item(0).getTextContent(  ));    } } class SampleFunctionResolver implements XPathFunctionResolver {     public XPathFunction resolveFunction(QName functionName, int arity) {        if ("uri:comedy:guest".equals(functionName.getNamespaceURI(  ))                && "joke".equals(functionName.getLocalPart(  )) && (arity == 1)) {            return new SampleFunction(  );        } else            return null;    } } public class FunctionExample {     public static void main(String[] args) throws Exception {        XPathFactory factory = XPathFactory.newInstance(  );        XPath xPath = factory.newXPath(  );        SimpleNamespaceContext nsContext = new SimpleNamespaceContext(  );        xPath.setNamespaceContext(nsContext);        nsContext.addNamespace("s", "uri:comedy:schedule");        nsContext.addNamespace("g", "uri:comedy:guest");        xPath.setXPathFunctionResolver(new SampleFunctionResolver(  ));         NodeList shows = (NodeList) xPath.evaluate("/s:schedule/s:show",                new InputSource(new FileReader("tds_ns.xml")),                XPathConstants.NODESET);        for (int i = 0; i < shows.getLength(  ); i++) {            Element show = (Element) shows.item(i);             String guestJoke = xPath.evaluate("g:joke(g:guest)", show);            System.out                    .println(show.getAttribute("weekday") + " - " + guestJoke);        }    } }Jaxp的Xpath不但使用麻烦,而且很少是线程安全的。XPathFactory 类不是线程安全的。换句话说,应用程序负责确保任意给定时刻最多只有一个线程使用 XPathFactory 对象。建议实现将方法标记为 synchronized 以保护客户端不至于崩溃。 XPathFactory 不可重入。当正在调用某个 newInstance 方法时,应用程序不能试图以递归方式调用 newInstance 方法,即使是从相同的线程调用。 XPathExpression不是线程安全的,也不能重入。换句话说,应用程序负责确保在任意给定时刻不能有多个线程使用一个 XPathExpression 对象,且当调用 evaluate 方法时,应用程序不能以递归方式调用 evaluate 方法。 XPath 对象不是线程安全的,也不能重入。换句话说,应用程序负责确保在任意给定时刻不能有多个线程使用一个 XPath 对象,且当调用 evaluate 方法时,应用程序不能以递归方式调用 evaluate 方法。 其实sun的JDK内置xalan的Xpath实现XPathAPI,这个类即使加入Saxon的jar,也不受影响,而且使用方便,安全。唯一的缺点就是慢,因为每次调用都创建一个新的XPathContext,新的DTMManager,新的DTM...,很多对象,如果提高速度,要使用低级的API进行precompile,不要用那个静态方法,用CachedXPathAPI是个不错的选择。但是也有新的问题,这个实现是有问题的,xml的节点越多,他的速度越慢,例如2000个Node,开始查询时,每个大约10ms,最后每个大约1s多,实现中有内存的泄露。打开xalan的XPathContext.java,你会笑掉大牙, protected DTMManager m_dtmManager = DTMManager.newInstance(                   org.apache.xpath.objects.XMLStringFactoryImpl.getFactory());作者应该是vc程序员。主要代码在这里 public int getDTMHandleFromNode(org.w3c.dom.Node node)  {    return m_dtmManager.getDTMHandleFromNode(node);  }类DTMManager是个抽象类, public abstract int getDTMHandleFromNode(org.w3c.dom.Node node);具体实现DOM2DTM.java /**   * Get the handle from a Node.   * <p>%OPT% This will be pretty slow.</p>   *   * <p>%OPT% An XPath-like search (walk up DOM to root, tracking path;   * walk down DTM reconstructing path) might be considerably faster   * on later nodes in large documents. That might also imply improving   * this call to handle nodes which would be in this DTM but   * have not yet been built, which might or might not be a Good Thing.</p>   *    * %REVIEW% This relies on being able to test node-identity via   * object-identity. DTM2DOM proxying is a great example of a case where   * that doesn't work. DOM Level 3 will provide the isSameNode() method   * to fix that, but until then this is going to be flaky.   *   * @param node A node, which may be null.   *   * @return The node handle or <code>DTM.NULL</code>.   */  private int getHandleFromNode(Node node)  {    if (null != node)    {      int len = m_nodes.size();              boolean isMore;      int i = 0;      do      {                  for (; i < len; i++)        {          if (m_nodes.elementAt(i) == node)            return makeNodeHandle(i);        }         isMore = nextNode();          len = m_nodes.size();                  }       while(isMore || i < len);    }        return DTM.NULL;  }   /** Get the handle from a Node. This is a more robust version of   * getHandleFromNode, intended to be usable by the public.   *   * <p>%OPT% This will be pretty slow.</p>   *    * %REVIEW% This relies on being able to test node-identity via   * object-identity. DTM2DOM proxying is a great example of a case where   * that doesn't work. DOM Level 3 will provide the isSameNode() method   * to fix that, but until then this is going to be flaky.   *   * @param node A node, which may be null.   *   * @return The node handle or <code>DTM.NULL</code>.  */  public int getHandleOfNode(Node node)  {    if (null != node)    {      // Is Node actually within the same document? If not, don't search!      // This would be easier if m_root was always the Document node, but      // we decided to allow wrapping a DTM around a subtree.      if((m_root==node) ||         (m_root.getNodeType()==DOCUMENT_NODE &&          m_root==node.getOwnerDocument()) ||         (m_root.getNodeType()!=DOCUMENT_NODE &&          m_root.getOwnerDocument()==node.getOwnerDocument())         )        {          // If node _is_ in m_root's tree, find its handle          //          // %OPT% This check may be improved significantly when DOM          // Level 3 nodeKey and relative-order tests become          // available!          for(Node cursor=node;              cursor!=null;              cursor=                (cursor.getNodeType()!=ATTRIBUTE_NODE)                ? cursor.getParentNode()                : ((org.w3c.dom.Attr)cursor).getOwnerElement())            {              if(cursor==m_root)                // We know this node; find its handle.                return getHandleFromNode(node);             } // for ancestors of node        } // if node and m_root in same Document    } // if node!=null     return DTM.NULL;  }问题就在这里/** The node objects.  The instance part of the handle indexes   * directly into this vector.  Each DTM node may actually be   * composed of several DOM nodes (for example, if logically-adjacent   * Text/CDATASection nodes in the DOM have been coalesced into a   * single DTM Text node); this table points only to the first in   * that sequence. */  protected Vector m_nodes = new Vector();不信大家使用这段代码运行一下import com.sun.org.apache.xml.internal.utils.PrefixResolverDefault;import com.sun.org.apache.xpath.internal.XPath;import com.sun.org.apache.xpath.internal.XPathContext;import org.w3c.dom.Document;import org.w3c.dom.Element;import org.w3c.dom.NodeList; import javax.xml.parsers.DocumentBuilderFactory; public class TestXpath {    public static void main(String[] argv) throws Exception {        int numChilds = 100000 + 1;         System.out.println("Building a document with " + numChilds + "childs");        Document doc =                DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();        Element root = doc.createElement("root");        doc.appendChild(root);        for (int i = 0; i < numChilds; i++) {            Element child = doc.createElement("child");            root.appendChild(child);            Element subChild = doc.createElement("sub-child");            child.appendChild(subChild);            Element subSubChild = doc.createElement("sub-sub-child");            subChild.appendChild(subSubChild);            subSubChild.setAttribute("title", "title" + i);        }         XPathContext xpathSupport = new XPathContext();        PrefixResolverDefault prefixResolver = new                PrefixResolverDefault(doc);        XPath titleXpath = new XPath("sub-child/sub-sub-child/@title",                null, prefixResolver, XPath.SELECT, null);        Runtime r = Runtime.getRuntime();         System.out.println("Evaluating XPath for each " + numChilds +                " childs");        NodeList nodeList = root.getChildNodes();        int size = nodeList.getLength();        for (int i = 0; i < size; i++) {            long start = System.currentTimeMillis();            Element child = (Element) nodeList.item(i);            int ctxtNode = xpathSupport.getDTMHandleFromNode(child);            long duration = System.currentTimeMillis() - start;            if (i < 10 || (i % (numChilds / 10)) == 0)                System.out.println("child #" + i + "\t took " +                        duration + " ms." +                        "\tfreeMemory: " + r.freeMemory() +                        "\ttotalMemory: " + r.totalMemory());            else if (i == 10)                System.out.println("printing some selected childs onlyfrom now on...");        }    }}


阅读全文(11161) | 回复(0) | 编辑 | 精华
 



发表评论:
昵称:
密码:
主页:
标题:
验证码:  (不区分大小写,请仔细填写,输错需重写评论内容!)



站点首页 | 联系我们 | 博客注册 | 博客登陆

Sponsored By W3CHINA
W3CHINA Blog 0.8 Processed in 0.062 second(s), page refreshed 144801610 times.
《全国人大常委会关于维护互联网安全的决定》  《计算机信息网络国际联网安全保护管理办法》
苏ICP备05006046号