dongdong的照相机--【转载】HTMLParser的初步整理

本站首页 管理页面写新日志退出

« September 2025 »
日一二三四五六
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30

公告

暂无公告...

我的分类（专题）

首页(3)
转载(1)

日志更新

【转载】JAVA与模式摘录
【转载】HTMLParser的初步整理
【转载】使用 HttpClient 和

最新评论

回复:【转载】JAVA与模式摘录

留言板

签写新留言

链接

Blog信息

blog名称:dongdong的照相机
日志总数:3
评论数量:1
留言数量:-1
访问次数:29597
建立时间:2009年11月25日

[转载]【转载】HTMLParser的初步整理
网上资源, 电脑与网络

dongdong 发表于 2009/11/27 17:37:05

书上没有详细介绍HTMLParser细致的使用，所以需要看文档来来进一步了解HTMLParser。我首先整理一些方法，以便查找。 Interface Node： getChildren() NodeList get the children of this node getStartPosition() int get the starting position of the node. getEndPosition() int getFirstChild() Node getLastChild() Node getParent() Node this will always return null when parsing with the Lexer. getNextSibling() Node getPreviousSibling() Node getPage() page get the page this node came from. return the page that supplied this node. getText() String return the text of the node. toHtml() String return the HTML for this node. toHtml(boolean) String return the HTML for this node. toPlainTextString()* String a string representation of the node. toString() String return the string representation of the node. collectInto(NodeList,NodeFilter)* void collect this node and its child nodes into a list,provided the node satisfies the filtering criteria. accept(NodeVisitor) void apply the visitor to this node. setchildren(NodeList) void setStartPosition(int) void setEndPosition(int) void setPage(Page) void setParent(Node) void setStartPosition(int) void setText(String) void clone() Object allow cloning of nodes *: toPlainTextString(): A string representation of the node. This is an important method, it allows a simple string transformation of a web page, regardless of a node. For a Text node this is obviously the textual contents itself. For a Remark node this is the remark contents (sic). For tags this is the text contents of it's children (if any). Because multiple nodes are combined when presenting a page in a browser, this will not reflect what a user would see. 简单来说，就是返回节点内所有文本。System.out.println (node.toPlainTextString ()); collectInto(NodeList,NodeFilter): 将满足NodeFilter的所有node存到NodeList中。疑问：一系列的set的方法，有什么使用意义？accept和clone怎么使用？ Package org.htmlparser.filters AndFilter CssSelectorNodeFilter 接受所有支持CSS2选择器的节点。 HasAttributeFilter HasChildFilter HasParentFilter HasSiblingFilter IsEqualFilter This class accepts only one specific node. LinkRegexFilter 接受所有linkTag标签的link值，匹配给定的正则表达式的节点。 LinkStringFilter 接受所有linkTag标签的link值，匹配给定的字符串的节点。 NodeClassFilter 接受所有接受指定的类的节点。 NotFilter OrFilter RegexFilter 接受所有满足指定正则表达式的String Nodes。 StringFilter 接受所有满足指定String的String Nodes。 TagNameFilter

阅读全文(2494) | 回复(0) | 编辑 | 精华

发表评论：

昵称：
密码：
主页：
标题：

验证码： (不区分大小写,请仔细填写,输错需重写评论内容！)

站点首页 | 联系我们 | 博客注册 | 博客登陆

Sponsored By W3CHINA
W3CHINA Blog 0.8 Processed in 0.016 second(s), page refreshed 144810961 times.
《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
苏ICP备05006046号