title | description | author | tags |
Delphi HTML Parser |
Very small and fast module for parsing HTML pages. |
sandbil |
Delphi |
This module allows you to work with HTML documents as DOM tree and to use XPath for searching tags.
It is a very simple way to parse HTML.
It's tested with Delphi XE5, XE6 and 10.3
OpenSSL library (libeay32.dll, ssleay32.dll)
Current versions of OpenSSL can be downloaded at https://indy.fulgan.com/SSL/
- Add
to uses list. - Example usage
DomTree: TDomTree;
DomTreeNode: TDomTreeNode;
HtmlTxt: string;
NodeList: TNodeList;
ValueList: TStringList;
HtmlTxt := NodeList:= TNodeList.Create;
ValueList := TStringList.Create;
DomTree := TDomTree.Create;
DomTreeNode := DomTree.RootNode;
If DomTreeNode.RunParse(HtmlTxt) then
// short example code:
DomTreeNode.FindXPath('//*[@id="TopBox"]/div[1]/div[@class="draw default"]', NodeList, ValueList)
- enjoy!!!
- property Count - count of nodes
- property RootNode - root node (TDomTreeNode)
- property ParseErr - Tstringlist contains all parsing errors and warnings
- property Tag - name of tag
- property AttributesTxt - string with all attributtes
- property Attributes - parsed attributes (TDictionary<string, string>)
- property Text - text
- property TypeTag -
- property Child - contains child's nodes (TChildList of TDomTreeNode)
- property Parent - contains parent's node
- property Owner - contains pointer to owner TDomTree
- function FindNode - boolean function, if true then TNodeList contains found nodes
- function FindTagOfIndex - boolean function, if true then TNodeList contains founded nodes
- function GetAttrValue - returns value of attribute of current node
- function GetComment - returns the set index comment in the current container node
- function GetTagName - return name of tag + AttributesTxt
- function GetText - returns the set index text in the current container node
- function GetXPath - returns Xpath for current node
- function RunParse - if parse is successfully then CHild property contains HTML DOM tree
- function FindXPath - boolean function, if true then TNodeList contains found nodes
- and TStringList contains found values of attribute, comment, text
Xpath support:
attributes - //*[@id="TopBox"]/div/@class
comment - //*[@id="TopBox"]/div/comment()[3]
text - //*[@id="TopBox"]/div/text()[2]
previous level - /../div[@class="draw default"]/img[2]/@alt
partial coincidence by search in value of attribute:
returned nodes[[div class="draw default"],[div class="draw"], [div class="draw any"]..]
like Xpath's function contains.
Note: Xpath always starts search from current node. If you want to do global search, you must start from root's node.