Delphi HTML Parser

title	description	author	tags
Delphi HTML Parser	Very small and fast module for parsing HTML pages.	sandbil	Delphi

Delphi HTML Parser

This module allows you to work with HTML documents as DOM tree and to use XPath for searching tags.
It is a very simple way to parse HTML.

It's tested with Delphi XE5, XE6 and 10.3

Requirements

OpenSSL library (libeay32.dll, ssleay32.dll)
Current versions of OpenSSL can be downloaded at https://indy.fulgan.com/SSL/

Usage

Add parser.pas; to uses list.
Example usage

  {...}
  var
    DomTree: TDomTree;
    DomTreeNode: TDomTreeNode;
    HtmlTxt: string;
    NodeList: TNodeList;
    ValueList: TStringList;
  begin
    HtmlTxt := NodeList:= TNodeList.Create;
    ValueList := TStringList.Create;
    DomTree := TDomTree.Create;
    DomTreeNode := DomTree.RootNode;
    If DomTreeNode.RunParse(HtmlTxt) then
    begin
      // short example code:
      DomTreeNode.FindXPath('//*[@id="TopBox"]/div[1]/div[@class="draw default"]', NodeList, ValueList)

      {...}
    end;
    {...}
  end;

enjoy!!!

Available properties and methods:

TDomTree
- property Count - count of nodes
- property RootNode - root node (TDomTreeNode)
- property ParseErr - Tstringlist contains all parsing errors and warnings
TDomTreeNode
- property Tag - name of tag
- property AttributesTxt - string with all attributtes
- property Attributes - parsed attributes (TDictionary<string, string>)
- property Text - text
- property TypeTag -
- property Child - contains child's nodes (TChildList of TDomTreeNode)
- property Parent - contains parent's node
- property Owner - contains pointer to owner TDomTree
- function FindNode - boolean function, if true then TNodeList contains found nodes
- function FindTagOfIndex - boolean function, if true then TNodeList contains founded nodes
- function GetAttrValue - returns value of attribute of current node
- function GetComment - returns the set index comment in the current container node
- function GetTagName - return name of tag + AttributesTxt
- function GetText - returns the set index text in the current container node
- function GetXPath - returns Xpath for current node
- function RunParse - if parse is successfully then CHild property contains HTML DOM tree
- function FindXPath - boolean function, if true then TNodeList contains found nodes
- and TStringList contains found values of attribute, comment, text
Xpath support:
- attributes - //*[@id="TopBox"]/div/@class
- comment - //*[@id="TopBox"]/div/comment()[3]
- text - //*[@id="TopBox"]/div/text()[2]
- previous level - /../div[@class="draw default"]/img[2]/@alt
- partial coincidence by search in value of attribute:
  /div[@class="draw] returned nodes [[div class="draw default"],[div class="draw"], [div class="draw any"]..] like Xpath's function contains.
  Note: Xpath always starts search from current node. If you want to do global search, you must start from root's node.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
demo		demo
test		test
LICENSE		LICENSE
README.md		README.md
parser.pas		parser.pas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Delphi HTML Parser

Requirements

Usage

Available properties and methods:

demo project

About

Releases

Packages

Contributors 2

Languages

License

sandbil/HTML-Parser

Folders and files

Latest commit

History

Repository files navigation

Delphi HTML Parser

Requirements

Usage

Available properties and methods:

demo project

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages