-
Notifications
You must be signed in to change notification settings - Fork 2
Home
A description of the classes and its functions
The language used to describe what to extract and how
The DataTreeShell class or pre and post processing
Some examples
Multi-threading
Warnings, Errors and other messages
Testing your data_defs
Glossary
The DataTreeGrab module consists of two main classes: the DataTreeGrab.HTMLtree class and the
DataTreeGrab.JSONtree class. They read a given HTML or JSON page into a tree of nodes with properties and both derive from the DataTreeGrab.DATAtree class.
Similar there are the DataTreeGrab.HTMLnode, DataTreeGrab.JSONnode and DataTreeGrab.DATAnode classes, but they normally will only get called internally. The DataTreeGrab.NULLnode class is used to indicate a Null search result.
With version 1.1 there is a warnings framework. Also with this version there is a new DataTreeGrab.DataTreeShell class.
Every node has the following properties:
- parent: the parent node
- children[]: a list of child-nodes
- dtree: the tree and through it its root
- level: its level, with the root being 0
- child_index: an index among its siblings. This is the index in children[], keys[] and key_index[]
For HTML every tag represents a node with the following additional properties:
- tag: the tag-name: always lower-case
- text: any containing text
- tail: any tailing text
- attributes[
<name>
]: the attributes with their content. The attribute-name is converted to lower-case.
For JSON every list, dict and value represents a node with as additional properties:
- type: [list|dict|value]
- key: either the numeric list-index or the dict key
- keys[]: a list of the child keys
- key_index[]: The reverse of the previous
- value
Through these properties you can parse through the tree and select the desired data. At present in the JSONtree class the index for dicts has no meaning except as internal reference. To use it against the original JSON data, we first have to add our own parser that bypasses the Python randomizing of the order within a dict structure.
Glossary
accept-header
autoclose-tags
caller_id
current_date
current_ordinal
child_index
data_def
data-format
DATAnode
DATAtree
date-range-splitter
date-sequence
date-splitter
datetimestring
default-item-count
empty-values
enclose-with-html-tag
encoding
init_def
item-range-splitter
key_def
key-node
link_def
link-value
month-names
name-value
node_def
NULLnode
path_def
.print_searchtree
relative-weekdays
root-node
severity
.show_result
start_node
str-list-splitter
text_replace
time-splitter
time-type
timezone
unquote_html
URL_def
url
url-data
url-date-format
url-date-multiplier
url-date-type
url-header
url-type
url-weekdays
value_def
value-filters
warngoal
weekdays