Home

The DataTreeGrab module contains of two main classes: the DataTreeGrab.HTMLtree class and the DataTreeGrab.JSONtree class. They read a given HTML or JSON page into a tree of nodes with properties and both derive from the DataTreeGrab.DATAtree class. Similar there are the DataTreeGrab.HTMLnode, DataTreeGrab.JSONnode and DataTreeGrab.DATAnode classes, but the normally will only get called internally. The DataTreeGrab.NULLnode class is used to indicate a Null search result.
For HTML every tag represents a node with the following properties:

tag: the tag-name: always lower-case
text: any containing text
tail: any tailing text
attributes[<name>]: the attributes with their content. The attribute-name is converted to lower-case.

For JSON every list, dict and value represents a node with as properties:

type: [list|dict|value]
key: either the numeric list-index or the dict key
keys[]: a list of the child keys
key_index[]: The reverse of the previous
value

Both have also the following properties:

parent: the parent node
children[]: a list of child-nodes
dtree: the tree and through it its root
level: its level, with the root being 0
child_index: an index among its siblings. This is the index in children[], keys[] and key_index[]

Trough these properties you can parse through the tree and select the desired data. At present in the JSONtree class the index for dicts has no meaning except as internal reference. To use it against the original JSON data, we first have to add our own parser that bypasses the Python randomizing of the order within a dict structure.

Glossary

accept-header
autoclose-tags
caller_id
current_date
current_ordinal
child_index
data_def
data-format
DATAnode
DATAtree
date-range-splitter
date-sequence
date-splitter
datetimestring
default-item-count
empty-values
enclose-with-html-tag
encoding
init_def
item-range-splitter
key_def
key-node
link_def
link-value
month-names
name-value
node_def
NULLnode
path_def
.print_searchtree
relative-weekdays
root-node
severity
.show_result
start_node
str-list-splitter
text_replace
time-splitter
time-type
timezone
unquote_html
URL_def
url
url-data
url-date-format
url-date-multiplier
url-date-type
url-header
url-type
url-weekdays
value_def
value-filters
warngoal
weekdays

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Clone this wiki locally