Methods for Working with DOM
Nodes
When a digger is working with HTML or XML document, it works with DOM (Domain Object Model) structure of the document. So basically such document consists of nodes. When you use the find method, you search through nodes of the document, switch to the found node and, respectively, to block context. Inside the current block (node) there can be nested (child) nodes, and they also may have child nodes and so on. In a block context, you can manipulate the nodes of the current block, delete or replace them.
Examples of commands you can use for nodes manipulations:
# DELETE ALL NON-TEXT CHILD NODES
- node_remove_all
# DELETE ALL `а` NODES
- node_remove: a
# REPLACE ALL `а` NODES TO EMPTY `p` NODES
- node_replace:
path: a
with: <p></p>
# REPLACE ALL `а` NODES TO THEIR CONTENTS
- node_replace:
path: a
with: content
Let's use following HTML source as example:
<div>
<span>some text</span>
<a>some link</a>
<span>another text</span>
</div>
Examples of usage:
- find:
path: div
do:
- node_remove_all
- parse
# REGISTER WILL BE EMPTY AS ALL NODES WERE REMOVED
- find:
path: div
do:
- node_remove: span
- parse
# REGISTER VALUE: some link
# BECAUSE ALL `span` NODES WERE REMOVED
- find:
path: div
do:
- node_replace:
path: span
with: ' some text '
- parse
# REGISTER VALUE: " some text some link some text "
# BECAUSE ALL `span` NODES WERE REPLACED WITH TEXT " some text "
- find:
path: div
do:
- node_replace:
path: span
with: content
- parse
# REGISTER VALUE: some textsome linksome text
# BECAUSE ALL `span` NODES WERE REPLACED WITH THEIR CONTENTS
In the next chapter, we learn how to manipulate the attributes of a node.