Methods for Working with DOM

Nodes

When a digger is working with HTML or XML document, it works with DOM (Domain Object Model) structure of the document. So basically such document consists of nodes. When you use the find method, you search through nodes of the document, switch to the found node and, respectively, to block context. Inside the current block (node) there can be nested (child) nodes, and they also may have child nodes and so on. In a block context, you can manipulate the nodes of the current block, delete or replace them.

Examples of commands you can use for nodes manipulations:

          # DELETE ALL NON-TEXT CHILD NODES
- node_remove_all

          # DELETE ALL `а` NODES
- node_remove: a

          # REPLACE ALL `а` NODES TO EMPTY `p` NODES
- node_replace:
    path: a
    with: <p></p>

          # REPLACE ALL `а` NODES TO THEIR CONTENTS
- node_replace:
    path: a
    with: content

Let's use following HTML source as example:

          <div>
    <span>some text</span>
    <a>some link</a>
    <span>another text</span>
</div>

Examples of usage:

Delete all nodes
Delete nodes by CSS selector
Replace nodes
Replace with node contents

              - find:
    path: div
    do:
    - node_remove_all
    - parse

    # REGISTER WILL BE EMPTY AS ALL NODES WERE REMOVED

              - find:
    path: div
    do:
    - node_remove: span
    - parse

    # REGISTER VALUE: some link
    # BECAUSE ALL `span` NODES WERE REMOVED

              - find:
    path: div
    do:
    - node_replace: 
        path: span
        with: ' some text '
    - parse

    # REGISTER VALUE: " some text some link some text "
    # BECAUSE ALL `span` NODES WERE REPLACED WITH TEXT " some text "

              - find:
    path: div
    do:
    - node_replace: 
        path: span
        with: content
    - parse

    # REGISTER VALUE: some textsome linksome text
    # BECAUSE ALL `span` NODES WERE REPLACED WITH THEIR CONTENTS

In the next chapter, we learn how to manipulate the attributes of a node.