Method for Working with DOM
Split the Block
In some cases, you may need to split the contents of a block into several blocks,
to go further into each of these blocks and parse the contents separately.
For example, some parameters of the item can be listed as comma-separated values in
defined block, or separated by the
tag. In this case, the split command will help you.
It can work in two contexts: text and HTML. In the text context, it works with the contents of all text nodes of the current block, and in the HTML context - with the HTML content of the current block.
The result of the command execution will be a new block, in the context of which the digger will automatically switch.
Command can use following paramenters:
Parameter | Description |
---|---|
context | Defines context for the command: text or html. |
delimiter | Separator, which will be used to split contents to blocks. |
Let's use following HTML source:
<div>
<p>Some text</p>
<br/>
<p>Some,other,text with
comma
and
newline</p>
</div>
Usage examples:
- find:
path: div > p:contains(",")
do:
- split:
context: text
delimiter: ","
# AT THIS MOMENT WE WILL BE IN NEW BLOCK
# WHICH IS CREATED BY THE `split` COMMAND
# THIS BLOCK WILL HAVE FOLLOWING HTML CONTENT:
# <div class="splitted element_0">Some</div>
# <div class="splitted element_1">other</div>
# <div class="splitted element_2">text with comma and newline</div>
# LETS USE FOLLOWING CSS SELECTOR AND SELECT LAST SPLITTED BLOCK
- find:
path: .splitted
slice: -1
do:
- parse
# REGISTER VALUE: text with comma and newline
- find:
path: div
do:
- split:
context: html
delimiter: <br/>
# AT THIS MOMENT WE WILL BE IN NEW BLOCK
# WHICH IS CREATED BY THE `split` COMMAND
# THIS BLOCK WILL HAVE FOLLOWING HTML CONTENT:
# <div class="splitted element_0"><p>Some text</p></div>
# <div class="splitted element_1"><p>Some,other,text with comma and newline</p></div>
# LETS USE FOLLOWING CSS SELECTOR AND SELECT LAST SPLITTED BLOCK
- find:
path: .splitted
slice: -1
do:
- parse
# REGISTER VALUE: Some,other,text with comma and newline
In the next chapter, we will learn how to split the contents of a block into blocks using sequences.