Methods for Working with DOM
Attributes
Nodes may has some attributes (for example: style or class) set. In some cases, you may need to remove these attributes. You can use the attr_remove command. It will remove all specified attributes in all nodes of the current block.
The parameter selector must be passed with the command, where you have to specify a selector for attributes that should be removed. To delete all attributes, you can pass the wildcard selector *.
Let's use following HTML source:
<div class="container">
<span style="width: 200px;">some text</span>
<a href="link.html">some link</a>
<span style="width: 400px;">another text</span>
</div>
Example of usage:
- find:
path: div
do:
- attr_remove:
selector: '*'
- parse:
format: html
# ALL ATTRIBUTES WERE REMOVED, REGISTER VALUE:
# <span>some text</span>
# <a>some link</a>
# <span>another text</span>
- find:
path: div
do:
- attr_remove:
selector: style
- parse:
format: html
# REMOVED ONLY STYLE ATTRIBUTE, REGISTER VALUE:
# <span>some text</span>
# <a href="link.html">some link</a>
# <span>another text</span>
In the next chapter, we will learn how to split the contents of a block into multiple blocks manually.