Entity Manipulations
Link Pools
There are two commands for working with link pools, one used to add link to a specific pool, another to clear a given pool.
The pool_clear command clears the link pool with the given name. If the name is not provided, the pool with the name "default" is cleared, which is the default pool.
The link_add command adds a link to the pool. Depending on the context and used parameters, you can add a link from the register or in an explicit form. For example, in a block context you can add a link from the register, and in other contexts only explicitly, since the register is not available in other contexts. The full list of parameters is given below:
Parameter | Description |
---|---|
pool | Pool name. If not sent, digger uses "default" as pool name. |
url | One or list of links (see examples), given explicitly to add to the pool. If parameter is missing, register value is used as source to get link. |
Usage examples:
# CREATE BLOCK FROM HTML STRING
- register_set: '<body>
<a href="http://www.somesite.com/1">link1</a>
<a href="http://www.somesite.com/2">link2</a>
<a href="http://www.somesite.com/3">link3</a>
<a href="http://www.somesite.com/4">link4</a>
</body>'
- to_block
# -------------------------------------------------------------
# FIND ALL `a` TAGS
- find:
path: a
do:
# READ `href` ATTRIBUTE TO THE REGISTER
- parse:
attr: href
# ADD LINK FROM REGISTER TO THE POOL (DEFAULT)
- link_add
# ITERATE OVER LINKS IN THE POOL, LOAD PAGE AND EXECUTE `do` BLOCK
- walk:
to: links
do:
...
...
# CLEAR POOL WITH NAME `default`
- pool_clear
# -------------------------------------------------------------
# FIND ALL `a` TAGS
- find:
path: a
do:
# READ `href` ATTRIBUTE TO THE REGISTER
- parse:
attr: href
# ADD LINK FROM REGISTER TO THE POOL WITH NAME `main`
- link_add:
pool: main
# ITERATE OVER LINKS IN THE POOL `main`, LOAD PAGE AND EXECUTE `do` BLOCK
- walk:
to: links
pool: main
do:
...
...
# CLEAR POOL WITH NAME main
- pool_clear: main
# -------------------------------------------------------------
# EXPLICITLY ADD URL http://www.somesite.com/somecoolurl TO THE POOL WITH NAME `somepool`
- link_add:
pool: somepool
url: http://www.somesite.com/somecoolurl
# -------------------------------------------------------------
# EXPLICITLY ADD LIST OF URLS TO THE POOL WITH NAME `somepool`
- link_add:
pool: somepool
url:
- http://www.somesite.com/somecoolurl1
- http://www.somesite.com/somecoolurl2
- http://www.somesite.com/somecoolurl3
- http://www.somesite.com/somecoolurl4
In the next chapter, we show you how to work with data objects.