Project: Suppliers
Set URL list
Sometimes it may be confusing to open the digger configuration editor to change the list of URLs digger should scrape.
Now you can do it in simplified mode, without editing the digger config.
This is especially convenient when the logic of the digger does not change, but you need to change one or more
URLs. Please note, that it will work only if your walk command uses default links pool to navigate pages.
For example, you have a digger code in a meta-language:
---
config:
debug: 2
agent: Chrome
do:
- link_add:
url:
- https://www.diggernaut.com/sandbox/details.html#1
- https://www.diggernaut.com/sandbox/details.html#2
- https://www.diggernaut.com/sandbox/details.html#3
- walk:
to: links
do:
In this code, you added several URLs with the link_add command and then iterate over them using walk to links command. Before, to change URLs digger should iterate over you had to put them to the link_add command in the config. And now you can handle it more easily. First, you will need to edit your config and remove link_add command completely.
Now, your digger code will look like this:
---
config:
debug: 2
agent: Chrome
do:
- walk:
to: links
do:
and in the pool of links there will be only URLs you specify in the list of URLs.
To fill it, click on the Options button and Set URL list item.
and let's look closer to the interface and what we can do here.
- URL address
- Add link
- The list of URLs
- Delete selected URLs
- Clear the list
- Save the list
URL address - URL address of the page you want to put to the default pool of links.
Add link - add link to the pool.
Delete selected URLs - you can select one or more URLs from the list using Shift or Ctrl.
Clear the list - allow you to clear the entire list at once.
Save the list - don't forget to click it!
NextPay attention!
You must save the list for the changes to take effect.
If you need to remove the list from the digger, clear it completely and save.