Entity Manipulations
Using Hashes
Hashs are used as dictionaries for various purposes, for example you can create a dictionary of pages already visited. So you can check whether you visited page already before loading it. You can also create reference data sets for a further population of objects with data. Hashs can not be used as data for substitution, but you can read the values to the register and work with them there.
The hashmap_set command is used for setting field of the hash with value of the register (only in the block context) or directly:
# SWITCHING TO THE BLOCK
- find:
path: .somepath
do:
- parse
# WRITING REGISTER VALUE TO THE HASH FIELD
- hashmap_set:
name: currency
field: EUR
# WRITING VALUE TO THE HASH FIELD DIRECTLY
- hashmap_set:
name: currency
field: USD
value: United States Dollar
The command hashmap_get is used to write the value of the hash field to the register:
# SWITCHING TO THE BLOCK
- find:
path: .somepath
do:
# READING HASH FIELD TO THE REGISTER
- hashmap_get:
name: currency
field: EUR
Let's see how you can use a hash to prevent the collection of duplicated events (events that has Activity number).
The source HTML for our scraper is available at this link.
Please note that Activity number 363101-09 is duplicated in the table, and we only need to collect
the first record encountered and ignore all subsequent duplicates under the same number.
---
config:
debug: 2
agent: Firefox
do:
- walk:
to: https://www.diggernaut.com/sandbox/meta-lang-hash-table-en.html
do:
- find:
# LETS FIND ALL `tr` TAGS
path: tbody > tr
do:
# CLEAR VARIABLE FOR KEEPING ACTIVITY NUMBER
- variable_clear: number
- find:
path: td.col2
do:
- parse
# SAVE NUMBER TO THE VARIABLE
- variable_set: number
# TRYING TO FIND HASH WITH NAME AS ACTIVITY NUMBER AND READ FIELD `name` TO THE REGISTER
- hashmap_get:
name: <%number%>
field: name
- if:
# CHECK IF REGISTER IS NOT EMPTY
match: \S
# IF ITS EMPTY
else:
# CREATE OBJECT `item`
- object_new: item
- find:
path: td.col3
do:
- parse
# CREATE HASH WITH NAME AS ACTIVITY NUMBER AND SAVE REGISTER VALUE (IT HAS NAME OF ACTIVITY) TO THE FIELD `name`
# THIS HASH WILL BE USED FUTHER FOR DUPLICATES CHECKING
- hashmap_set:
name: <%number%>
field: name
# SAVE VALUE OF THE REGISTER TO THE FIELD name OF THE OBJECT item
- object_field_set:
object: item
field: name
- find:
path: td.col4
do:
- parse
# SAVE LOCATION TO THE OBJECT
- object_field_set:
object: item
field: location
- find:
path: td.col10
do:
- parse
# SAVE STATUS OF EVENT TO THE OBJECT
- object_field_set:
object: item
field: isAvailable
# SAVE OBJECT TO THE DB
- object_save:
name: item
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Diggernaut | Meta-language | Hash table sample</title>
</head>
<body>
<div class="result-content">
<div>
<h3>363101 - Jr Golf Clinic Orange</h3>
</div>
<table cellspacing="2" border="1" cellpadding="5">
<thead>
<tr>
<th>Activity</th>
<th>Description</th>
<th>Location</th>
<th>Status</th>
</tr>
</thead>
<tbody>
<tr>
<td class="col2">
<span class="nowrap">363101-07</span>
</td>
<td class="col3">Jr Golf-Orange 4,5:31</td>
<td class="col4">Randall Oaks Golf Cl</td>
<td class="col10">
<span class="success arstatus">Available</span>
</td>
</tr>
<tr>
<td class="col2">
<span class="nowrap">363101-09</span>
</td>
<td class="col3">Jr Golf-Orange 4,5:30</td>
<td class="col4">Randall Oaks Golf Cl</td>
<td class="col10">
<span class="success arstatus">Available</span>
</td>
</tr>
<tr>
<td class="col2">
<span class="nowrap">363101-09</span>
</td>
<td class="col3">Jr Golf-Orange 5,5:30</td>
<td class="col4">Randall Oaks Golf Cl</td>
<td class="col10">
<span class="success arstatus">Available</span>
</td>
</tr>
<tr>
<td class="col2">
<span class="nowrap">363101-10</span>
</td>
<td class="col3">Jr Golf-Orange 5,6:23</td>
<td class="col4">Randall Oaks Golf Cl</td>
<td class="col10">
<span class="success arstatus">Available</span>
</td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
[
{
"item": {
"isAvailable": "Available",
"location": "Randall Oaks Golf Cl",
"name": "Jr Golf-Orange 4,5:31"
}
},
{
"item": {
"isAvailable": "Available",
"location": "Randall Oaks Golf Cl",
"name": "Jr Golf-Orange 4,5:30"
}
},
{
"item": {
"isAvailable": "Available",
"location": "Randall Oaks Golf Cl",
"name": "Jr Golf-Orange 5,6:23"
}
}
]
Next, we consider how useful counters can be and what methods are provided for them.