Recently we added a couple of neat functions which let you work with data more efficiently. So one of these functions is JSON schema support. JSON schema can be used in many cases, e.g., if you need to ensure that digger still works appropriately and data you are getting is still in good state, or if you need to get just specific records and skip others. For example, if you are gathering some events, you may want to get the only event that not canceled or has open slots, if a website has information about it, you can easily set rules in a JSON scheme to pick only records you need.
So what is JSON schema? As json-schema.org states: “JSON Schema is a vocabulary that allows you to annotate and validate JSON documents.”. I would recommend you to learn more about it from the above site, as we are not going to cover syntax and JSON schema usage in this article. You can quickly learn it and play with it in debug mode at Diggernaut without paying a dime for it.
So how can you set JSON schema for a digger? First, you need to login to your Diggernaut account, then go to Projects > Diggers, find digger you need and click on “Config” button.
It opens editor panel where you usually put in digger config. You can see that it has 2 additional tabs now. You need to click on the “Validator” tab.
Then you have to put your JSON schema and click on the “Save” button.
Next time your digger is running, it applies your JSON scheme for data validation. To understand it better, you may want to look into digger config we used for tests:
---
config:
debug: 2
do:
- link_add: 'https://diggernaut.com/sandbox/'
- walk:
to: links
do:
- sleep: 1
- find:
path: .result-content
do:
- variable_clear: name
- variable_clear: descr
- find:
path: h3
do:
- parse
- variable_set: name
- find:
path: p
do:
- parse
- variable_set: descr
- find:
path: table
do:
- find:
path: 'tbody > tr'
do:
- object_new: item
- variable_get: name
- object_field_set:
object: item
field: name
- variable_get: descr
- object_field_set:
object: item
field: descr
- find:
path: .col2
do:
- parse
- object_field_set:
object: item
field: number
- find:
path: .col3
do:
- parse
- object_field_set:
object: item
field: short_descr
- find:
path: .col4
do:
- parse
- object_field_set:
object: item
field: location
- find:
path: .col5
do:
- object_new: date
- find:
path: ' .nowrap:nth-child(1)'
do:
- parse
- object_field_set:
object: date
field: start
- find:
path: ' .nowrap:nth-child(2)'
do:
- parse
- object_field_set:
object: date
field: end
- object_save:
name: date
to: item
- find:
path: .col6
do:
- object_new: time
- find:
path: ' .nowrap:nth-child(1)'
do:
- parse
- object_field_set:
object: time
field: start
- find:
path: ' .nowrap:nth-child(2)'
do:
- parse
- object_field_set:
object: time
field: end
- object_save:
name: time
to: item
- find:
path: .col7
do:
- parse
- object_field_set:
object: item
field: days
- find:
path: .col8
do:
- parse:
filter:
- "\\s*\\$\\s*(\\d+)\\/"
- "\\s*\\$\\s*(\\d+)"
- object_field_set:
object: item
type: int
field: member_fee
- parse:
filter:
- "\\s*\\/\\s*\\$\\s*(\\d+)"
- "\\s*\\$\\s*(\\d+)"
- object_field_set:
object: item
type: int
field: non_member_fee
- find:
path: .col9
do:
- parse
- object_field_set:
object: item
field: ages
- find:
path: .col10
do:
- parse
- object_field_set:
object: item
field: is_available
- find:
path: .ajaxLoad.info-icon.tooltips
do:
- parse:
attr: href
- walk:
to: value
do:
- find:
path: 'tr:nth-of-type(2) td:nth-of-type(2)'
do:
- parse
- object_field_set:
object: item
field: gender
- object_save:
name: item
- find:
path: .next a
do:
- parse:
attr: href
- link_add
And JSON scheme we used for it:
{
"$schema": "http://json-schema.org/draft-04/schema#",
"title": "Activities",
"description": "Park district activities",
"type": "object",
"properties": {
"item": {
"type": "object",
"properties": {
"number": {
"description": "The unique identifier for an activity",
"type": "string"
},
"name": {
"description": "Activity name",
"type": "string"
},
"descr": {
"description": "Activity description",
"type": "string"
},
"gender": {
"description": "Gender specification for an activity",
"type": "string"
},
"short_descr": {
"description": "Activity short description",
"type": "string"
},
"ages": {
"description": "Allowed ages",
"type": "string"
},
"days": {
"description": "Weekdays when activity takes place",
"type": "string"
},
"member_fee": {
"description": "Fee for members",
"type": "number"
},
"non_member_fee": {
"description": "Fee for non-members",
"type": "number"
},
"is_available": {
"description": "Shows if activity is still available",
"type": "string"
},
"location": {
"description": "Location where activity takes place",
"type": "string"
},
"dates": {
"type": "array",
"items": {
"type": "object",
"properties": {
"start": {
"description": "Start date for activity session",
"type": "string"
},
"end": {
"description": "End date for activity session",
"type": "string"
}
},
"required": ["start","end"]
},
"minItems": 1,
"uniqueItems": true
},
"time": {
"type": "array",
"items": {
"type": "object",
"properties": {
"start": {
"description": "Start time for activity event",
"type": "string"
},
"end": {
"description": "End time for activity event",
"type": "string"
}
},
"required": ["start","end"]
},
"minItems": 1,
"uniqueItems": true
}
},
"required": ["number","name","gender"]
}
},
"required": ["item"]
}