Entity Manipulations

Data Objects

Data objects are used to organize the structure of the stored data. Before you can save data to an object, you must create an object. This is done with the object_newcommand, which takes a single argument - the name of the object. After the object is created and until it is saved with the object_save command, you can work with the fields of the object. You can write values and a list of values ​to the fields with the command object_field_set. You can also save other object to the object field using a special parameter in the object_save command. If you use the JSON validation scheme in your digger, the object_save command object will check your object for validity. If there are validation errors, the object will not be saved. Also for checking an object without saving object to the DB, there is a command object_check, which simply checks the object for validity and in the case of a positive result it executes commands from the do block, and in the case of a negative one, from the else block.

Allowed commands:

          # CREATE OBJECT
- object_new: item
          
          # SET VALUE OF THE FIELD OF THE OBJECT
- object_field_set:
    # OBJECT NAME
    object: item
    # FIELD NAME
    # CAN BE SET AS STRING: somefield
    # OR WITH USING VARIABLES/ARGUMENTS/COUNTER VALUE: somefield_<%somevar%>
    field: somefield
          
          # SET VALUE OF THE FIELD OF THE OBJECT AS SPECIFIC DATA TYPE
- object_field_set:
    # OBJECT NAME
    object: item
    # FIELD NAME
    # CAN BE SET AS STRING: somefield
    # OR WITH USING VARIABLES/ARGUMENTS/COUNTER VALUE: somefield_<%somevar%>
    field: somefield
    # SETTING FIELD TYPE (CAN BE: float, date OR int)
    type: float
    # YOU CAN ALSO SET PRECISION FOR FLOAT TYPE
    precision: 2
          
          # PUSHING VALUE TO THE FIELD AS ELEMENT OF ARRAY
- object_field_push:
    # OBJECT NAME
    object: item
    # FIELD NAME
    # CAN BE SET AS STRING: somefield
    # OR WITH USING VARIABLES/ARGUMENTS/COUNTER VALUE: somefield_<%somevar%>
    field: somefield
    # CHECK FOR UNIQUENESS (OPTIONAL)
    # IF SUCH VALUE ALREADY EXIST IN THE ARRAY, IT WILL NOT BE PUSHED
    unique: yes
          
          # SAVING OBJECT TO THE FIELD OF OTHER OBJECT
- object_save:
    # OBJECT NAME
    name: item
   # TARGET OBJECT NAME
    to: anotherobj
# OR
- object_save:
    # OBJECT NAME
    name: item
    # TARGET OBJECT NAME
    to: anotherobj
    # SAVE OBJECT AS MAP (OPTIONAL, WITHOUT THIS OPTION OBJECT WILL BE PUSHED AS AN ELEMENT OF ARRAY, WHICH ALLOW YOU TO PUSH SEVERAL OBJECTS TO THE FIELD)
    as: map
          
          # SAVING OBJECT TO THE DB
- object_save:
    # OBJECT NAME
    name: item
          
            # CHECKING OBJECT
  - object_check:
      # OBJECT NAME
      name: item
      do:
      # IF CHECK RESULTS WERE POSITIVE WE DO THIS BLOCK
      - register_set: All is good
      else:
      # IF CHECK RESULTS WERE NEGATIVE WE DO THIS BLOCK
      - exit
        

The process of saving of objects to an object can occur in two scenarios:
save a list of objects in a field or save one object as a hashmap.

Saving of a list of objects into an object is performed as follows:

              # CREATE BLOCK FROM HTML STRING
- register_set: '<ul>
                    <li>A</li>
                    <li>B</li>
                    <li>C</li>
                    <li>D</li>
                </ul>'
- to_block
- find:
    path: ul
    do:
    # CREATE OBJECT WITH NAME `item`
    - object_new: item
    - find:
        path: li
        do:
        # CREATE ANOTHER OBJECT WITH NAME `sub-item`
        - object_new: sub-item

        # PARSE TO REGISTER
        - parse

        # SAVE REGISTER VALUE TO THE OBJECT `sub-item` FIELD `sometext`
        - object_field_set:
            object: sub-item
            field: sometext

        # SAVE OBJECT `sub-item` TO THE OBJECT `item`
        - object_save:
            name: sub-item
            to: item

    # SAVE OBJECT `item` TO THE DB
    - object_save:
        name: item
              
              {
    "item": {
        "sub-item": [
          {
            "sometext": "A",
          },
          {
            "sometext": "B",
          },
          {
            "sometext": "C",
          },
          {
            "sometext": "D",
          }
        ]
    }
}
              

Saving object to other object as hashmap:

              # CREATE BLOCK FROM HTML STRING
- register_set: '<ul>
                    <li>A</li>
                    <li>B</li>
                    <li>C</li>
                    <li>D</li>
                </ul>'
- to_block
- find:
    path: ul
    do:
    # CREATE OBJECT WITH NAME `item`
    - object_new: item
    - find:
        path: li
        do:
        # CREATE ANOTHER OBJECT WITH NAME `sub-item`
        - object_new: sub-item

        # PARSE TO THE REGISTER
        - parse

        # SAVE REGISTER VALUE TO THE OBJECT `sub-item` FIELD `sometext`
        - object_field_set:
            object: sub-item
            field: sometext

        # SAVE OBJECT `sub-item` TO THE OBJECT `item` AS HASHMAP
        - object_save:
            name: sub-item
            to: item
            as: map

    # SAVE OBJECT `item` TO THE DB
    - object_save:
        name: item
              
              {
    "item": {
        "sub-item" : {
            "sometext": "D"
    }
}
              

Please note:
In this example, the script goes through all li tags, but in the dataset there is only the last element with the value D falls into the data. This is because the field will be consistently overwritten by the values ​​of all the elements sequentally. As a result, only the value of the last element will remain in the field.

An example of the correct creation and saving objects:

              # EVERY TIME DIGGER ITERATE SAME BLOCK - IT EXECUTES SAME CODE AND APPLY SAME LOGIC
# THATS WHY WITH EACH NEW ITERATION DIGGER WILL CREATE NEW OBJECT
- find:
    path: .somepath
    do:
    # CREATE OBJECT WITH NAME `someobj`
    - object_new: someobj

    # PARSE TEXT TO THE REGISTER
    - parse

    # SET `somefield` FIELD OF THE `someobj` OBJECT WITH THE VALUE OF THE REGISTER
    - object_field_set:
        object: someobj
        field: somefield

    # JUMP TO SOME OTHER BLOCK, PARSE TEXT CONTENT AND SAVE IT TO SOME OTHER FIELD OF THE OBJECT `someobj`
    # PLEASE NOTE: IF DIGGER FIND SINGLE ELEMENT USING CSS SELECTOR - THEN VALUE OF FIELD WILL BE PROPER,
    # IF DIGGER FINDS SEVERAL ELEMENTS, IT WILL ITERATE OVER ALL OF THEM SEQUENTALLY AND ONLY CONTENT OF LAST ELEMENT WILL BE ACTUALLY KEPT IN THE FIELD
    # WE WILL OVERVIEW IT IN DETAILS LATER, NOW LETS JUST SWITCH TO THE SINGLE BLOCK
    - find:
        path: li:nth-of-type(1)
        do:
        - parse

        # SET `anotherfield` FIELD OF THE `someobj` OBJECT WITH THE VALUE OF THE REGISTER
        - object_field_set:
            object: someobj
            field: anotherfield

    # LETS NOW USE CSS SELECTOR THAT SELECTS SEVERAL ELEMENTS,
    # WHILE WE ITERATING WE WILL CREATE NEW OBJECTS AND PUT THEM TO OUR MAIN OBJECT `someobj`
    - find:
        path: .anotherpath
        do:
        # CREATE NEW OBJECT
        - object_new: anotherobj

        # SET FIELD `somefield` OF THE OBJECT `anotherobj`
        - parse
        - object_field_set:
            object: anotherobj
            field: somefield

        # SAVE OBJECT `anotherobj` TO THE OBJECT `someobj`
        - object_save:
            name: anotherobj
            to: someobj

    # SAVE MAIN OBJECT `someobj`
    - object_save:
        name: someobj
              
              # EXAMPLE OF HOW YOU SHOOULD NOT DO IT :)
- find:
    path: .somepath
    do:
    # CREATE OBJECT `someobj`
    - object_new: someobj

    # PARSE TEXT TO THE REGISTER
    - parse

    # SET FIELD `somefield` OF THE OBJECT `someobj` WITH THE VALUE OF THE REGISTER
    - object_field_set:
        object: someobj
        field: somefield

    # LETS USE CSS SELECTOR THAT WILL FIND SEVERAL ELEMENTS,
    # WHILE WE ITERATING WE WILL CREATE NEW OBJECTS AND PUT THEM TO OUR MAIN OBJECT `someobj`
    - find:
        path: .anotherpath
        do:
        # CREATE NEW OBJECT
        - object_new: anotherobj

        # SET FIELD `somefield` OF THE OBJECT `anotherobj`
        - parse
        - object_field_set:
            object: anotherobj
            field: somefield

        # SAVE OBJECT `anotherobj` TO THE OBJECT `someobj`
        - object_save:
            name: anotherobj
            to: someobj

    # SWITCH TO OTHER BLOCK AND SAVE PARSED DATA TO SOME OTHER FIELD OF THE OBJECT `someobj`
    - find:
        path: li:nth-of-type(1)
        do:
        - parse

        # SET FIELD `anotherfield` OF THE OBJECT `someobj` WITH THE VALUE OF THE REGISTER
        - object_field_set:
            object: someobj
            field: anotherfield

        # SAVE OBJECT `someobj`
        - object_save:
            name: someobj

    # AS YOU SEE MAIN OBJECT IS SAVED IN DIFFERENT SCOPE THAN OBJECT OPENING SCOPE
    # IT COULD LEAD TO THE POTENTIAL PROBLEM
    # IF SOME PAGES WILL NOT HAVE ANY ELEMENTS WITH SELECTOR `li:nth-of-type(1)`, OBJECT WILL NOT BE SAVED
    # AND YOU WILL LOSE DATA FOR SUCH PAGES
    # SO IT WILL WORK ONLY IF SUCH SELECTOR HAS AT LEAST 1 ELEMENT ON PAGE
    # THATS WHY YOU SHOULD ALWAYS SAVE YOUR OBJECT IN SAME SCOPE WHERE YOU CREATE THE OBJECT
              
              {
    "someobj": {
        "somefield" : "somedata",
        "anotherfield": "anotherdata",
        "anotherobj" : [
            {
                "somefield":"somedata"
            },
            {
                "somefield":"somedata"
            },
            ...
            ...
            ...
            {
                "somefield":"somedata"
            },
            {
                "somefield":"somedata"
            }
        ]
    }
}
              

Save a field to an object specifying the type of field (supported types are: int, float, bool and string). If omitted, the default type "string" is used:

              # CREATE A BLOCK FROM HTML STRING
- register_set: '<ul>
                    <li class="float">12.85</li>
                    <li class="int">158</li>
                    <li class="bool_false">false</li>
                    <li class="bool_true">true</li>
                </ul>'
- to_block
- find:
    path: ul
    do:
    # CREATE OBJECT WITH NAME `item`
    - object_new: item
    - find:
        path: .float
        do:
        # PARSE TEXT TO THE REGISTER
        - parse

        # SAVE VALUE OF THE REGISTER TO `somefloat` FIELD OF THE `item` OBJECT WITH SPECIFYING FIELD TYPE AS `float`
        - object_field_set:
            object: item
            field: somefloat
            type: float

    - find:
        path: .int
        do:
        # PARSE TEXT TO THE REGISTER
        - parse

        # SAVE VALUE OF THE REGISTER TO `someint` FIELD OF THE `item` OBJECT WITH SPECIFYING FIELD TYPE AS `int`
        - object_field_set:
            object: item
            field: someint
            type: int
    - find:
      path: .bool_false
      do:
      # PARSE TEXT TO THE REGISTER
      - parse
      # SAVE VALUE OF THE REGISTER TO `falsebool` FIELD OF THE `item` OBJECT WITH SPECIFYING FIELD TYPE AS `bool`
      - object_field_set:
          object: item
          field: falsebool
          type: bool

  - find:
      path: .bool_true
      do:
      # PARSE TEXT TO THE REGISTER
      - parse
      # SAVE VALUE OF THE REGISTER TO `truebool` FIELD OF THE `item` OBJECT WITH SPECIFYING FIELD TYPE AS `bool`
      - object_field_set:
          object: item
          field: truebool
          type: bool
      
    # SAVE OBJECT `item`
    - object_save:
        name: item
              
              {
    "item": {
        "somefloat": 12.85,
        "someint": 158,
        "falsebool": false,
        "truebool": true
    }
}
              

Setting field value with joining old values:

              # CREATE BLOCK FROM HTML STRING
- register_set: '<ul class="int">
                    <li>125</li>
                    <li>158</li>
                </ul>
                <ul class="float">
                    <li>12.5</li>
                    <li>15.8</li>
                </ul>
                <ul class="default">
                    <li>sometext</li>
                    <li>15.8</li>
                </ul>'
- to_block
- find:
    path: ul.int
    do:
    # CREATE OBJECT WITH NAME `item`
    - object_new: item
    - find:
        path: li
        do:
        # PARSE TEXT TO THE REGISTER
        - parse

        # SETTING OBJECT FIELD WITH SPECIFYING FIELD TYPE AS `int` AND OPERATION FOR JOINING
        - object_field_set:
            object: item
            field: multint
            type: int
            joinby: "*" # MULTIPLY

    - find:
        path: li
        do:
        # PARSE TEXT TO THE REGISTER
        - parse

        # SETTING OBJECT FIELD WITH SPECIFYING FIELD TYPE AS `int` AND OPERATION FOR JOINING
        - object_field_set:
            object: item
            field: divint
            type: int
            joinby: "/" # DIVIDE

    - find:
        path: li
        do:
        # PARSE TEXT TO THE REGISTER
        - parse

        # SETTING OBJECT FIELD WITH SPECIFYING FIELD TYPE AS `int` AND OPERATION FOR JOINING
        - object_field_set:
            object: item
            field: subint
            type: int
            joinby: "-" # SUBSTRACT

    - find:
        path: li
        do:
        # PARSE TEXT TO THE REGISTER
        - parse

        # SETTING OBJECT FIELD WITH SPECIFYING FIELD TYPE AS `int` AND OPERATION FOR JOINING
        - object_field_set:
            object: item
            field: sumint
            type: int
            joinby: "+" # ADD

    # SAVE OBJECT `item`
    - object_save:
        name: item
              
              {
    "item": {
        "divint": 0,
        "multint": 19750,
        "subint": -33,
        "sumint": 283
    }
}
              
              # CREATE BLOCK FROM HTML STRING
- register_set: '<ul class="int">
                    <li>125</li>
                    <li>158</li>
                </ul>
                <ul class="float">
                    <li>12.5</li>
                    <li>15.8</li>
                </ul>
                <ul class="default">
                    <li>sometext</li>
                    <li>15.8</li>
                </ul>'
- to_block
- find:
    path: ul.float
    do:
    # CREATE OBJECT WITH NAME `item`
    - object_new: item
    - find:
        path: li
        do:
        # PARSE TEXT TO THE REGISTER
        - parse

        # SETTING OBJECT FIELD WITH SPECIFYING FIELD TYPE AS `float` AND OPERATION FOR JOINING
        - object_field_set:
            object: item
            field: multfloat
            type: float
            joinby: "*" # УМНОЖЕНИЕ
            precision: 2

    - find:
        path: li
        do:
        # PARSE TEXT TO THE REGISTER
        - parse

        # SETTING OBJECT FIELD WITH SPECIFYING FIELD TYPE AS `float` AND OPERATION FOR JOINING
        - object_field_set:
            object: item
            field: divfloat
            type: float
            joinby: "/" # ДЕЛЕНИЕ
            precision: 3

    - find:
        path: li
        do:
        # PARSE TEXT TO THE REGISTER
        - parse

        # SETTING OBJECT FIELD WITH SPECIFYING FIELD TYPE AS `float` AND OPERATION FOR JOINING
        - object_field_set:
            object: item
            field: subfloat
            type: float
            joinby: "-" # ВЫЧИТАНИЕ
            precision: 4

    - find:
        path: li
        do:
        # PARSE TEXT TO THE REGISTER
        - parse

        # SETTING OBJECT FIELD WITH SPECIFYING FIELD TYPE AS `float` AND OPERATION FOR JOINING
        - object_field_set:
            object: item
            field: sumfloat
            type: float
            joinby: "+" # СЛОЖЕНИЕ
            precision: 5

    # SAVE OBJECT `item`
    - object_save:
        name: item
              
              {
    "item": {
        "divfloat": 0.7911392405063291,
        "multfloat": 197.5,
        "subfloat": -3.3000000000000007,
        "sumfloat": 28.3
    }
}
              
              # CREATE BLOCK FROM HTML STRING
- register_set: '<ul class="int">
                    <li>125</li>
                    <li>158</li>
                </ul>
                <ul class="float">
                    <li>12.5</li>
                    <li>15.8</li>
                </ul>
                <ul class="default">
                    <li>sometext</li>
                    <li>15.8</li>
                </ul>'
- to_block
- find:
    path: ul.default
    do:
    # CREATE OBJECT WITH NAME `item`
    - object_new: item
    - find:
        path: li
        do:
        # PARSE TEXT TO THE REGISTER
        - parse

        # SETTING OBJECT FIELD WITH SPECIFYING DELIMITER FOR JOINING
        - object_field_set:
            object: item
            field: default
            joinby: "*" # DELIMITER

    # SAVE OBJECT `item`
    - object_save:
        name: item
              
              {
    "item": {
        "default": "sometext*15.8"
    }
}
              

Commands object_save and object_check supports special update mode. By default its turned off and to enable it you need to pass parameter mode with value update as its shown in the sample below. Along with mode parameter, you also need to pass primary_key parameter that should has as value the name of the field in your dataset which you want to use as primary (unique) key for records. In this mode, system gets checksum of last saved record with given primary key and compare it with checksum of new record. If checksums matches, record will not be saved as counted as duplicate. In other case, the new record will be saved and checksum in the cache will be updated to the most recent.

              # SAVE OBJECT `item`
- object_save:
    name: item
    mode: update
    primary_key: title
              

In the next chapter, we'll review the methods for working with the DOM structure.