Evgeniy Solomanidin in Learning meta-languageWeb scraping

I want to have my data flow directly from digger to RDBMS

Do you want to save data to RDBMS?

It’s not that hard.

So today I’m going to show you how easy it can be done using Diggernaut’s Compile Service.

Stage One: preparation.

This is test configuration you can use as a sample for education, its fully working example and doesn’t break any TOS. So its safe to use it in any country.

---
do:
  - walk:
      to: 'https://www.diggernaut.com/sandbox/'
      do:
        - find:
            path: 'div:nth-child(6) .result-content'
            do:
              - find:
                  path: h3
                  do:
                    - parse
                    - variable_clear: name
                    - variable_set: name
              - find:
                  path: p
                  do:
                    - parse
                    - variable_clear: description
                    - variable_set: description
              - find:
                  path: 'tbody > tr'
                  do:
                    - parse
                    - object_new: item
                    - variable_get: name
                    - object_field_set:
                        object: item
                        field: name
                    - variable_get: description
                    - object_field_set:
                        object: item
                        field: descr
                    - object_new: date
                    - find:
                        path: '.col5 > .nowrap:nth-child(1)'
                        do:
                          - parse
                          - object_field_set:
                              object: date
                              field: date
                    - find:
                        path: '.col5 > .nowrap:nth-child(2)'
                        do:
                          - parse
                          - object_field_set:
                              object: date
                              field: date
                              joinby: " - "
                    - find:
                        path: .col6
                        do:
                          - parse
                          - normalize:
                              routine: replace_substring
                              args:
                                  \s*(\d+)A: 's'   
                          - object_field_set:
                              object: date
                              field: time
                    - object_save:
                        name: date
                        to: item
                    - object_save:
                        name: item

Resulting dataset has the following structure:

item: [
        {
            "name":"",
            "descr":"",
            "date": [
                        {
                            "date":",
                            "time":""
                        }
                        ...
                        ...
                    ]
            },
            ...
            ...
        ]

Let’s prepare our database. Digger can save data to MySQL, Microsoft SQL and PostgreSQL. So we need to create database and tables for our data.

MYSQL

    CREATE DATABASE `digger`;

    CREATE TABLE `items` (
        `id` INT(11) NOT NULL AUTO_INCREMENT,
        `name` VARCHAR(50) NULL DEFAULT NULL,
        `descr` TEXT NULL,
        PRIMARY KEY (`id`)
    )
    COLLATE='utf8_general_ci'
    ENGINE=InnoDB;

    CREATE TABLE `dates` (
        `id` INT(11) NOT NULL AUTO_INCREMENT,
        `item_id` INT(11) NOT NULL,
        `date` VARCHAR(50) NULL DEFAULT NULL,
        `time` VARCHAR(50) NULL DEFAULT NULL,
        INDEX `Index 1` (`id`),
        INDEX `FK__items` (`item_id`),
        CONSTRAINT `FK__items` FOREIGN KEY (`item_id`) REFERENCES `items` (`id`) ON UPDATE NO ACTION ON DELETE NO ACTION
        )
    COLLATE='utf8_general_ci'
    ENGINE=InnoDB;

MS SQL

    CREATE DATABASE `digger`;

    CREATE TABLE `items` (
        id INT NOT NULL IDENTITY(1,1) CONSTRAINT pk_items_pid PRIMARY KEY,
        `name` VARCHAR(50),
        `descr` TEXT 
    );

    CREATE TABLE `dates` (
        id INT NOT NULL IDENTITY(1,1) CONSTRAINT pk_dates_pid PRIMARY KEY,
        item_id int CONSTRAINT fk_item_id FOREIGN KEY(item_id) REFERENCES items(id),
        date TEXT,
        time TEXT 
    );

PostgreSQL

    CREATE DATABASE digger;

    CREATE TABLE items (
        id SERIAL PRIMARY KEY,
        name varchar(50),
        descr text,
    );

    CREATE TABLE dates (
        id SERIAL PRIMARY KEY,
        date varchar(50),
        time varchar(50),
        item_id integer REFERENCES items (id),
    );

Now let’s compile our digger.

Download it and run. Once the job is finished, you can see data in your DB. So now you can have your data flow directly from digger to your database without a hassle.

Next Read: Diggernaut can run JS routines »

Evgeniy Solomanidin:

New in Diggernaut: expanded functionality to work with Selenium, new static variables, and proxy management
For paid subscribers, it became possible to set the proxy type for use in diggers.…
How to avoid getting detected during web scraping
It becomes fairly easy to gather information from the Internet with all the advanced scrapers.…
Learning how to scrape the data from eBay
eBay is a very famous and popular marketplace. Very often, it is used by small…