Scrape data from fashion stores with Diggernaut: Betsey Johnson

Betsey Johnson is an American fashion designer, easily recognizable in the fashion world with her quirky designs for women. In her online store you can find clothes, shoes, accessories and jewelry. We are publishing digger configurations that will help you to scrape data from fashion stores and this scraper extracts data from the betseyjohnson.com online store.

Approx number of goods: 1000
Approx number of page requests 7000
Recommended subscription plan: X-Small

PLEASE NOTE! The number of requests can exceed the number of products, because data about variations, images, etc. can be scraped from other resources and will require additional requests. Also part of the product data can be delivered using XHR requests, which also increases the total number of required page requests.

How to use the web scraper to extract data about goods and prices from betseyjohnson.com

To use the web scraper for Betsey Johnson store’s website, you must have an account with our Diggernaut service. You can just simply follow this comprehensive guide:

  1. Go through this registration link to open free account with Diggernaut
  2. After registering and confirming the email address, you will need to log in to your account
  3. Create a project with any name and description, if you do not know how to do it, please refer to our documentation
  4. Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our documentation
  5. Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our documentation
  6. Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our documentation
  7. Run your digger and wait until the completion, if you do not know how to do it, please refer to our documentation
  8. Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our documentation

You can also setup a schedule for running your scraper and collect data regularly.

Scraping configuration for the digger

---
config:
    debug: 2
    agent: Firefox
do:
- walk:
    to: http://www.betseyjohnson.com/
    do:
    - find: 
        path: .off > a 
        do: 
        - parse:
            attr: href
        - normalize:
            routine: replace_substring
            args:
                '\s*#': ''
        
        - normalize:
            routine: url
        - link_add:
            pool: main
- walk:
    to: links
    pool: main
    do:
    - find: 
        path: .categoryNav a
        do: 
        - parse:
            attr: href
        - walk:
            to: value
            do:
            - find: 
                path: .viewAll 
                do: 
                - parse:
                    attr: data-href
                - normalize:
                    routine: url
                - walk:
                    to: value
                    do:
                    - find: 
                        path: .mainImage 
                        do: 
                        - parse:
                            attr: href
                            filter: 
                                - (.+)\?
                                - (.+)
                        - normalize:
                            routine: url
                        - link_add:
                            pool: sub
            - find: 
                path: .mainImage 
                do: 
                - parse:
                    attr: href
                    filter: 
                        - (.+)\?
                        - (.+)
                - normalize:
                    routine: url
                - link_add:
                    pool: sub
- walk:
    to: links
    pool: sub
    do:
    - object_new: product
    - find: 
        in: doc
        path: head 
        do: 
        - eval:
            routine: js
            body: '(function (){var d = new Date(); return d.toISOString()})();'
        - object_field_set:
            object: product
            field: date
        - static_get: url
        - object_field_set:
            object: product
            field: url
    - find: 
        path: 'meta[itemprop="productID"]'
        do: 
        - parse:
            attr: content
        - space_dedupe
        - trim
        - object_field_set:
            object: product
            field: sku
    - find: 
        path: .breadcrumb a 
        do: 
        - parse
        - space_dedupe
        - trim
        - normalize:
            routine: lower
        - object_field_set:
            object: product
            field: category
            joinby: "|"
    - find: 
        path: 'select.COLOR_NAME > option'
        do: 
        - parse:
            attr: value
        - space_dedupe
        - trim
        - if:
            match: (\S)
            do:
            - object_field_set:
                object: product
                field: variations
                joinby: "|"
    - find: 
        path: .item-name 
        do: 
        - parse
        - space_dedupe
        - trim
        - object_field_set:
            object: product
            field: name
    - find: 
        path: .productPrice 
        do: 
        - parse:
            filter:
                - ^\s*\$\s*(\d+\.?\d*)
        - if:
            match: (\d+)
            do:
            - object_field_set:
                object: product
                field: price
                type: float
            - register_set: USD
            - object_field_set:
                object: product
                field: currency
    - find: 
        path: script:matches(variantMatrices)
        do: 
        - parse:
            filter: 
                - \/\/var\s*thumbsAndStuff\s*=\s*(.+);\s* 
        - normalize:
            routine: json2xml
        - to_block
        - find:
            path: alts
            do:
            - parse
            - walk:
                to: http://www.betseyjohnson.com/scene7_proxy.jsp?cb=&req=set,json,utf-8&id=
                do:
                - find:
                    path: body
                    do:
                    - parse
                    - normalize:
                        routine: replace_substring
                        args:
                            - \/\*jsonp\*\/\s*: ''
                            - s7jsonResponse\(: ''
                            - \}\}\}\,\"\"\)\;: '}}}'
                    - normalize:
                        routine: unescape_html
                    - normalize:
                        routine: json2xml
                    - to_block
                    - find: 
                        path: item > i > n 
                        do: 
                        - parse
                        - register_set: http://s7d9.scene7.com/is/image/?scl=1
                        - object_field_set:
                            object: product
                            field: images
                            joinby: "|"
    - find: 
        path: .detailsWrap > p
        do: 
        - parse
        - space_dedupe
        - trim
        - object_field_set:
            object: product
            field: description
    - find: 
        path: meta[itemprop="brand"] 
        do: 
        - parse:
            attr: content
        - space_dedupe
        - trim
        - object_field_set:
            object: product
            field: brand
    - object_save:
        name: product

Sample of scraped data

Below is a sample of a dataset with several products in JSON format (so you can easily review it and see data structure). The dataset can be downloaded as CSV, XLSX, XML, or any other text format using the templates.

[{
    "product": {
        "brand": "Betsey Johnson",
        "category": "accessories|all accessories|gifty goodies",
        "currency": "USD",
        "date": "2017-12-07T18:45:38.519Z",
        "description": "This pen is an extra special gift. Its candy-hued coloring gets extra girly flair with a soft, feathery accent.",
        "images": "http://s7d9.scene7.com/is/image/BetseyJohnson/HOLIDAY-GIVING-BUNNY-PEN_PINK?scl=1|http://s7d9.scene7.com/is/image/BetseyJohnson/HOLIDAY-GIVING-BUNNY-PEN_PINK_PACKAGED?scl=1",
        "name": "HOLIDAY GIVING BUNNY PEN",
        "price": 35,
        "sku": "247714",
        "url": "https://www.betseyjohnson.com/product/HOLIDAY-GIVING-BUNNY-PEN/247714.uts",
        "variations": "PINK"
    }
}
,{
    "product": {
        "brand": "Betsey Johnson",
        "category": "accessories|all accessories|gifty goodies",
        "currency": "USD",
        "date": "2017-12-07T18:45:40.633Z",
        "description": "This black cat pen will bring your favorite friend some good luck! Its colorful, oil slick surface is topped off by a feathery black cat embellishment.",
        "images": "http://s7d9.scene7.com/is/image/BetseyJohnson/HOLIDAY-GIVING-BLACK-CAT-PEN_BLACK?scl=1|http://s7d9.scene7.com/is/image/BetseyJohnson/HOLIDAY-GIVING-BLACK-CAT-PEN_BLACK_PACKAGED?scl=1",
        "name": "HOLIDAY GIVING BLACK CAT PEN",
        "price": 35,
        "sku": "247715",
        "url": "https://www.betseyjohnson.com/product/HOLIDAY-GIVING-BLACK-CAT-PEN/247715.uts",
        "variations": "BLACK"
    }
}
,{
    "product": {
        "brand": "Betsey Johnson",
        "category": "accessories|all accessories|gifty goodies",
        "currency": "USD",
        "date": "2017-12-07T18:45:41.840Z",
        "description": "This colorful parrot pen is the perfect way to add some color to the holiday season! Its pastel surface is accented by a vivid feather plump on an opal parrot.",
        "images": "http://s7d9.scene7.com/is/image/BetseyJohnson/HOLIDAY-GIVING-PARROT-PEN_MULTI?scl=1|http://s7d9.scene7.com/is/image/BetseyJohnson/HOLIDAY-GIVING-PARROT-PEN_MULTI_PACKAGING?scl=1",
        "name": "HOLIDAY GIVING PARROT PEN",
        "price": 35,
        "sku": "247716",
        "url": "https://www.betseyjohnson.com/product/HOLIDAY-GIVING-PARROT-PEN/247716.uts",
        "variations": "MULTI"
    }
}]
Mikhail Sisin: Co-founder of cloud-based web scraping and data extraction platform Diggernaut. Over 10 years of experience in data extraction, ETL, AI, and ML.
Related Post