In this article we are going to share information on how to scrape product and price information from Cartier website. Cartier – the famous French House that produce and sell jewelry and watches. It was founded in 1847 by Louis-Francois Cartier as a small workshop. Popularity came to him in 1867 after the World Exhibition in Paris, and since then the products of this brand are highly valued all over the world.
Approx number of goods: 2000
Approx number of page requests: 2000
Recommended subscription plan: Free
PLEASE NOTE! The number of requests can exceed the number of products, because data about variations, images, etc. can be scraped from other resources and will require additional requests. Also part of the product data can be delivered using XHR requests, which also increases the total number of required page requests.
How to use the web scraper to extract data about goods and prices from cartier.com
To use the web scraper for Cartier store website, you must have an account with our Diggernaut service. You can just simply follow this comprehensive guide:
- Go through this registration link to open free account with Diggernaut
- After registering and confirming the email address, you will need to log in to your account
- Create a project with any name and description, if you do not know how to do it, please refer to our documentation
- Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our documentation
- Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our documentation
- Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our documentation
- Run your digger and wait until the completion, if you do not know how to do it, please refer to our documentation
- Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our documentation
You can also setup a schedule for running your scraper and collect data regularly.
Scraping configuration for the digger
---
config:
debug: 2
agent: Firefox
do:
- walk:
to: http://www.cartier.com/en-us/collections.html
do:
- find:
path: ul.c-navigation__ulist a
do:
- parse:
attr: href
filter: ^([^\?]+)
- space_dedupe
- trim
- normalize:
routine: replace_matched
args:
javascript\:: ''
- if:
match: \s*[a-z]+
do:
- normalize:
routine: url
- link_add:
pool: catalog
- walk:
to: links
pool: catalog
do:
- sleep: 2
- find:
path: a.c-collection-link
do:
- parse:
attr: href
filter: ^([^\?]+)
- space_dedupe
- trim
- normalize:
routine: replace_matched
args:
javascript\:: ''
- if:
match: \s*[a-z]+
do:
- normalize:
routine: url
- link_add:
pool: catalog
- find:
path: a.prod-link
do:
- parse:
attr: href
filter: ^([^\?]+)
- space_dedupe
- trim
- normalize:
routine: replace_matched
args:
javascript\:: ''
- if:
match: \s*[a-z]+
do:
- normalize:
routine: url
- link_add:
pool: pages
- walk:
to: links
pool: pages
do:
- sleep: 2
- find:
path: div.main-container
do:
- variable_clear: desc
- object_new: product
- eval:
routine: js
body: '(function (){var d = new Date(); return d.toISOString()})();'
- object_field_set:
object: product
field: date
- static_get: url
- object_field_set:
object: product
field: url
- find:
path: span.c-pdp__cta-section--product-title
do:
- parse
- space_dedupe
- trim
- object_field_set:
object: product
field: name
- register_set: Cartier
- object_field_set:
object: product
field: brand
- find:
path: div.c-pdp__cta-section--product-ref-id>span
do:
- parse
- space_dedupe
- trim
- if:
match: \w+
do:
- variable_set: pid
- object_field_set:
object: product
field: sku
- find:
in: doc
path: meta[property="description"]
do:
- parse:
attr: content
- space_dedupe
- trim
- variable_set: desc
- find:
path: div.c-pdp__desc--content
do:
- parse
- space_dedupe
- trim
- variable_set: desc
- variable_get: desc
- object_field_set:
object: product
field: description
- find:
path: div.c-pdp__cta-section--product-price
do:
- find:
path: div.price
do:
- parse
- normalize:
routine: replace_matched
args:
\$: USD
- object_field_set:
object: product
field: currency
- parse:
filter:
- ([0-9\.\,]+)\s*-
- ([0-9\.\,]+)
- normalize:
routine: replace_substring
args:
\,: ''
- space_dedupe
- trim
- object_field_set:
object: product
type: float
field: price
- find:
path: ul.c-breadcrumb__list>li.c-breadcrumb__list-item>a
do:
- parse
- space_dedupe
- trim
- normalize:
routine: replace_matched
args:
Collections: ''
Categories: ''
- if:
match: \w+
do:
- object_field_set:
object: product
joinby: "|"
field: categories
- find:
path: div.c-pdp__image--wrapper
do:
- parse:
attr: data-src
- space_dedupe
- trim
- if:
match: \w+
do:
- normalize:
routine: url
- object_field_set:
object: product
joinby: "|"
field: images
- object_save:
name: product
Sample of scraped data
Below is a sample of a dataset with several products in JSON format (so you can easily review it and see data structure). The dataset can be downloaded as CSV, XLSX, XML, or any other text format using the templates.
[{
"product": {
"brand": "Cartier",
"categories": "Watches|Women's watches|Crash",
"currency": "USD",
"date": "2017-12-27T10:58:53.896Z",
"description": "Created in 1967 in *Swinging London*, the Crash watch expresses the sparkling, carefree spirit of an era that was all about complete freedom. The unlikely design of this watch could only have been conceived by Cartier, the great maker of shaped watches. Passionate and in touch with the spirit of the times, it sought to create a unique watch that would capture the joyous burst of rebellion and pop culture that shook up the conformism of the time.",
"images": "http://www.cartier.com/content/dam/rcq/car/59/37/24/593724.png|http://www.cartier.com/content/dam/rcq/car/59/29/55/592955.png",
"name": "Crash watch",
"price": 133000,
"sku": "HPI00654",
"url": "http://www.cartier.com/en-us/collections/watches/womens-watches/crash/hpi00654-crash-watch.html"
}
}
,{
"product": {
"brand": "Cartier",
"categories": "Watches|Gifts|Cartier Classics",
"currency": "USD",
"date": "2017-12-27T10:58:57.333Z",
"description": "Louis Cartier created the Santos watch in 1904, sealing his friendship with the aviator Alberto Santos Dumont. The famous aviator's wish was granted: he could check the time while flying. The dial's rounded angles and exposed screws made this an iconic timepiece. Cartier marked the centenary of the watch with the introduction of a new version.",
"images": "http://www.cartier.com/content/dam/rcq/car/58/46/40/584640.png|http://www.cartier.com/content/dam/rcq/car/15/35/39/2/1535392.png",
"name": "Santos 100 watch",
"price": 7000,
"sku": "W20073X8",
"url": "http://www.cartier.com/en-us/collections/watches/selections/cartier-classics/w20073x8-santos-100-watch.html"
}
}
,{
"product": {
"brand": "Cartier",
"categories": "Watches|Gifts|Cartier Classics",
"currency": "USD",
"date": "2017-12-27T10:59:00.589Z",
"description": "The Tank story takes an unexpected turn with the Tank Anglaise. This variation of the distinctive features of the Tank recreates the perfect alignment of the original thanks to a winding mechanism seamlessly incorporated into the case. Featuring a concentrated form and reinforced lines, the streamlined design reinterprets the original model and gives it a new dimension.",
"images": "http://www.cartier.com/content/dam/rcq/car/10/28/14/2/1028142.png",
"name": "Tank Anglaise watch",
"price": 9100,
"sku": "W5310047",
"url": "http://www.cartier.com/en-us/collections/watches/selections/cartier-classics/w5310047-tank-anglaise-watch.html"
}
}]