Athleta is a subsidiary of Gap Corporation, which develops, manufactures and sells women’s and children’s clothes for sports and active life. This free web scraper collects data about all the products presented in the athleta.gap.com online store.
Approx number of goods: 20000
Approx number of page requests: 20000
Recommended subscription plan: X-Small
PLEASE NOTE! The number of requests can exceed the number of products, because data about variations, images, etc. can be scraped from other resources and will require additional requests. Also part of the product data can be delivered using XHR requests, which also increases the total number of required page requests.
How to use the web scraper to extract data about goods and prices from athleta.gap.com
To use the web scraper for !!!SPECIFIC!!! store’s website, you must have an account with our Diggernaut service. You can just simply follow this comprehensive guide:
- Go through this registration link to open free account with Diggernaut
- After registering and confirming the email address, you will need to log in to your account
- Create a project with any name and description, if you do not know how to do it, please refer to our documentation
- Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our documentation
- Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our documentation
- Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our documentation
- Run your digger and wait until the completion, if you do not know how to do it, please refer to our documentation
- Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our documentation
You can also setup a schedule for running your scraper and collect data regularly.
Scraping configuration for the digger
---
config:
debug: 2
agent: Firefox
do:
- walk:
to: http://athleta.gap.com/
do:
- find:
path: div.topnav_atol>ul>li>a
do:
- parse:
attr: href
- space_dedupe
- trim
- if:
match: \w+
do:
- link_add:
pool: main
- walk:
to: links
pool: main
do:
- find:
path: .sidebar-navigation
do:
- node_remove: h1
- sequence:
header: h2
selector: h2,div
- find:
path: div.sequence
do:
- variable_clear: catname
- find:
path: h2
do:
- parse
- space_dedupe
- trim
- variable_set: catname
- find:
path: .sidebar-navigation--category--link
do:
- pool_clear: pager
- parse:
attr: href
filter:
- cid=(.+)
- variable_set: cid
- register_set: http://athleta.gap.com/resources/productSearch/v1/search?cid=<%cid%>&locale=en_US&isFacetsEnabled=true
- link_add:
pool: pager
- walk:
to: links
pool: pager
do:
- variable_clear: ptot
- find:
path: pageNumberTotal
do:
- parse
- if:
match: (^\s*[0-1]\s*$)
else:
- variable_set: ptot
- find:
path: pageNumberRequested
do:
- parse
- if:
match: (^\s*0\s*$)
do:
- variable_get: ptot
- if:
match: (\d)
do:
- if:
gt: 1
do:
- eval:
routine: js
body: '(function (){var r = ""; for (var i = 1; i<<%ptot%>; i++){r += ""+i+""}; return r;})();'
- to_block
- find:
path: div
do:
- parse
- variable_set: pageid
- register_set: http://athleta.gap.com/resources/productSearch/v1/search?cid=<%cid%>&locale=en_US&pageId=<%pageid%>&isFacetsEnabled=true
- link_add:
pool: pager
- find:
path: productCategory > name
do:
- parse
- space_dedupe
- trim
- variable_set: catname2
- find:
path: productCategory > childProducts
do:
- find:
path: parentBusinessCatalogItemId
do:
- parse
- if:
match: (\S)
do:
- variable_set: pid
- register_set: http://athleta.gap.com/browse/product.do?pid=<%pid%>&cid=<%cid%>
- walk:
to: value
do:
- variable_clear: isP
- find:
path: script:matches(gap.pageProductData\s*=\s*\{)
do:
- variable_set:
field: isP
value: 1
- find:
path: html
do:
- variable_get: isP
- if:
match: (1)
do:
- object_new: product
- find:
path: head
do:
- eval:
routine: js
body: '(function (){var d = new Date(); return d.toISOString()})();'
- object_field_set:
object: product
field: date
- static_get: url
- object_field_set:
object: product
field: url
- register_set: 'GAP'
- object_field_set:
object: product
field: brand
- find:
path: meta[name="keywords"]
do:
- parse:
attr: content
- object_field_set:
object: product
field: description
- find:
path: script:matches(gap.pageProductData\s*=\s*\{)
do:
- parse:
filter:
- gap\.currentBrand\s*=\s*\"(.+)\"\;
- if:
match: (\S)
do:
- object_field_set:
object: product
field: brand
- parse
- normalize:
routine: replace_substring
args:
var\s*gap\s*=\s*window\.gap\s*\|\|\s*\{\s*\}\;: ''
gap\.pageProductData\s*=\s*: ''
\s*;\s*gap.currentBrand\s*=\s*.*\;: ''
- normalize:
routine: json2xml
- to_block
- find:
path: productimages
do:
- parse:
format: html
- variable_set: imghtml
- find:
path: variants > productstylecolors > productstylecolorimages
do:
- parse
- normalize:
routine: lower
- variable_set: imgpath
- register_set: <%imghtml%>
- to_block
- find:
path: safe_<%imgpath%>
do:
- variable_clear: getit
- find:
path: xlarge
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- variable_get: getit
- if:
match: (1)
else:
- find:
path: large
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- variable_get: getit
- if:
match: (1)
else:
- find:
path: medium
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- variable_get: getit
- if:
match: (1)
else:
- find:
path: small
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- find:
path: body_safe > variants > productstylecolors > colorname
do:
- parse
- if:
match: (\S)
do:
- object_field_set:
object: product
field: variations
joinby: "|"
- find:
path: body_safe > name
do:
- parse
- if:
match: (\S)
do:
- object_field_set:
object: product
field: name
- find:
path: body_safe > currentmaxprice, body_safe > currentminprice
do:
- parse:
filter:
- (\d+\.?\d*)
- if:
match: (\d+)
do:
- object_field_set:
object: product
field: price
type: float
- register_set: USD
- object_field_set:
object: product
field: currency
- find:
path: styleid
slice: 0
do:
- parse
- object_field_set:
object: product
field: sku
- find:
path: body
do:
- find:
path: '.selected'
do:
- parse
- space_dedupe
- trim
- object_field_set:
object: product
field: category
joinby: "|"
- variable_get: catname
- if:
match: (\S)
do:
- object_field_set:
object: product
field: category
joinby: "|"
- variable_get: catname2
- if:
match: (\S)
do:
- object_field_set:
object: product
field: category
joinby: "|"
- object_save:
name: product
- find:
path: productCategory > childCategories
do:
- variable_clear: catname3
- find:
path: name
slice: 0
do:
- parse
- space_dedupe
- trim
- variable_set: catname3
- find:
path: parentBusinessCatalogItemId
do:
- parse
- if:
match: (\S)
do:
- variable_set: pid
- register_set: http://athleta.gap.com/browse/product.do?pid=<%pid%>&cid=<%cid%>
- walk:
to: value
do:
- variable_clear: isP
- find:
path: script:matches(gap.pageProductData\s*=\s*\{)
do:
- variable_set:
field: isP
value: 1
- find:
path: html
do:
- variable_get: isP
- if:
match: (1)
do:
- object_new: product
- find:
path: head
do:
- eval:
routine: js
body: '(function (){var d = new Date(); return d.toISOString()})();'
- object_field_set:
object: product
field: date
- static_get: url
- object_field_set:
object: product
field: url
- register_set: 'GAP'
- object_field_set:
object: product
field: brand
- find:
path: meta[name="keywords"]
do:
- parse:
attr: content
- object_field_set:
object: product
field: description
- find:
path: script:matches(gap.pageProductData\s*=\s*\{)
do:
- parse:
filter:
- gap\.currentBrand\s*=\s*\"(.+)\"\;
- if:
match: (\S)
do:
- object_field_set:
object: product
field: brand
- parse
- normalize:
routine: replace_substring
args:
var\s*gap\s*=\s*window\.gap\s*\|\|\s*\{\s*\}\;: ''
gap\.pageProductData\s*=\s*: ''
\s*;\s*gap.currentBrand\s*=\s*.*\;: ''
- normalize:
routine: json2xml
- to_block
- find:
path: productimages
do:
- parse:
format: html
- variable_set: imghtml
- find:
path: variants > productstylecolors > productstylecolorimages
do:
- parse
- normalize:
routine: lower
- variable_set: imgpath
- register_set: <%imghtml%>
- to_block
- find:
path: safe_<%imgpath%>
do:
- variable_clear: getit
- find:
path: xlarge
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- variable_get: getit
- if:
match: (1)
else:
- find:
path: large
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- variable_get: getit
- if:
match: (1)
else:
- find:
path: medium
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- variable_get: getit
- if:
match: (1)
else:
- find:
path: small
do:
- parse
- if:
match: (\S)
do:
- variable_set:
field: getit
value: 1
- normalize:
routine: url
- object_field_set:
object: product
field: images
joinby: "|"
- find:
path: body_safe > variants > productstylecolors > colorname
do:
- parse
- if:
match: (\S)
do:
- object_field_set:
object: product
field: variations
joinby: "|"
- find:
path: body_safe > name
do:
- parse
- if:
match: (\S)
do:
- object_field_set:
object: product
field: name
- find:
path: body_safe > currentmaxprice, body_safe > currentminprice
do:
- parse:
filter:
- (\d+\.?\d*)
- if:
match: (\d+)
do:
- object_field_set:
object: product
field: price
type: float
- register_set: USD
- object_field_set:
object: product
field: currency
- find:
path: styleid
slice: 0
do:
- parse
- object_field_set:
object: product
field: sku
- find:
path: body
do:
- find:
path: '.selected'
do:
- parse
- space_dedupe
- trim
- object_field_set:
object: product
field: category
joinby: "|"
- variable_get: catname
- if:
match: (\S)
do:
- object_field_set:
object: product
field: category
joinby: "|"
- variable_get: catname2
- if:
match: (\S)
do:
- object_field_set:
object: product
field: category
joinby: "|"
- variable_get: catname3
- if:
match: (\S)
do:
- object_field_set:
object: product
field: category
joinby: "|"
- object_save:
name: product
Sample of scraped data
Below is a sample of a dataset with several products in JSON format (so you can easily review it and see data structure). The dataset can be downloaded as CSV, XLSX, XML, or any other text format using the templates.
[{
"product": {
"brand": "athleta",
"category": "New Arrivals|CATEGORIES|All New Arrivals",
"currency": "USD",
"date": "2017-12-06T19:35:53.451Z",
"description": "Easy Cozy Karma Jacket, New Arrivals, New Arrivals All New Arrivals, Athleta",
"images": "http://athleta.gap.com/webcontent/0014/295/432/cn14295432.jpg|http://athleta.gap.com/webcontent/0014/295/469/cn14295469.jpg|http://athleta.gap.com/webcontent/0014/295/464/cn14295464.jpg|http://athleta.gap.com/webcontent/0014/295/460/cn14295460.jpg|http://athleta.gap.com/webcontent/0014/509/387/cn14509387.jpg|http://athleta.gap.com/webcontent/0014/088/415/cn14088415.jpg|http://athleta.gap.com/webcontent/0014/295/469/cn14295469.jpg|http://athleta.gap.com/webcontent/0014/295/464/cn14295464.jpg|http://athleta.gap.com/webcontent/0014/295/460/cn14295460.jpg|http://athleta.gap.com/webcontent/0014/509/387/cn14509387.jpg|http://athleta.gap.com/webcontent/0014/130/170/cn14130170.jpg|http://athleta.gap.com/webcontent/0014/295/469/cn14295469.jpg|http://athleta.gap.com/webcontent/0014/295/464/cn14295464.jpg|http://athleta.gap.com/webcontent/0014/295/460/cn14295460.jpg|http://athleta.gap.com/webcontent/0014/509/387/cn14509387.jpg|http://athleta.gap.com/webcontent/0014/068/604/cn14068604.jpg|http://athleta.gap.com/webcontent/0014/295/469/cn14295469.jpg|http://athleta.gap.com/webcontent/0014/295/464/cn14295464.jpg|http://athleta.gap.com/webcontent/0014/295/460/cn14295460.jpg|http://athleta.gap.com/webcontent/0014/509/387/cn14509387.jpg|http://athleta.gap.com/webcontent/0014/295/432/cn14295432.jpg|http://athleta.gap.com/webcontent/0014/295/469/cn14295469.jpg|http://athleta.gap.com/webcontent/0014/295/464/cn14295464.jpg|http://athleta.gap.com/webcontent/0014/295/460/cn14295460.jpg|http://athleta.gap.com/webcontent/0014/509/387/cn14509387.jpg|http://athleta.gap.com/webcontent/0014/088/415/cn14088415.jpg|http://athleta.gap.com/webcontent/0014/295/469/cn14295469.jpg|http://athleta.gap.com/webcontent/0014/295/464/cn14295464.jpg|http://athleta.gap.com/webcontent/0014/295/460/cn14295460.jpg|http://athleta.gap.com/webcontent/0014/509/387/cn14509387.jpg|http://athleta.gap.com/webcontent/0014/130/170/cn14130170.jpg|http://athleta.gap.com/webcontent/0014/295/469/cn14295469.jpg|http://athleta.gap.com/webcontent/0014/295/464/cn14295464.jpg|http://athleta.gap.com/webcontent/0014/295/460/cn14295460.jpg|http://athleta.gap.com/webcontent/0014/509/387/cn14509387.jpg|http://athleta.gap.com/webcontent/0014/068/604/cn14068604.jpg|http://athleta.gap.com/webcontent/0014/295/469/cn14295469.jpg|http://athleta.gap.com/webcontent/0014/295/464/cn14295464.jpg|http://athleta.gap.com/webcontent/0014/295/460/cn14295460.jpg|http://athleta.gap.com/webcontent/0014/509/387/cn14509387.jpg|http://athleta.gap.com/webcontent/0014/295/432/cn14295432.jpg|http://athleta.gap.com/webcontent/0014/295/469/cn14295469.jpg|http://athleta.gap.com/webcontent/0014/295/464/cn14295464.jpg|http://athleta.gap.com/webcontent/0014/295/460/cn14295460.jpg|http://athleta.gap.com/webcontent/0014/509/387/cn14509387.jpg|http://athleta.gap.com/webcontent/0014/088/415/cn14088415.jpg|http://athleta.gap.com/webcontent/0014/295/469/cn14295469.jpg|http://athleta.gap.com/webcontent/0014/295/464/cn14295464.jpg|http://athleta.gap.com/webcontent/0014/295/460/cn14295460.jpg|http://athleta.gap.com/webcontent/0014/509/387/cn14509387.jpg|http://athleta.gap.com/webcontent/0014/130/170/cn14130170.jpg|http://athleta.gap.com/webcontent/0014/295/469/cn14295469.jpg|http://athleta.gap.com/webcontent/0014/295/464/cn14295464.jpg|http://athleta.gap.com/webcontent/0014/295/460/cn14295460.jpg|http://athleta.gap.com/webcontent/0014/509/387/cn14509387.jpg|http://athleta.gap.com/webcontent/0014/068/604/cn14068604.jpg|http://athleta.gap.com/webcontent/0014/295/469/cn14295469.jpg|http://athleta.gap.com/webcontent/0014/295/464/cn14295464.jpg|http://athleta.gap.com/webcontent/0014/295/460/cn14295460.jpg|http://athleta.gap.com/webcontent/0014/509/387/cn14509387.jpg",
"name": "Easy Cozy Karma Jacket",
"price": 118,
"sku": "158372",
"url": "http://athleta.gap.com/browse/product.do?pid=158372&cid=1006482",
"variations": "White Heather|Charcoal Heather|Cassis Heather|Black|White Heather|Charcoal Heather|Cassis Heather|Black|White Heather|Charcoal Heather|Cassis Heather|Black"
}
}
,{
"product": {
"brand": "athleta",
"category": "New Arrivals|CATEGORIES|All New Arrivals",
"currency": "USD",
"date": "2017-12-06T19:35:56.279Z",
"description": "Velour Hoodie, New Arrivals, New Arrivals All New Arrivals, Athleta",
"images": "http://athleta.gap.com/webcontent/0014/120/934/cn14120934.jpg|http://athleta.gap.com/webcontent/0014/121/309/cn14121309.jpg|http://athleta.gap.com/webcontent/0014/449/374/cn14449374.jpg",
"name": "Velour Hoodie",
"price": 118,
"sku": "158403",
"url": "http://athleta.gap.com/browse/product.do?pid=158403&cid=1006482",
"variations": "Charcoal Grey Heather"
}
}
,{
"product": {
"brand": "athleta",
"category": "New Arrivals|CATEGORIES|All New Arrivals",
"currency": "USD",
"date": "2017-12-06T19:35:57.948Z",
"description": "Luxe Stronger Hoodie, New Arrivals, New Arrivals All New Arrivals, Athleta",
"images": "http://athleta.gap.com/webcontent/0012/348/901/cn12348901.jpg|http://athleta.gap.com/webcontent/0012/302/897/cn12302897.jpg|http://athleta.gap.com/webcontent/0014/522/557/cn14522557.jpg|http://athleta.gap.com/webcontent/0012/204/913/cn12204913.jpg|http://athleta.gap.com/webcontent/0014/422/795/cn14422795.jpg|http://athleta.gap.com/webcontent/0014/422/782/cn14422782.jpg|http://athleta.gap.com/webcontent/0014/422/795/cn14422795.jpg|http://athleta.gap.com/webcontent/0012/302/897/cn12302897.jpg|http://athleta.gap.com/webcontent/0014/522/557/cn14522557.jpg|http://athleta.gap.com/webcontent/0012/204/913/cn12204913.jpg|http://athleta.gap.com/webcontent/0012/302/088/cn12302088.jpg|http://athleta.gap.com/webcontent/0012/302/897/cn12302897.jpg|http://athleta.gap.com/webcontent/0014/522/557/cn14522557.jpg|http://athleta.gap.com/webcontent/0012/204/913/cn12204913.jpg|http://athleta.gap.com/webcontent/0014/422/795/cn14422795.jpg|http://athleta.gap.com/webcontent/0012/348/901/cn12348901.jpg|http://athleta.gap.com/webcontent/0012/302/897/cn12302897.jpg|http://athleta.gap.com/webcontent/0014/522/557/cn14522557.jpg|http://athleta.gap.com/webcontent/0012/204/913/cn12204913.jpg|http://athleta.gap.com/webcontent/0014/422/795/cn14422795.jpg|http://athleta.gap.com/webcontent/0014/422/782/cn14422782.jpg|http://athleta.gap.com/webcontent/0014/422/795/cn14422795.jpg|http://athleta.gap.com/webcontent/0012/302/897/cn12302897.jpg|http://athleta.gap.com/webcontent/0014/522/557/cn14522557.jpg|http://athleta.gap.com/webcontent/0012/204/913/cn12204913.jpg|http://athleta.gap.com/webcontent/0012/348/901/cn12348901.jpg|http://athleta.gap.com/webcontent/0012/302/897/cn12302897.jpg|http://athleta.gap.com/webcontent/0014/522/557/cn14522557.jpg|http://athleta.gap.com/webcontent/0012/204/913/cn12204913.jpg|http://athleta.gap.com/webcontent/0014/422/795/cn14422795.jpg|http://athleta.gap.com/webcontent/0014/422/782/cn14422782.jpg|http://athleta.gap.com/webcontent/0014/422/795/cn14422795.jpg|http://athleta.gap.com/webcontent/0012/302/897/cn12302897.jpg|http://athleta.gap.com/webcontent/0014/522/557/cn14522557.jpg|http://athleta.gap.com/webcontent/0012/204/913/cn12204913.jpg",
"name": "Luxe Stronger Hoodie",
"price": 148,
"sku": "456789",
"url": "http://athleta.gap.com/browse/product.do?pid=456789&cid=1006482",
"variations": "Oatmeal Heather|Black Multi|Black|Oatmeal Heather|Black Multi|Oatmeal Heather|Black Multi"
}
}
,{
"product": {
"brand": "athleta",
"category": "New Arrivals|CATEGORIES|All New Arrivals",
"currency": "USD",
"date": "2017-12-06T19:36:03.291Z",
"description": "Stronger Long Hoodie, New Arrivals, New Arrivals All New Arrivals, Athleta",
"images": "http://athleta.gap.com/webcontent/0014/365/879/cn14365879.jpg|http://athleta.gap.com/webcontent/0014/365/874/cn14365874.jpg|http://athleta.gap.com/webcontent/0014/330/558/cn14330558.jpg|http://athleta.gap.com/webcontent/0014/365/856/cn14365856.jpg|http://athleta.gap.com/webcontent/0014/365/874/cn14365874.jpg|http://athleta.gap.com/webcontent/0014/330/558/cn14330558.jpg",
"name": "Stronger Long Hoodie",
"price": 138,
"sku": "158356",
"url": "http://athleta.gap.com/browse/product.do?pid=158356&cid=1006482",
"variations": "Light Grey Multi|Black Multi"
}
}]