You’ve probably seen galleries with user-generated content in various online stores that sell clothing, shoes, home products, etc. They are very helpful in selling a product because they allow a potential buyer to see how a particular product sits on a real person rather than on a model and allows the buyer to make a more conscious decision. You probably would like to extract user-generated content but don’t know how to do it with a limited budget.
Technical implementation of such a mechanism is following: the service aggregator collects user-generated images on the Internet, for example, in Instagram, determines the brand and model of the item or items shown in the photo, and delivers it in a particular feed. It may be costly to connect to such service for a small venue, so mainly large mono and multi-brand online stores can afford it.
The second option is to create such an aggregation service yourself, but this is a very time-consuming, long-term and expensive process, much more expensive than connecting to a similar service-aggregator for a single online store.
However, there is a budget option. Many brands and well-known online stores are already customers of such aggregators and have their feeds with user-generated photos and information about corresponding products. Therefore, if you sell products of similar brands, you can get information from these feeds, process the received data and use them in your online store to sell products of this brand.
You can say that coding scrapers for every site and brand if there are hundreds of them, is quite tedious and takes much time. However, you do not need to scrape the websites. You only need a feed with user content. Moreover, such feeds are provided by a limited set of aggregators. Therefore technically, you need to have only one scraper, with standard logic and use different URLs or parameters to pick up feeds for different stores and brands.
One such service is Like2Buy, a service provided by Curalate company. They serve more than 6000 online stores and brands. All feeds can be easily googled by typing “like2buy.curalate.com” in the search box and clicking on the link “show all results.” Also, just for your reference, we’ll list below a few stores and their IDs for use with our free web scraper, which we’ll share in this article.
This data can be useful not only for online stores but also for companies conducting research for brands, as well as companies working in the machine learning area.
So you need a free account with our Diggernaut service. You can follow this comprehensive guide:
- Go through this registration link to open free account with Diggernaut
- After registering and confirming the email address, you will need to log in to your account
- Create a project with any name and description, if you do not know how to do it, please refer to our documentation
- Switch to the created project and create a digger with any name, if you do not know how to do it, please refer to our documentation
- Copy the following digger configuration to the clipboard and paste it into the digger you created, if you do not know how to do it, refer to our documentation
- In the iterator configuration inside the digger config, enter one or more (comma separated) store IDs from the table below.
- Switch the mode of the digger from Debug to Active, if you do not know how to do it, please refer to our documentation
- Run your digger and wait until the completion, if you do not know how to do it, please refer to our documentation
- Download the scraped dataset in the format you need, if you do not know how to do it, please refer to our documentation
You can also set up a schedule for running your scraper and collect data regularly.
The scraper configuration is shown below. You can copy it to any of your diggers, put the ID from the store table (or a few at a time) and start your digger.
---
config:
debug: 2
agent: Firefox
iterator:
type: csv
name: shop
value: # Set here single store ID or few store IDs separated by comma
do:
- walk:
to: https://like2buy.curalate.com/<%shop%>/
do:
- pool_clear: sub
- find:
path: html
do:
- eval:
routine: js
body: '(function() {return "xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx".replace(/[xy]/g, function(e) {var t = 16 * Math.random() | 0, r = "x" === e ? t : 3 & t | 8; return r.toString(16)})})();'
- variable_set: rid
- register_set: http://api.curalate.com/v1/like2buy/<%shop%>/products.json?rid=<%rid%>
- link_add:
pool: sub
- walk:
to: links
pool: sub
do:
- find:
path: qbookmark
do:
- parse
- register_set: http://api.curalate.com/v1/like2buy/<%shop%>/products.json?qBookmark=<%register%>&rid=<%rid%>
- link_add:
pool: sub
- find:
path: items
do:
- object_new: item
- argument_get: shop
- object_field_set:
object: item
field: shop
- find:
path: largephotourl
slice: 0
do:
- parse
- normalize:
routine: url
- object_field_set:
object: item
field: image
- find:
path: products
do:
- parse
- object_new: product
- find:
path: destinationurl
do:
- parse
- object_field_set:
object: product
field: url
- find:
path: name
do:
- parse
- space_dedupe
- trim
- object_field_set:
object: product
field: name
- object_save:
name: product
to: item
- object_save:
name: item
As a result, you get a dataset with the following structure:
[{
"item": {
"image": "https://d28m5bx785ox17.cloudfront.net/v1/img/PPYWso07RgBC_UHzxcrgAO_Wk0twhD3XHvviHlJ7-ZY=/d/l",
"product": [
{
"name": "Marco Faux-Leather Moto Jacket",
"url": "https://shop.guess.com/en/catalog/view/women/jackets-and-outerwear/view-all/marco-faux-leather-moto-jacket/w74l10r72y1?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&crl8_id=670ce9b5-3465-4372-b0fe-df6a0c71ed4b"
},
{
"name": "CAN: Marco Faux-Leather Moto Jacket",
"url": "https://www.guess.ca/en/catalog/view/women/jackets-and-outerwear/view-all/marco-faux-leather-moto-jacket/w74l10r72y1?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=w74l10r72y1&crl8_id=670ce9b5-3465-4372-b0fe-df6a0c71ed4b"
}
],
"shop": "guess"
}
}
,{
"item": {
"image": "https://d28m5bx785ox17.cloudfront.net/v1/img/Wn0kXxTmnzmAy6hTP3_bynEdtv9Ph7Y0M9FOVyLen00=/d/l",
"product": [
{
"name": "US: Silver-Tone Charm Bracelet Box Set",
"url": "https://shop.guess.com/en/catalog/view/434044G21?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=434044G21&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d"
},
{
"name": "US: Boxed Rose Gold-Tone Charm Bracelet",
"url": "https://shop.guess.com/en/catalog/view/434042G21?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=434042G21&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d"
},
{
"name": "US: GUESS 1981 Eau De Toilette, 3.4 oz.",
"url": "https://shop.guess.com/en/catalog/view/accessories/women/fragrance/guess-1981-eau-de-toilette-3-4-oz/32667861000?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=32667861000&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d"
},
{
"name": "US: Metallic Mini Backpack Keychain",
"url": "https://shop.guess.com/en/catalog/view/17GUP248?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=17GUP248&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d"
},
{
"name": "CAN: Boxed Gold-Tone Stud Earring Set",
"url": "https://guess.ca/en/Catalog/View/434046GC21/?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=434046GC21&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d#434046GC21"
},
{
"name": "CAN: GUESS 1981 Eau De Toilette, 3.4 oz.",
"url": "https://www.guess.ca/en/catalog/view/accessories/women/fragrance/guess-1981-eau-de-toilette-3-4-oz/32667861000?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=32667861000&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d"
},
{
"name": "CAN: Metallic Mini Backpack Keychain",
"url": "https://www.guess.ca/en/Catalog/View/17GUP248/?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=17GUP248&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d#17GUP248"
},
{
"name": "EU: Holiday Delivery",
"url": "https://www.guess.eu/en/CustomerCare/guaranteed-delivery/?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&crl8_id=a0dd62bd-9024-4224-bcdf-323d6e6e601d"
}
],
"shop": "guess"
}
}
,{
"item": {
"image": "https://d28m5bx785ox17.cloudfront.net/v1/img/oCSER6z1bD-KgCCgMcbH9Xk9OifDOvwuXgXNwAQmIeI=/d/l",
"product": [
{
"name": "CAN: Lily Faux-Fur Coat",
"url": "https://www.guess.ca/en/catalog/view/women/jackets-and-outerwear/faux-fur/lily-faux-fur-coat/w74l14w9t70?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&utm_content=w74l14w9t70&crl8_id=9a9df613-d531-4252-a63e-2566d16dedd2"
},
{
"name": "EU: Floral Faux-Fur Coat",
"url": "https://www.guess.eu/en/catalog/view/women/apparel/coats-and-jackets/floral-faux-fur-coat/w74l14w9t70?color=dpid%3FCMP%3DSMC-INSTAGRAM-LIKETOBUY&crl8_id=9a9df613-d531-4252-a63e-2566d16dedd2"
}
],
"shop": "guess"
}
}]
As you can see, our basic scraper extracts only the URL to the image, the names, and URLs of the products. By changing the scraper logic, you can extract other data available in the feed, as well as perform any manipulations with the extracted data, forming your dataset precisely as you need it. Below is the structure of one source feed object, so you can better navigate to compose CSS selectors to containers with data:
<items>
<candelete>false</candelete>
<caption_safe>Introducing the next generation of #GUESSConnect Smartwatches ⌚️? Powered by Android Wear (and compatible
with iOS 9+), our fav feature is swiping through the hundreds of watch faces to pair perfectly
with whatever you're wearing + the Google Assistant! ➡️ Click the link in our bio to
discover more #GUESSWatches #LoveGUESS</caption_safe>
<commentcount>182</commentcount>
<isfeatured>true</isfeatured>
<largephotourl>https://d28m5bx785ox17.cloudfront.net/v1/img/9w5j3aXjw6pKZUvbDwEAEB9wXM8RqUpsxHL3wHF0i5A=/d/l</largephotourl>
<largevideourl>https://scontent.cdninstagram.com/vp/d9e6c226c2cadbf3bc45167c1f24fff9/5A3D679E/t50.2886-16/24383086_151063558867804_2812871925800370176_n.mp4</largevideourl>
<likecount>13306</likecount>
<mediumphotourl>https://d28m5bx785ox17.cloudfront.net/v1/img/9w5j3aXjw6pKZUvbDwEAEB9wXM8RqUpsxHL3wHF0i5A=/d/m</mediumphotourl>
<mediumvideourl>https://scontent.cdninstagram.com/vp/d9e6c226c2cadbf3bc45167c1f24fff9/5A3D679E/t50.2886-16/24383086_151063558867804_2812871925800370176_n.mp4</mediumvideourl>
<networkidentifier>f1ffd186-3ee1-42ec-b463-135b26139ab7</networkidentifier>
<networkurl>https://www.instagram.com/p/BcNdy1oluYh/</networkurl>
<originalfileidandsource>
<fileid>9w5j3aXjw6pKZUvbDwEAEB9wXM8RqUpsxHL3wHF0i5A=</fileid>
<osource>instagram</osource>
</originalfileidandsource>
<products>
<croppedthumbnailimageurl>https://d28m5bx785ox17.cloudfront.net/v1/img/dXSPdD25vkxoHZMw7xCH21i3Xm5Bda6gi5-MMFGEBNI=/sc/350x350</croppedthumbnailimageurl>
<destinationurl>https://shop.guess.com/en/catalog/browse/lifestyle/guess-connect-touch/?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&crl8_id=f1ffd186-3ee1-42ec-b463-135b26139ab7</destinationurl>
<fileid>dXSPdD25vkxoHZMw7xCH21i3Xm5Bda6gi5-MMFGEBNI=</fileid>
<id>0</id>
<imageurl>https://d28m5bx785ox17.cloudfront.net/v1/img/dXSPdD25vkxoHZMw7xCH21i3Xm5Bda6gi5-MMFGEBNI=/d/l</imageurl>
<name>US: GUESS CONNECT</name>
<position>1</position>
<productstyleid>u_2765_00c88d1540a358f1f4cadff87341b5122c7ac0900f11568a7e434923c71aa2f4</productstyleid>
<sourceimageurl>https://d28m5bx785ox17.cloudfront.net/v1/img/dXSPdD25vkxoHZMw7xCH21i3Xm5Bda6gi5-MMFGEBNI=</sourceimageurl>
</products>
<products>
<croppedthumbnailimageurl>https://d28m5bx785ox17.cloudfront.net/v1/img/j3aEX6aK9BPSma4E8OrRXxT4JjCrcJn7zmhJ_rEFcPA=/sc/350x350</croppedthumbnailimageurl>
<destinationurl>https://shop.guess.ca/en/catalog/browse/lifestyle/guess-connect-touch/?utm_source=instagram&utm_medium=social&utm_campaign=like2buy&crl8_id=f1ffd186-3ee1-42ec-b463-135b26139ab7</destinationurl>
<fileid>j3aEX6aK9BPSma4E8OrRXxT4JjCrcJn7zmhJ_rEFcPA=</fileid>
<id>0</id>
<imageurl>https://d28m5bx785ox17.cloudfront.net/v1/img/j3aEX6aK9BPSma4E8OrRXxT4JjCrcJn7zmhJ_rEFcPA=/d/l</imageurl>
<name>CAN: GUESS CONNECT</name>
<position>2</position>
<productstyleid>u_2765_8a7e0d6ae928e7b95cd25781dadb917ab9d5d5826cb0dd14c7425e5c9c99c5e5</productstyleid>
<sourceimageurl>https://d28m5bx785ox17.cloudfront.net/v1/img/j3aEX6aK9BPSma4E8OrRXxT4JjCrcJn7zmhJ_rEFcPA=</sourceimageurl>
</products>
<storeid>938</storeid>
<timeposted>1512240829000</timeposted>
</items>
Below, we list the stores and their IDs that use Like2Buy to deliver user-generated content. This list is incomplete, if you did not find the brand or store you are interested in, try to google, or ask us, we are always happy to help 🙂
Store or brand | ID | Store or brand | ID |
---|---|---|---|
Aldo | aldo_shoes | Ann Taylor | anntaylor |
Anthropologie | anthropologie | Bed, Bath and Beyond | bedbathandbeyond |
Brilliant Earth | brilliantearth | Cartier | cartier |
CB2 | cb2 | Champion | champion |
Chobani | chobani | Chumbak | chumbak |
Crate and Barrel | crateandbarrel | Creative Recreation | creativerecreation |
Covergirl | covergirl | David’s Bridal | davidsbridal |
Disney | disney | Dune London | dune_london |
Farfetch | farfetch | Fawn Shoppe | fawn_shoppe |
Forever21 | forever21,forever21men | Fossil | fossil |
Free People | freepeople | Gap | gap |
Garage Clothing | garageclothing | Guess | guess |
HauteLook | hautelook | Herbal Essenses | herbalessences |
Hot Topic | hottopic | House of Lashes | houseoflashes |
J. Crew | jcrew | Karl Lagerfeld | karllagerfeld |
Kohl’s | kohls | Laura Mercier | lauramercier |
Lilly Pulitzer | lillypulitzer | Louis Vuitton | louisvuitton |
lululemon | lululemon | Lulus | lulus |
Macy’s | macys | Misspap | misspap |
Neiman Marcus | neimanmarcus | Next Com AU | nextofficial_au |
Nordstrom | nordstrom | Paint Nite | paintnite |
PB Teen | pbteen | Pendleton | pendletonwm |
Pier 1 | pier1 | Pottery Barn | potterybarn |
Raymour & Flanigan | raymourflanigan | Schoolhouse Electric & Supply Co | schoolhouse |
Schutz | schutzshoes | Sephora | sephora |
Sperry | sperry | Target | target |
The Bump | thebump | The Company Store | thecompanystore |
Topman | topman | TopShop | topshop |
Victoria’s Secret | victoriassecret | Vineyard Vines | vineyardvines |
West Elm | westelm | Williams Sonoma | williamssonoma |
Windsor | windsorstore | Z Gallerie | zgallerie |
Zumiez | zumiez |