20% off on all annual plans. Use 20blackfriday code at checkout.

A Dive Into Ecommerce and Web Scraping with Pierluigi Vinciguerra [DataTalk #1]

Expert interview with Pierluigi Vinciguerra about the future of ecommerce and web scraping

For our first entry of this interview format we’ll call DataTalk, we’ve invited Pierluigi Vinciguerra, Co-Founder and CTO at Re Analytics and CTO at databoutigue.com, to talk about his experience in the ecommerce industry.

We hope you enjoy it!


LEO: Thanks a lot for your time, Pier! To get started, why don’t you tell us more about you and your business?

PIER: First of all, I wanted to thank you for inviting me to this space. My name is Pierluigi, and I’m the CTO of Re Analytics and Databoutique.com.

I started working in Accenture, where I met Andrea, the other co-founder of databoutique, in 2009.

We worked on managing the data infrastructure of banks and insurance, and then, after a showcase of a geospatial database made by a vendor, we started thinking about which data could be a perfect fit for it.

We realized that, at that time, there was a huge potential in the stream of data flowing in the real estate websites, which no one was collecting, and started to do so with some basic scrapers.

Soon, we had the biggest database of real estate classifieds in Italy, even bigger than the ones used by official data providers for the government. Unluckily, it didn’t translate to a product worth selling, but we understood the potential of web scraping as a data source.

After the first experiments in web scraping, we refined our technique, and, in 2015, we pivoted to Ecommerce scraping for fashion, which was a great choice for different reasons:

  • We’re Italians and so close to most of the fashion brands and their network, typically, a fashion brand has one website which serves different countries, each with different prices.
  • Also, most of these brands are listed on stock exchanges, so they’re also interesting for the financial world and not only for fashion brands themselves.

So we started building some analysis on extracted data and when we had the first proof of market fit, both Andrea and I quit our day job to found Re Analytics.

It was, and still is, a web data extraction company that adds some industry expertise on top of the data, focused on fashion e-commerces, and actually, we’re selling directly or indirectly our analysis and services to most of the major fashion brands in Italy and some investors in this field.

The company works, but, thanks to the years in the field, we found some limits, common to all web scraping factories: it’s hard to serve every request from your customers or potential ones.

To address all [theses] issue[s], we, always I and Andrea, created databoutique, a marketplace specifically designed for public, legal, and quality-assured web-scraped data.

LEO: I guess it’s accurate to say you’ve seen many trends come and go by now, so I’m wondering, how would you say data is changing the e-commerce industry? – hopefully, I’m expressing myself correctly.

PIER: Ecommerce is such a huge industry that I cannot say how’s changing from a global perspective, but I can say what I’ve seen from my perspective in Europe and mainly luxury goods websites.

During the past 10 years, we can say the ecommerce for this industry went from insignificant (most of these high-end luxury brands didn’t have any e-commerce) to a must-have, which drives a significant part of their revenues.

But having an e-commerce website poses some challenges since you’re basically exposing your prices to the public, a thing that the luxury industry was not used to doing.

You’re entering competition in the same arena (the web) with your own wholesale customers (multi-brand stores that buy from you to resell to their customers) and with your own physical stores.

This means your prices should be coherent between all the channels, and while direct store brands have control of what happens, this is not true for wholesalers. So here’s the request for web data, to understand better what’s happening in the market.

LEO: In that sense, when would be a good time for a business to start collecting data?

PIER: That’s a good question, and the answer is not easy because, generally speaking, it’s difficult to understand the return on investment for every data project.

I mean, you’re not buying a new solution that allows you to save X percent of the money on your cloud bills, but when you’re buying data (or putting in place web scraping operations), you’re basically buying the first ingredient of a recipe. Then you need the ability of the chef to carve out something precious and a customer that finds the dish delicious.

Bringing back to your question, of course, the more data a company has, the better it is, so the data science team could have more chances to find something meaningful, but I think that first of all, a company needs to set up goals on specific KPIs and try to understand which data sources could be useful to understand better the phenomenon observed and how to improve that KPI. As soon as this is defined, you should acquire the data needed and start baking some solutions.

LEO: Based on your experience, do you consider businesses (today) are using the full potential of alternative data?

PIER: The term alternative data is related to data sources that could help describe the financial KPIs of a company and don’t come from the company itself, and this kind of data is extremely interesting for hedge funds and investors in general.

The most famous types of alternative data are credit card transactions: by knowing where and what people bought, you can have some math models to predict how a single publicly traded company is performing compared to the last year.

You might not have the right revenue number since there’s not a single data provider for all the transactions that are happening in the world, but you can get close to it.

Web scraping, of course, could be used to generate alternative data: many websites expose the inventory levels, so you can estimate the unit sold from there, you can track the online reputation and reviews of a company, the happiness of their customers, and many other use cases.

Having only one dataset may not give you the full picture, but by adding different datasets on the same company, you can have a clearer view.

Again, the adoption of web-scraped-based alternative data is still limited by the cost of this kind of data and the difficulty in linking the data product to a certain KPI to track, so it’s difficult to understand the return on investment, but we’re still at the beginning of the adoption curve.

LEO: You published an interesting article recently about inventory levels that fits perfectly here. Could you tell us more about it?

PIER: That’s exactly an example of alternative data based on web scraping that could be used!

In this article, I’ve described how a listed home improved company, Lowe’s, shows the exact amount of items available on their website per item shown.

This means that once we understand how much the online channel impacts the global revenues of the company and where the inventory sold comes from (from warehouses or physical stores), we can estimate the sales that happened on the website and then the global revenues of the company.

In this particular case, you need to choose your preferred pick-up store, before knowing the item availability in it. Since the website was protected with Akamai and needed to change the preferred store, I opted to use Playwright.

In this way, I could set the browser coordinates as the same as the store I wanted to select, and at the same time, I could easily bypass Akamai since I was using a real browser for the interaction with the website.

Since the article was meant to show the potential of web scraping to create alternative data for the financial industry, I didn’t run the scraper on the full website but just on the category of “French Door refrigerators”, and what I found was that probably, when you order an item from Lowe’s website and decide to pick it up on a store, the item comes from a warehouse and not from the store itself.

In fact, picking up two different stores in NY, at least for this category, the inventory levels were the same. When choosing another store in LA, the levels were different. So, probably, they are served by different warehouses.

But why is this information interesting? By tracking the inventory levels, you can estimate sales and, if you have enough history, you can clearly see trends, not only for Lowe’s itself but also for brands sold by Lowe’s, and check if they are correlated with their revenues or how a new product is perceived from the market (let’s say a new version of Apple TV, as an example).

LEO: Thank you so much for your amazing answers, Pier! I really appreciate your time and the passion you have for this industry. Before you go, can you tell our readers how to keep in touch with you?

PIER: I share my journey in web scraping with articles like the one you’ve mentioned before on a substack called The Web Scraping Club, most of the articles are free, and they vary from courses for beginners to product reviews and advanced techniques with code and examples. We also have a Discord server to share ideas and doubts, and you can reach me on LinkedIn.


We hope you enjoyed our first DataTalk interview! We have many more interesting conversation lineups for 2024, so stay tuned for more ^^

Want to learn more about ecommerce scraping? Check out our latest tutorials and guides:

About the author

Picture of Leonardo Rodriguez

Leonardo Rodriguez

Leo is a technical content writer based in Italy with experience in Python and Node.js. He’s currently ScraperAPI's content manager and lead writer. Contact him on LinkedIn.

Related Articles

Talk to an expert and learn how to build a scalable scraping solution.