Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize many importxml from the same website?

I have a Google Sheet where rows are products and columns are information about these products (Material, Color, Price, ...).

These information are retrieved using the function IMPORTXML with the corresponding xpath query. In the following example, the column D is the URL of this product. These functions retrieve information (Glass, Color and Type) for the product on row 3:

=IMPORTXML(D3,"//table[@class='info-table']/tr[th/text()[contains(.,'Glass')]]/td")
=IMPORTXML(D3,"//table[@class='info-table']/tr[th/text()[contains(.,'Color')]]/td")
=IMPORTXML(D3,"//table[@class='info-table']/tr[th/text()[contains(.,'Type')]]/td")

For each product, there are around 10 columns that are retrieved and thus 10 different functions IMPORTXML for the same URL.

It works properly when there are few products but when there are too many products only some information is retrieved and the other cells stay frozen with the value "Loading...".

The only workaround I found is manually copy/pasting the retrieved values (Ctrl+C, Ctrl+Maj+v) and then some other cells are unblocked and start import the values. In the end, I need to do this for all cells and it takes a lot of time.

  1. Why do I have cells that stay frozen in "Loading..."? Is there a limit imposed by Google?
  2. Should I adapt my functions to optimize the process and if possible reduce the number of IMPORTXML functions?
  3. Do you have a better solution than my time consuming workaround?
  4. The xpath query is always very similar. Is it possible to use a list of predefined elements. And then use other Google Sheet functions to obtain the same result but with only one IMPORTXML function

For example:

=IMPORTXML(D3,"//table[@class='info-table']/tr[th/text()[contains(.,{'Glass', 'Color', 'Type'})]]/td")
like image 472
Romain Capron Avatar asked Mar 06 '23 02:03

Romain Capron


1 Answers

This issue can be fixed by adapting the xpath query and using a combination of different Google Spreadsheet formulas.

  1. It is possible to combine several xpath queries using the character |
  2. To only get the first result, you can surround the xpath query by (YOUR_XPATH)[1]
  3. The TRANSPOSE function will move the several results on the same row

For example:

=TRANSPOSE(IMPORTXML(D2,"
(//table[@class='info-table']/tr[th/text()[contains(.,'Diameter')]]/td)[1] | 
(//table[@class='info-table']/tr[th/text()[contains(.,'Material')]]/td)[1] | 
(//table[@class='info-table']/tr[th/text()[contains(.,'Glass')]]/td)[1] | 
(//table[@class='info-table']/tr[th/text()[contains(.,'W/R')]]/td)[1] | 
(//table[@class='info-table']/tr[th/text()[contains(.,'Caliber')]]/td)[1] | 
(//table[@class='info-table']/tr[th/text()[contains(.,'Type')]]/td)[1]"))

By doing so, there is only one importXML per product.

like image 71
Romain Capron Avatar answered Mar 24 '23 14:03

Romain Capron