Web Scraping of Zomato

Hi…I am just wondering which library or approach should use to scrap from Zomato…Thanks…

Hello @joylyjelly, Welcome to the community.
I think the BeautifulSoup library is good enough to scrap Zomato, If you need login/dynamic activity you can check Selenium Python Library.

Thanks alot…I will try…sorry for another qyestion…I tried to scrap tooltip text “Score…”
But unable to do it…Can you please guide me what code should i use?

Can you show me what you tried?

Thanks…
This is the code…
soup.find(‘div’,{‘class’:“rc-tooltip-inner”})

Are you getting an error here? If not then try with these codes

soup.find(‘div’,{‘class’:“rc-tooltip-inner”}).text

or,
soup.find(‘div’,{‘class’:“rc-tooltip-inner”}).find('div').text
and if you are getting an error then kindly share what error you are getting.

Thanks…Got this error

Can you please share the notebook where you were trying?

Thanks…here it is

I’ve visited this page and there’s no rc-tooltip-inner class on any element.

Your first find tries to find something which doesn’t exist. The result is None. Then you try to use find again on this None, which results in such error.

It’s as @Sebgolos said, there’s no class as rc-tooltip-inner in this site https://www.kogan.com/au/shop/category/led-tv/?page=4, check if you are using the correct site.

Thanks for that…In fact, I am trying to extract data from Tooltip( shows the text when mouse hovering on Rating stars)…Any guide on how to do it?
image

I see now, these HTML’s are only loading whenever you hover over these boxes, else these are hidden, You can’t access dynamic pages using beautiful soup, only static pages can be accessed using bs4, you can try to use selenium for this purpose, using selenium you can hover over all these icons first and then try to get the data from the page,

@birajde @joylyjelly
You can actually parse this with bs4.

The “score” is actually described as an image in stars:


The score for the first TV is 2.3
The “width” of the yellow part of stars is 46%.
46 / 20 gives the desired score.

I’ve marked the parent element from which I would start the actual scraping.

Each child of this element (with the exception of the last one) is an actual TV.

As you can see the class names are generated, so you can’t easily scrape by using them. But you could in theory look for the first image with alt='star'.
Then look for a parent of such image - this will be the element that has the desired width which can be converted to score.

Wow, that’s a nice observation, I didn’t notice the width of the div element, Yes you can get information of the star using the way, @Sebgolos mentioned, just to add here, the classes are not auto generated, the class for the star div element where the width is mentioned is “_2HrrH” and is common for all stars block in the site. You can access these divs using the “_2HrrH” class selector.

1 Like

Such name indicates that the classes are generated by some system. They might be the same across the site, but I suspect they might change quickly whenever a new/updated version of their CSS shows up.

This will of course work (for some time), I’m just being myself purist-perfectionist :stuck_out_tongue:
I would try find("img", {"alt": "star"}) (not tested, but should work).

1 Like

Thanks a lot to both of you, Sebgolos and Birajde…it really guided me how to do it… …I got this now…

1 Like