1

I'm using Python script to scrap google, this is what I get when script finishes. Imagine if I have 100 results (I showed 2 for example).

{'query_num_results_total': 'Око 64 резултата (0,54 секунде/и)\xa0', 'query_num_results_page': 77, 'query_page_number': 1, 'query': 'example', 'serp_rank': 1, 'serp_type': 'results', 'serp_url': 'example2.com', 'serp_rating': None, 'serp_title': '', 'serp_domain': 'example2.com', 'serp_visible_link': 'example2.com', 'serp_snippet': '', 'serp_sitelinks': None, 'screenshot': ''}
{'query_num_results_total': 'Око 64 резултата (0,54 секунде/и)\xa0', 'query_num_results_page': 77, 'query_page_number': 1, 'query': 'example', 'serp_rank': 2, 'serp_type': 'results', 'serp_url': 'example.com', 'serp_rating': None, 'serp_title': 'example', 'serp_domain': 'example.com', 'serp_visible_link': 'example.com', 'serp_snippet': '', 'serp_sitelinks': None, 'screenshot': ''}

This is script usage code

import serpscrap
import pprint
import sys

config = serpscrap.Config()
config_new = {
   'cachedir': '/tmp/.serpscrap/',
   'clean_cache_after': 24,
   'sel_browser': 'chrome',
   'chrome_headless': True,
   'database_name': '/tmp/serpscrap',
   'do_caching': True,
   'num_pages_for_keyword': 2,
   'scrape_urls': False,
   'search_engines': ['google'],
   'google_search_url': 'https://www.google.com/search?num=100',
   'executable_path': '/usr/local/bin/chromedriver',
    'headers': {
      'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
      'Accept-Language': 'de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4',
      'Accept-Encoding': 'gzip, deflate, sdch',
      'Connection': 'keep-alive',
   },
}

arr = sys.argv

keywords = ['example']

config.apply(config_new)
scrap = serpscrap.SerpScrap()
scrap.init(config=config.get(), keywords=keywords)
results = scrap.run()


for result in results:
    print(result)

I want to stop script if in results is some url I want, for example "example.com"

If I have https here 'serp_url': 'https://example2.com' I want to check it and stop script if I give argument without https, just example2.com. If it's not possible to check while script working, I will need explanation how to find serp_url by an argument I provided.

I'm not familiar with Python, but I'm building PHP application that will run this Python script and output results. But I don't want to work with results in PHP (extracting by serp_url etc,) I want everything to be done in Python.

2 Answers 2

0

You can with something like this:

for result in results:
    if my_url in result['serp_url']:
    # this match 'myexample.com' in 'http://example.com'
    # or even more like 'http://example.com/whatever' and of course begining with 'https'
        exit

With any is another solution:

 if any((my_url in result['serp_url'] for result in results)):
     exit
Sign up to request clarification or add additional context in comments.

Comments

0

First of all you need to access serp_url's value.

Since result variable is a dictionary, typing result['serp_url'] will return each result's url.

Inside for-loop where you print your results you should add an if-statement where result['serp_url'] will be compared with a variable that contains your desired urls (i think you don't provide that info in your code). Maybe it could be something like the following:

for result in results:
    print(result)
    if my_url == result['serp_url']:
        exit

Same thinking in the case of https but now we need startswith() method:

for result in results:
    print(result)
    if my_url == result['serp_url']:
        exit
    if result['serp_url'].startswith('https'):
        exit

Hope it helps.

2 Comments

Thank you very much, it will be useful! But, I need my argument not to match exactly (==), but my serp_url should contain my argument. If serp_url is example.com with https://, and my argument is example.com, that statement should find match. Can that be done?
I didn't understand that your desired urls is more than one. In this case Tzomas answer does the trick.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.