❗ Please use the skills learned in this article responsibly and make sure you adhere to the terms of service of any mentioned service!
You can find the GitHub repository for this article here:
Do you want to scrap all Google search results into one file using Python by automation? Do you want to know the answer to your search query within a few minutes? This would be so awesome! Right?
You are not required to open the Google Chrome browser and type your search query in the search box. Manual scrolling of URL pages for over 10 pages to get answers to your query is also not required. All will be automatic and done for you. Great! Right?
It is possible to automate Google search results using Selenium webdriver and Python. You need only a basic understanding of Python and HTML programming to achieve this.
In this article, you will learn how to fill the search query in google.com and click submit using webdriver
. Then you will learn to scrap website URLs with titles and to get answers to the query.
We will explain to you about each element of the google search results page and how to extract them.
Why it is so important to automate google search results? The main reason is you can get all details in one file organized in not much time without any errors and omissions. For example, if you copy all the 10 pages of URLs in google to your file for further research. You might copy partial URLs and might omit some of the important web page URLs. It is a very laborious and boring task for you.
In what other ways we can use this powerful tool Selenium webdriver. You can automate posts on Facebook, Twitter, etc through Selenium webdriver. It is also used for scraping competitors’ product prices for price comparison.
Can we begin learning?
What is Selenium?
Selenium is open-source software to automate web applications for testing purposes. In simple terms, it is free software that automates your browsers. Created by Jason Huggins in 2004 for automating web tests who is an engineer in Thoughtworks. His routine duty is to test web applications. Manual testing was so tedious and time-consuming. So he created this software using Java Script which automates the browser interactions.
In the beginning, Selenium restricts to access pages only on google.com. It can’t be used in other search engines such as Yahoo.com. To overcome this computer engineers have developed new versions of selenium.
You can see in the below Table on developments of selenium taken place over the period.
Creator | Software Name | Developments |
Paul Hammant | Selenium Remote Control or Selenium 1 | Created HTTP Proxy server to trick browser that selenium comes from same domain. |
Patrick Lightbody | Selenium Grid | Reducing Test Execution Time. |
Shinya Kasatani | Selenium IDE | Build Fireworks extension to automate browser through record-and-playback feature. This has reduced Execution time further. |
Simon Stewart | WebDriver | Automate browser in OS level and not on JavaScript. |
Selenium Team | Selenium 2 | Merged Web Driver and Selenium RC to form powerful software for quicker automation. |
What is Web Driver?
It is a modern tool implemented for automating web testing with cross-browser platforms. The test is executed in different browsers such as Firefox, Google Chrome, Internet Explorer and Safari.
The Web Driver supports java, PHP, Python, Perl, and Ruby programming languages.
The main benefits of Web Driver are as follows:
- Installation is simple as the server is not required to install,
- Direct communication between Driver and Browser,
- Realistic browser interaction and faster execution,
- Can execute in any operating system,
- Reduces the cost of hiring testers because of automated testing.
The browser can directly communicate with one of the drivers such as chrome driver, Firefox options, Safari options, Edge driver, and Internet Explorer options.
You can use the selenium web driver to automate routine tasks such as tweeting, google searching, searching LinkedIn profiles and web scrapping.
Also, we can use it for automating form filling like time sheets for project management.
The limitation of the Selenium web driver is that it can’t support window-based applications. It can only support web-based applications and cannot test mobile applications. It can’t support new browsers, can’t handle captchas, barcodes.
Elements of Google Search Results Page
If you want to know about a topic, information, or want to purchase the product at the best price you would have to google it most of the cases right? Did you ever analyze the google results search page elements? The Google search Result page or SERPS we call it in short has different elements. Elements such as organic results, knowledge graphs, people also ask, videos, top stories, related searches, and more. In this section, we will know about it in detail.
Organic Results:
Google will show all the results which are naturally received and not paid. These results are shown as per the search query and according to Google’s Secrets algorithm. Search Engine Optimization is also used for ranking Organic search results. The results consist of Text in the blue link, URL shown in green, and snippet or short description of website.
People also ask:
Based on your search query. Google uses its algorithm and the previous user’s query to display blocks of related questions. When expanding each block of the question, you can see snippets answering the question with URL links. This block appears after a few organic or paid results. It populates more questions blocks whenever you click on the last block.
Knowledge Graph:
When you search a topic for example, “python” or brand/company name “Apple”. Google collects large amounts of data and presents you in the information box in an effective way. This is the area of our target to get the answer for your search query.
In this element, you can see all information about the search query in a more concise way. Google gets the data from credible resources such as Wikipedia, the CIA World Fact Book, schema information, and more. It is on the right side of the SERPS screen on desktop.
You can get all the answers for your search query from the below five elements marked in the image 2:
- Images – These are the pictures of your search query. For example, Python 3 and Python logo is shown.
- Heading – The title is shown here.
- Description – Basic information of your search query is shown. For example, what is Python is explained in this section.
- Subheading – Important Facts are shown to your search query.
- URL links – Few Important topics related to search query is displayed.
Videos:
SERPs display videos for certain keywords. It displays among other elements as a separate element Called Video. Often SERP pulls videos from YouTube and pull from the website if the embedded video is available. Initially, you can see 3 Videos in SERP. When you click the arrow button then you can view more videos. SEMrush states that Google only shows 6 % of search query videos results. Is it? You can research it.
Related searches:
This feature is shown at the bottom of the search result. Even though it’s at the bottom it is very essential data for us. From Google Algorithms and previous user searches, the keyword is displayed. You might not find the necessary information based on your search query. But from related searches keywords, you can find the information you want. There will be great ideas for your research from related Search results.
Setting up of Framework for Automating
Well, you have understood what the elements of Google are. Great! Now let’s begin to automate Google queries using Selenium, Web Driver, and Python.
Before proceeding further, I hope that you are familiar with the basic HTML structure. Let’s start without further delay.
Installation of Library:
First, we should install Selenium.
Open Terminal or command and type the following command:
pip install selenium
Then install Webdriver Chrome Driver using this link.
Finally, Install CSV using the below command
pip install python-csv
Now our framework is set up to proceed further to automate Google Search.
Before diving into code let us go through procedures to get google search results in CSV file.
Procedure to Automate Google Search
Now let’s dive into coding open your Python Idle shell
Import Python Libraries
First, let us import Selenium Webdriver, Sleep, and CSV using the code:
from selenium import webdriver from time import sleep import csv
Accessing and Navigating Web Page
We are telling the computer to open the chrome browser, go to www.google.com and search for query “Python”.
# specify path of Chrome Driver, code 1 driver = webdriver.Chrome('/Users/mohamedthoufeeq/Desktop/chromedriver') # use driver.get()method to navigate the web page by giving URL address driver.get('https://www.google.com/') code 2 # locate "English" Language by _Xpath / to change language to English, code 3 English = driver.find_element_by_xpath('//*[@id="SIvCob"]/a[2]') English.click()
The variable driver
is an instance of Google Chrome. We will use this variable driver to perform commands. Find the location of your Chrome driver installed on your PC. In the code 1 you have to put your location path of Chrome driver. When you execute code 1 and code 2, the Google Chrome browser will open and go to google.com automatically. Then, the driver.get()
method is used to open the web page. The next line of code is optional if the Google page opens in other languages instead of ENGLISH. Then use code 3 to change the language to English. Let’s look at this code in more detail.
Open the HTML scripts by right click on the web page and clicking Inspect (see Image 5).
There are different types of attributes and tags like class
, id
, href
, a
, div
, p
, etc in HTML to access specific elements. We can access all elements using find_element_ by_method
.
These methods are shown below:
find_element_by_class_name
find_element_by_css_selector
find_element_by_id
find_element_by_link_text
find_element_by_name
find_element_by_partial_link_text
find_element_by_tag_name
find_element _by_xpath
Click the marked (black circle in below image) Inspect element icon to hover over any element on the webpage. Hover over the “ENGLISH” link to inspect the element. Find href
element in HTML then click right > copy > copy xpath
then paste in Find bar.
The xpath you get is "//*[@id="SIvCob"]/a[1]"
, there is attribute id
and tag a
. Use this xpath
to access English link in the google home page in code 3 (see Image 6).
Let’s discuss about xpath in selenium. It is an XML path to navigate through attributes and tags in an HTML document. The syntax of xpath is
xpath=//tagname[@attribute='value'] // : select current Path. Tagname : tagname of particular path @ : select attribute Attribute : attribute of the particular path Value : name of the attribute
In case you can’t find elements using general selectors such as _classname
or _ id
then xpath is used to find the element.
Thus, we have chosen XPath to find the English link.
Once you receive the English link stored in the English variable, click the same to proceed. The click()
method of web driver is used to interact with web page (see second line code no 3).
For inputting the search query “python” in search box and entering. Then create these 4 lines of code as shown below:
# locate search query form in html script by _name, code 1 search_query=driver.find_element_by_name("q") # use send_keys() to simulate key strokes/ type the search term "python"b code 2 search_query.send_keys("python") #locate Google Search button by _xpath code 3 google_search_btn =driver.find_element_by_xpath('//*[@type="submit"]') # use submit() to mimic enter key code 4 google_search_btn.submit()
The first step is to look for the search box HTML element. Create search_query
variable to store the search box element for performing keystrokes. In the HTML Script, when inspecting the search box you can see attribute name=“q”
(see Image 7). Use this attribute to locate the search query as shown in code 1.
In code 2, use the send_keys()
method to prompt keystrokes for typing “python”. To proceed we have to click submit button. Code 3 and code 4 do our intended action. Xpath for locating Google search button element attributes [@type="submit”]
(see Image 8).
Note that the asterisk “*” is for searching any tag.
Excellent! You have an automated search query.
Now let’s start to code to extract Google Search elements.
Organic Results
These results give us all the websites normally derived using Google’s secret algorithms and SEO keywords.
# locate URL for organic results element from html script by _xpath, code 1 organic_result = driver.find_elements_by_xpath('//*[@class="yuRUbf"]/a[@href]') # get all URL and store it in variable "url_list1" list using for loop, code 2 url_list1 = [] for organic_url in organic_result: if not 'google' in organic_url.get_attribute("href"): url_list1.append(organic_url.get_attribute("href")) # locate title of URL for organic results element from html script by _xpath , code 3 url1_title = driver.find_elements_by_xpath('//*[@class="LC20lb DKV0Md"]') # get all title of the URL and store it in variable "title_url_list1" list using for loop, , code 4 title_url_list1 =[] for title_url1 in url1_title : text = title_url1.text title_url_list1.append(text)
In the webpage hover the heading of the first URL search result “https://
www.python.org” and inspect the element.
You can see the href
link attribute. Then identify which class it belongs to this href link which is class = “yuRUbf”
(see Image 9).
You create xpath for locating this organic results URL using code 1:
Xpath in more detail:
Xpath in more detail: ('//*[@class="yuRUbf"]/a[@href]') // - selecting current path of class = “yuRUbf” * - selecting current tagname of class = “yuRUbf” which is div [@class="yuRUbf"] - selecting the class with value "yuRUbf" /a[@href] - selecting href element after the class
This organic URL element is stored in the variable organic_result
. Code 2 stores URL of the organic result element to the list called url_list1
. To get the attributes of href ie URL links use the get_attribute
method. Also, we need to remove URL links from google as this website belongs to the “People also ask” element. After that, the title of each URL of the organic element is to be extracted and stored in list “title_url_list1
”. To do that again inspect the element title “Welcome to Python.org” and identify the xpath. The class is “LC20lb DKV0Md” to locate the title of the element (refer to image 10) and use code 3. Store the title into a list title_url_list1
using for loop and append to the list as shown in code 4.
The Organic Results of your search query is extracted and stored in respective variables as shown above. Next, we can move to next element.
People Also Ask
In this element, we can find what other people asked questions related to your search query.
This is useful data for your research content.
Now let’s scrape People also ask element as per the above similar step.
First, locate URL links of the “People also ask” element in the HTML using inspect element option.
# locate URL in "People also ask" element from html script by _xpath, code 1 People_quest = driver.find_elements_by_xpath('//*[@class="AuVD cUnQKe"]//a[@href]') # get all URL and store it in variable "url_list2" list using for loop, code 2 url_list2 = [] for People_url in People_quest : if not 'google' in People_url.get_attribute("href"): if not 'search' in People_url.get_attribute("href"): url_list2.append(People_url.get_attribute("href")) # locate title of URL in "People also ask" element from html script by _xpath , code 3 url2_title = driver.find_elements_by_xpath('//*[@class="iDjcJe IX9Lgd wwB5gf"]') # get all title of the URL and store it in variable "title_url_list2" list using for loop , code 4 title_url_list2 =[] for title_url2 in url2_title : text = title_url2.text title_url_list2.append(text)
You can get the URL of People also ask element using class = “AuVD cUnQKe”
. This class only belongs to People also ask element (see Image 11). In code 1, create the People_quest
variable to store the URL element of People also ask. Refer Image 12 to get the title of URLs from People also ask element.
Next store the title and URLs of a web page in url_list2 and title_url_list2.
Now let us move to extracting search terms from Related searches element.
Related Searches
This element provides new great ideas related to your search query. It is on the last side of the page. There are 8 unique search terms derived from other people searches and Google algorithms. Let us see how to scrap this superb element Related searches.
Scroll down the page right click on this element and then click Inspect elements.
Refer Image 13 and Image 14.
# locate URL for Related searches element from html script by _xpath, Code 1 related_search = driver.find_elements_by_xpath('//a[@class ="k8XOCe R0xfCb VCOFK s8bAkb"][@href]') # get all URL and store it in variable "url_list5" list using for loop url_list5 = [] for related_url in related_search : url_list5.append(related_url.get_attribute("href")) # locate title of URL for Related searches element from html script by _xpath url5_title = driver.find_elements_by_xpath('//*[@class="s75CSd OhScic AB4Wff"]') # get all title of the URL and store it in variable "title_url_list5" list using for loop title_url_list5 = [] for title_url5 in url5_title : text = title_url5.text title_url_list5.append(text)
The related_search
variable stores the URL of Related searches element using the find_elements_by_xpath
method.
There is the tag “a” before class = “k8XOCe R0xfCb VCOFK s8bAkb”. So the xpath syntax is ('//a[@class ="k8XOCe R0xfCb VCOFK s8bAkb"][@href]'
) as shown in code 1.
Next store the title and URL of the web page of Related searches in list variables title_url_list5
and url_list5
using the above codes.
Knowledge Graph
This is an interesting new element in the google search results page. In this element, you can answer your search query in the Description segment.
The information is displayed in condensed form with text, image, video, and URLs.
Let’s break up the knowledge graph into segments and scrap a few essential data from them.
- Top Images
- Main Text Heading
- Description/ Snippets
- Subheadings
- URL Links
Extracting details of Main Text Heading:
Inspect the element of the heading identify the attributes and tags.
# locate the main title for Knowledge Graph element from html script by _xpath Know_Main_head = driver.find_elements_by_xpath('//*[@class="K20DDe R9GLFb JXFbbc LtKgIf a1vOw BY2RHc"]') # get the main title and store it in variable "text_url3" using for loop for title_url3 in Know_Main_head: text_url3 = title_url3.text
The main heading of the Knowledge graphs’s class is “K20DDe R9GLFb JXFbbc LtKgIf a1vOw BY2RHc”. (Refer to Image 15)
The element is stored in the variable Know_Main_head
. The text details are then stored in text_url3
. Even though it’s a single string of data in the main heading. The element is stored in a list and .text
method can’t work in List, so we use for loop to get the details
Extracting Details of Description / Snippets:
Identify the attributes and tags for this element using inspect element icon.
# locate description of Knowledge Graph element from html script by _xpath Know_desc = driver.find_elements_by_xpath('//*[@class="PZPZlf hb8SAc"]') # get description and store it in variable "text_desc" using for loop for desc in Know_desc: text_desc = desc.text
The attribute of the class is “PZPZlf hb8SAc” which is stored in Know_desc
Variable. Refer to Image 16.
Using the for loop and .text
method we get the text of the element.
Extracting details of subheadings:
These subheadings are below the snippets. And have useful facts about the search query.
Identify the attributes and tags for this element for extracting data:
# locate title of sub head for Knowledge Graph element from html script by _xpath Know_subhead = driver.find_elements_by_xpath('//*[@class="rVusze"]') # get all title of the URL and store it in variable "title_subhead" list using for loop title_subhead = [] for subhead in Know_subhead: text = subhead.text title_subhead.append(text)
The attribute of class for subheadings is rVusze
which is stored in variable Know_subhead
(see Image 17).
Likewise use for loop and .text()
method to store the facts in the list variable title_subhead
.
In this, there are more subheadings items in the list.
Extracting Website title and URLs:
Inspect element for the webpage name and Url links using the hover action.
# locate title of URL for Knowledge Graph element from html script by _xpath Know_links_name = driver.find_elements_by_xpath('//*[@class="OS8yje oJc6P QTsT3e"]') # get all title of the URL and store it in variable "title_url_list3" list using for loop title_url_list3 = [] for title_url3 in Know_links_name: text = title_url3.text title_url_list3.append(text) # locate URL for Knowledge Graph element from html script by _xpath Know_graph = driver.find_elements_by_xpath('//*[@class ="mFVw3b"]//a[@href]') # get all URL and store it in variable "url_list6" list using for loop url_list6 = [] for graph_url in Know_graph : url_list6.append(graph_url.get_attribute("href"))
You can identify class = “ OS8yje oJc6P QTsT3e” and class = “mFVw3b” for webpage name and URL links attribute (see Image 18-20).
The variable Know_links_name
stores elements for the Webpage name. Variable Know_graph
stores URL links of the webpage in the Knowledge Graph.
Using for loop, .text
and get_attribute
method, we get a list for webpage name and URL links.
You got all items in the knowledge graph and stored in the variable list.
Now you can move to next interesting element.
Videos
You can view videos related to your search queries.
These videos mostly come from YouTube which is the leading search engine for Video.
# locate URL for Videos element from html script by _xpath Video = driver.find_elements_by_xpath('//a[@class ="X5OiLe"][@href]') # get all URL and store it in variable "vid_url" list using for loop vid_url = [] for vid in Video : vid_url.append(vid .get_attribute("href")) # locate title of URL for Videos element from html script by _xpath Video_title = driver.find_elements_by_xpath('//*[@class="fc9yUc oz3cqf p5AXld"]') # get all title of the URL and store it in variable "vid_title" list using for loop vid_title = [] for Vid_text in Video_title : text = Vid_text.text vid_title.append(text)
Hover over the video URL and title to get the tags and attributes (see Image 21-22).
The xpath for Video Url is ‘//a[@class ="X5OiLe"][@href]’
where a is the tag for video URL link which is shown first in the path. The elements are stored in Video Variable.
The xpath for Video title is ‘//*[@class="fc9yUc oz3cqf p5AXld”]’
which is stored in the Video_title
variable.
The title and URL links are stored in vid_title
and vid_url
list variables.
Congratulations! You have extracted all details from elements of Google Search Results Page using Selenium.
There are few points to be added for smoother functioning of program without errors.
- Use
sleep
function to make program wait, so thatfind_all_elements
will have enough time to extract the HTML elements.
from time import sleep # use sleep method between each Google elements sleep(1)
- The script I have written above will scrap search results only for the first page. You can add few lines of code to scrap results for more pages. For this purpose, use For loop and driver .get method to access next page as shown below:
for i in range(7): Next_page = driver.find_element_by_xpath('//*[@id="pnnext"]') ''' Script for extracting Search result from Organic Result google elements''' …. Next_page.click() sleep(1)
- You should change next page only for extracting details from Organic Results element and not on other elements. Because these elements are available in first page only. Following code will do the intended action.
for i in range(7): Next_page = driver.find_element_by_xpath('//*[@id="pnnext"]') ''' Script for extracting Search result from Organic Result google elements''' while i == 0: ''' Script for extracting Search result from "People also ask" google element''' ''' Script for extracting Search result from "Related searches" google element''' … i = i + 1 Next_page.click() sleep(1)
Exporting data to CSV file
Below is the code to export all the results of elements to Google_Search.csv
file.
with open('Google_Search.csv','w', newline = "") as Google: Main_header1 = ["People also ask"] People_header_writer = csv.DictWriter(Google, fieldnames = Main_header1) People_header_writer.writeheader() header1 = ['Question','URL'] People_writer = csv.DictWriter(Google, fieldnames = header1) People_writer.writeheader() for a,b in zip(title_url_list2,url_list2): People_writer.writerow({'Question' : a , 'URL' : b }) Main_header2 = ["Related Search"] Related_header_writer = csv.DictWriter(Google, fieldnames = Main_header2) Related_header_writer.writeheader() header2 = ['Search Terms','URL'] Related_writer = csv.DictWriter(Google, fieldnames = header2) Related_writer.writeheader() for c,d in zip(title_url_list5,url_list5): Related_writer.writerow({'Search Terms' : c , 'URL' : d }) Main_header3 = ["Knowledge Graph"] Knowledge_header_writer1 = csv.DictWriter(Google, fieldnames = Main_header3) Knowledge_header_writer1.writeheader() Know_Main_header = [text_url3] Know_Main_header_writer = csv.DictWriter(Google, fieldnames = Know_Main_header) Know_Main_header_writer.writeheader() Know_descp = [text_desc] Know_descp_writer = csv.DictWriter(Google, fieldnames = Know_descp) Know_descp_writer.writeheader() Know_subhead_header = ["subhead"] Know_subhead_writer = csv.DictWriter(Google, fieldnames = Know_subhead_header) Know_subhead_writer.writeheader() for i in zip(title_subhead): Know_subhead_writer.writerow({'subhead' : i}) header3 = ['Title','URL'] Know_writer = csv.DictWriter(Google, fieldnames = header3) Know_writer.writeheader() for e,f in zip(title_url_list3,url_list6): Know_writer.writerow({'Title' : e , 'URL' : f }) Main_header4 = ["Videos"] Video_header_writer1 = csv.DictWriter(Google, fieldnames = Main_header4) Video_header_writer1.writeheader() header4 = ['Title','URL'] Video_writer = csv.DictWriter(Google, fieldnames = header4) Video_writer.writeheader() for g,h in zip(vid_title,vid_url): Video_writer.writerow({'Title' : g , 'URL' : h }) Main_header5 = ["Organic Results"] Organic_header_writer1 = csv.DictWriter(Google, fieldnames = Main_header5) Organic_header_writer1.writeheader() header5 = ['Web Site Name','URL'] Organic_writer = csv.DictWriter(Google, fieldnames = header5) Organic_writer.writeheader() for j,k in zip(title_url_list1,url_list1): Organic_writer.writerow({'Web Site Name' : j , 'URL' : k })
Title and URL details are stored in separate list Variables. We must convert it to the dictionary and export the data to csv file. csv.DictWriter
method is used to write data in the CSV file. Zip function is used to create a dictionary where Titles are Keys and URL links are Values.
The output of Google_Search.csv
file:
Conclusion
Selenium which automates web browsers is a powerful tool for you to scrap useful data from any webpage promptly. You can extract all the URLs and information about your query in one file from elements of SERP. This information is so useful for further research. You can again extract information with Selenium webdriver of websites received from google search results too. Automate Web Scrap is used widely in different areas such as market research, Price comparison, Machine learning, and Product development. So how will you use Selenium webdriver for extracting data?