In the time of web scraping or with the purpose of automation, we require to get the text from the HTML element of the page. Selenium allows us to do this with a special “.text()
” method. This method helps us bringing out the text that is visible in the HTML content. Today we will dive deep into it to have a better understanding of this feature.
Setting Up the Environment
So, let us initiate the process. The WebDriver
module needs to be imported from selenium and then create a driver object from it. Next, we need to specify the path of chromedriver
since we will be using the chrome browser to scroll the page. The maximize_window()
method is available to have a better view. Then try to connect to the website using driver.get()
method. We will be using implicit wait for 10 seconds.
from selenium import webdriver driver = webdriver.Chrome(executable_path = r'G:/chromedriver_win32/chromedriver.exe') driver.maximize_window() driver.get('https://theautomationzone.blogspot.com/2020/07/mix-of-basic-webelements.html') driver.implicitly_wait(10)
Finding Header Text From a Website with .text()
We will try to find the header text from the “the automation zone” blog today. First, we need to find the element then we will use the text method of Python selenium to get the text of the header. Bring the mouse pointer inside the webpage and Right-click on the mouse. From the context menu click the inspect option.
From the html we can use the class attribute to find the element and then apply the “.text” method to get the text of the title. We will create a “title” variable now and store the located web element with text method in it.
title = driver.find_element_by_class_name('title').text print(title)
The title text “the automation zone” will be printed in the console.
How to Get the Text with get_attribute()
There is another method available in selenium called get_attribute()
methodwhich also allows us to get the text out of the html. The method get_attribute()
can take arguments like “textContent
”, “value
” , “innerHtml
”. For instance, we want to get the text of the third paragraph. We can get it using following codes:
paragraph3 = driver.find_element_by_id('p3').get_attribute("textContent") print(paragraph3)
Here, after locating the webelement we used get_attribute(“textContent”)
method to get the text. The result will look like this:
This is an example of paragraphs with a span inside
Difference .text() and get_attribute()
Notice the output text of paragraph 3 above. It does not look like as same as the text visible on the webpage. There are some empty spaces among the phrases. This is because there is a “span
” attribute available inside the HTML tag and we are getting the line by line code text written on the HTML side. It will not return the empty spaces or line breaks available inside the HTML element tag.
Now if we try to get the same text of third paragraph using the “.text
” method:
para3 = driver.find_element_by_id('p3').text print(para3)
The output will be:
This is an example of paragraphs with a span inside
As we can see the output text is as same as it was written on the web page. It ignores the spaces inside the HTML file.
So the main difference is, the get_attribute()
method will return the same text written on the HTML side while the “.text
” method will copy the same text written on the webpage.
How to Get the Text of an URL
The get_attribute()
method not only allows us to bring the text out of the element but also enables us to get the text written inside the attribute of an element tag. For instance, we need to find the link attached in the “this is an example of link “ part of the webpage.
By inspecting the HTML of the Google link portion of the webpage we can see the URL is available inside the href
attribute of the <a>
tag. We can use the get_attribute("value")
method to get the value of href
.
link = driver.find_element_by_id('link').get_attribute('href') print(link)
Hereafter locating the element by id, we used the ‘href
’ inside the get_attribute()
method as it contains the URL of the Google link. it returns the output as plain text.
https://www.google.com/
This is a very useful way of getting the text value of an attribute inside an HTML tag.
How to Get the Text From a Dropdown
Let’s try to set the “select your favorite food” dropdown to “Pineapple” and get the text “Pineapple” from it. If we inspect the element by right-clicking it, we will find that “Pineapple” option is available under the select
tag.
There is an article available regarding “how to select a dropdown menu” in the Finxter blog. You can use the following link to know the process to find the select
tag element.
We need to import the Select
module and the code will follow as below to get the text “Pineapple
”:
dropdown = driver.find_element_by_id("mySelect") dropdown.click() element = Select(dropdown) element.select_by_index('2') fruit = driver.find_element_by_id("mySelect").get_attribute("value") print(fruit)
Here we located the element first and then with the help of “Select()
” method we selected the “pineapple
” value from the dropdown. At last, we used the get_attribute(“value”)
method to bring the text “pineapple” out of it.
That’s all about how to get the text with Selenium in Python. I Hope, Now it’ll be easier for you to get the text from the webpage.
To learn more about Python, check out the following cheat sheets: