How to Get the Text with Selenium in Python

5/5 - (2 votes)

In the time of web scraping or with the purpose of automation, we require to get the text from the HTML element of the page. Selenium allows us to do this with a special โ€œ.text()โ€ method. This method helps us bringing out the text that is visible in the HTML content. Today we will dive deep into it to have a better understanding of this feature.

Setting Up the Environment

So, let us initiate the process. The WebDriver module needs to be imported from selenium and then create a driver object from it. Next, we need to specify the path of chromedriver since we will be using the chrome browser to scroll the page. The maximize_window() method is available to have a better view. Then try to connect to the website using driver.get() method. We will be using implicit wait for 10 seconds.

from selenium import webdriver
driver = webdriver.Chrome(executable_path = r'G:/chromedriver_win32/chromedriver.exe')

Finding Header Text From a Website with .text()

We will try to find the header text from the โ€œthe automation zone” blog today. First, we need to find the element then we will use the text method of Python selenium to get the text of the header. Bring the mouse pointer inside the webpage and Right-click on the mouse. From the context menu click the inspect option.

From the html we can use the class attribute to find the element and then apply the โ€œ.textโ€  method to get the text of the title. We will create a โ€œtitleโ€ variable now and store the located web element with text method in it.

title = driver.find_element_by_class_name('title').text

The title text โ€œthe automation zoneโ€ will be printed in the console.

How to Get the Text with get_attribute()

There is another method available in selenium called get_attribute() methodwhich also allows us to get the text out of the html. The method get_attribute() can take arguments like โ€œtextContentโ€, โ€œvalueโ€ , โ€œinnerHtmlโ€. For instance, we want to get the text of the third paragraph. We can get it using following codes:

paragraph3 = driver.find_element_by_id('p3').get_attribute("textContent")

Here, after locating the webelement we used get_attribute(โ€œtextContentโ€) method to get the text. The result will look like this:

This is           an example of paragraphs                with a span inside

Difference .text() and get_attribute()

Notice the output text of paragraph 3 above. It does not look like as same as the text visible on the webpage. There are some empty spaces among the phrases. This is because there is a “span” attribute available inside the HTML tag and we are getting the line by line code text written on the HTML side. It will not return the empty spaces or line breaks available inside the HTML element tag.

Now if we try to get the same text of third paragraph using the โ€œ.textโ€ method:

para3 = driver.find_element_by_id('p3').text

The output will be:

This is an example of paragraphs with a span inside

As we can see the output text is as same as it was written on the web page. It ignores the spaces inside the HTML file.

So the main difference is, the get_attribute() method will return the same text written on the HTML side while the “.textโ€ method will copy the same text written on the webpage.

How to Get the Text of an URL

The get_attribute() method not only allows us to bring the text out of the element but also enables us to get the text written inside the attribute of an element tag. For instance, we need to find the link attached in the “this is an example of link “ part of the webpage.

By inspecting the HTML of the Google link portion of the webpage we can see the URL is available inside the href attribute of the <a> tag. We can use the get_attribute("value") method to get the value of href.

link = driver.find_element_by_id('link').get_attribute('href')

Hereafter locating the element by id, we used the โ€˜hrefโ€™ inside the get_attribute() method as it contains the URL of the Google link. it returns the output as plain text.

This is a very useful way of getting the text value of an attribute inside an HTML tag.

How to Get the Text From a Dropdown

Letโ€™s try to set the โ€œselect your favorite foodโ€ dropdown to โ€œPineappleโ€ and get the text โ€œPineappleโ€ from it. If we inspect the element by right-clicking it, we will find that “Pineapple” option is available under the select tag.

There is an article available regarding โ€œhow to select a dropdown menuโ€ in the Finxter blog. You can use the following link to know the process to find the select tag element.

We need to import the Select module and the code will follow as below to get the text โ€œPineappleโ€:

dropdown = driver.find_element_by_id("mySelect")
element = Select(dropdown)
fruit = driver.find_element_by_id("mySelect").get_attribute("value")

Here we located the element first and then with the help of โ€œSelect()โ€ method we selected the โ€œpineappleโ€ value from the dropdown. At last, we used the get_attribute(โ€œvalueโ€) method to bring the text “pineapple” out of it.

That’s all about how to get the text with Selenium in Python. I Hope, Now it’ll be easier for you to get the text from the webpage.

To learn more about Python, check out the following cheat sheets: