5 Best Ways to Get the Inner Text of a Webpage Using JavaScript Executor in Selenium with Python

πŸ’‘ Problem Formulation: In web automation, you may often need to retrieve the inner text of webpage elements for testing or data extraction. Using Selenium with Python, the text content can be collected efficiently through JavaScript execution. This article explores how to accomplish this by illustrating several JavaScript-based methods to access the DOM’s inner text. Suppose you have loaded a webpage and wish to extract all text within a <div> element with an ID of “content”. The desired output is the textual content without HTML tags.

Method 1: Using getElementById()

This JavaScript function retrieves an element by its specific ID and collects its inner text. This method is fast and accurate for individual elements with unique IDs and is implemented in Selenium by executing a JavaScript command.

Here’s an example:

from selenium import webdriver
driver = webdriver.Chrome()
driver.get('http://example.com')
innerText = driver.execute_script('return document.getElementById("content").innerText')
print(innerText)

Output:

Hello, World! This is an example text within the content div.

This code snippet launches a new Chrome browser instance, opens a webpage, executes a JavaScript command to find the element with the ID “content”, and retrieves its inner text. The execute_script() function triggers the JavaScript execution within the Selenium script.

Method 2: Using querySelector()

The querySelector() JS function is versatile, allowing you to select elements with complex selectors. Through Selenium’s JavaScript execution ability, it provides a way to pinpoint specific elements, similar to CSS selectors, to fetch their inner text content.

Here’s an example:

innerText = driver.execute_script('return document.querySelector(".my-class").innerText')

Output:

Text found using the .my-class selector.

This line of code uses JavaScript within Selenium to select the first element with the class “my-class” and get its inner text. It’s a more flexible approach to targeting elements without relying on an ID.

Method 3: Using getElementsByTagName()

This method retrieves a live HTMLCollection of elements with a given tag name and gets their inner text using JavaScript. It is commonly used when dealing with multiple elements sharing the same tag.

Here’s an example:

innerTexts = driver.execute_script(
    'return Array.from(document.getElementsByTagName("p")).map(function(el) {return el.innerText;});'
)
print(innerTexts)

Output:

['First paragraph text.', 'Second paragraph text.', 'Third paragraph text.']

The code utilizes JavaScript to convert the collection of <p> elements to an array and maps through it to compile an array of inner text strings, retrieved via Selenium’s JavaScript executor.

Method 4: Using getElementsByClassName()

Similar to getElementsByTagName(), this JavaScript method gets elements by their class name. It’s useful for batch retrieval of inner text from elements sharing the same class.

Here’s an example:

innerTexts = driver.execute_script(
    'return Array.from(document.getElementsByClassName("common-class")).map(el => el.innerText);'
)

Output:

['First element text.', 'Second element text.']

The script finds all elements with β€œcommon-class” and extracts their inner text. It uses an arrow function to map each element to its text content, simplifying the array conversion process.

Bonus One-Liner Method 5: Using XPath Expression

This succinct method uses an XPath expression to directly get the inner text of an element, which can be executed quickly in Selenium’s JavaScript context.

Here’s an example:

innerText = driver.execute_script('return document.evaluate("//div[@id=\'content\']", document, null, XPathResult.STRING_TYPE, null).stringValue;')

Output:

This is content via XPath.

This compact code retrieves the inner text of an element with the ID “content” using an XPath expression executed within Selenium’s JavaScript executor.

Summary/Discussion

  • Method 1: getElementById(). Fast and precise, ideal for unique elements. Limited to ID selectors.
  • Method 2: querySelector(). Flexible CSS-like selector capability. Gets only the first matching element.
  • Method 3: getElementsByTagName(). Suitable for collecting texts from multiple items with the same tag. Converts a HTMLCollection, which may be slower with large collections.
  • Method 4: getElementsByClassName(). Batch retrieves inner texts for elements with a common class. Shares the same limitation with massive collections.
  • Method 5: XPath Expression. Concise one-liner suitable for complex element selection scenarios, but XPath requires precise syntax and understanding of document structure.