In this article, I’ll share my method of retrieving the first n
Google search results for a given keyword and extract the text content from the resulting URLs.
The search_google(query, num_results)
function performs a search using the Google Custom Search JSON API, with the search query and the desired number of results (num_results
) as parameters.
Upon a successful API request (HTTP Status Code 200
), the function extracts and prints titles, URLs, and snippets of the search results.
Subsequently, it employs the get_texts_from_urls
function to retrieve the text content from each URL, which internally uses get_text_from_url
to send a GET request to each URL, parse the HTML using BeautifulSoup, and extract the text. The extracted texts are then printed along with their respective lengths.
The API key (google_search_api
) and Custom Search Engine ID (google_search_cx
) are needed to authenticate and specify the search engine, respectively. Both are free though. π
π§βπ» Note: The API key and Custom Search Engine ID should be obtained from Google Cloud Console and Programmable Search Engine, respectively, and should be kept confidential to prevent unauthorized usage. Both are free for up to 100 requests per hour or so.
Otherwise, you can just copy and paste the following code into your Jupyter notebook (or Google Colab):
import requests from bs4 import BeautifulSoup keyword = 'get stock price in excel cell' google_search_api = '...' # https://developers.google.com/custom-search/v1/introduction google_search_cx = '...' # https://programmablesearchengine.google.com/controlpanel/create def search_google(query, num_results=10): '''Returns a list of strings with the text of the first n google results''' # Google Custom Search JSON API URL api_url = "https://www.googleapis.com/customsearch/v1" # Parameters for the API request params = { 'key': google_search_api, # Your API key 'cx': google_search_cx, # Your Custom Search Engine ID 'q': query, # The search query 'num': num_results # The number of results to retrieve } # Make the API request response = requests.get(api_url, params=params) # Check if the request was successful (HTTP Status Code 200) if response.status_code == 200: # Parse the JSON response results = response.json() titles = [item['title'] for index, item in enumerate(results.get('items', []))] print(titles) urls = [item['link'] for index, item in enumerate(results.get('items', []))] print(urls) snippets = [item['snippet'] for index, item in enumerate(results.get('items', []))] print(snippets) # Get the Text of all URLs texts = get_texts_from_urls(urls) print(texts) for text in texts: print(len(text)) else: print("Failed to retrieve results:", response.status_code, response.text) def get_text_from_url(url): try: # Send a GET request to the URL response = requests.get(url) # Check if the request was successful (HTTP Status Code 200) if response.status_code == 200: # Parse the HTML content of the page with BeautifulSoup soup = BeautifulSoup(response.text, 'html.parser') # Extract text from the parsed HTML return soup.get_text(separator="\n", strip=True) else: print(f"Failed to retrieve content from {url}. Status code: {response.status_code}") return None except Exception as e: print(f"An error occurred while fetching {url}: {str(e)}") return None def get_texts_from_urls(urls): texts = [] for url in urls: text = get_text_from_url(url) if text: texts.append(text) return texts search_google(keyword, 5)
If you don’t use Google Colab Notebook to run the code (e.g., in your own .py
script), you may need to install BeautifulSoup and Requests — check out the linked articles for a quick tutorial on accomplishing this.
For the provided keyword 'get stock price in excel cell'
, the output of this code was as follows:
['Get a stock quote - Microsoft Support', 'The formula that returns real-time stock prices', 'Excel data types: Stocks and geography - Microsoft Support', 'Add Real-Time Stock Prices And Metrics To Excel', 'Get stock price (latest close) - Excel formula | Exceljet'] ['https://support.microsoft.com/en-us/office/get-a-stock-quote-e5af3212-e024-4d4c-bea0-623cf07fbc54', 'https://www.morningbrew.com/money-scoop/stories/2021/08/24/formula-returns-realtime-stock-prices', 'https://support.microsoft.com/en-us/office/excel-data-types-stocks-and-geography-61a33056-9935-484f-8ac8-f1a89e210877', 'https://www.thespreadsheetguru.com/add-real-time-stock-prices-and-metrics-to-excel/', 'https://exceljet.net/formulas/get-stock-price-latest-close'] ["Type some text in cells. Β· Then select the cells. Β· Although it's not required, we recommend creating an Excel table. Β· With the cells still selected, go to the\xa0...", 'Aug 24, 2021 ... In Excel: Yep, Excel can return stock prices, too. Enter each ticker in its own cell > Highlight the cells > Select βDataβ > βStocksβ > Tap\xa0...', '... price are getting extracted from the Geography data type in column A. Type some text in cells. If you want stock information, type a ticker symbol, company\xa0...', 'Jul 28, 2019 ... If you click the Convert to
...
problem with this page\nHelp us improve Exceljet\nYou must have JavaScript enabled to use this form.\nName\nEmail\nProblem\n*\nYour email address is private and not shared.\nExceljet\nQuick, clean, and to the point\nResources\nTraining\nVideos\nFunctions\nFormulas\nShortcuts\nArticles\nAbout Us\nAbout\nTestimonials\nTopics\nContact\nNewsletter Sign-up\nWork faster in Excel.\nJoin more than 100,000\npeople who get weekly tips from us.\nEmail\nHP\nΒ© 2012-2023 Exceljet.\nTerms of use\nVisit our Twitter page'] 6189 9885 8960
Thanks for reading the article and feel free to join our 100% free email academy by downloading our Python cheat sheets: