How to Get HTML Source in Selenium?

Using Selenium for web scraping often requires obtaining the HTML source of a webpage. This is especially important when dealing with dynamic websites where the content changes based on user interactions or JavaScript execution. Selenium, a powerful tool for web automation, makes this task straightforward.

To get the HTML source of a webpage using Selenium, you can use the page_source attribute. This attribute retrieves the entire HTML content of the current page, which can then be parsed or processed as needed. Below is an example of how to get the HTML source in Selenium with Python:

      from selenium import webdriver

# Set up the WebDriver (using Chrome in this example)
driver = webdriver.Chrome()

# Navigate to the desired webpage
driver.get('https://www.example.com')

# Get the HTML source of the page
html_source = driver.page_source

# Print the HTML source
print(html_source)

# Close the WebDriver
driver.quit()
    

In this example, the WebDriver navigates to a specified URL, retrieves the HTML source using page_source, and then prints it. This method is useful for scraping dynamic websites as it captures the fully rendered HTML after any JavaScript has been executed.

Conclusion

Using Selenium for web scraping allows you to interact with web elements, simulate user actions, and retrieve data from dynamic websites. However, building and maintaining your own scrapers can be time-consuming and complex. Instead, you can leverage Bright Data’s web scraping APIs to scrape websites with ease. These APIs handle all the intricacies of web scraping, providing structured data via an API to any application, saving you the hassle of managing scrapers and ensuring high-quality results.

 

Ready to get started?