Web Scraping with Selenium and Python – Beginners Guide – Installation and Setup

Google

Table of Contents

Introduction

Selenium is one of the most important frameworks when it comes to web scraping. Modern websites are known for rendering dynamic content because it is better on a user experience stand point. However, for us web scrapers, that is more challenging, and using libraries like Requests and Scrapy won’t get the job done is such situations.

Overview

In this first article, we explain how to install Selenium and to use it with Google Chrome and the Python programming language.

Setting up everything requires four things to work which are Google Chrome, Chrome Webdriver, python and Selenium installed on your computer.

How does Selenium work?

Selenium allows us to automate the browser. Meaning that anything we can do manually can be automated. That is useful when it comes to web testing. More than that, it allows us to retrieve the html of the website it renders, which is used for web scraping.

Why use Selenium?

When it comes to scraping dynamic websites (websites that use JavaScript to render pages or section of pages) or automating the interactions with the browser, a framework like Selenium is necessary. There are other Alternatives for this framework, but this framework is the most popular.

Selenium has its pros and cons as any other framework. One should always experiment with many technologies to get a clear idea what tools are suitable for a specific job.

Pros and Cons of using Selenium

Cons

  • Easy to use
  • Can be used for both web scraping and Testing

Pros

  • Relatively slower than other frameworks
  • Installation process can be challenging for beginners (if Chrome Webdriver is setup manually)

Alternatives of Selenium

The most popular alternative is Splash. When it comes to popularity, Selenium is the first and Splash is the second.

One advantage to Splash over Selenium is that it is faster.

Installation and Setup

For the installation process, We first install Selenium using pip (assuming python is installed on your system), then Chrome and finally webdriver either manually or by using a package.

Installing Selenium

To install Selenium itself is a straightforward operation, open the Terminal or the Command Line and type the following

pip install selenium 

Installing Google Chrome

Install Google Chrome directly from the official website.

Installing Chrome WebDriver

there are two ways

1 – install manually (which is the challenging part)

After installing Google Chrome, you will have to know the version that you have. Then install the corresponding version of the Chrome WebDriver.

How to know what version of Google Chrome you have installed?

In Chrome go to the click on the three dots on top right, the click on Settings, and finally click on About Chrome.

Google Chrome version

In my case, as you can see on screen shot above, I have the version 93.0.4577.63 (as per making this article)

I recommend that you always update Chrome to the latest version.

After that We download the WebDriver correspondent to that version.

Go to the following website, and click on the version you have.

Chrome WebDriver website - All versions

It will take you to this page, click on the one correspondent to the operating system you have to start the download.

Chrome WebDriver website - download

Once the file downloaded, unzip it or decompress it and put it in a folder that you remember, we will need to access it via its path.

2 – using package (The easier way)

The second method of using the Chrome WebDriver is by using the webdriver-manager package. This package will automate the whole process for you, it knows what version of chrome you have, and download the corresponding package in a folder that it knows, then link to it for you.

I recommend that you use this method.

Webdriver-manager package used with selenium

To install the webdriver-manager package, just type the code written below in your Terminal, and hit enter:

pip install webdriver-manager

That’s all you need to do for now. This is a package that allows us to manage the Webdriver, the actually Webdriver will be downloaded later when we call it in the python script.

The different ways Selenium is used

Selenium is used either alone meaning with no additional package, or with other packages like Scrapy or Request with Beautifulsoup.

For most project, using it alone is sufficient, but sometimes we come across websites that are not entirely dynamic and using selenium alone to scrape everything would make the process slower. That’s why we use different technologies in such situations. For example, Selenium would handle the dynamic part and Beautifulsoup would handle the static part.

In this post we will Test it alone.

Getting started with Selenium

Now that we have installed everything we need, we will try out selenium with both methods.

Using Chrome WebDriver

To use Selenium with the downloaded WebDriver you start with the boilerplate code below:

# import selenium
from selenium import webdriver

# save path in a variable 
PATH = 'you Chrome WebDriver path'

# instantiate the driver 
driver = webdriver.Chrome(PATH)

We first imported the Selenium framework, then saved the path of the downloaded Chrome WebDriver in a variable we named PATH. And finally, instantiated the driver class by calling webdriver.Chrome(), and passing the PATH to it. The driver object is what allow us to make us to automate the browser and all that.

Using webdriver-manager

The way we use Selenium with the webdriver-manager is as follows:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

We first import the webdriver class from selenium, then import the ChromeDriverManager from webdriver_manager.chrome. And finally instantiating the driver class using webdriver.Chrome(ChromeDriverManager().install()).

Yo will see such messages on your Command Prompt that prints the webdriver’s installation (Line 3 to 7)

Command Prompt showing Chrome WebDriver being installed to used with Selenium
after that, this Chrome window will pop up.
Google Chrome automated using Selenium
You can read on the chrome windows the line that says : Chrome is being controlled by automated test software, which means that Selenium and Webdriver are set up successfully.

Conclusion

In this post we have leaned how to set up Selenium and Webdriver to work with Google Chrome and Python. We have seen that setting up Webdriver via the webdriver-manager packages is more beginner friendly. In the next posts on series, we will dive deep and learn how to use the framework better

Check out my post on how to get started with Beautifulsoup here.

Thank you for reading my post.