Selenium is one of the most important frameworks when it comes to web scraping. Modern websites are known for rendering dynamic content because it is better on a user experience stand point. However, for us web scrapers, that is more challenging, and using libraries like Requests and Scrapy won’t get the job done is such situations.
In this first article, we explain how to install Selenium and to use it with Google Chrome and the Python programming language.
Setting up everything requires four things to work which are Google Chrome, Chrome Webdriver, python and Selenium installed on your computer.
How does Selenium work?
Selenium allows us to automate the browser. Meaning that anything we can do manually can be automated. That is useful when it comes to web testing. More than that, it allows us to retrieve the html of the website it renders, which is used for web scraping.
Why use Selenium?
Selenium has its pros and cons as any other framework. One should always experiment with many technologies to get a clear idea what tools are suitable for a specific job.
Pros and Cons of using Selenium
- Easy to use
- Can be used for both web scraping and Testing
- Relatively slower than other frameworks
- Installation process can be challenging for beginners (if Chrome Webdriver is setup manually)
Alternatives of Selenium
The most popular alternative is Splash. When it comes to popularity, Selenium is the first and Splash is the second.
One advantage to Splash over Selenium is that it is faster.
Installation and Setup
For the installation process, We first install Selenium using pip (assuming python is installed on your system), then Chrome and finally webdriver either manually or by using a package.
To install Selenium itself is a straightforward operation, open the Terminal or the Command Line and type the following
pip install selenium
Installing Google Chrome
Install Google Chrome directly from the official website.
Installing Chrome WebDriver
there are two ways
1 – install manually (which is the challenging part)
After installing Google Chrome, you will have to know the version that you have. Then install the corresponding version of the Chrome WebDriver.
How to know what version of Google Chrome you have installed?
In Chrome go to the click on the three dots on top right, the click on Settings, and finally click on About Chrome.
In my case, as you can see on screen shot above, I have the version 93.0.4577.63 (as per making this article)
I recommend that you always update Chrome to the latest version.
After that We download the WebDriver correspondent to that version.
Go to the following website, and click on the version you have.
It will take you to this page, click on the one correspondent to the operating system you have to start the download.
Once the file downloaded, unzip it or decompress it and put it in a folder that you remember, we will need to access it via its path.
2 – using package (The easier way)
The second method of using the Chrome WebDriver is by using the webdriver-manager package. This package will automate the whole process for you, it knows what version of chrome you have, and download the corresponding package in a folder that it knows, then link to it for you.
I recommend that you use this method.
To install the webdriver-manager package, just type the code written below in your Terminal, and hit enter:
pip install webdriver-manager
That’s all you need to do for now. This is a package that allows us to manage the Webdriver, the actually Webdriver will be downloaded later when we call it in the python script.
The different ways Selenium is used
Selenium is used either alone meaning with no additional package, or with other packages like Scrapy or Request with Beautifulsoup.
For most project, using it alone is sufficient, but sometimes we come across websites that are not entirely dynamic and using selenium alone to scrape everything would make the process slower. That’s why we use different technologies in such situations. For example, Selenium would handle the dynamic part and Beautifulsoup would handle the static part.
In this post we will Test it alone.
Getting started with Selenium
Now that we have installed everything we need, we will try out selenium with both methods.
Using Chrome WebDriver
To use Selenium with the downloaded WebDriver you start with the boilerplate code below:
# import selenium from selenium import webdriver # save path in a variable PATH = 'you Chrome WebDriver path' # instantiate the driver driver = webdriver.Chrome(PATH)
We first imported the Selenium framework, then saved the path of the downloaded Chrome WebDriver in a variable we named PATH. And finally, instantiated the driver class by calling webdriver.Chrome(), and passing the PATH to it. The driver object is what allow us to make us to automate the browser and all that.
The way we use Selenium with the webdriver-manager is as follows:
from selenium import webdriver from webdriver_manager.chrome import ChromeDriverManager driver = webdriver.Chrome(ChromeDriverManager().install())
We first import the webdriver class from selenium, then import the ChromeDriverManager from webdriver_manager.chrome. And finally instantiating the driver class using webdriver.Chrome(ChromeDriverManager().install()).
Yo will see such messages on your Command Prompt that prints the webdriver’s installation (Line 3 to 7)
In this post we have leaned how to set up Selenium and Webdriver to work with Google Chrome and Python. We have seen that setting up Webdriver via the webdriver-manager packages is more beginner friendly. In the next posts on series, we will dive deep and learn how to use the framework better
Check out my post on how to get started with Beautifulsoup here.
Thank you for reading my post.