How does web scraping work? Explanation for programmers

Table of Contents


In this article we will have a technical look and a top-level explanation on how web scraping work under the hood, and understand the fundaments of web scraping with the python programming language using two packages, requests and bs4 (beautifulsoup) which are very common.

General Overview

First we will explain why we will use both requests and beautifulsoup. Then set up our workspace by downloading the required packages. After that, we will understanding HTML tags and CSS attributes since we will target them to scrape the data. Then we will have a look at the inspector tool on the browser to know what HTML and CSS are implemented on an element in order to target that element. finally, we will pass to writing code in python.

Why we need to use both requests and beautifulsoup for web scraping?

The answer to this question is that, requests is used to send a request to the server and then, gets back a response which will be the web page code, and its job ends here.

Then, beautifulsoup takes that web page, parse it in a format that we can work with, and finally extract the data we specify using different method.

That’s why we need both packages for web scraping.

Setting up the work space

First we will set everything up by installing requests and bs4 using pip using the following command (python needs to be installed on your system)

python logo
requests logo - web scraping
beautifulsoup logo - web scraping
pip install requests bs4

HTML tags and CSS Attributes – Overview for web scraping

html 5 logo
css 3 logo

Before we start extracting the data, we need to have a basic understanding on HTML and CSS since we will be using their properties to extract the data from the website.

HTML stands for Hypertext Markup Language which is the structure of elements on the a web page.

Each tag is specific to each element, for example, <p> is for paragraph, <img> is for image, <div> for a section, <h1> for header…etc. When scraping, we target these specific tags. For example if we want a paragraph we would target the <p> tag.

However, we can have multiple paragraphs on a website page, therefor, multiple <p> tags. That’s one of the reasons why we need to use CSS attributes.

CSS stands for Cascading Style Sheets and we use it to style web pages. The most common way we use CSS is by giving classes and ids to html tags, and based on those classes and ids we style the elements.

We as as web scrapers, target those classes and ids and and sometimes other attributes to target elements on a web page and then extract the data.

With all that being said, how do we know what tags, classes and ids are used for each element on a specific web page? That’s where the inspector tool comes in.

The inspector tool – Overview for web scraping

The inspector tool is a tool imbedded in almost every modern browser, it serves many purpose, one of which is displaying the HTML and CSS of web pages.

Using that, we can know each element’s HTML tag or tags and CSS attributes. After we know them, we simply translate that into python code and scrape that element. And we do that for all the elements we want to extract.

How to access the inspector tool ?

We will show an example on Python’s Wikipedia page that’s accessed using this URL:

On your browser :

  • Right click on the page
  • When a menu appears, choose the last item which is “Inspect
inspecting the Wikipedia page on chrome

Let’s have a quick overview on the inspector:

What’s really important to a beginner is the Elements tab since it shows the HTML tags and CSS attributes of all the elements on the web page.

In the following video, we can see the web page on the left ,and the inspector (Elements tab selected) on the right.

All the code you see in the Elements tab is the HTML code of that page, and that’s where we can know each element’s tag and CSS attributes

Now, how do we find each element using the browser’s inspector tool?

We find each specific element as follows:

  • First, click on the select Icon on the Inspector.
  • Then, click on the desired element on the web page.
  • Finally, the element’s html tags will be highlighted on the Elements tab as it’s highlighted by blue in the video (at the end).

Let’s talk about what was highlighted:

In the video, the title is expressed by the following expression:

<h1 id="section_0">Python (programming language)</h1>

Meaning that the title has an html tag of <h1> and a CSS id with the value of section_0. We use that specific information to scrape that specific element.

We do that for all the elements that we want to scrape.

Extracting data using python – The web scraping code

Now we can start the work. Open a python file (.py) and import requests and beautifulsoup (I named my python file

# import requests and beatifulsoup
import requests
from bs4 import BeautifulSoup

Notice that we only imported BeautifulSoup from bs4.

Now we can make our first scraper.

# assigning the URL
URL = ''

# sending a request to that URL and saving the response
response = requests.get(URL)

# parse the web page using beautifulsoup
page = BeautifulSoup(response.content, 'parser.html')

# searching for the title and assigning it's value
title = page.find('h1', {'id': 'section_0'}).text

# printing the title

Let’s look at what we did.

First, we initialized the URL of the page we want to scrape.

After initializing URL, we sent a request using the request.get() method and saved the response (returned web page) in the response variable.

Next, we parsed the response using BeautifulSoup() into the a variable we named page with the help of the html.parser parameter.

Now, the page variable is a beautifulsoup object that we can use to search for elements on the parsed page using their tags and attributes. We searched for the title that has an h1 tag and an id of value section_0 then extracted the text with the .text method.

We then saved that data into the title variable which will allow us to save it in a file or use it in any other way.

Note: We scrape all the information using the same method, for example to scrape other elements we would do the following:

# for a table 
table = page.find('table', {'attribut','attribut value'})

# for a paragraph
text = page.find('p', {'attribut','attribut value'}).text

# and so on...


In this article we saw how web scraping works under the hood and how web scrapers go about it. Remember that this is just a brief introduction and that there’s more to web scraping than what covered on this article.

Web scraping is an ever changing skill and requires from the scraper to always be up to date with the latest technologies and theories.

Thank you for reading my post!