Excel Automation – Automate Xlsx Workbooks using Python and Openpyxl

excel - macbook pro

Table of Contents

Introduction

Excel automation is the most in use form of office automation. Using python, we can automation any operation that require manual human input. The idea behind automating a .xlsx document is so that we gain time automating a routinely work.

Nowadays, we don’t just automate certain tasks like combining multiple files, or executing certain operations and calculations in a specified worksheet. Today we automate entire workflows, from getting the data, to populating the excel workbook with these data, to then performing all required calculations and finally outputting the results in the same or another excel file, or even a word document. Python can automate it all.

In this tutorial, we will explore one of the popular packages in python that is used for excel automation, openpyxl.

Overview

We will start our tutorial by installing the package. Then we will start by working with workbooks. Then pass to worksheets. After that we will pass to working with individual cells. Finally, we will make a project which consists of summing up everything we will see in this article.

Note that excel automation is a vast subject, and this article is about getting you started quickly.

For official documentation click here.

Installation and set up – excel automation environment

To install the openpyxl package, we use pip (after you have installed python).

Open a Terminal on Mac or Linux or a Command Prompt on windows and type the following:

pip install openpyxl

And hit Enter. It will start installing.

How does the package work? – principals of excel automation with openpyxl

The openpyxl packages is a versatile packages that allow us to do anything with excel documents. First we can start by creating a new workbook or open an existing one. Then we can get the active worksheet inside the workbook or get all the worksheets in the case of an existing workbook, we can change the title of any worksheet and most importantly, get the values, append new values, and modify individual cells’ values inside of a worksheet.

When modifying or appending values, we must save the document either by replacing the original on or by saving in a new one.

The openpyxl package allows us to modify, add, and remove style from individual cells. Styling translates to applying colors and changing font weight and so on.

One important thing to note for beginners is that when you save the file at the end using python, the excel workbook must be closed and not opened in excel or another xlsx processor otherwise you will get an error.

Also keep in mind that the openpyxl packages only work with files that have the .xlsx extension only.

Working with Workbooks – First step towards excel automation

To start with, a workbook is the entire excel document, and as mentioned before, we use the package to either create a new empty one or open an existing one. Choosing one method over another depends on your use case. For example, if you have an existing file that holds some data and you want to automate some operations, then you will open it. On the other hand, if you have some data in a certain format other than xlsx, like JSON or CSV, you would open this file, then create a new excel workbook, and insert the data in…etc.

creating a new workbook

We create a new document by importing the Workbook class, creating an instance of it, and then saving that instance using the save() method as follows:

# import the Workbook class
from openpyxl import Workbook

# Create and open the workbook
wb = Workbook()

# save the workbook under the name demo.xlsx
wb.save('demo.xlsx')

Opening an existing workbook

To open an existing file we import the load_workbook class, and create an instance of it, then adding the path of the file we want to open. in this case I’ve put the demo.xlsx file in the same directory (folder) as my python script so I’ll just need to give it the name. Here’s the code:

import the load_workbook class
from openpyxl import load_workbook

# open file named demo.xlsx that's in the same folder as our python file
wb = load_workbook('demo.xlsx')

Working with worksheets – Second step towards excel automation

In an excel document, worksheets are the sheets inside of the workbook, an excel file can have just one or multiple worksheets. The more complex the document, the more worksheets we add to simplify the work.

Using the openpyxl packages, we can know the names of all the worksheets that we have in a workbook, get the one we want, change its title and apply any modification to it that we usually do manually.

Getting the active worksheet

The active worksheet is the one we find ourselves in when opening the file using either excel or python. In case we have only one worksheet in or document, it becomes the active worksheet by default.

We get the worksheet using the active method as follows:

from openpyxl import load_workbook

wb = load_workbook('demo.xlsx')

# getting the active worksheet from the wb workbook
ws = wb.active

Getting the name of all the worksheet in a workbook

Getting the name (title) of all the worksheets is very important since we would later select a worksheet by its name , to get all the names we use the sheetnames method.

Here’s an example:

from openpyxl import load_workbook

wb = load_workbook('demo.xlsx')

# getting all worksheets name and saving them in the variable ws_titles
ws_titles = wb.sheetnames

Creating a new worksheet

Creating new empty worksheet is necessary for either outputting results in or using it as a way to organize our code like we mentioned earlier. We can create a new one in a newly created workbook or in an existing one.

We create a new worksheet using the create_sheet() method, we give it a name, then we save it.

Here’s an example:

from openpyxl import load_workbook

wb = load_workbook('demo.xlsx')

# create a new worksheet under the name new_worksheet - you can give it any name you want 
wb.create_sheet('new_worksheet')

wb.save('demo.xlsx')

Note that we can change the name of the sheet later.

Access a specific worksheet by its name

We access a specific worksheet by its name like so:

from openpyxl import load_workbook

wb = load_workbook('demo.xlsx')

# access specific worksheet by its name
ws = wb['Sheet1']

We usually get all worksheets names, then access them one by one using those names.

Get and change title of a worksheet

To modify the tile of a worksheet we use the title that comes with the class as follows:

from openpyxl import load_workbook

wb = load_workbook('demo.xlsx')

ws = wb['Sheet1']

# change name of worksheet
ws.title = 'Data'

# save to a document
wb.save('demo.xlsx')

Adding a row of data to a worksheet

The way we add a row is by assigning the value to each cell of the row using the append method. The append method takes in a list:

from openpyxl import load_workbook

wb = load_workbook('demo.xlsx')

ws = wb['Sheet1']

# append a row of data to worksheet
ws.append(['1', '2', '3', "somethhing"])

# save to a document
#wb.save('demo.xlsx')

Working with individual cells – foundation of excel automation

Cells are the building blocks of excel, and working with them is the most important step. In this section we will see how we can select individual cells, get and change their values.

In the final section we will insert data to cells from a dictionary (JSON styled format).

Get a cell’s value – and change a cell’s value

Accessing cells is the same as accessing an element in a python dictionary, where the same name (example A1) is the name, and .value is the value of that cell.

We change the value of that cell easily by assigning a new value to that cell like we assign values in python dictionaries.

Here’s a snippet of code for that:

from openpyxl import load_workbook

wb = load_workbook('demo.xlsx')

#ws = wb.active

# getting value of cell A1 
cell_a1_value = ws['A1'].value

# changing value of cell D4
ws['D4'] = 'Hola'

# save to a document
#wb.save('demo.xlsx')

Merging and unmerging cells

Merging cells is turning multiple cells into one, whether they are in row or in multiple rows.

from openpyxl import load_workbook

wb = load_workbook('demo.xlsx')

#ws = wb.active

#merge cells - merging a row
ws.merge_cells('A1:D1')

# merge cells - merging an area
ws.merge_cells('A3:D4') 

# unmerge merge_cells
ws.unmerge_cells('A1:D1')

# save to a document
#wb.save('demo.xlsx')

Insert and delete a rows at specified position

To insert an empty row at a specific position we use the insert_rows() method and we pass it the number of the row.

The delete rows() method is used to achieve the opposite which is deleting a row at a certain position.

from openpyxl import load_workbook

wb = load_workbook('demo.xlsx')

#ws = wb.active

# insert empty row at position 7
ws.insert_rows(7)

# delete rows at position 11
ws.delete_rows(11)

# save to a document
#wb.save('demo.xlsx')

Inserting and deleting columns at specified position

Same as we did with rows, we can add empty columns and delete ones using the two methods, insert_cols() and delete_cols(), then passing the number corresponding to the position on the alphabet (example: A=1, D=4).

here’s a code snippet for that:

from openpyxl import load_workbook

wb = load_workbook('demo.xlsx')

#ws = wb.active

# insert empty column at position 4 meaning D
ws.insert_cols(4)

# delete column at position 5 meaning E
ws.delete_cols(5)

# save to a document
#wb.save('demo.xlsx')

Move an entire section

Using the openpyxl packages, we can move entire sections from one place to another using the move_range() method.

We specify the beginning and the end of the section, then how many rows and how many columns should it move by. A positive number means move from left to right for columns and from up to down for rows. The opposite applies for negative numbers.

from openpyxl import load_workbook

wb = load_workbook('demo.xlsx')

#ws = wb.active

# move columns and rows - move two rows down and move two columns to the left
ws.move_range('C1:D11' rows=2, cols=2)

# save to a document
#wb.save('demo.xlsx')

How to add styles to worksheets

One important thing to keep in mind is that we can only style individual cells, meaning that if we want to style an entire row, we must style each cell at a time. Usually, styles are added to a row by looping through all the cells that make up that row.

We style cells using the Font class that we can get from openpyxl.styles

for example if we want to change the font weight from regular to bold, and color to blue of the cell A1, we would do that as follow:

from openpyxl.styles import Font

# change text of cell A1 to blue bold
ws.['A1'].font = Font(bold=True, color='0099CCFF')

Style is a larger subject, it’s better to check the documentation for all styles you can apply, and all colors and so on here.

Populate a worksheet with data – Applying excel automation to a project

Before we get started I want to mention that you can find the code of this project on my GitHub by clicking here.

Now, let’s pass to a real example. We have data of five Sellers that sell the same four Products, and we want to assert them to an excel document, change the header to blue bold, and calculate the average value of each product’s price.

The plan of work:

  • Create and open a new excel workbook
  • Get the active worksheet and change it’s name to “Sellers Listings”.
  • Append the headings to the worksheet (Product1, Product2, Product3, Product4)
  • Insert the value of each product for each Seller
  • Calculate the average of each product.
  • Add styling to the headings.
  • Save the workbook under the name SellersListings.xlsx.

Here’s the code:

# required imports
from openpyxl import Workbook
from openpyxl.utils import get_column_letter
from openpyxl.styles import Font

# our data 
data = {
	"Seller1": {
		"product1": 78,
		"product2": 62,
		"product3": 14,
		"product4": 8
	},
	"Seller2": {
		"product1": 75,
		"product2": 64,
		"product3": 12,
		"product4": 9
	},
	"Seller3": {
		"product1": 77,
		"product2": 62,
		"product3": 11,
		"product4": 9
	},
	"Seller4": {
		"product1": 75,
		"product2": 63,
		"product3": 14,
		"product4": 9
	},
	"Seller5": {
		"product1": 76,
		"product2": 62,
		"product3": 13,
		"product4": 8
	}
}

# create and open a new workbook
wb = Workbook()

# get the active worksheet in the workbook
ws = wb.active

# change the tile of the workbook
ws.title = "Sellers Listings"

# create the headings list from the data we have 
headings = ['Seller'] + list(data['Seller1'].keys())

# adding the heading to the worksheet
ws.append(headings)

# adding the values of data to the worksheet
for seller in data:
	products = list(data[seller].values())
	ws.append([seller] + products)

# calculating the average value of each product
for col in range(2, len(data['Seller1']) + 2):
	char = get_column_letter(col)
	ws[char + "7"] = f"=SUM({char + '2'}:{char + '6'})/{len(data)}"

# styling the headings
for col in range(1, 6):
	ws[get_column_letter(col) + '1'].font = Font(bold=True, color="0099CCFF")

# saving the file
wb.save("SellersListings.xlsx")

The results:

excel workbook - worksheet - excel automation - openpyxl

As we can see, the output is an excel workbook that has one worksheet titled Sellers Listings, where the first row is the headings which has a bold weight and a blue colored text. Each row represents a Seller and each column represent a Product. Finally, the last row (7) represents the average price of each products.

Conclusion

In this article we explored excel automation using python’s openpyxl packages. Note that what we saw is just a little portion of what the package is capable of. As mentioned before, we can literally automate every single manual process in excel using python.

In the next Articles on excel, I’ll go even deeper, showcasing more complex and useful operations. We will explore other packages as well like Pandas and xlwings which are very common.

I encourage you to read the documentation yourself.

If you are searching for a Word document (Docx) automation article for beginners, click here.

Thank you for reading my post!