Excel Automation using Pandas – How to combine multiple Workbooks and Worksheets the Easy Way

macbook pro

Table of Contents


Pandas is one of the most important python packages. Built on the Numpy package, it allowed for python to expand in the world of Dana Science and Data Analytics. Pandas is an essential when it comes to anything data-related with python, from opening files and documents, to performing complex calculations with few lines of code in fast amounts of time, to saving files in different formats.

In this post, we will have a beginners understanding on how we can use Pandas for Excel automation to handle workbooks and worksheets.


In this article we explore how we use the pandas package to handle excel workbooks and worksheets.

We start by having a basic understanding on how the package works. Then we install it. After that we explain how to open files and read data from them using Pandas, which is always the first step.

We will handle working with workbooks and worksheets in different situations like how to save multiple workbooks into one.

We will use a CSV file to start our work. The data we will use is from the Kaggle website, and it represents the best selling books on amazon from 2009 to 2019.

Note that you can get the data and the script from my GitHub repository here.

How Does Pandas Work?

According to the official documentation: “pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.”

The way it works: “When working with tabular data, such as data stored in spreadsheets or databases, pandas is the right tool for you. pandas will help you to explore, clean, and process your data. In pandas, a data table is called a Dataframe.

I encourage you to check out Pandas official documentation by clicking here.

As a bonus, here’s the official pandas Cheat Sheet.

Installation and Setup

To install pandas (after installing python from python.org) you need to open a Terminal on Mac or Linux, or a Command Prompt on Windows and write the following command:

pip install pandas

Then hit Enter.

Importing pandas

To import pandas we simply write import pandas. However the convention is to import it as a variable named pd, as follows:

import pandas as pd

Opening files and workbooks with pandas – First step towards automation

We can open different files using the pandas package like Excel (xlsx) and CSV. each format has its open method. For Excel we use read_excel(), and for CSV we use read_csv(), then we give the the path of the file we want to open.

Please check out the documentation on how we open other formats here.

#import pandas
import pandas as pd

# open an excel file named demo.xlsx
excel_file = pd.read_excel('demo.xlsx')

# open a csv file named demo.csv
csv_file = pd.read_csv('demo.csv')

Convert CSV file to an Excel workbook

To automatically convert a file from CSV to Excel, we first open the CSV file using Pandas, then save it as an Excel file. Pandas will do the rest. As simple as that.

Note that when saving a file using Pandas, you almost always want to set index = None, Otherwise you will have an Id column generated by pandas in your excel, or any other saved file.

# import pandas
import pandas as pd

# open csv file named 'bestsellers with categories.csv'
read_file = pd.read_csv('bestsellers with categories.csv')

# save to excel file named bestsellersWithCaregories.xlsx
read_file.to_excel('bestsellersWithCaregories.xlsx', index = None, header=True)
CSV file

This first screenshot is the CSV file.

Excel workbook - workbooks

This second screenshot is the excel workbook.

One worksheet to multiple workbooks – Excel automation

To transfer data from one worksheet to multiple workbooks using Pandas, we first need to open the original workbook, then get the worksheet using the read_excel() method, and passing the path of the original file.

If the original file has more than one worksheet, we must specify the name of the wanted worksheet by passing the argument sheet_name = ‘the worksheet name’ (line 7), other wise we don’t (line 4).

After getting the worksheet we dived or filter it however we want into new dataframes. In the example bellow, I separated the data corresponding to the 2015, 2016 and 2017 years into different dataframes (Lines 10, 11 and 12).

Finally, we save each dataframe in a separate file using the to_excel() method (Lines 14, 15, 16).

# import pandas
import pandas as pd

# open excel file - if one worksheet
df_file = pd.read_excel('bestsellersWithCaregories.xlsx')

# open excel file - if multiple worksheets
df_file = pd.read_excel('bestsellersWithCaregories.xlsx', sheet_name='sheetname')

# extract data related to year 2015
results_2015 = df_file[df_file['Year'].apply(str).str.match('2015')]
# extract data related to year 2016
results_2016 = df_file[df_file['Year'].apply(str).str.match('2016')]
# extract data related to year 2017
results_2017 = df_file[df_file['Year'].apply(str).str.match('2017')]

# save 2015 data in a workbook named bestsellersWithCaregories2015.xlsx
results_2015.to_excel('bestsellersWithCaregories2015.xlsx', index = None, header=True)
# save 2016 data in a workbook named bestsellersWithCaregories2016.xlsx
results_2016.to_excel('bestsellersWithCaregories2016.xlsx', index = None, header=True)
# save 2017 data in a workbook named bestsellersWithCaregories2017.xlsx
results_2017.to_excel('bestsellersWithCaregories2017.xlsx', index = None, header=True)
excel 2015 file - pandas -worksheet - workbook - workbooks

Excel workbook that contains the 2015 data.

excel 2016 file - pandas -worksheet - workbook

Here’s the Excel workbook that contains the 2016 data.

excel 2017 file - pandas -worksheet - workbooks

An Excel workbook that contains the 2017 data.

One worksheet to multiple worksheets in the same workbook

To save one worksheet to multiple ones, we open and get the data as we did in the example before, however when saving the data we use a different method which is the ExcelWriter().

The way we use ExcelWriter is that we first create a document by calling the ExcelWriter() method then giving it a name (in the example above we called it final.xlsx – Line 10). After that, save each dataframe into the workbook by passing the writer to the to_excel() method, then giving a name to the worksheet, setting the index to None (Lines 12, 13 and 14), and finally saving the writer (Line 16).

Note that the resulting worksheets can be saved in the original workbook or in a different one.

import pandas as pd

df_file = pd.read_excel('bestsellersWithCaregories.xlsx')

results_2015 = df_file[df_file['Year'].apply(str).str.match('2015')]
results_2016 = df_file[df_file['Year'].apply(str).str.match('2016')]
results_2017 = df_file[df_file['Year'].apply(str).str.match('2017')]

# open a new excel file named final.xlsx using ExcelWriter
writer = pd.ExcelWriter('final.xlsx')

# save the 2015 data in a worksheet named '2015' inside the final.xlsx workbook 
results_2015.to_excel(writer,'2015', index = None)
# save the 2016 data in a worksheet named '2016' inside the final.xlsx workbook 
results_2016.to_excel(writer,'2016', index = None)
# save the 2017 data in a worksheet named '2017' inside the final.xlsx workbook 
results_2017.to_excel(writer,'2017', index = None)

# save the final.xlsx workbook

If you take line 12 for example, what we did was that we saved the dataframe name results_2015 (holds data for year 2015) using the to_excel() method, and passing three arguments:

  • The first argument is writer, which corresponds to the ExcelWriter that opened the final.xlsx file.
  • The second argument is 2015 which corresponds to the name of the worksheet that will hold the data.
  • The third argument is Index = None, which as mentioned before, we are telling Pandas to not create a new column for Ids.

Here’s the resulted workbook:

excel workbook with multiple worksheets - pandas

We can see that we have one workbook with 3 worksheets named 2015, 2016, and 2017.

Multiple worksheets to one workbook or one worksheet

To automate the process of converting multiple excel worksheets into one, we use the concat() method. the concat() method accept a list of the dataframes we want to add together starting by the first one and finishing by last one.

Here’s a code that shows how to achieve that:

# import panads
import pandas as pd

# open excel workbook 'bestsellersWithCaregories2015.xlsx'
df_file_2015 = pd.read_excel('bestsellersWithCaregories2015.xlsx')
# open excel workbook 'bestsellersWithCaregories2016.xlsx'
df_file_2016 = pd.read_excel('bestsellersWithCaregories2016.xlsx')
# open excel workbook 'bestsellersWithCaregories2017.xlsx'
df_file_2017 = pd.read_excel('bestsellersWithCaregories2017.xlsx')

df_2015_2017 = pd.concat([df_file_2015, df_file_2016, df_file_2017])

df_2015_2017.to_excel('bestsellersWithCaregories201520162017.xlsx', index = None, header=True)
Excel multiple workbooks combined into one worksheet


In this post we explored some of the methods used to open and handle different combinations of workbooks and worksheets, namely, excel automation using pandas. Keep in mind that we barely scratched the surface of the packages. The main goal of Pandas is to explore, clean, and process data, which is what we will explore in future articles.

One again, I encourage you to check out the official documentation of Pandas.

If you want to automate Excel using the Openpyxl package, here’s an article for beginners.

Thank you for reading my post!