Word Automation – Automate Docx Documents using Python and python-docx

word automation

Table of Contents

Introduction on word automation using python-docx

Nowadays, python automation is everywhere. From simple, every day’s hacks to professional workflows. One of the important areas where automation is used in the corporate and professional world is in Excel and Word automation. Python is used to automate financial reports, combine multiple excel sheets, insert the results of calculations and the graphical results in a Word formatted report that can be then converted into a pdf file…etc. Python can automate it all.

In this article, we will have a look at one of the common python packages that is used for creating, inserting data and modifying a word document. In other words, automating one or multiple tasks and workflows related to a .docx document.

The package that we will explore is python-docx.

Overview

In this first article, we will create a new empty word document. Then we will have a look on how to create and add various text elements like paragraphs and headings. After that we will understand how we can style them by adding different colors transformation the font type. Then, We will pass To adding images. Finally, we will explore how we can add and style tables which is something that can’t be added directly.

Note that this is a beginners friendly post, that’s why we will explain simple manipulations to understand the basics of the python-docx library. After this post I’ll add another post where we explore more stuff and build a real project.

One important thing to put in mind is that python automation can be easy if one understand the basics well, that’s why I made this first easy-to-digest article on word automation.

Installation and Setup

pip install python-docx

Understanding the basics of word automation using python-docx

Python-docx allows us to both, create an empty document and modify an existing one. When creating a new file, everything is added using pure python code. however, when working with an existing document, we usually insert variables into the document, close it, then change those variables using the package.

One important thing to note for beginners is that we must have the file closed in order to apply the modifications using the library. Meaning if you are opening the document in a word processor like MS Word, and execute the python script, you will get an error.

Creating the an empty document – first step towards word automation

To create an empty file we use to following code:

from docx import Document

document = Document()

document.save('demo.docx')

Note that I named the outputted word document demo.docx, you can name it whatever you want as long as it ends with .docx .

Also note that everything we insert in this document must be between the second and third line as follows, and that these three lines of code must always be present.

from docx import Document

document = Document()
############################

# eveything we insert must be in this area

############################
document.save('demo.docx')

Adding Elements – word automation principles

Creating and adding headings using python-docx

We add headings in our document using the add_heading() method. There are 10 levels of heading starting from 0. 0 is the biggest and 9 is the smallest.

the add_heading() method accepts two arguments, the first is the text and the second is the level.

here’s an example:

from docx import Document

document = Document()

document.add_heading('Heading, level 0', level=0)
document.add_heading('Heading, level 1', level=1)
document.add_heading('Heading, level 2', level=2)
document.add_heading('Heading, level 3', level=3)
document.add_heading('Heading, level 4', level=4)
document.add_heading('Heading, level 5', level=5)
document.add_heading('Heading, level 6', level=6)
document.add_heading('Heading, level 7', level=7)
document.add_heading('Heading, level 8', level=8)
document.add_heading('Heading, level 9', level=9)


document.save('demo.docx')

The output is word document with the following headings:

word automation - python-docx - heading

As we can see, we have 10 headings added to our document. We can notice that heading level 0 is the biggest and level 9 is the smallest. We can also see, that by default, the python-docx add the color blue to the headings created.

working with paragraphs using python-docx

Creating paragraphs

We add a paragraph using the add_paragraph() method, where we input the text inside the methode like we did with headings.

from docx import Document

document = Document()

document.add_paragraph('This is a simple plain paragraph.')

document.save('demo.docx')

Resulted document:

word automation - python-docx - paragraph

Styling Text Elements like headings and paragraphs

In order to style text in python-doc, we use what’s known as Runs.

We ofter procceed as follows:

  • Create an element and save it in a variable.
  • we add a run to that variable using .add_run().
  • finally we add the style to that run.

There are many styles you can add to text elements, in this example we will explore how to change color, transform regular text into Bold or italic, and finally combine these styles.

Note that we add styles to all sorts of text this was, whether it’s a paragraph, a heading, a list element, …etc.

# import 
from docx import Document
from docx.shared import Inches, RGBColor

# Create and open the document
document = Document()

####
#### Styling Paragraphs
####

# bold paragraph
p2 = document.add_paragraph('')
p2.add_run('This is a bold paragraph.').bold = True

# italic paragraph
p2 = document.add_paragraph('')
p2.add_run('This is an italic paragraph.').italic = True

# paragraph with an orange color
p2 = document.add_paragraph('')
p2.add_run('This is an orange paragraph.').font.color.rgb = RGBColor(255, 165, 0)

# paragraph with mixed style
p2 = document.add_paragraph('This paragraph has a mixture of styles, like ')
p2.add_run('bold ').bold = True
p2.add_run('and ')
p2.add_run('italic ').italic = True
p2.add_run('and ')
p2.add_run('green color.').font.color.rgb = RGBColor(0, 255, 0)

# paragraph with a sentence mixed with two styles, color and italic
p3 = document.add_paragraph('This paragraph has an ')
p3_word = p3.add_run('italic aqua colored ')
p3_word.italic = True
p3_word.font.color.rgb = RGBColor(0,255,250)
p3.add_run('sentence.')

####
#### Styling Headings
####

# heading with red color
h =  document.add_heading('', level=1)
h.add_run('This is a red heading').font.color.rgb = RGBColor(255, 0, 0)

# an italic heading
h2 = document.add_heading('', level=1)
h2.add_run('This is an italic heading').italic = True

# heading with three colors
h4 = document.add_heading('', level=1)
h4.add_run('This is black ').font.color.rgb = RGBColor(0, 0, 0)
h4.add_run('This is yellow ').font.color.rgb = RGBColor(255, 255, 0)
h4.add_run('This is green').font.color.rgb = RGBColor(0,255,0)

# saving and closing the document
document.save('demo.docx')

word automation - python-docx - adding style

Working with lists using python-docx

Creating an ordered list

In order to create an ordered list in python-docx, you need to create myltiple paragraphs (lsame as we did in the section above) and then give them the List Number style as follows:

from docx import Document
document = Document()

document.add_paragraph('first item in ordered list', style='List Number')
document.add_paragraph('second item in ordered list', style='List Number')
document.add_paragraph('third item in ordered list', style='List Number')
    
document.save('demo.docx')

Here is the output of the script above:

word automation - python-docx - ordered list

As you can see, the List Number style gives us the possibility of creating un ordered list.

Creating an unordered List

To create an unorder list (bullets list), we do the same as we did in the ordered list example, we one change the List Number style by the List Bullet style as follows:

from docx import Document
document = Document()

document.add_paragraph('first item in unordered list', style='List Bullet')
document.add_paragraph('second item in unordered list', style='List Bullet')
document.add_paragraph('third item in unordered list', style='List Bullet')
    
document.save('demo.docx')

Output of the code:

python-docx - unordered list

As you can see, the code worked.

Styling Lists

To style list elements, we apply the same style principals and attributes as we did with the paragraphs/ heading since list elements are just text! (check out the examples above).

Adding image using python-docx

To add an image to a word document using the python-docx package, we use the add_picture() method, and we give it the name of the file.

One thing to not is that add_picture() imports the file with its original size, which is something that we don’t always want. We can modify the size of the method by importing a measurement unit like Inches, cm and mm from the docx.shared class.

In the example bellow we we added twice the same image which is name python-log.png and exists in the same folder /directory as our python file. The first time we import it with its original size, and the second time with modify the size.

# import 
from docx import Document
from docx.shared import Inches

# create and open the document
document = Document()

# adding images with original size
document.add_picture('python-logo.png')

# adding image with a size of 1.25 inches
document.add_picture('python-logo.png', width=Inches(1.25))

# save and close the document
document.save('demo.docx')

Here’s the output of the code:

python-docx - add picture

As we can see, when we open the created demo.docx file, we find two images. The first images is large which means it kemp its original size, and the second one is the same image with the dimensions of 1.25 inches.

Working with tables using python-docx

Creating a Table

Working with tables is a bit different from what we seen before. The approach is as follows

  • First, get the data.
  • Then, create a table with one row and as many columns as you need.
  • After that, populate the header row.
  • Finaly, using a loop, populate each row with the data that we have.

# import
from docx import Document

# create and open document
document = Document()

# get table data
records = (
    (1, 'product 1', '99'),
    (2, 'product 2', '13'),
    (3, 'product 3', '43'),
    (4, 'product 4', '104')
)

# add the table
table = document.add_table(rows=1, cols=3)

# add grid 
table.style = 'Table Grid'

# populate the header row
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Id'
hdr_cells[1].text = 'Product'
hdr_cells[2].text = 'Price'

# add a data row for each item
for id, product, price in records:
    row_cells = table.add_row().cells
    row_cells[0].text = str(id)
    row_cells[1].text = product
    row_cells[2].text = price

# save and close the document
document.save('demo.docx')

The output is a word document with the following table:

python-docx - add table

egezrgerger

Styling tables

To style a table we use the style method, the same one we used in the example above the add a grid table.style = ‘built-in value’.

Note that there a lot of built-in styles to choose from, you can find them in the official documentation here (click link and scroll down to find table related properties).

Let’s check three examples here:

# import
from docx import Document

# create and open document
document = Document()

# get table data
records = (
    (1, 'product 1', '99'),
    (2, 'product 2', '13'),
    (3, 'product 3', '43'),
    (4, 'product 4', '104')
)

# add the table
table = document.add_table(rows=1, cols=3)

# add grid
table.style = 'Table Grid'

# populate the header row
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Id'
hdr_cells[1].text = 'Product'
hdr_cells[2].text = 'Price'

# add a data row for each item
for id, product, price in records:
    row_cells = table.add_row().cells
    row_cells[0].text = str(id)
    row_cells[1].text = product
    row_cells[2].text = price


####
#### table 2
####

# add the table
table2 = document.add_table(rows=1, cols=3)

# add grid
table2.style = 'Medium List 1 Accent 1'

# populate the header row
hdr_cells = table2.rows[0].cells
hdr_cells[0].text = 'Id'
hdr_cells[1].text = 'Product'
hdr_cells[2].text = 'Price'

# add a data row for each item
for id, product, price in records:
    row_cells = table2.add_row().cells
    row_cells[0].text = str(id)
    row_cells[1].text = product
    row_cells[2].text = price



####
#### table 3
####

# add the table
table3 = document.add_table(rows=1, cols=3)

# add grid
table3.style = 'Medium Shading 2'

# populate the header row
hdr_cells = table3.rows[0].cells
hdr_cells[0].text = 'Id'
hdr_cells[1].text = 'Product'
hdr_cells[2].text = 'Price'

# add a data row for each item
for id, product, price in records:
    row_cells = table3.add_row().cells
    row_cells[0].text = str(id)
    row_cells[1].text = product
    row_cells[2].text = price

# save and close the document
document.save('demo.docx')

The output of the script:

styling tables

As we can see, we have three different styles, in the first one we just added the grid using ‘Table Grid’, in the second one we set style equals to ‘Medium List 1 Accent 1’, and in the final one we set style equals to ‘Medium Shading 2’.

As mentioned before, check out the official documentation to explore all the available styles.

Conclusion

In this article we leaned the basics of the word automation using the python-docx package. We explored how to create, add and style text based elements lie headings and lists, as well as other elements. Note that these are just simple examples to ease the way for you to start using the library and get more comfortable with it. In the next post I’ll dive deeper in python-docx and manipulate an existing file with a template which is what most people do.

Thank you for reading my post!