Python DocX Charts

Where I do charts the stupid way
Published on Monday, 25 September 2023

DocX content the stupid way:

Python-docx is a great tool for working with docx files in Python. One problem is that it doesn't do graphs. We needed it to print some pretty reports, and had to get creative.

Of course, first we searched high and low for a Python solution, but none seemed to exist. Then we looked for a C# solutions to maybe find something closer to Microsoft's ecosystem. Short of OpenXmlSDK2, which is simply too low level to be useful, only a few expensive, proprietary libraries seemed to have chart capabilities.

So we got creative. We took the knowledge we had and put it to use:

  • DocX files are simply zip files. Unzip them to get all the goodies
  • DocX is made up of a lot of XML files
  • BeautifulSoup is excellent for working with XML
  • When making charts in a DocX file, it embeds an excel file with the data
  • Openpyxl is a great library for working with excel files

Right, so armed with this knowledge, we got to work. Instead of using Python-DocX to write files directly, we prepared word templates with the charts we wanted. So for example, a simple word file with just a pie chart. We can style it, resize it and add whatever headers we want.

Screenshot of Microsoft Word's 'Add Chart' dialog

Screenshot of Microsoft Word document with a PieChart in it

Great. With such templates in place, the steps to fill in data for the charts are:

  • unzip the word template into a temporary folder
  • Locate the excel file and modify the data series with openpyxl
  • Locate the chart XML file and modify it with BeautifulSoup
  • Zip the file back up
  • Delete the temporary folder to cleanup

Not saying this is pretty or particularly smart, but it works.

There is one fun catch, which is that you need to edit the XML as well as the Excel. Technically, you can get away with only editing the XML, because then when you open the word file, your changes will be visible right away. But if you then click the ‘refresh’ button on the chart, it will reload from the embedded excel file.

So to make sure you get it right when you open the file as well as when data is refreshed, you need to edit both the XML and the Excel.

What about using OpenXmlSDK2? Well, I'm sure it is possible. Seems like the SDK can do anything the office suite of tools can do. The problem is that there are no guard rails and no shortcuts. You have to do everything yourself: add the elements, retain the references, update excel formulas, etc. I tried a couple of approaches but all attempts ended up with corrupted files that would open, and Word didn’t offer any debugging information at all , other than “sorry, file not working 🙃”.

So for my usage at least, this was the most straightforward approach I could get working, and it has the added benefit, that I can style and prepare templates in word, so I know exactly what it’ll look in the end.

Finally, here is a short code snippet to give an idea of what it looks like. It may look a bit odd and confusing, but try unzipping a word file with a PieChart and look at the XML. Should help it making sense of it.

First, Install requirements with pip install python-docx openpyxl bs4, and then

from bs4 import BeautifulSoup
from openpyxl.reader.excel import load_workbook

# The values to put in the PieChart
list_of_labels = ["foo", "bar", "baz"]
list_of_values = [1, 42, 1337]

template_path = "/path/to/the/template.docx"
tmp_dir = "/tmp/workdir"

# ------------------------------
# Unzip the docx
# ------------------------------
os.makedirs(tmp_dir, exist_ok=True)
with zipfile.ZipFile(template_path, "r") as zip_ref:
    zip_ref.extractall(temp_dir)

# ------------------------------
# Load and fix the docx xlsx
# ------------------------------
xlsx_path = os.path.join(
    temp_dir, "word", "embeddings", "Microsoft_Excel_Worksheet.xlsx"
)
workbook = load_workbook(xlsx_path)
sheet = workbook.active
for i, label in enumerate(list_of_labels):
    sheet[f"A{i+2}"] = label
for i, value in enumerate(list_of_values):
    sheet[f"B{i+2}"] = value
workbook.save(xlsx_path)
workbook.close()

# ------------------------------
# Load and fix the docx xml
# ------------------------------
chart_xml_path = os.path.join(temp_dir, "word", "charts", "chart1.xml")
with open(chart_xml_path) as xml_file:
    contents = xml_file.read()

soup = BeautifulSoup(contents, "xml")
plot_area = soup.find("c:plotArea")

# Fix categories/labels of the pie chart
cat = plot_area.find("c:ser").find("c:cat")
cache = cat.find("c:strCache")

cache.clear()
ptCount = soup.new_tag("c:ptCount", val=str(len(list_of_labels)))
cache.append(ptCount)
for i, key in enumerate(list_of_labels):
    pt = soup.new_tag("c:pt", idx=str(i))
    v = soup.new_tag("c:v")
    v.string = key
    pt.append(v)
    cache.append(pt)

# Fix values of the chart
val = plot_area.find("c:ser").find("c:val")
cache = val.find("c:numCache")

cache.clear()
ptCount = soup.new_tag("c:ptCount", val=str(len(list_of_values)))
cache.append(ptCount)
for i, key in enumerate(list_of_values):
    pt = soup.new_tag("c:pt", idx=str(i))
    v = soup.new_tag("c:v")
    v.string = str(key)
    pt.append(v)
    cache.append(pt)

with open(chart_xml_path, "w") as xml_file:
    xml_file.write(str(soup))

# ------------------------------
# Recompress and remove tmp folder
# ------------------------------
destination_file = os.path.join(
    os.path.dirname(__file__),
    "..",
    "docx_templates",
    f"my_finished_report.docx",
)
with zipfile.ZipFile(destination_file, "w") as new_zip:
    for foldername, subfolders, filenames in os.walk(temp_dir):
        for filename in filenames:
            file_path = os.path.join(foldername, filename)
            arcname = os.path.relpath(file_path, temp_dir)
            new_zip.write(file_path, arcname)

shutil.rmtree(temp_dir)


Blog Logo

Hi! thanks for dropping in to pick my brain

I write on this blog for my own benefit. I write because I like to, and because it helps me process topics. It is also my own little home on the web and a place for me to experiment.

Since you're here though, I'd love to hear your thoughts on what you've read here, so please leave a comment below. Also, if you like what you read and want to give a small tip, fell free to:

Buy Me A Coffee