Python library for exporting formulas to Excel and other formats

★ Posted on August 16, 2020

One of the typical challenges when exporting numerical computations is how to include the way the calculations were achieved. Typically, if results should be in an Excel sheet, you want to see a formula if you click on each cell. A naive method is using some Excel driver (like XlsxWriter) directly. But if you are dealing with a big Excel sheet (or multiple sheets in one file), this approach becomes quite inconvenient. Therefore, this article presents a library called Portable Spreadsheet - it can easily export defined formulas in Python to many formats, including Excel and JSON.

Simple problem

Say that you have an e-shop written in Python and want to export some simple Excel sheet that summarizes revenues per category. There can be just two columns to simplify things, one representing the category and another representing revenue.

Figure 1: Simple problem definition
Figure 1: Simple problem definition

In the end, you want to see some statistics, like the sum of all revenues for each category, the minimal revenue in all categories, and say the sum of food and drinks as a new category (called grocery). What is crucial, the Spreadsheet should contain formulas and be editable. Meaning, when someone clicks on the cell, there should be a computation definition.

Direct approach with Excel drivers

Multiple drivers are capable of handling the XLSX 2010 format. The most important ones are XlsxWriter and openpyxl.

XlsxWriter driver

You can tackle this issue directly using some driver for Excel available in Python. Driver, in this sense, means a library that allows you to handle files in Excel format. A very simple yet effective one is XlsxWriter - the following example demonstrate how it works:

import xlsxwriter
workbook = xlsxwriter.Workbook('revenues.xlsx')
worksheet = workbook.add_worksheet()
food_rev = 1000
drinks_rev = 100
headphones_rev = 300
# Write categories
worksheet.write(0, 0, "Food")
worksheet.write(1, 0, "Drinks")
worksheet.write(2, 0, "Headphones")
# Write revenues
worksheet.write(0, 1, food_rev)
worksheet.write(1, 1, drinks_rev)
worksheet.write(2, 1, headphones_rev)
# Write results labels
worksheet.write(3, 0, "Total")
worksheet.write(4, 0, "Minimal")
worksheet.write(5, 0, "Grocery")
# Write results
worksheet.write_formula(
    3, 1, "=SUM(B1:B3)",
    value=(food_rev + drinks_rev + headphones_rev)
)
worksheet.write_formula(
    4, 1, "=MIN(B1:B3)",
    value=min(food_rev, drinks_rev, headphones_rev)
)
worksheet.write_formula(5, 1, "=B1+B2",
                        value=food_rev + drinks_rev)
# Close the sheet (in order to write it to the file)
workbook.close()

This code is inflexible, as all values and formulas are explicitly written into each cell. That makes code editing difficult and prone to error. The typical bug is when the values do not match the formula - Excel (or LibreOffice calc) has no way to recognize it as it only shows what is in the cell unless something is updated. Code is also quite complex, despite that example is incomplete - for example, styles are missing.

Also, is that when you decide to export to a different format (say JSON), you have to write this code again.

openpyxl driver

A very similar driver is the openpyxl library (pip install openpyxl). This driver is slightly smarter than XlsxWriter as it allows not only writing of values but also reading from Excel files. Another advantage is the support for writing formulas without defining values - this makes code simpler. However, if you open the output in Excel, it takes longer to interpret correct values in each cell. Also, the driver works itself is slower than the XlsxWriter library. Our example would look like this:

import openpyxl
workbook = openpyxl.Workbook()
worksheet = workbook.active
food_rev = 1000
drinks_rev = 100
headphones_rev = 300
# Write categories
worksheet["A1"] = "Food"
worksheet["A2"] = "Drinks"
worksheet["A3"] = "Headphones"
# Write revenues
worksheet["B1"] = food_rev
worksheet["B2"] = drinks_rev
worksheet["B3"] = headphones_rev
# Write results labels
worksheet["A4"] = "Total"
worksheet["A5"] = "Minimal"
worksheet["A6"] = "Grocery"
# Write results
worksheet["B4"] = "=SUM(B1:B3)"
worksheet["B5"] = "=MIN(B1:B3)"
worksheet["B6"] = "=B1+B2"
# Close the sheet (in order to write it to the file)
workbook.save('revenues1.xlsx')

As you can see, the code is slightly more straightforward. However, one obvious disadvantage is that cell coordinates must be passed as strings and not as matrix integer coordinates - but an internal function can help transform integer position into Excel definition.

Generally, the code structure is almost identical to the previously discussed driver (with described peculiarities). Also, all described advantages and disadvantages hold even in this case.

Portable Spreadsheet approach

High-level object-oriented programming languages have a way to deal with the inconveniences of this type called encapsulation. In this case, commonly used constructs (like a sheet) are defined as a class with some typical operations (in our case, sum, minimal value, adding). You can then call these methods on concrete sheet objects. This is what the library called Portable Spreadsheet does.

The library Portable Spreadsheet reacts to a missing package that would encapsulate the sheet. As the name suggests, it does not export only to Excel (or LibreOffice) format but also to other formats like JSON, HTML, and user-defined ones - therefore portable. Technically, it is a Python library that can easily be installed using pip install portable-spreadsheet.

The previous example written using Portable Spreadsheet library would look like this:

import portable_spreadsheet as ps

# Create the spreadsheet with the correct labels
sheet = ps.Sheet.create_new_sheet(
    6, 1,
    rows_labels=['Food', 'Drink',
                 'Headphones', 'Total',
                 'Minimal', 'Grocery'],
    columns_labels=['Revenue']
)
# Set the values (in the Pandas-like way)
sheet.loc['Food', 'Revenue'] = 15000
sheet.loc['Drink', 'Revenue'] = 16000
sheet.loc['Headphones', 'Revenue'] = 1000
# Set the computations
sheet.loc['Total', 'Revenue'] = \
    sheet.loc['Food':'Total', 'Revenue'].sum()
sheet.loc['Minimal', 'Revenue'] = \
    sheet.loc['Food':'Total', 'Revenue'].min()
sheet.loc['Grocery', 'Revenue'] = (
        sheet.loc['Food', 'Revenue'] +
        sheet.loc['Drink', 'Revenue']
    )
# Export result to Excel
sheet.to_excel("rev.xlsx")
# Export result to JSON
json = sheet.to_json()

This example shows the fundamental logic of the Portable Spreadsheet library. The updated logic removes all redundancies (compared to the naive approach using driver directly). The concept remotely resembles Pandas DataFrame logic - you can now directly set the cell's value and use the cell in operations.

How does Portable Spreadsheet work?

Portable Spreadsheet internally keeps the value of each cell (either string or numerical value) and then the defining string for each cell and each language. If we say language here, it is essential to mention what it means - it is a generic concept Noam Chomsky first defined. In our case, we are talking about so-called Context-Free languages. Therefore, each formula in Excel is just a word in a language that uses particular grammar. The same holds true for JSON - which is just another Context-Free language or HTML or XML, etc.

Therefore, the library defines the grammar for each language in use and allows you to describe languages. Practically, if you want to sum values in a column, you need to have a definition for summation, aggregation of values, and insertion into a formula. The picture below shows the example of this logic for summation.

Figure 2: Word construction process
Figure 2: Word construction process

Generally, each operation has its prefixes, suffices, separators and operands. The order and values of each must be specified when the operation grammar is defined. As in Excel, words (operations) can be nested, which means inserting words into another word. Practically, each language defines many operations, like aggregation functions, binary and unary operations, references to cells, and others. All the library does is compose words based on overloaded operators (or direct method calls) and keep their value for each language and cell.

Summary

If there is a need to export data into Excel format with formulas, two options are generally available. One uses some Excel file driver directly, and another uses the Portable Spreadsheet library. If you choose to use drivers directly, there are two popular XLSX 2010 file format drivers in Python - XlsxWriter and openpyxl. The disadvantage of this approach is that the code is complex and fragile. Another option is to use the encapsulation of the driver, the Portable Spreadsheet library. It encapsulates all standard operations and allows export into multiple formats like JSON, XLSX or XML.

❋ Tags: Data Visualisation Exporting Python Design Excel