Skip to main content

Chapter 14 - Excel Spreadsheets (Python)

Here's a concise walkthrough of the main ideas in Chapter 14, each with a small example.


Intro and setup

  • Goal: use openpyxl to read/write .xlsx files from Excel or LibreOffice Calc, automating repetitive spreadsheet tasks.
  • openpyxl works on files, not the live Excel app; you must close the workbook in Excel (or download/re-upload in Office 365) before/after running scripts.
  • Install with pip install openpyxl; examples use example3.xlsx, censuspopdata.xlsx, produceSales3.xlsx, etc., from the book resources.

Reading Excel files

Workbooks, sheets, cells

  • A spreadsheet file is a workbook (.xlsx).
  • Each workbook has sheets (worksheets); one is the active sheet.
  • Sheets have lettered columns (A, B, ...) and numbered rows (1, 2, ...).
  • A cell is identified by its address, like B3.

The example workbook example3.xlsx has sheets Sheet1, Sheet2, Sheet3, with Sheet1 containing a small table of dates, fruit, and quantities.

Opening a workbook

import openpyxl

wb = openpyxl.load_workbook('example3.xlsx')
print(type(wb)) # <class 'openpyxl.workbook.workbook.Workbook'>

Getting sheets from the workbook

wb.sheetnames              # ['Sheet1', 'Sheet2', 'Sheet3']
sheet = wb['Sheet3'] # Worksheet object
print(sheet.title) # 'Sheet3'

active_sheet = wb.active # last viewed sheet in Excel
print(active_sheet.title)

Sheets are Worksheet objects, accessible like dict keys by name, or via wb.active.

Getting cells from a sheet

import openpyxl

wb = openpyxl.load_workbook('example3.xlsx')
sheet = wb['Sheet1']

print(sheet['A1']) # Cell object
print(sheet['A1'].value) # datetime(2035, 4, 5, 13, 34, 2)
c = sheet['B1']
print(c.value) # 'Apples'
print(f'Row {c.row}, Column {c.column} is {c.value}')
print(f'Cell {c.coordinate} is {c.value}')
print(sheet['C1'].value) # 73

Dates are returned as datetime objects, not strings.

You can also use sheet.cell(row=..., column=...) with 1-based indices:

print(sheet.cell(row=1, column=2).value)  # 'Apples'

for i in range(1, 8, 2): # every other row
print(i, sheet.cell(row=i, column=2).value)

Size of the used area:

print(sheet.max_row)     # 7
print(sheet.max_column) # 3

Converting between column letters and numbers

from openpyxl.utils import get_column_letter, column_index_from_string

print(get_column_letter(1)) # 'A'
print(get_column_letter(27)) # 'AA'
print(get_column_letter(900)) # 'AHP'

print(column_index_from_string('A')) # 1
print(column_index_from_string('AA')) # 27

You can also use get_column_letter(sheet.max_column) to get the last column's letter.

Getting ranges, rows, and columns

A rectangular slice:

sheet = wb['Sheet1']

for row_cells in sheet['A1':'C3']:
for cell in row_cells:
print(cell.coordinate, cell.value)
print('--- END OF ROW ---')

You can also use sheet.rows and sheet.columns (convert to list first):

second_col = list(sheet.columns)[1]
for cell in second_col:
print(cell.value) # Apples, Cherries, ...

list(sheet.rows) is a list of row tuples; list(sheet.columns) is a list of column tuples.

Review checklist

To read a cell, the steps are:

  1. import openpyxl
  2. wb = openpyxl.load_workbook(...)
  3. Get sheet via wb.active or wb[sheetname].
  4. Get a Worksheet object.
  5. Use indexing (sheet['B3']) or sheet.cell(row=, column=).
  6. Get a Cell object.
  7. Read cell.value.

Project: Gather Census Statistics

Task: from censuspopdata.xlsx, count for each county:

  • Number of census tracts.
  • Total population.

Spreadsheet Population by Census Tract has columns: A (tract number), B (state abbreviation), C (county name), D (population).

Step 1: Read spreadsheet data

# readCensusExcel.py
import openpyxl, pprint

print('Opening workbook...')
wb = openpyxl.load_workbook('censuspopdata.xlsx')
sheet = wb['Population by Census Tract']
county_data = {}

print('Reading rows...')
for row in range(2, sheet.max_row + 1): # skip header
state = sheet['B' + str(row)].value
county = sheet['C' + str(row)].value
pop = sheet['D' + str(row)].value
# will fill county_data

Step 2: Populate nested dict structure

county_data structure:

{
'AK': {
'Anchorage': {'tracts': 55, 'pop': 291826},
'Bethel': {'tracts': 3, 'pop': 17013},
},
}

Code:

for row in range(2, sheet.max_row + 1):
state = sheet['B' + str(row)].value
county = sheet['C' + str(row)].value
pop = sheet['D' + str(row)].value

county_data.setdefault(state, {})
county_data[state].setdefault(county, {'tracts': 0, 'pop': 0})
county_data[state][county]['tracts'] += 1
county_data[state][county]['pop'] += int(pop)

setdefault ensures nested dictionaries exist before incrementing.

Step 3: Write results to a Python file

Use pprint.pformat to dump the dict as valid Python code.

print('Writing results...')
with open('census2010.py', 'w', encoding='utf-8') as result_file:
result_file.write('allData = ' + pprint.pformat(county_data))
print('Done.')

Then you can import and use it:

import census2010

anchorage = census2010.allData['AK']['Anchorage']
print(anchorage['pop'], anchorage['tracts'])

Ideas for similar programs

Use this pattern (load sheet → loop rows → build structure) to:

  • Compare rows within one sheet.
  • Compare across multiple spreadsheets.
  • Validate data (blank rows, invalid cells).
  • Use spreadsheet data as input for other Python programs.

Writing Excel documents

Creating and saving workbooks

import openpyxl

wb = openpyxl.Workbook() # new blank workbook
print(wb.sheetnames) # ['Sheet']
sheet = wb.active
print(sheet.title) # 'Sheet'

sheet.title = 'Spam Bacon Eggs Sheet'
wb.save('example_new.xlsx')

When editing existing files, always save to a different filename to avoid corrupting the original; Excel must not have the file open when you call save().

Creating and deleting sheets

wb = openpyxl.Workbook()
print(wb.sheetnames) # ['Sheet']

wb.create_sheet() # appended, name 'Sheet1'
wb.create_sheet(index=0, title='First Sheet')
wb.create_sheet(index=2, title='Middle Sheet')
print(wb.sheetnames) # ['First Sheet', 'Sheet', 'Middle Sheet', 'Sheet1']

del wb['Middle Sheet']
del wb['Sheet1']
print(wb.sheetnames) # ['First Sheet', 'Sheet']

wb.save('multi_sheets.xlsx')

Writing values to cells

wb = openpyxl.Workbook()
sheet = wb['Sheet']

sheet['A1'] = 'Hello, world!'
print(sheet['A1'].value) # 'Hello, world!'

wb.save('hello.xlsx')

You assign directly to sheet['A1'] or to sheet.cell(row=..., column=...).value.


Project: Update a Spreadsheet

produceSales3.xlsx has rows: Produce (A), Cost per Pound (B), Pounds Sold (C), TOTAL (D) as a formula like =ROUND(B2*C2, 2).

Task: fix incorrect prices for Garlic, Celery, Lemon in thousands of rows.

Step 1: Data structure for updated prices

# updateProduce.py
import openpyxl

wb = openpyxl.load_workbook('produceSales3.xlsx')
sheet = wb['Sheet']

PRICE_UPDATES = {
'Garlic': 3.07,
'Celery': 1.19,
'Lemon': 1.27,
}

Step 2: Loop rows and update

for row_num in range(2, sheet.max_row + 1):   # skip header
produce_name = sheet.cell(row=row_num, column=1).value
if produce_name in PRICE_UPDATES:
sheet.cell(row=row_num, column=2).value = PRICE_UPDATES[produce_name]

wb.save('updatedProduceSales3.xlsx')

Because D uses formulas, Excel recalculates totals automatically when B changes.

Ideas for similar programs

  • Copy data between spreadsheets.
  • Pull data from websites/files/clipboard into Excel.
  • "Clean" spreadsheets: standardize phone numbers, strip whitespace, etc., using regex and text processing.

Styling cells: fonts

Import Font and assign to cell.font.

from openpyxl.styles import Font
import openpyxl

wb = openpyxl.Workbook()
sheet = wb['Sheet']

italic_24_font = Font(size=24, italic=True)
sheet['A1'].font = italic_24_font
sheet['A1'] = 'Hello, world!'

wb.save('styles3.xlsx')

Useful Font keyword args:

  • name: 'Calibri', 'Times New Roman', ...
  • size: integer point size.
  • bold: True/False.
  • italic: True/False.

Example with two styles:

bold_font = Font(name='Times New Roman', bold=True)
sheet['A1'].font = bold_font
sheet['A1'] = 'Bold Times New Roman'

italic_font = Font(size=24, italic=True)
sheet['B3'].font = italic_font
sheet['B3'] = '24 pt Italic'
wb.save('styles3.xlsx')

Formulas

Formulas are just strings starting with =; openpyxl writes them, Excel calculates them.

Set a formula:

sheet['B9'] = '=SUM(B1:B8)'

Simple example:

import openpyxl

wb = openpyxl.Workbook()
sheet = wb['Sheet']
sheet['A1'] = 200
sheet['A2'] = 300
sheet['A3'] = '=SUM(A1:A2)'
wb.save('writeFormula3.xlsx')

openpyxl does not evaluate the formula; Excel does when you open/save the file.

If you open in Excel, save, then reload with data_only=True, you can read computed values instead of formula text:

# After opening writeFormula3.xlsx in Excel and saving it
wb = openpyxl.load_workbook('writeFormula3.xlsx', data_only=True)
sheet = wb['Sheet']
print(sheet['A3'].value) # 500 (not '=SUM(A1:A2)')

Overall idea of the chapter

Chapter 14 shows how to automate Excel tasks with openpyxl: reading workbooks/sheets/cells, building data structures from spreadsheet rows (census project), writing and updating cells (produce project), styling with fonts, and writing formulas. The key pattern is: load workbook → loop rows → read or write cells → save.