Chapter 14 - Excel Spreadsheets (Python)
Here's a concise walkthrough of the main ideas in Chapter 14, each with a small example.
Intro and setup
- Goal: use
openpyxlto read/write.xlsxfiles from Excel or LibreOffice Calc, automating repetitive spreadsheet tasks. openpyxlworks on files, not the live Excel app; you must close the workbook in Excel (or download/re-upload in Office 365) before/after running scripts.- Install with
pip install openpyxl; examples useexample3.xlsx,censuspopdata.xlsx,produceSales3.xlsx, etc., from the book resources.
Reading Excel files
Workbooks, sheets, cells
- A spreadsheet file is a workbook (
.xlsx). - Each workbook has sheets (worksheets); one is the active sheet.
- Sheets have lettered columns (A, B, ...) and numbered rows (1, 2, ...).
- A cell is identified by its address, like
B3.
The example workbook example3.xlsx has sheets Sheet1, Sheet2, Sheet3, with Sheet1 containing a small table of dates, fruit, and quantities.
Opening a workbook
import openpyxl
wb = openpyxl.load_workbook('example3.xlsx')
print(type(wb)) # <class 'openpyxl.workbook.workbook.Workbook'>
Getting sheets from the workbook
wb.sheetnames # ['Sheet1', 'Sheet2', 'Sheet3']
sheet = wb['Sheet3'] # Worksheet object
print(sheet.title) # 'Sheet3'
active_sheet = wb.active # last viewed sheet in Excel
print(active_sheet.title)
Sheets are Worksheet objects, accessible like dict keys by name, or via wb.active.
Getting cells from a sheet
import openpyxl
wb = openpyxl.load_workbook('example3.xlsx')
sheet = wb['Sheet1']
print(sheet['A1']) # Cell object
print(sheet['A1'].value) # datetime(2035, 4, 5, 13, 34, 2)
c = sheet['B1']
print(c.value) # 'Apples'
print(f'Row {c.row}, Column {c.column} is {c.value}')
print(f'Cell {c.coordinate} is {c.value}')
print(sheet['C1'].value) # 73
Dates are returned as datetime objects, not strings.
You can also use sheet.cell(row=..., column=...) with 1-based indices:
print(sheet.cell(row=1, column=2).value) # 'Apples'
for i in range(1, 8, 2): # every other row
print(i, sheet.cell(row=i, column=2).value)
Size of the used area:
print(sheet.max_row) # 7
print(sheet.max_column) # 3
Converting between column letters and numbers
from openpyxl.utils import get_column_letter, column_index_from_string
print(get_column_letter(1)) # 'A'
print(get_column_letter(27)) # 'AA'
print(get_column_letter(900)) # 'AHP'
print(column_index_from_string('A')) # 1
print(column_index_from_string('AA')) # 27
You can also use get_column_letter(sheet.max_column) to get the last column's letter.
Getting ranges, rows, and columns
A rectangular slice:
sheet = wb['Sheet1']
for row_cells in sheet['A1':'C3']:
for cell in row_cells:
print(cell.coordinate, cell.value)
print('--- END OF ROW ---')
You can also use sheet.rows and sheet.columns (convert to list first):
second_col = list(sheet.columns)[1]
for cell in second_col:
print(cell.value) # Apples, Cherries, ...
list(sheet.rows) is a list of row tuples; list(sheet.columns) is a list of column tuples.
Review checklist
To read a cell, the steps are:
import openpyxlwb = openpyxl.load_workbook(...)- Get sheet via
wb.activeorwb[sheetname]. - Get a
Worksheetobject. - Use indexing (
sheet['B3']) orsheet.cell(row=, column=). - Get a
Cellobject. - Read
cell.value.
Project: Gather Census Statistics
Task: from censuspopdata.xlsx, count for each county:
- Number of census tracts.
- Total population.
Spreadsheet Population by Census Tract has columns: A (tract number), B (state abbreviation), C (county name), D (population).
Step 1: Read spreadsheet data
# readCensusExcel.py
import openpyxl, pprint
print('Opening workbook...')
wb = openpyxl.load_workbook('censuspopdata.xlsx')
sheet = wb['Population by Census Tract']
county_data = {}
print('Reading rows...')
for row in range(2, sheet.max_row + 1): # skip header
state = sheet['B' + str(row)].value
county = sheet['C' + str(row)].value
pop = sheet['D' + str(row)].value
# will fill county_data
Step 2: Populate nested dict structure
county_data structure:
{
'AK': {
'Anchorage': {'tracts': 55, 'pop': 291826},
'Bethel': {'tracts': 3, 'pop': 17013},
},
}
Code:
for row in range(2, sheet.max_row + 1):
state = sheet['B' + str(row)].value
county = sheet['C' + str(row)].value
pop = sheet['D' + str(row)].value
county_data.setdefault(state, {})
county_data[state].setdefault(county, {'tracts': 0, 'pop': 0})
county_data[state][county]['tracts'] += 1
county_data[state][county]['pop'] += int(pop)
setdefault ensures nested dictionaries exist before incrementing.
Step 3: Write results to a Python file
Use pprint.pformat to dump the dict as valid Python code.
print('Writing results...')
with open('census2010.py', 'w', encoding='utf-8') as result_file:
result_file.write('allData = ' + pprint.pformat(county_data))
print('Done.')
Then you can import and use it:
import census2010
anchorage = census2010.allData['AK']['Anchorage']
print(anchorage['pop'], anchorage['tracts'])
Ideas for similar programs
Use this pattern (load sheet → loop rows → build structure) to:
- Compare rows within one sheet.
- Compare across multiple spreadsheets.
- Validate data (blank rows, invalid cells).
- Use spreadsheet data as input for other Python programs.
Writing Excel documents
Creating and saving workbooks
import openpyxl
wb = openpyxl.Workbook() # new blank workbook
print(wb.sheetnames) # ['Sheet']
sheet = wb.active
print(sheet.title) # 'Sheet'
sheet.title = 'Spam Bacon Eggs Sheet'
wb.save('example_new.xlsx')
When editing existing files, always save to a different filename to avoid corrupting the original; Excel must not have the file open when you call save().
Creating and deleting sheets
wb = openpyxl.Workbook()
print(wb.sheetnames) # ['Sheet']
wb.create_sheet() # appended, name 'Sheet1'
wb.create_sheet(index=0, title='First Sheet')
wb.create_sheet(index=2, title='Middle Sheet')
print(wb.sheetnames) # ['First Sheet', 'Sheet', 'Middle Sheet', 'Sheet1']
del wb['Middle Sheet']
del wb['Sheet1']
print(wb.sheetnames) # ['First Sheet', 'Sheet']
wb.save('multi_sheets.xlsx')
Writing values to cells
wb = openpyxl.Workbook()
sheet = wb['Sheet']
sheet['A1'] = 'Hello, world!'
print(sheet['A1'].value) # 'Hello, world!'
wb.save('hello.xlsx')
You assign directly to sheet['A1'] or to sheet.cell(row=..., column=...).value.
Project: Update a Spreadsheet
produceSales3.xlsx has rows: Produce (A), Cost per Pound (B), Pounds Sold (C), TOTAL (D) as a formula like =ROUND(B2*C2, 2).
Task: fix incorrect prices for Garlic, Celery, Lemon in thousands of rows.
Step 1: Data structure for updated prices
# updateProduce.py
import openpyxl
wb = openpyxl.load_workbook('produceSales3.xlsx')
sheet = wb['Sheet']
PRICE_UPDATES = {
'Garlic': 3.07,
'Celery': 1.19,
'Lemon': 1.27,
}
Step 2: Loop rows and update
for row_num in range(2, sheet.max_row + 1): # skip header
produce_name = sheet.cell(row=row_num, column=1).value
if produce_name in PRICE_UPDATES:
sheet.cell(row=row_num, column=2).value = PRICE_UPDATES[produce_name]
wb.save('updatedProduceSales3.xlsx')
Because D uses formulas, Excel recalculates totals automatically when B changes.
Ideas for similar programs
- Copy data between spreadsheets.
- Pull data from websites/files/clipboard into Excel.
- "Clean" spreadsheets: standardize phone numbers, strip whitespace, etc., using regex and text processing.
Styling cells: fonts
Import Font and assign to cell.font.
from openpyxl.styles import Font
import openpyxl
wb = openpyxl.Workbook()
sheet = wb['Sheet']
italic_24_font = Font(size=24, italic=True)
sheet['A1'].font = italic_24_font
sheet['A1'] = 'Hello, world!'
wb.save('styles3.xlsx')
Useful Font keyword args:
name:'Calibri','Times New Roman', ...size: integer point size.bold:True/False.italic:True/False.
Example with two styles:
bold_font = Font(name='Times New Roman', bold=True)
sheet['A1'].font = bold_font
sheet['A1'] = 'Bold Times New Roman'
italic_font = Font(size=24, italic=True)
sheet['B3'].font = italic_font
sheet['B3'] = '24 pt Italic'
wb.save('styles3.xlsx')
Formulas
Formulas are just strings starting with =; openpyxl writes them, Excel calculates them.
Set a formula:
sheet['B9'] = '=SUM(B1:B8)'
Simple example:
import openpyxl
wb = openpyxl.Workbook()
sheet = wb['Sheet']
sheet['A1'] = 200
sheet['A2'] = 300
sheet['A3'] = '=SUM(A1:A2)'
wb.save('writeFormula3.xlsx')
openpyxl does not evaluate the formula; Excel does when you open/save the file.
If you open in Excel, save, then reload with data_only=True, you can read computed values instead of formula text:
# After opening writeFormula3.xlsx in Excel and saving it
wb = openpyxl.load_workbook('writeFormula3.xlsx', data_only=True)
sheet = wb['Sheet']
print(sheet['A3'].value) # 500 (not '=SUM(A1:A2)')
Overall idea of the chapter
Chapter 14 shows how to automate Excel tasks with openpyxl: reading workbooks/sheets/cells, building data structures from spreadsheet rows (census project), writing and updating cells (produce project), styling with fonts, and writing formulas. The key pattern is: load workbook → loop rows → read or write cells → save.