4 Effective Methods to Extract HTML Tables to Excel (Manual & Automated)

2025-08-20 08:28:17 zaki zou

Extract HTML Tables to Excel

Overview

Extracting HTML tables into Excel is a common requirement for data analysts, researchers, developers, and business professionals who frequently work with structured web data. HTML tables often contain valuable information such as financial reports, product catalogs, research results, or performance statistics. However, transferring that data into Excel in a clean and usable format can be tricky—especially when dealing with complex tables that include merged cells (rowspan, colspan), nested headers, or large datasets.

Fortunately, there are multiple approaches to convert HTML tables into Excel files. These methods range from quick, manual copy-paste actions suitable for small tasks to fully automated scripts using VBA or Python for large-scale or recurring jobs.

In this article, we’ll explore four effective methods for extracting HTML tables to Excel:

  1. Manual Copy-Paste (simplest method)
  2. Excel’s Built-in “From Web” Feature
  3. VBA Macro (Excel Automation)
  4. Python (BeautifulSoup + Spire.XLS)

Finally, we’ll compare these approaches in a summary table to help you choose the best method based on your use case.

Manual Copy-Paste (Simplest Method)

For small, one-off extractions, the simplest option is to use copy and paste directly from your browser into Excel.

Copy an HTML table

Steps:

  1. Open the HTML page in a browser (e.g., Chrome, Edge, or Firefox).
  2. Highlight the table you want to extract.
  3. Copy it with Ctrl+C (or right-click → Copy).
  4. Open Excel and paste with Ctrl+V .

Pros:

  • Extremely simple—no setup or coding required.
  • Works instantly for small, clean tables.

Cons:

  • Manual process—tedious and inefficient for frequent or large datasets.
  • Doesn’t always preserve merged cells or formatting.
  • Cannot handle dynamic (JavaScript-rendered) tables reliably.

When to use : Best suited for small tables, ad-hoc data collection, or quick testing.

Excel’s Built-in “From Web” Feature

Excel includes a powerful “Get & Transform Data” tool (formerly Power Query) that allows users to pull tables directly from a web page.

Excel's from web feature

Steps:

  1. Open Excel.
  2. Go to DataFrom Web .
  3. Enter the URL of the webpage containing the table.
  4. Excel will display detected tables; select the one you want.
  5. Load the data into your worksheet.

Pros:

  • Direct integration into Excel—no external tools required.
  • Works well for structured HTML tables.
  • Supports refresh—can re-pull updated data from the same source.

Cons:

  • Limited support for dynamic or JavaScript-rendered content.
  • Sometimes fails to detect complex tables.
  • Requires internet access and valid URL (not for local HTML files unless imported manually).

When to use : Best for analysts pulling live, structured data from websites that are updated regularly.

VBA Macro (Excel Automation)

For users who frequently extract HTML tables and want more control, VBA (Visual Basic for Applications) provides an excellent solution. VBA allows you to fetch tables from a URL and correctly process merged cells, something basic copy-paste cannot handle.

VBA to extract html table to excel

Steps:

  1. Launch Microsoft Excel.
  2. Press Alt + F11 to open the VBA editor.
  3. Right-click the project explorer → InsertModule .
  4. Paste the provided VBA code.
  5. Close the VBA editor.
  6. Press Alt + F8 , select the macro name, and click Run .

Sample VBA Code:

Sub ExtractHTMLTableWithProperMerging()
    Dim html As Object, tables As Object, table As Object, row As Object, cell As Object
    Dim ws As Worksheet
    Dim iRow As Long, iCol As Long, realCol As Long
    Dim url As String
    Dim colspan As Integer, rowspan As Integer
    Dim cellTracker() As Boolean ' Track occupied cells

    ' Set target worksheet
    Set ws = ThisWorkbook.Sheets("Sheet1")
    ws.Cells.ClearContents
    ws.Cells.UnMerge ' Clear any existing merged cells

    ' Get URL input
    url = InputBox("Enter webpage URL:", "HTML Table Extractor")
    If url = "" Then Exit Sub

    ' Load HTML
    Set html = CreateObject("htmlfile")
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", url, False
        .send
        html.body.innerHTML = .responseText
    End With

    ' Get first table (change index if needed)
    Set tables = html.getElementsByTagName("table")
    If tables.Length = 0 Then
        MsgBox "No tables found!", vbExclamation
        Exit Sub
    End If
    Set table = tables(0)

    ' Initialize cell tracker array
    Dim maxRows As Long, maxCols As Long
    maxRows = table.Rows.Length
    maxCols = 0
    For Each row In table.Rows
        If row.Cells.Length > maxCols Then maxCols = row.Cells.Length
    Next
    ReDim cellTracker(1 To maxRows, 1 To maxCols)

    ' Process table
    iRow = 1
    For Each row In table.Rows
        realCol = 1 ' Track actual column position accounting for rowspans

        ' Find first available column in this row
        While realCol <= maxCols And cellTracker(iRow, realCol)
            realCol = realCol + 1
        Wend

        iCol = 1 ' Track logical column position
        For Each cell In row.Cells
            ' Get merge attributes
            colspan = 1
            rowspan = 1
            On Error Resume Next ' In case attributes don't exist
            colspan = cell.colspan
            rowspan = cell.rowspan
            On Error GoTo 0

            ' Skip already occupied cells (from rowspan above)
            While realCol <= maxCols And cellTracker(iRow, realCol)
                realCol = realCol + 1
            Wend

            If realCol > maxCols Then Exit For

            ' Write value
            ws.Cells(iRow, realCol).Value = cell.innerText

            ' Mark all cells that will be occupied by this cell
            Dim r As Long, c As Long
            For r = iRow To iRow + rowspan - 1
                For c = realCol To realCol + colspan - 1
                    If r <= maxRows And c <= maxCols Then
                        cellTracker(r, c) = True
                    End If
                Next c
            Next r

            ' Merge cells if needed
            If colspan > 1 Or rowspan > 1 Then
                With ws.Range(ws.Cells(iRow, realCol), ws.Cells(iRow + rowspan - 1, realCol + colspan - 1))
                    .Merge
                    .HorizontalAlignment = xlCenter
                    .VerticalAlignment = xlCenter
                End With
            End If

            realCol = realCol + colspan
            iCol = iCol + 1
        Next cell
        iRow = iRow + 1
    Next row

    ' Formatting
    ws.UsedRange.Columns.AutoFit
    ws.UsedRange.Borders.Weight = xlThin
    MsgBox "Table extracted with proper merging!", vbInformation
End Sub

Pros:

  • Runs entirely within Excel—no external tools required.
  • Handles complex tables with merged cells.
  • Can be customized for multiple tables or scheduled execution.

Cons:

  • Setup requires knowledge of VBA.
  • Cannot handle JavaScript-rendered data without extra steps.
  • Only works in Excel desktop (not Excel Online).

When to use : Perfect for users who regularly extract similar tables and want a one-click solution.

Python (BeautifulSoup & Spire.XLS)

For developers or power users, Python provides the most flexible, scalable, and automated solution. With libraries like BeautifulSoup for parsing HTML and Spire.XLS for Python for Excel manipulation, you can programmatically fetch, clean, and export tables with full control.

Steps:

  1. Install Python (3.8+ recommended).
  2. Create a new project in your IDE (e.g., VS Code, PyCharm).
  3. Install dependencies:
pip install requests beautifulsoup4 spire.xls
  1. Copy and run the following script.

Python Code:

import requests

from bs4 import BeautifulSoup
from spire.xls import Workbook, ExcelVersion

# Get HTML string from url
response = requests.get("https://cdn.e-iceblue.com/resource/sample.html")
html = response.text

# Parse HTML
soup = BeautifulSoup(html, "html.parser")
table = soup.find("table")  # Get the first table

# Initialize Excel
workbook = Workbook()
sheet = workbook.Worksheets[0]

# Track merged cells to skip them later
skip_cells = set()

# Loop through HTML rows and cells
for row_idx, row in enumerate(table.find_all("tr")):
    col_idx = 1  # Excel columns start at 1
    for cell in row.find_all(["th", "td"]):
        # Skip already merged cells
        while (row_idx + 1, col_idx) in skip_cells:
            col_idx += 1

        # Get colspan/rowspan values (default to 1 if not present)
        colspan = int(cell.get("colspan", 1))
        rowspan = int(cell.get("rowspan", 1))

        # Write cell value to Excel
        sheet.Range[row_idx + 1, col_idx].Text = cell.get_text(strip=True)

        # Merge cells if colspan/rowspan > 1
        if colspan > 1 or rowspan > 1:
            end_row = row_idx + rowspan
            end_col = col_idx + colspan - 1
            sheet.Range[row_idx + 1, col_idx, end_row, end_col].Merge()

            # Mark merged cells to skip
            for r in range(row_idx + 1, end_row + 1):
                for c in range(col_idx, end_col + 1):
                    if r != row_idx + 1 or c != col_idx:  # Skip the main cell
                        skip_cells.add((r, c))

        col_idx += colspan

# Auto fit column width in all used range
sheet.AllocatedRange.AutoFitColumns()

# Save to Excel
workbook.SaveToFile("TableToExcel.xlsx", ExcelVersion.Version2016)
workbook.Dispose()

Pros:

  • Full control—can parse, clean, and transform data.
  • Handles merged cells properly.
  • Easily scalable to multiple tables or websites.
  • Automatable for scheduled tasks or batch jobs.

Cons:

  • Requires Python installation and basic programming knowledge.
  • More setup than built-in Excel solutions.
  • External dependencies (BeautifulSoup, Spire.XLS).

When to use : Best for developers or advanced users extracting large or complex tables regularly.

Output:

Python Extract HTML Tables to Excel

To enhance the visual appeal of the generated Excel worksheet in Python, you can apply styles to cells or worksheets in Excel.

Summary Table: Best Method by Use Case

Method Best For Pros Cons Automation?
Manual Copy-Paste Quick, one-time use Fast, no setup No automation, formatting issues ❌No
Excel From Web Live structured data Integrated, supports refresh Limited for dynamic tables ❌No
VBA Macro Repeated tasks in Excel Automates extraction, handles merges Requires VBA knowledge ✅Yes
Python (BeautifulSoup + Spire.XLS) Developers, large/complex tables Full control, scalable, automatable Requires coding & dependencies ✅Yes

Final Thoughts

The method you choose depends largely on your use case :

  • If you only need to grab a small table occasionally, manual copy-paste is the fastest.
  • If you want to pull structured data from a webpage that updates frequently, Excel’s From Web is convenient.
  • For business users who work in Excel daily and want automation, a VBA macro is ideal.
  • For developers handling multiple datasets or complex HTML structures, Python with BeautifulSoup and Spire.XLS provides the most flexibility and scalability.

By combining these methods with your workflow, you can save hours of manual effort and ensure cleaner, more reliable data extraction into Excel.

See Also