How to Convert HTML to Markdown: 3 Methods

May 2, 2026 · 5 min read

How to Convert HTML to Markdown: 3 Practical Methods

Converting HTML to markdown strips away HTML tags and produces clean, readable plain text with markdown formatting. You can convert HTML to markdown using our free online tool, the Turndown JavaScript library, or Pandoc on the command line. This tutorial covers each method for migrating CMS content, cleaning up web pages, and simplifying documentation.

Why Convert HTML to Markdown?

HTML is verbose. Markdown is clean. Common reasons to convert include migrating blog content from WordPress or other CMS platforms to static site generators (Hugo, Jekyll, Gatsby), cleaning up copied web content for notes or documentation, converting email HTML templates to readable markdown drafts, moving Confluence or SharePoint pages to Git-based documentation, and simplifying HTML documentation for easier editing.

A typical HTML paragraph like <p>This is <strong>bold</strong> and <em>italic</em> text.</p> becomes This is **bold** and *italic* text. in markdown. The content is identical, but markdown is easier to read and edit.

Method 1: Convert HTML to Markdown Online (Free)

The fastest approach needs no installation:

Step 1: Open the HTML to Markdown converter.

Step 2: Paste your HTML content into the editor.

Step 3: The tool strips HTML tags and produces markdown output with headings, links, images, lists, bold, italic, and table formatting preserved.

The converter handles common HTML elements: <h1> through <h6> become # headings, <strong> becomes **bold**, <a href> becomes [text](url), <ul>/<li> becomes bullet lists, and <table> becomes pipe-and-dash markdown tables.

Method 2: Convert with Turndown (JavaScript)

Turndown is a JavaScript library purpose-built for HTML-to-markdown conversion. It runs in Node.js and browsers.

Basic usage:

import TurndownService from 'turndown';

const turndown = new TurndownService();
const html = '<h1>Hello</h1><p>This is <strong>bold</strong> text.</p>';
const markdown = turndown.turndown(html);
console.log(markdown);
// # Hello
// This is **bold** text.

Install: npm install turndown

Configuring output style:

const turndown = new TurndownService({
  headingStyle: 'atx',
  codeBlockStyle: 'fenced',
  bulletListMarker: '-',
  emDelimiter: '*',
});

Adding GFM support (tables, strikethrough, task lists):

import TurndownService from 'turndown';
import { gfm } from 'turndown-plugin-gfm';

const turndown = new TurndownService();
turndown.use(gfm);

const html = '<table><tr><th>Name</th></tr><tr><td>Alice</td></tr></table>';
const markdown = turndown.turndown(html);

Install the GFM plugin: npm install turndown-plugin-gfm

Turndown handles edge cases well: nested lists, complex table structures, image alt text extraction, and code blocks with language identifiers. In our testing, it produced the cleanest output of any JavaScript library for typical blog content.

Method 3: Convert with Pandoc (Command Line)

Pandoc converts HTML files to markdown from the command line:

pandoc input.html -f html -t markdown -o output.md

For GitHub Flavored Markdown output (with tables, task lists):

pandoc input.html -f html -t gfm -o output.md

Converting a live web page:

curl -s https://example.com/page | pandoc -f html -t gfm -o output.md

Pandoc strips non-content HTML (scripts, styles, nav elements) and preserves semantic content. It handles footnotes, definition lists, and metadata that simpler converters miss.

What Gets Converted (and What Does Not)

HTML ElementMarkdown OutputQuality
<h1> to <h6># to ######Excellent
<strong>/<b>**text**Excellent
<em>/<i>*text*Excellent
<a href>[text](url)Excellent
<img>![alt](src)Good
<ul>/<ol>/<li>- or 1. listsExcellent
<table>Pipe tablesGood
<pre><code>Fenced code blocksGood
CSS/JavaScriptRemovedExpected

Visual styling, layout, and interactive elements do not survive the conversion. Markdown is a content format, not a layout format.

Migrating WordPress Content to Markdown

WordPress is the most common source for HTML-to-markdown migration. Here is a practical workflow:

Step 1: Export your WordPress content using the built-in XML export (Tools > Export).

Step 2: Convert the XML to individual markdown files using wordpress-export-to-markdown:

npx wordpress-export-to-markdown --input export.xml --output posts/

This tool extracts each post as a markdown file with YAML frontmatter (title, date, tags) and downloads images.

Step 3: Review and clean up each file. Common issues include shortcode remnants, WordPress-specific HTML classes, and broken image paths.

For smaller migrations (under 20 posts), our online converter works well for converting posts one at a time.

Common Issues and Fixes

Problem: Extra blank lines in output. Some converters add excessive whitespace. Clean up with a text editor's find-and-replace or a regex.

Problem: Tables are misformatted. Complex HTML tables with colspan or rowspan do not convert cleanly to markdown's simple pipe format. Simplify the table structure manually.

Problem: Images have empty alt text. If the source HTML lacks alt attributes, the markdown output shows ![](url). Add descriptive alt text for accessibility.

Frequently Asked Questions

Summary

Converting HTML to markdown simplifies web content for editing, version control, and platform migration. Our online tool handles one-off conversions instantly. Turndown provides a fast JavaScript library for web apps and batch processing. Pandoc converts from the command line and supports advanced markdown flavors. For WordPress migrations, dedicated tools extract posts as markdown files with frontmatter and images. Start with the converter above, then use the markdown cheat sheet to format your converted content.

Written by the Markdown Editor Online team. Last updated May 2026.