This document is for people who are unfamiliar with command line tools. Command-line experts can go straight to the User’s Guide or the pandoc man page.
To batch convert from pdf to docx: libreoffice -headless -convert-to docx.pdf You don't get the very latest docx format, but that may actually be a Good Thing. Pandoc handles the thus produced docx files without problems, anyway. You can also use the optional `-outdir path/to/converted/files`. Pandoc code.text-s -highlight-style tango -o example18f.html pandoc code.text -s -highlight-style zenburn -o example18g.html GNU Texinfo, converted to info and HTML formats.
First, install pandoc, following the instructions for your platform.
Pandoc is a command-line tool. There is no graphic user interface. So, to use it, you’ll need to open a terminal window:
On OS X, the Terminal application can be found in
/Applications/Utilities
. Open a Finder window and go toApplications
, thenUtilities
. Then double click onTerminal
. (Or, click the spotlight icon in the upper right hand corner of your screen and typeTerminal
– you should seeTerminal
underApplications
.)On Windows, you can use either the classic command prompt or the more modern PowerShell terminal. If you use Windows in desktop mode, run the
cmd
orpowershell
command from the Start menu. If you use the Windows 8 start screen instead, simply typecmd
orpowershell
, and then run either the “Command Prompt” or “Windows Powershell” application. If you are usingcmd
, typechcp 65001
before using pandoc, to set the encoding to UTF-8.On Linux, there are many possible configurations, depending on what desktop environment you’re using:
- In Unity, use the search function on the
Dash
, and search forTerminal
. Or, use the keyboard shortcutCtrl-Alt-T
. - In Gnome, go to
Applications
, thenAccessories
, and selectTerminal
, or useCtrl-Alt-T
. - In XFCE, go to
Applications
, thenSystem
, thenTerminal
, or useSuper-T
. - In KDE, go to
KMenu
, thenSystem
, thenTerminal Program (Konsole)
.
- In Unity, use the search function on the
You should now see a rectangle with a “prompt” (possibly just a symbol like %
, but probably including more information, such as your username and directory), and a blinking cursor.
Let’s verify that pandoc is installed. Type
and hit enter. You should see a message telling you which version of pandoc is installed, and giving you some additional information.
First, let’s see where we are. Type
on Linux or OSX, or
on Windows, and hit enter. Your terminal should print your current working directory. (Guess what pwd
stands for?) This should be your home directory.
Let’s navigate now to our Documents
directory: type
and hit enter. Now type
(or echo %cd%
on Windows) again. You should be in the Documents
subdirectory of your home directory. To go back to your home directory, you could type
The ..
means “one level up.”
Go back to your Documents
directory if you’re not there already. Let’s try creating a subdirectory called pandoc-test
:
Now change to the pandoc-test
directory:
If the prompt doesn’t tell you what directory you’re in, you can confirm that you’re there by doing
(or echo %cd%
) again.
OK, that’s all you need to know for now about using the terminal. But here’s a secret that will save you a lot of typing. You can always type the up-arrow key to go back through your history of commands. So if you want to use a command you typed earlier, you don’t need to type it again: just use up-arrow until it comes up. Try this. (You can use down-arrow as well, to go the other direction.) Once you have the command, you can also use the left and right arrows and the backspace/delete key to edit it.
Most terminals also support tab completion of directories and filenames. To try this, let’s first go back up to our Documents
directory:
Now, type
and hit the tab key instead of enter. Your terminal should fill in the rest (test
), and then you can hit enter.
To review:
pwd
(orecho %cd%
on Windows) to see what the current working directory is.cd foo
to change to thefoo
subdirectory of your working directory.cd ..
to move up to the parent of the working directory.mkdir foo
to create a subdirectory calledfoo
in the working directory.- up-arrow to go back through your command history.
- tab to complete directories and file names.
Type
and hit enter. You should see the cursor just sitting there, waiting for you to type something. Type this:
When you’re finished (the cursor should be at the beginning of the line), type Ctrl-D
on OS X or Linux, or Ctrl-Z
followed by Enter
on Windows. You should now see your text converted to HTML!
What just happened? When pandoc is invoked without specifying any input files, it operates as a “filter,” taking input from the terminal and sending its output back to the terminal. You can use this feature to play around with pandoc.
By default, input is interpreted as pandoc markdown, and output is HTML. But we can change that. Let’s try converting from HTML to markdown:
Now type:
and hit Ctrl-D
(or Ctrl-Z
followed by Enter
on Windows). You should see:
Now try converting something from markdown to LaTeX. What command do you think you should use?
You’ll probably want to use pandoc to convert a file, not to read text from the terminal. That’s easy, but first we need to create a text file in our pandoc-test
subdirectory.
Important: To create a text file, you’ll need to use a text editor, not a word processor like Microsoft Word. On Windows, you can use Notepad (in Accessories
). On OS X, you can use TextEdit
(in Applications
). On Linux, different platforms come with different text editors: Gnome has GEdit
, and KDE has Kate
.
Start up your text editor. Type the following:
Now save your file as test1.md
in the directory Documents/pandoc-test
.
Note: If you use plain text a lot, you’ll want a better editor than Notepad
or TextEdit
. You might want to look at Sublime Text or (if you’re willing to put in some time learning an unfamiliar interface) Vim or Emacs.
Go back to your terminal. We should still be in the Documents/pandoc-test
directory. Verify that with pwd
.
Now type
(or dir
if you’re on Windows). This will list the files in the current directory. You should see the file you created, test1.md
.
To convert it to HTML, use this command:
The filename test1.md
tells pandoc which file to convert. The -s
option says to create a “standalone” file, with a header and footer, not just a fragment. And the -o test1.html
says to put the output in the file test1.html
. Note that we could have omitted -f markdown
and -t html
, since the default is to convert from markdown to HTML, but it doesn’t hurt to include them.
Check that the file was created by typing ls
again. You should see test1.html
. Now open this in a browser. On OS X, you can type
On Windows, type
You should see a browser window with your document.
To create a LaTeX document, you just need to change the command slightly:
Try opening test1.tex
in your text editor.
Pandoc can often figure out the input and output formats from the filename extensions. So, you could have just used:
Pandoc knows you’re trying to create a LaTeX document, because of the .tex
extension.
Now try creating a Word document (with extension docx
).
Word To Html
If you want to create a PDF, you’ll need to have LaTeX installed. (See MacTeX on OS X, MiKTeX on Windows, or install the texlive package on Linux.) Then do
You now know the basics. Pandoc has a lot of options. At this point you can start to learn more about them by reading the User’s Guide.
Here’s an example. The --mathml
option causes pandoc to convert TeX math into MathML. Type
then enter this text, followed by Ctrl-D
(Ctrl-Z
followed by Enter
on Windows):
Now try the same thing without --mathml
. See the difference in output?
If you forget an option, or forget which formats are supported, you can always do
to get a list of all the supported options.
On OS X or Linux systems, you can also do
to get the pandoc manual page. All of this information is also in the User’s Guide.
If you get stuck, you can always ask questions on the pandoc-discuss mailing list. But be sure to check the FAQs first, and search through the mailing list to see if your question has been answered before.
Either you've already heard of pandoc
or if you have searched online for markdown
to pdf
or similar, you are sure to come across pandoc
. This tutorial will help you use pandoc
to generate pdf
and epub
from a GitHub style markdown file. The main motivation for this blog post is to highlight what customizations I did to generate pdf
and epub
versions for self-publishing my ebooks. It wasn't easy to arrive at the set-up I ended up with, so I hope this will be useful for those looking to use pandoc
to generate pdf
and epub
formats. This guide is specifically aimed at technical books that has code snippets.
Installation🔗
If you use a debian based distro like Ubuntu, the below steps are enough for the demos in this tutorial. If you get an error or warning, search that issue online and you'll likely find what else has to be installed.
I first downloaded deb
file from pandoc: releases and installed it. Followed by packages needed for pdf
generation.
For more details and guide for other OS, refer to pandoc: installation
Minimal example🔗
Once pandoc
is working on your system, try generating a sample pdf
without any customization.
See learnbyexample.github.io repo for all the input and output files referred in this tutorial.
Here sample_1.md
is input markdown file and -f
is used to specify that the input format is GitHub style markdown. The -o
option specifies the output file type based on extension. The default output is probably good enough. But I wished to customize hyperlinks, inline code style, add page breaks between chapters, etc. This blog post will discuss these customizations one by one.
pandoc
has its own flavor of markdown
with many useful extensions — see pandoc: pandocs-markdown for details. GitHub style markdown is recommended if you wish to use the same source (or with minor changes) in multiple places.
It is advised to use markdown
headers in order without skipping — for example, H1
for chapter heading and H2
for chapter sub-section, etc is fine. H1
for chapter heading and H3
for sub-section is not. Using the former can give automatic index navigation on ebook readers.
On Evince reader, the index navigation for above sample looks like this:
Chapter breaks🔗
As observed from previous demo, by default there are no chapter breaks. Searching for a solution online, I got this piece of tex
code:
This can be added using -H
option. From pandoc
manual,
-H FILE, --include-in-header=FILE
Include contents of FILE, verbatim, at the end of the header. Thiscan be used, for example, to include special CSS or JavaScript inHTML documents. This option can be used repeatedly to include multiplefiles in the header. They will be included in the order specified.Implies --standalone.
The pandoc
invocation now looks like:
You can add further customization to headings, for example use sectionfont{underlineclearpage}
to underline chapter names or sectionfont{LARGEclearpage}
to allow chapter names to get even bigger. Here's some more links to read about various customizations:
Changing settings via -V option🔗
-V KEY[=VAL], --variable=KEY[:VAL]
Set the template variable KEY to the value VAL when rendering thedocument in standalone mode. This is generally only useful when the--template option is used to specify a custom template, since pandocautomatically sets the variables used in the default templates. Ifno VAL is specified, the key will be given the value true.
The -V
option allows to change variable values to customize settings like page size, font, link color, etc. As more settings are changed, better to use a simple script to call pandoc
instead of typing the whole command on terminal.
mainfont
is for normal textmonofont
is for code snippetsgeometry
for page size and marginslinkcolor
to set hyperlink color- to increase default font size, use
-V fontsize=12pt
- See stackoverflow: change font size if you need even bigger size options
Using xelatex
as the pdf-engine
allows to use any font installed in the system. One reason I chose DejaVu
was because it supported Greek and other Unicode characters that were causing error with other fonts. See tex.stackexchange: Using XeLaTeX instead of pdfLaTeX for some more details.
The pandoc
invocation is now through a script:
Do compare the pdf generated side by side with previous output before proceeding.
On my system, DejaVu Serif
did not have italic variation installed, so I had to use sudo apt install ttf-dejavu-extra
to get it.
Syntax highlighting🔗
One option to customize syntax highlighting for code snippets is to save one of the pandoc
themes and editing it. See stackoverflow: What are the available syntax highlighters? for available themes and more details (as a good practice on stackoverflow, go through all answers and comments — the linked/related sections on sidebar are useful as well).
Edit the above file to customize the theme. Use sites like colorhexa to help with color choices, hex values, etc. For this demo, the below settings are changed:
Inline code
Similar to changing background color for code snippets, I found a solution online to change background color for inline code snippets.
Add --highlight-style pygments.theme
and --include-in-header inline_code.tex
to the script and generate the pdf
again.
With pandoc sample_2.md -f gfm -o sample_2.pdf
the output would be:
With ./md2pdf_syn.sh sample_2.md sample_2_syn.pdf
the output is:
For my Python re(gex)? book, by chance I found that using ruby
instead of python
for REPL code snippets syntax highlighting was better. Snapshot from ./md2pdf_syn.sh sample_3.md sample_3.pdf
result is shown below. For python
directive, string output gets treated as a comment and color for boolean values isn't easy to distinguish from string values. The ruby
directive treats string value as expected and boolean values are easier to spot.
Bullet styling🔗
This stackoverflow Q&A helped for bullet styling.
Comparing pandoc sample_4.md -f gfm -o sample_4.pdf
vs ./md2pdf_syn_bullet.sh sample_4.md sample_4_bullet.pdf
gives:
PDF properties🔗
This tex.stackexchange Q&A helped to change metadata. See also pspdfkit: What’s Hiding in Your PDF? and discussion on HN.
./md2pdf_syn_bullet_prop.sh sample_4.md sample_4_bullet_prop.pdf
gives:
Adding table of contents🔗
There's a handy option --toc
to automatically include table of contents at top of the generated pdf
. You can control number of levels using --toc-depth
option, the default is 3 levels. You can also change the default string Contents
to something else using -V toc-title
option.
./md2pdf_syn_bullet_prop_toc.sh sample_1.md sample_1_toc.pdf
gives:
Pandoc Html To Pdf Css
Adding cover image🔗
To add something prior to table of contents, cover image for example, you can use a tex
file and include it verbatim. Create a tex
file (named as cover.tex
here) with content as shown below:
Then, modify the previous script md2pdf_syn_bullet_prop_toc.sh
by adding --include-before-body cover.tex
and tada — you get the cover image before table of contents. thispagestyle{empty}
helps to avoid page number on the cover page, see also tex.stackexchange: clear page.
The bash
script invocation is now ./md2pdf_syn_bullet_prop_toc_cover.sh sample_5.md sample_5.pdf
.
You'll need at least one image in input markdown file, otherwise settings won't apply to the cover image and you may end up with weird output. sample_5.md
used in the command above includes an image. And be careful to use escapes if the image path can contain tex
metacharacters.
Stylish blockquote🔗
By default, blockquotes (lines starting with >
in markdown) are just indented in the pdf
output. To make them standout, tex.stackexchange: change the background color and border of blockquote helped.
Create quote.tex
with the contents as shown below. You can change the colors to suit your own preferred style.
The bash
script invocation is now ./md2pdf_syn_bullet_prop_toc_cover_quote.sh sample_5.md sample_5_quote.pdf
. The difference between default and styled blockquote is shown below.
Customizing epub🔗
For a long time, I thought epub
didn't make sense for programming books. Turned out, I wasn't using the right ebook readers. FBReader is good for novels but not ebooks with code snippets. When I used atril and calibre ebook-viewer, the results were good.
Pandoc Pdf To Html
I didn't know how to use css
before trying to generate the epub
version. Somehow, I managed to take the default epub.css provided by pandoc
and customize it as close as possible to the pdf
version. The modified epub.css
is available from the learnbyexample.github.io repo. The bash
script to generate the epub
is shown below and invoked as ./md2epub.sh sample_5.md sample_5.epub
. Note that pygments.theme
is same as the pdf
customization discussed before.
Resource links🔗
Pdf To Url
More options and workflows for generating ebooks:
Pandoc Html To Pdf Page Break
- pandoc-latex-template — a clean pandoc LaTeX template to convert your markdown files to PDF or LaTeX
- Jupyter Book — open source project for building beautiful, publication-quality books and documents from computational material
- See also fastdoc — the output of fastdoc is an asciidoc file for each input notebook. You can then use asciidoctor to convert that to HTML, DocBook, epub, mobi, and so forth
- Asciidoctor
- Sphinx
Pandoc Pdf To Html Online
Miscellaneous
- picular: search engine for colors and colorhexa