Generating Jekyll pages from data

I wrote a Jekyll-generated site for Botanic Organic that sells products and includes product and ingredient pages that are generated from data. Jekyll is a blog aware, static site generator. Botanic Organic’s site was forked off of Octopress, which uses Jekyll but adds a few plugins and styles.

Update: The current version of Botanic Organic’s site is no longer based on Jekyll.

Botanic Organic’s product database is a couple of sheets of a Google Documents spreadsheet that are maintained by Botanic Organic. I wrote a ruby class that fetches separate product and ingredient sheets and combines and saves them in _source/_data/products.json_. I also added a rake task to execute this fetch. The problem from this point was how to inject the data into the pages that need generating. The site required a products page that lists all products, a separate product page for each product, and an ingredients page that describes all the ingredients used in Botanic Organic’s products.

Jekyll and Octopress make it easy to create pages from markdown or html files. Your markdown file specifies a template via it’s YAML header and the output page is created from a combination of your markdown page and the specified template. What is less obvious is how to insert data into pages or create pages in a data driven way. For example, I was hoping to create an ingredients page and use Liquid syntax to load the JSON data. This approach was not feasible because Liquid is limited to filters and tags, neither of which could be used to bring in a large data hash object for processing. Liquid Drops looked like they might work, but it is at this point that I discovered generator plugins.

The solution for Botanic Organic was to write a ruby generator plugin to load the JSON data and generate the pages. The existing category_generator.rb plugin that is part of Octopress is a good example showing just how to do this. Plugins are placed in the plugins folder and include code to register themselves as a generator. Jekyll loads all plugins when executed.

The plugin for Botanic Organic is shown below. It reads the JSON file into a hash containing an array of products and an array of ingredients. This hash is added to the global page object via['key'] = value (line 75 or 81), making it available to the project’s pages. I used three pages that I placed in a folder _source/_products_: an ingredients page that lists and describes all the ingredients, a products pages that lists all the products, and a product page that is used for each individual page. The generator plugin specifies where these pages are found, and where to write the output files.

Below is a simplified version of my product.html page that is used when generating each of the individual product pages.


layout: page
title: "Product Page - Title will be replaced by generator plugin"

<p>{{ page.product.description }}</p>
<p><strong>Ingredients: </strong>{{ page.product.ingredients }}</p>
<p>{{ page.product.size }}</p>
<p>{{ page.product.price }}</p>

Implementation Issues

One late addition to the code above was to override the Jekyll’s Page object’s to_liquid method. I did this because the liquid page.url property was not being properly set to the URL of the deployed page. This is the case because I pass separate dest_dir and src_dir arguments to my Page.initialize method. The URL used by Disqus and Twitter to identify the page is set to page.url and therefore needs to be correct. In making this change I battled a bit with the Page object’s @dir and @name properties. I’m not sure my code is optimal, but it is working correctly.

Another problem I encountered was that sitemap entries were not being generated by the stock sitemap_generator plugin. This necessitated modifications to my product_generator (above) and a modified version of sitemap_generator. The modified sitemap generator looks for two additional properties, dest_url and src_mtime, in the Page object. This allows my product generator to set these values when the pages are generated, rather then leaving it to the sitemap generator to try to figure this out.