Process Jekyll content to replace first occurrence of any post title with a hyperlink of the post with that title

Multi tool use
Multi tool use


Process Jekyll content to replace first occurrence of any post title with a hyperlink of the post with that title



I am building a Jekyll ruby plugin that will replace the first occurrence of any word in the post copy text content with a hyperlink linking to the URL of a post by the same name.



I've gotten this to work but I can't figure out two problems in the process_words method:


process_words


post.data['url']



The current code works but will replace the first occurrence even if it's the value of an HTML attribute, like an anchor or a meta tag.



We have a blog with 3 posts:



And in the "Hobbies" post body text, we have a sentence with each word appearing in it for the first time in the post, like so:


I love mountain biking and bicycles in general.



The plugin would process that sentence and output it as:


I love mountain biking and <a href="https://example.com/link/to/bicycles/">bicycles</a> in general.


# _plugins/hyperlink_first_word_occurance.rb
require "jekyll"
require 'uri'


module Jekyll

# Replace the first occurance of each post title in the content with the post's title hyperlink
module HyperlinkFirstWordOccurance
POST_CONTENT_CLASS = "page__content"
BODY_START_TAG = "<body"
ASIDE_START_TAG = "<aside"
OPENING_BODY_TAG_REGEX = %r!<body(.*)>s*!
CLOSING_ASIDE_TAG_REGEX = %r!</aside(.*)>s*!

class << self
# Public: Processes the content and updates the
# first occurance of each word that also has a post
# of the same title, into a hyperlink.
#
# content - the document or page to be processes.
def process(content)
@title = content.data['title']
@posts = content.site.posts

content.output = if content.output.include? BODY_START_TAG
process_html(content)
else
process_words(content.output)
end
end


# Public: Determines if the content should be processed.
#
# doc - the document being processes.
def processable?(doc)
(doc.is_a?(Jekyll::Page) || doc.write?) &&
doc.output_ext == ".html" || (doc.permalink&.end_with?("/"))
end


private

# Private: Processes html content which has a body opening tag.
#
# content - html to be processes.
def process_html(content)
content.output = if content.output.include? ASIDE_START_TAG
head, opener, tail = content.output.partition(CLOSING_ASIDE_TAG_REGEX)
else
head, opener, tail = content.output.partition(POST_CONTENT_CLASS)
end
body_content, *rest = tail.partition("</body>")

processed_markup = process_words(body_content)

content.output = String.new(head) << opener << processed_markup << rest.join
end

# Private: Processes each word of the content and makes
# the first occurance of each word that also has a post
# of the same title, into a hyperlink.
#
# html = the html which includes all the content.
def process_words(html)
page_content = html
@posts.docs.each do |post|
post_title = post.data['title'] || post.name
post_title_lowercase = post_title.downcase
if post_title != @title
if page_content.include?(" " + post_title_lowercase + " ") ||
page_content.include?(post_title_lowercase + " ") ||
page_content.include?(post_title_lowercase + ",") ||
page_content.include?(post_title_lowercase + ".")
page_content = page_content.sub(post_title_lowercase, "<a href="#{ post.url }">#{ post_title.downcase }</a>")
elsif page_content.include?(" " + post_title + " ") ||
page_content.include?(post_title + " ") ||
page_content.include?(post_title + ",") ||
page_content.include?(post_title + ".")
page_content = page_content.sub(post_title, "<a href="#{ post.data['url'] }">#{ post_title }</a>")
end
end
end
page_content
end
end
end
end


Jekyll::Hooks.register %i[posts pages], :post_render do |doc|
# code to call after Jekyll renders a post
Jekyll::HyperlinkFirstWordOccurance.process(doc) if Jekyll::HyperlinkFirstWordOccurance.processable?(doc)
end



Updated my code with @Keith Mifsud's advice. Now using either the sidebar's aside element or the page__content class to select body content to work on.


aside


page__content



Also improved checking and replacing the correct term.



PS: The code base example I started with working on my plugin was @Keith Mifsud's jekyll-target-blank plugin




1 Answer
1



this code looks very familiar :) I suggest you look into the Rspecs test file to test against your issues: https://github.com/keithmifsud/jekyll-target-blank



I'll try to answer your questions, sorry I couldn't test these myself the time of writing.



How to only search for a post title in the main content copy text of the post, and not the meta tags before the post or the table of contents (which is also generated before main post copy text)? I can't get this to work with Nokigiri, even though that seems to be the tool of choice here.



Your requirements here are:



1) Ignore content outside the <body></body> tags.


<body></body>



This seems to already be implemented in the process_html() method. This method is stating the only process the body_content and it should work as it is. Have you got tests for it? How are you debugging it? The same string splitting works in my plugin. I.e. only content inside the body is processed.


process_html()


body_content



2) Ignore content inside the Table of Contents (TOC).
I suggest you extend the process_html() method by further splitting the body_content variable. Search for content in between the opening and closing tags of your TOC (by id, css class etc..) and exclude it, then add it back in it's position before or after process_words string.


process_html()


body_content


process_words



3) Whether to use the Nokigiri plugin?
This plugin is great for parsing html. I think you are parsing strings and then creating html. So vanilla Ruby and the URI plugin should suffice. You can still use it if you want but it won't be any faster then splitting strings in ruby.



If a post's URL is not at post.data['url'], where is it?



I think you should a have method to get all all post titles and then match the "words" against the array. You can get all the posts collection from the doc itself doc.site.posts and foreach post return the title. The the process_words() method can check each work to see if it matched an item from the array. But what if the title is made of more than one word?


doc.site.posts


process_words()



Also, is there a more efficient, cleaner way to do this?



So far so good. I'll start with getting the issues fixed and then refactor for speed and coding standards.



Again I suggest you use testing to help you with this.



Let me know if I can help more :)





lol how did you find it!? Did you run into it or just into Jekyll/ruby on Stack? PS: thanks for a good starting point/code base!
– Andre Bulatov
Jul 2 at 0:01






You're very welcome Andre. We are a community! I started following @jekyllbot on Twitter, just a couple of days ago and this came up on the feed. :)
– Keith Mifsud
Jul 2 at 2:13






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

3w0tW rfZX,7yM7ZBflkS2iiW uofX2,RKJaN43IFe,G Bx0S8 72Ibp 0Ps CkDzMewO,uVc,eDXo8,NpNPOb 1oB7nq
A78E IxwtT0KK5ehD P rlGrzYu0P9BOOLZRrOh1K

Popular posts from this blog

Rothschild family

Boo (programming language)