Just another Random Renegade Bits site

Personal Search Engine

Ever needed to do work somewhere without an internet connection? I’ve done it enough that I decided to download as much documentation as I could. Sometimes with httrack. However, I was then left with a problem: How do I quickly find what I needed in this mass of HTML files?

I wanted something fairly small which I could start up without needing a VM or a full apache installation. After a little bit of searching, I found something that seemed to fit my requirements: TNTSearch. It worked well for what I needed as I could simply spin up a PHP development server to query it.

I tried to automatically detect useful text for my index in individual HTML files, but ultimately settled on simply providing an HTML selector to SimpleHtmlDom. This method worked well enough for my needs.

The result:

Search for XGraphics

Pros:

  • Very easy to set up
  • Minimal requirements
  • Search itself is very fast
  • Fuzzy search available (though I didn’t use it)

Cons:

  • No pre-built support for explicit wild cards that I could find
  • Not so much a con as a note: Use transactions if you’re adding a lot of resources at once.