Fuzzy Search using PHP similar_text() and levenshtein()

|

After writing my application for the Google Summer of Code on Fuzzy Search techniques I have found myself constantly on the lookout for other ideas of how to do such a process. I found one today that looks promising, but still requires quite a bit of processing during the actual search. The tutorial is good and describes in depth how to achieve fuzzy search without requiring too much of your server resources.

Fuzzy Search Tutorial

Excellent project and

Excellent project and ideas.

Weight based criteria
----------------------
Relevance [Keyword (search term) density]
Activity
Ratings (Editors | Users)
Sponsored Content (Advertising)

Other Reference:
-----------------
Apache Lucene - Scoring
http://lucene.apache.org/java/docs/scoring.html

Darly

Hey, I came across this page

Hey,

I came across this page a while ago while trying to figure out how to do this myself, but sadly the link didn't (and still doesn't) work. So for those of who who are interested, this is how I did it:

http://porteightyeight.com/archives/21-Fuzzy-Searching-in-PHP-Part-1.htm...

Andy

Ouch, not sure what happened

Ouch, not sure what happened to that tutorial by codewalkers. It essentially outlined a way to implement a search engine by using a query that asks for the results from the database using a like query.

SELECT * FROM table WHERE words LIKE 'keyword%'

The results are then passed through a levenshtein function to check the edit distance between the result and keyword. This may work alright, but it would likely be more effective to use a stemming algorithm which would likely match stems in a much more efficient way.

I've been doing a lot of research lately into using the apache Solr search engine which is very fast, easy to use, and extremely powerful.

thank you

thank you

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • You may use [inline:xx] tags to display uploaded files or images inline.
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.

Upcoming events

  • No upcoming events available