Real-time search – fact and fiction, and why?

One of the latest trends in ‘the valley’ is the notion of real time search.  This has been brought to highlight by twitter search.  While I understand the idea of real-time, the idea of applying the concept to the entire internet seems like a big job, and so it is with some scepticism that I hear people believe Twitter have solved that riddle.

An example elsewhere in the text was of banks – their information is not real time.  Consider ATM deposits that appear on the next days date, or payroll electronic funds transfer which are deposited in batch mode overnight.  Batch mode is the opposite of real-time.  Like everything, there is a reason, and while its not optimal, real-time is a long way off in many respects.  It will be dependent on computing power, data storage, disk access speeds, bandwidth, standardisation of data standards and messaging standards, amongst other things I am sure to have missed.

I came across this in the comments on RWW.  They are doing a series of posts on new ‘net trends, and did a piece on real-time, which produced this comment from Falafulu Fisi, who I am sure if he does not work for Google yet, he might soon.

Anyhow the distinction he draws is this;  you can have real time search of raw data, or real time search of cleansed data.  [warning – lots of mathematics here]

Here is an attempted summary that does no justice of the Fisi comment on real-time search.

Raw data search: clumsy, inefficient, duplicative, and requires the user to sift through enormous amounts of results to locate what he needs.

Cleansed data search: using techniques such as feature computation or feature decomposition results can be indexed and computations performed on the data that allow qualified and targeted results.

He notes that twitter use raw data search, and the results while seconds old, are what you might expect – very much a kitchen sink approach, and not elegant.

In summary my conclusion on this is that real time search is a much different animal than twitter search.  Twitter search is just that … a keyword search of twitter.  There is nothing for nothing even in computational matters, although the promise of real-time remains interesting I am not sure it is the appropriate target.

Relevance to Bankwatch:

As a side note, I was involved in one situation at a Bank whereby we wanted to purchase a search tool for our web site.  The general consensus of many is that search is a commodity, and therefore the cheapest ought to be enough.  That cannot be further from the truth.  The algorithms and calculations involved in search are what make it useful, and there is much disparity between the tools.

In fact I would be ok with 24 hour old search, if it meant the quality and targetting of the results were better.  Once we have solved the targetting and accuracy of results, which mean the results that I expect, or that you expect, which are not the same; once we have solved that, then real-time could be the next target but not before.  Unfortunately by the time the ‘real-timers’ will get what are asking for, they might realise that is not what they wanted.

Written by Colin Henderson

May 25, 2009 at 12:25

