15 June 2009

True Knowledge is at SemTech

True Knowledge is at SemTech 2009 in San Jose all this week.

On Wednesday, True Knowledge's founder and CEO William Tunstall-Pedoe will be participating in their keynote panel on semantic search with representatives from Google, Yahoo!, Bing, Ask and Hakia.

We're are very interested in meeting people at this conference so if you are attending, please introduce yourself.

14 June 2009

Flying high

060131a_lg

Holiday season is here so True Knowledge is learning about air travel.  We've added a flight-time calculator and a bunch of facts about airports and airlines.

So next time you're planning a journey, try a few questions like these:

  • Which airport serves Seattle?
  • How long does it take to fly from Lima to San Francisco?
  • Which airlines fly to Amsterdam Schiphol?
  • Where does Qantas fly to?

03 June 2009

Our Firefox add-on now works with Bing

Microsoft's cool new search engine Bing was released earlier this week.  Everyone at True Knowledge really likes Bing and to show our support for the new search engine we're releasing an update to our recently released search-enhancing Firefox add-on that works with Bing.

Bing

Download the latest version here.

There are a few subtle tweaks and features included in this upgrade as well.

For example you now have the option to show a loading animation while True Knowledge is 'thinking' (this is off as default).

Throbber

Also, some answers now come with mouse-over contextual menus containing useful links and information.

Contextual-mouseover

We hope you like this upgrade, please give feedback in the comments or dedicated forum.

16 May 2009

How to build a Universal Answer Engine: ten vital principles


At True Knowledge we're building a Universal Answer Engine:  a computer system designed to answer users' questions on any subject: directly and automatically. 

We've been working on this for a while and have made great progress. We also still have lots to do.

However, in the process, we've learned a great deal about what it takes to build such a system and in this blog post I’m going to try to distil this knowledge into Ten Principles: all of which, we suggest, are vital for success.

Principle 1. Knowledge not Code
All the world's information systems are developed in much the same way: the designers of the system decide what knowledge the system needs to cover, what kinds of queries it will respond to and what the responses to those queries will be.

Database schemas are then designed and database tables populated with data. Software engineers then write large quantities of computer code to read the data from these tables and present them to users in the desired format. After lots of elapsed time has gone by and lots of money has been spent, a new computer system exists which supports the types of queries it was designed for.

This method works well for individual vertical areas. Multiple verticals can be supported on the same system by doing a similar process for each vertical: multiplying up the amount of code and the number of database tables. However, this approach cannot be scaled to a truly open-domain system which has to support an unlimited number of verticals. The reason is that time and money are finite resources.  The maintenance requirements of code and ever more complex database schemas also scale non-linearly.

Because of this, True Knowledge has two basic design requirements:
  • All knowledge has a single structured representation that is unrelated to its meaning. The database schema is of fixed size and supporting new types of knowledge is done by adding more knowledge into this universal representation: facts about facts. 
  • The Knowledge Engine (the query-processing heart of our platform) is knowledge neutral:  it contains no program code specific to any knowledge area.
It isn't possible to escape domain specific code entirely, however. For example, code is needed to know that the string “the 23rd of October 2007” denotes a date and to calculate square roots. However, in our system, such code is isolated from the core query processing system.  It is attached in a soft way to specific inference rules (which are also soft). Our platform even supports this happening with externally supplied scripts which can be read from a database.

Principle 2. You can’t do it without understanding
In the True Knowledge system no answer to a question is attempted unless we’ve first managed to translate it into a language-independent machine-processable query: a full semantic interpretation of the user’s intent.

Although it is possible to build question answering systems by directly matching text questions to pre-written answers we don’t believe this approach can be made to scale. Every natural language question has huge numbers of variants (thousands in some cases) and to work out that they are the same question requires semantic processing. 

This step also underpins our ability to disambiguate queries and throw away interpretations which are unlikely to be what the user was intending.

Principle 3. You can’t do it without inference
Our experience with True Knowledge is that only a small percentage of questions can be directly answered by looking up the answer from a static source. Most questions require one or more logical steps or calculations to generate the answer the user wants. This is the case no matter how big the information source – it even applies to vast sources like the billions of web pages indexed by search engines such as Google.

For example, True Knowledge will happily answer questions like “is lisa rinna older than cindy Crawford?” and infer the answer from both their dates of birth. We currently know about more than half a million people, so for just questions of exactly this form we would need 250 billion facts to do this without inference (one for each pair of people). Now consider simple distance questions like “How far is it from chicago to madingley in miles?” When such a question can be asked about any pair of fixed points on the globe (of which there are millions), in any unit of distance, you can appreciate the scale of the problem. 

In True Knowledge, inference is also used to extend knowledge only slightly. e.g. by including the CEO’s name in a list of people that ‘run’ a named business derived from the knowledge that chief executives are part of the management team.

Another way of thinking about inference is that it allows relatively small knowledge bases to punch well above their weight. The 160 million (and growing) facts that True Knowledge currently knows allows it to answer trillions of questions. Without a general inference system, each fact would only answer one possible question.

Principle 4. The only truly scalable way to learn everything is by allowing users to contribute
One of the biggest success stories on the internet is Wikipedia. It is vastly bigger than any other encyclopaedia and one of the most trafficked websites on the internet. It was built almost entirely from the unpaid efforts of internet users and is kept up-to-date by thousands of volunteers.

True Knowledge automatically sources facts from Wikipedia harnessing this user generated source. It also has vast amounts of knowledge that have been imported from databases and added by our own staff.

We have also developed tools that allows users to directly add knowledge to our knowledge base and vote to correct knowledge that is believed to be incorrect. Sometimes this knowledge is directly prompted for by the system. Our internal metrics show that this knowledge, although a small percentage of what we know, is disproportionately valuable in answering other users’ questions.

One of the difficulties of this is that external users (unlike staff) are untrusted. However, a truly effective system needs to have ways of dealing with untrusted knowledge sources. Understanding the knowledge is a big advantage here as we can automatically suppress knowledge the system believes is incorrect (see Principle 2).  The multiple source approach (see principle 8) is also a big help.

Principle 5. Silence is way better than getting it wrong (when you have a decent backfill)
True Knowledge is designed to reliably know when it doesn’t understand or doesn’t know. When it can produce a good direct answer, it does. When it can’t, it stays silent and some other kinds of results can be presented to the user instead – perhaps standard internet search. Producing a wrong answer to a question or finding an interpretation of the user’s request that isn’t what they intended are equally bad as the user hasn’t got what they wanted and valuable screen space has been taken up with bad data.

A great example of this principle in action is the browser plugin we launched yesterday which seamlessly passes your standard search engine queries through our platform and inserts our answers into the results page when appropriate. Here the backfill could hardly be better: it’s Google. When the plugin fires inappropriately or when it can’t add value to the results it takes up valuable real-estate at the top of the page. Staying silent is these cases is exactly what is required. However, when it can produce a perfect direct answer, it does so, saving the user the effort of searching through the links and improving on the results page that would otherwise be there.

Principle 6. Model the universe the same way as your users do 
(or communication is only possible between equals)

The True Knowledge platform contains a comprehensive ontology mapping all the things in the world into hundreds of thousands of classes (people, places, animals, substances etc.) This knowledge underpins the system that translates users’ questions into queries corresponding to what they mean. It is also used to disambiguate. Without this ontology and commonsense knowledge, it would be far harder to respond to users' questions in a sensible way.

Principle 7. Lexical knowledge is just another kind of knowledge 
(or language independence is achievable)

As discussed in Principle 1, True Knowledge represents all knowledge in the same basic way. This knowledge includes what English words correspond to what entities. The various grammatical forms of various English words are also facts like any other. 

This means that both the core technology and the knowledge representation is free of anything tied to the English language and expanding to other languages is achievable essentially just be adding soft knowledge.

In future implementations of True Knowledge, users will be asking questions in multiple languages yet having their questions answered from a shared knowledge source.

Principle 8. All facts need sources and these need to be available to the user
In True Knowledge the facts used to answer a question are shown to the user after the answer and the sources for those individual facts are available with a single mouse click. Multiple sources can point at a single fact and the history of users endorsing or disagreeing with a fact is also visible. This history is also used for automatic assessment of a fact's truth.

We believe this approach is significantly better than a system which is just a black box and prints out an answer without the user being able to explore where the answer came from.

Principle 9. Get it working scalably on cheap hardware
If engineered correctly, modern web platforms should be able to run on servers which are cheap and available through cloud computing vendors enabling capacity to be turned on and off at a moment’s notice. At a launch or in a situation of high demand, capacity can be increased without having to buy more servers or make long term financial commitments.

Similarly, by making use of modern open source software, systems can be built in a way that avoids licence fees and avoids tying the business to any particular commercial supplier.

True Knowledge follows these principles to the letter being cloud based and not being tied to any commercial software.

Principle 10. It has to work fast
Modern search engines work lightening fast and this has become the expectation of users.  Highly complex computing tasks can be done on modern hardware in a tiny fraction of a second and even highly complex AI systems should be no exception. At True Knowledge we believe strongly in this principle and significant resources are being spent to live up to it fully.


14 May 2009

Google-enhancing Firefox add-on available now!

The True Knowledge team is excited to announce the availability of the first test version of a Firefox add-on that enhances your regular search results.

The add-on works by integrating our answers with the Google, Yahoo, Live and Ask search engines so that each time you make a search on your usual search engine, your query is simultaneously (without interfering with what you're doing) run through True Knowledge's answer engine.  If our system can find a useful, direct answer to enhance your search then it'll automatically insert it above your regular results.  If it doesn't find anything useful then your search results will remain unaltered.

Google_screen3

The add-on is available for download on the Mozilla add-ons site here:
https://addons.mozilla.org/en-US/firefox/addon/11738/

The add-on is currently still in what Mozilla calls 'experimental mode'. What we would really like is for people to try it out and write some reviews (either on the Mozilla site or on their own blogs) of their experience with it.  Having a few reviews is a requirement for an add-on to be considered 'not-experimental' and ready for general use by the Mozilla foundation (the makers of Firefox).

Slider

A large percentage of the time the True Knowledge add-on will "keep quiet" as we won't have anything that can add value to regular search results.  For a list of example questions that we do answer click here.

If you have any suggestions for the add-on, or find anything wrong with it, please post your feedback in this dedicated forum.

We hope you enjoy taking it for a spin!

Kind regards,

The True Knowledge team.

Your email address:


Powered by FeedBlitz