NSF PROGRAM: INFORMATION & DATA MANAGEMENT
PRINCIPAL INVESTIGATOR: Geller, Ilya S
TITLE: Lexical Cloning: the novel approach for textual information processing and knowledge flow control
What is the intellectual merit of the proposed activity?
There are some weaknesses in the description of the project and the demonstration of the ability of the proposer to carry it out.
The qualifications of the proposer to conduct this research are not well documented. His CV shows three associate degrees, and his experience includes programming for a securities firm and 5 years as principal scientist and executive officer of LexiClone, a technology startup. He shows 3 publications, one a 2004 TREC report (a general article, not an experimental report), one a Russian internet article, and one a US Patent for the LexiClone software. He reports TREC results in which "the superiority of this technology was demonstrated" (according to the Web site, 3rd in QA and 9th in Novelty Tracks). The citations to this work are to a participants-only website so further information is not available.
The reports of previous work are very limited; there are 4 citations, one to the LexiClone web page, one to the LexiClone Patent, one to the TREC results (a closed site as noted above), and one to a philosophy text. There is no evidence of grounding in prior research.
As noted in the summary statement, the goals of the project are very ambitious and yet there is no evidence that the work will build on prior knowledge, nor is there a clear indication of the underlying mechanisms which will be explored. The goals which have been set are very broad and ambitious and there is no indication of how difficult they are in reality; one of the goals for instance seems to be the creation of text in an author's voice, a very difficult problem.
There is no indication of how the project will be evaluated, which in IR terms is a serious omission.
What are the broader impacts of the proposed activity?
The goals of the project are the goals of much IR research: to create a retrieval mechanism which is effective and which is personalizable to the user. However it is questionable how well these goals can be met, particularly since the proposed methodologies are not documented in detail. The proposer is affiliated with a technology startup company whose product, LexiClone, is the basis for this research. As it is the subject of a US patent, it is not clear how this will impact the dissemination of results and the contribution to research in the IR field.
The "lexical clone" in the title appears to be a computer-generated summary of all the text an individual has written/read/queried, based on "triads" which are equated with "meaningful key phrases from a sentence". It is proposed that the lexical clone will act as a filter to obtain useful information for the users. The proposed work involves building an English part of speech dictionary; creating a summary/profile, using the summary to filter text, adding AI (learning) techniques to the clones, creating new texts on demand, and developing principles to automatically structure text.