This site has been retired. For up to date information, see handbook.gnome.org or gitlab.gnome.org.


[Home] [TitleIndex] [WordIndex

1. The Assistant

DanielCardenas: What is Gnome? A user interface (UI) to the operating system and a development platform to extend that UI. When we talk about changing the UI we are talking about changing the heart of Gnome. In the past the UI has had a desktop metaphor to make it easier to understand how to interact and get things done. I suggest it is time for a new metaphor. In the distant future we will see a UI that acts more like an assistant. Lets say we have talking Susan to help us get things done. Voice interaction would be the ideal UI for many people.

see also GnomeVoiceControl -- ThiloPfennig 2006-05-28 15:53:31

So yes this is impractical for today’s system, but if we know where the finish line is it will help us win the race. What is practical today and what makes this kind of UI inviting? The OS is getting more capable and providing more functions. That means more menus. More sub menus and more sub submenus. Not a very inviting UI. What is practical and showing up more and more on UIs is a text entry toolbar (TET). The TET accepts natural language input and guides the user to the operation they want to complete. For example they can type in: “Burn CD”, and the designated CD writing software pops up.

How does TET work? We definitely we want it to be practical. TET in addition to helping the user get things done will also be the major interface for help. I suggest TET runs a search engine on an internet Wiki web site. The Wiki web site will have the ability to start a designated program on the User’s system. If a blank page shows up then the user should be instructed to provide more details of what they are trying to do. Advanced users can look for the most requested blank pages and enter the appropriate data. Kind of like how wikipedia works. We can at a later date provide a mechanism for locally caching the wiki.

Oh and can we be really radical with Gnome 3.0? Instead of calling it the Gnome desktop can we call it the Operating-System User Interface (OUI)? The Gnome OUI for short. :)

FilipAndonov: I think that implementation of this idea needs Alice - powerful chatbot engine, with knowledge stored in AIML files. The advantage is that the user can say something in many different ways, but if the AIML is clever designed the system will understand the user. Maybe is a good idea for Desktop Assistant, something like Office Assistnat in MS Office (there is some work in a similar project - Charlix). I know that power users hate that paperclip, but a nice speaking and understanding human speech Tux will be great for all users, who loves that paperclip. Of course Desktop Assistant is not the only option for representing the intelligent agent metaphor. A simple edit box like in Beagle would do the trick for default. There are several implementations of ALICE in GPL or other free licenses, written in C++, PERL, Python, PHP, C#, Java, Delphi.

The Applications of such system are:

DaveRaggett: Textbots like ALICE are limited in the mechanisms they use for natural language understanding and we may want to look at newer approaches which make use of statistical models based upon a large corpus of past conversations. The words in the user's input is treated as statistical evidence for different interpretations of the tasks the user wants to achieve, taking into account the a priori likelihood of a particular task, and the user's input in previous dialogue turns. This approach is being used for telephone-based automated assistants that start by asking an open ended question such as "how can I help?".

The approach tends to rely on a taxonomy of tasks and statistically propagates the evidence from the leaves up to the roots. This allows the system to infer what the user is talking about in an indirect fashion. The system is able to respond to the user to clarify the nature of the user's task, using graph-based dialogue models. This could be just a list of static choices, or it could involve some natural language generation that is tailored to the specific context of the user's input, and which would allow users to check that they have been correctly understood, e.g. "You want to email John Smith at his work email address with agenda for November 3 (agenda.odt), right?", where the system has looked John up in the address book and offered the address you last used for him (his work rather than his home account), and likewise, the title for the file as well as the file name.

Such a solution would necessitate collecting lots of data, and providing the analysis and editing tools for use by the volunteer maintainers. Safeguarding people's privacy would be a major concern as would security to prevent attackers from taking over people's machines. The privacy issue could be addressed by a clean separation between the data handled by the client on the local machine, the information passed to server, and the information logged by the server for later analysis. For example, a preprocessor in the client could identify proper names and replace them by anonymous identifiers in the text sent to the server, and provide the corresponding substitution before showing the user the server's response.

A starting point for developing such a service would be to provide a human to human service, where users can send questions via email or instant messages to a website where volunteers provide answers. As time goes by and a significant corpus is built up, the service could start to provide automated responses for common cases. You can think of this as a more interactive version of FAQs. One issue is ensuring the availability of volunteers. This is a touchy issue, but a means for users to pay a modest amount for the service would make this a lot more likely to work out, particularly, if there is a way to route questions to volunteers according to their areas of expertise.

DanteAshton: I do not recommend ALICE for this task. Whilst having a conversation with my computer is something I'd love for GNOME to do in the future, the only real chatbots able to do this task now require a fairly large computer to mulch through data and extract relevant points. (my pet chatbot, Jeeney, requires a very high powered server) so a conversational interface is something that should on the list in the future, but not now.

However, whilst this means two way conversation is out of the picture, I would like to second Simon's idea. If I could do basic tasks through a natural language one-way conversational interface (against a non-natural language CLI or snooping around the GUI) then we'd have a nice merge between GUI and CLI for those needing CLI simplicity but are more comfortable in the GUI. A bonus would also go to those who are disabled (like those who cannot use a mouse)

It would mean we'd need a semantic system of some sort, though, even for a one way conversational system. Meta-data would play a larger role then before.

So, a few example commands: "Show me all the conversations I've had with John" (integrating with, say, Pidgin) "Read me the last conversation I've had with Mary" (integrating with Pidgin and a speech synth system)

"Tell David (mailto: David) I can't make tonight's meeting (message body)" (integrating with an email client, and sending off a message to David)

"I have an Doctor's Appointment (event) at midday (time) tomorrow (date)" (would integrate with a calendar)

"Download GNOME art manager" (speaks for itself, really)

"Send a copy of this (the active Window, presumably a word document) to David" (Mailto:David attach FileA.pdf)

"Read this out" (calls upon speech synth to read out active window or current selection)

Likewise, there are many other functions.

"What is a Car?" could bring up a page from Wikipedia about cars. Current semantic databases (like http://wiki.dbpedia.org/About) could provide better information for the computer to perhaps better explain it. But I don't think this can be done without serious effort, though I would like to see it in the future.

"What is WALL-E about?" could bring up a page (or just the summierzation) from IMDB.

Basically, it should replace the deskbar.

Let us consider the command; "Format my external HD to EXT3". The program must be able to look at the word 'format' and understand that I want to wipe the hardrive, External HD; first, it looks at HD, should have a list of words which correspond with it, so it would mean I mean a mass storage device, 'external' would correspond to 'USB' so there's no problem there. So, it knows what I want to do, and what I want it to do it to, but what should it do? 'EXT3' corresponds to a filesystem name, so it understands. These, however are quite complex commands.

A system would need to be in place that could handle these commands phrased differently, so it'd feel more like natural language. Conclusion: The assistant would more or less be a natural language command line; effectively, the CL with the ease of use of a GUI. It should look for keywords relating to the command, so in the above example, it would only look at "{Format} {external HD} {EXT3}' this would enable the system to understand commands regardless of how they are phrased (like "Please oh please would you kindly put EXT3 on my hardrive?" in that one it's obvious which hardrive is meant)

REALISTIC FUNCTIONS;

A more achievable goal would be enabling the system to perform basic tasks as described above; telling the program I have an appointment at a certain date is something within easy reach of most chat bot systems. This could also be reversed engineered so the program would know I have to finish a certain document at a due date (and again, this would enable much in the way of groupware). Should time permit, the SemanticSpaces system could also be intergrated, thus giving a friendly face to this program.

ABOUT VOICE CONTROL:

Voice control has many benefits, as a writer it is far easier for me to use speech recognition then to type (something to do with my brain's inability to transfer from the mental to the digital, but not from the mental to the vocal) but the slowness of the system compared to keyboard/mouse interface is terrible. Either voice recog is given as secondry, or GNOME 3.0 will have to alienate it's users.

ULTIMATE GOAL (?)

Something similar to Project CALO. http://caloproject.sri.com/ a natural-langauge frontend to apps and the OS itself. Easier for users, easier for disabled users, easier for computer illitartes. To use precise terminology, we'd be looking at an Intelligent Interface. http://www.aaai.org/AITopics/pmwiki/pmwiki.php/AITopics/Interfaces

For a design goal, I'd like the user to feel as though they are accompanied by this assistant, rather then just it being treated as another program. If done so properly, GNOME may be the first step towards an AI OS. I believe the Assistant should be a priority if GNOME is serious about doing an overhaul.


2024-10-23 10:59