As a component to our robotic cooking project, we have developed a knowledge representation known as the functional object-oriented network (FOON), which is based on the graph data structure. A robot should understand its intentions and its actions in a way that is similar to a human’s understanding to facilitate the communication of knowledge between the robot and humans. A knowledge representation such as FOON also allows a robot to derive a plan of action for meal preparation and is a means of explaining the decisions it makes (which is essential for explainable AI).
Below we summarize work done on FOON. For mode details please visit FOON.
Defining a Representation: Functional Object-Oriented Network for Manipulation Learning
In our introductory paper to FOON, we formally present the structure of FOON as a bipartite network describing the interaction between objects and actions as motions. A FOON is constructed directly from observations of human manipulation from cooking or recipe videos. With an annotated FOON graph, a robot can perform task tree retrieval, which is the process of task planning for a robot. Given a target goal and a list of items that are in its surroundings, a robot can find the ideal plan for meal preparation.
Expanding Knowledge for Completeness: Functional Object-Oriented Network: Construction & Expansion
Following our previous work, we explore how we can abstract knowledge in FOON as three levels of hierarchy, where objects can be represented with maximum detail (object type, state, and ingredient make-up) to minimal detail (object type only). Furthermore, we also look at how we can use semantic information to expand the knowledge in FOON without the need for annotating new videos. Through two methods, expansion and compression, we can acquire new knowledge using lexical knowledge bases such as WordNet and, more recently, Concept-Net.
Translating from Human Understanding to Robot Understanding of Motions: Manipulation Motion Taxonomy and Coding for Robots
In order to manage the scalability of the network and to properly translate annotations from other data sets, we formally presented the motion taxonomy as a means of embedding manipulation labels based on mechanical properties of said actions. The taxonomy is proposed to be used similarly to a neural network, where it accepts a demonstration of an action as input and an embedding of the action as a motion code can be given as output. The advantage of using motion code embeddings over existing embedding methods such as Word2Vec is that vectors will more accurately reflect distances (or dissimilarities) between motions based on properties such as trajectory type, contact type, and engagement type.
Real Robotic Programming with FOON: A Weighted Functional Object-Oriented Network for Task Planning
Having established the representation, we explored how we can use FOON for problem solving. Upon our investigation with a simple robotic system, that is the NAO robot (pictured below), we realized that it can be difficult for certain robots to perform human manipulations as seen in cooking to the degree of dexterity as required. To overcome this, we posed the robot task planning and execution problems as a human-robot collaboration, where the robot is aided by a human assistant in performing actions that it cannot do. Our main contribution in this work is the introduction of weights in FOON that innately reflect the robot’s abilities as well as the human’s capabilities and a variation of the task tree retrieval algorithm that accounts for weights.