Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents (+Author)

Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents (+Author)

Feb 16, 2022
|
41 views
Details
#gpt3 #embodied #planning In this video: Paper explanation, followed by first author interview with Wenlong Huang. Large language models contain extraordinary amounts of world knowledge that can be queried in various ways. But their output format is largely uncontrollable. This paper investigates the VirtualHome environment, which expects a particular set of actions, objects, and verbs to be used. Turns out, with proper techniques and only using pre-trained models (no fine-tuning), one can translate unstructured language model outputs into the structured grammar of the environment. This is potentially very useful anywhere where the models' world knowledge needs to be provided in a particular structured format. OUTLINE: 0:00 - Intro & Overview 2:45 - The VirtualHome environment 6:25 - The problem of plan evaluation 8:40 - Contributions of this paper 16:40 - Start of interview 24:00 - How to use language models with environments? 34:00 - What does model size matter? 40:00 - How to fix the large models' outputs? 55:00 - Possible improvements to the translation procedure 59:00 - Why does Codex perform so well? 1:02:15 - Diving into experimental results 1:14:15 - Future outlook Paper: https://arxiv.org/abs/2201.07207 Website: https://wenlong.page/language-planner/ Code: https://github.com/huangwl18/language-planner Wenlong's Twitter: https://twitter.com/wenlong_huang Abstract: Can world knowledge learned by large language models (LLMs) be used to act in interactive environments? In this paper, we investigate the possibility of grounding high-level tasks, expressed in natural language (e.g. "make breakfast"), to a chosen set of actionable steps (e.g. "open fridge"). While prior work focused on learning from explicit step-by-step examples of how to act, we surprisingly find that if pre-trained LMs are large enough and prompted appropriately, they can effectively decompose high-level tasks into low-level plans without any further training. However, the plans produced naively by LLMs often cannot map precisely to admissible actions. We propose a procedure that conditions on existing demonstrations and semantically translates the plans to admissible actions. Our evaluation in the recent VirtualHome environment shows that the resulting method substantially improves executability over the LLM baseline. The conducted human evaluation reveals a trade-off between executability and correctness but shows a promising sign towards extracting actionable knowledge from language models. Website at this https URL Authors: Wenlong Huang, Pieter Abbeel, Deepak Pathak, Igor Mordatch Links: Merch: http://store.ykilcher.com TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

0:00 - Intro & Overview 2:45 - The VirtualHome environment 6:25 - The problem of plan evaluation 8:40 - Contributions of this paper 16:40 - Start of interview 24:00 - How to use language models with environments? 34:00 - What does model size matter? 40:00 - How to fix the large models' outputs? 55:00 - Possible improvements to the translation procedure 59:00 - Why does Codex perform so well? 1:02:15 - Diving into experimental results 1:14:15 - Future outlook
Comments
loading...