ChatGPT: This AI has a JAILBREAK?! (Unbelievable AI Progress)

ChatGPT: This AI has a JAILBREAK?! (Unbelievable AI Progress)

Jan 02, 2023
|
33 views
Details
#chatgpt #ai #openai ChatGPT, OpenAI's newest model is a GPT-3 variant that has been fine-tuned using Reinforcement Learning from Human Feedback, and it is taking the world by storm! Sponsor: Weights & Biases https://wandb.me/yannic OUTLINE: 0:00 - Intro 0:40 - Sponsor: Weights & Biases 3:20 - ChatGPT: How does it work? 5:20 - Reinforcement Learning from Human Feedback 7:10 - ChatGPT Origins: The GPT-3.5 Series 8:20 - OpenAI's strategy: Iterative Refinement 9:10 - ChatGPT's amazing capabilities 14:10 - Internals: What we know so far 16:10 - Building a virtual machine in ChatGPT's imagination (insane) 20:15 - Jailbreaks: Circumventing the safety mechanisms 29:25 - How OpenAI sees the future References: https://openai.com/blog/chatgpt/ https://openai.com/blog/language-model-safety-and-misuse/ https://beta.openai.com/docs/model-index-for-researchers https://scale.com/blog/gpt-3-davinci-003-comparison#Conclusion https://twitter.com/johnvmcdonnell/status/1598470129121374209 https://twitter.com/blennon_/status/1597374826305318912 https://twitter.com/TimKietzmann/status/1598230759118376960/photo/1 https://twitter.com/_lewtun/status/1598056075672027137/photo/2 https://twitter.com/raphaelmilliere/status/1598469100535259136 https://twitter.com/CynthiaSavard/status/1598498138658070530/photo/1 https://twitter.com/tylerangert/status/1598389755997290507/photo/1 https://twitter.com/amasad/status/1598042665375105024/photo/1 https://twitter.com/goodside/status/1598129631609380864/photo/1 https://twitter.com/moyix/status/1598081204846489600/photo/2 https://twitter.com/JusticeRage/status/1598959136531546112 https://twitter.com/yoavgo/status/1598594145605636097 https://twitter.com/EladRichardson/status/1598333315764871174 https://twitter.com/charles_irl/status/1598319027327307785/photo/4 https://twitter.com/jasondebolt/status/1598243854343606273 https://twitter.com/mattshumer_/status/1598185710166896641/photo/1 https://twitter.com/i/web/status/1598246145171804161 https://twitter.com/bleedingedgeai/status/1598378564373471232 https://twitter.com/MasterScrat/status/1598830356115124224 https://twitter.com/Sentdex/status/1598803009844256769 https://twitter.com/harrison_ritz/status/1598828017446371329 https://twitter.com/parafactual/status/1598212029479026689 https://www.engraved.blog/building-a-virtual-machine-inside/ https://twitter.com/317070 https://twitter.com/zehavoc/status/1599193444043268096 https://twitter.com/yoavgo/status/1598360581496459265 https://twitter.com/yoavgo/status/1599037412411596800 https://twitter.com/yoavgo/status/1599045344863879168 https://twitter.com/natfriedman/status/1598477452661383168 https://twitter.com/conradev/status/1598487973351362561/photo/1 https://twitter.com/zswitten/status/1598100186605441024 https://twitter.com/CatEmbedded/status/1599141379879600128/photo/2 https://twitter.com/mattshumer_/status/1599175127148949505 https://twitter.com/vaibhavk97/status/1598930958769860608/photo/1 https://twitter.com/dan_abramov/status/1598800508160024588/photo/1 https://twitter.com/MinqiJiang/status/1598832656422432768/photo/2 https://twitter.com/zswitten/status/1598088280066920453 https://twitter.com/m1guelpf/status/1598203861294252033/photo/1 https://twitter.com/SilasAlberti/status/1598257908567117825/photo/1 https://twitter.com/gf_256/status/1598962842861899776/photo/1 https://twitter.com/zswitten/status/1598088267789787136 https://twitter.com/gf_256/status/1598178469955112961/photo/1 https://twitter.com/samczsun/status/1598564871653789696/photo/1 https://twitter.com/haus_cole/status/1598541468058390534/photo/3 https://twitter.com/tailcalled/status/1599181030065246208/photo/1 https://twitter.com/pensharpiero/status/1598731292278865920 https://twitter.com/sleepdensity/status/1598233414683197441 https://twitter.com/goodside/status/1598253337400717313 https://twitter.com/Carnage4Life/status/1598332648723976193/photo/2 https://github.com/sw-yx/ai-notes/blob/main/TEXT.md#jailbreaks https://twitter.com/dannypostmaa/status/1599352584963170309/photo/4 https://twitter.com/sama/status/1599112749833125888 https://twitter.com/sama/status/1599114807474810884 https://twitter.com/sama/status/1599461195005587456 https://twitter.com/deliprao/status/1599451192215887872 https://twitter.com/michlbrmly/status/1599168681711656961 https://twitter.com/zoink/status/1599281052115034113 Links: https://ykilcher.com Merch: https://ykilcher.com/merch YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://ykilcher.com/discord If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2

0:00 - Intro 0:40 - Sponsor: Weights & Biases 3:20 - ChatGPT: How does it work? 5:20 - Reinforcement Learning from Human Feedback 7:10 - ChatGPT Origins: The GPT-3.5 Series 8:20 - OpenAI's strategy: Iterative Refinement 9:10 - ChatGPT's amazing capabilities 14:10 - Internals: What we know so far 16:10 - Building a virtual machine in ChatGPT's imagination (insane) 20:15 - Jailbreaks: Circumventing the safety mechanisms 29:25 - How OpenAI sees the future
Comments
loading...