Join us for an engaging session on improving large language models (LLMs) for autonomous agents. We'll explore the challenges of making AI summaries both accurate and trustworthy, crucial for building smarter, more reliable agents. This discussion promises fresh insights into advancing AI's ability to evaluate and refine its own outputs, a key step towards more independent and effective autonomous systems.
π Quick Welcome (5 mins) - Kicking off the session with a warm greeting and what to expect.
π§βπΌ Brief Introduction from Participants (1 min each) - Participants share their name, location, occupation, and one expectation from the call.
π€ Overview of Summarization as a Theme for the Session (5 mins) - Diving into why we're focusing on Summarization and the challenges of evaluating large language model Β summaries.
π Overview of the Papers Discussed (5 mins) - Quick summary of the key papers we'll be exploring:
β
Β Β π Paper 1: "Identifying Factual Inconsistency in Summaries: Towards Effective Utilization of Large Language Model"
Β Β https://arxiv.org/abs/2402.12821
β
Β Β π Paper 2: "TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness"
Β Β https://arxiv.org/abs/2402.12545
β
Β Β π Bonus Reading: "How to evaluate a summarization task"
Β Β Β https://cookbook.openai.com/examples/evaluation/how_to_eval_abstractive_summarization
β
π£οΈ Introduction by the Community Agent (5 mins) - A brief introduction to the session by the AI agent, setting the stage for the discussions.
β Q&A from Research Agent (15 mins) - An interactive Q&A session, with questions prepared by the AI agent to stimulate discussion and deepen understanding.
π After-party (15 mins) - An informal chat to relax, network, and discuss AI, life, and everything in between.
β
β