DiRP Trustworthy LLM
The scope of the reading group is to exploring the trustworthiness of Large Language Models (LLMs), e.g., ChatGPT, Llama, etc.
Major reading materials:
- DecodingTrust: Comprehensive Assessment of Trustworthiness in GPT Models. [website]
- OpenAI GPT API document. [link]
Schedule
Weekly meeting: 5 pm (Central Time), Friday
Date | Topic | Location |
---|---|---|
10/04 | Introduction to Trustworthy LLM | EER 7.650 |
10/13 | Introduction to benchmarks and DecodingTrust | EER 7.650 |
10/20 | Reading: Privacy (Jocelyn), OoD Robustness (Daniel) | Online |
10/27 | Reading: Fairness (Satvik) | EER 7.650 |
11/03 | Reading: Ethics (Rishabh), Stereotype (Satvik) | EER 7.650 |
11/10 | Reading: Adversarial Demonstrations (Jocelyn) | EER 7.650 |
12/01 | Code and play | EER 7.650 |
Assignment 1: Decoding the trustworthiness of Large Language Models
- Read the introduction of DecodingTrust.
- Select a preferred topic (a perspective of trust) and read the corresponding section.
- Present the main challenge, measurement of the topic in 10 min.
Assignment 2: Code and play!
- Find a perspective in DecodingTrust that you want to play with.
- In your slides, write down
- What the metric is conceptually?
- Why does this metric matter?
- how to compute the score (e.g., success rate of private email extraction for privacy).
- Implement the score computation in Python with OpenAI API.
- Debug and play with a small set of samples. (To save you money, don’t do large-scale experiments).
Note, you are free to use any tools and online materials to do this (even reading/copying DecodingTrust codes). Just rock me with the coolest result that you can get!
The feature image is generated by DALL-E by below prompts:
Me: Create a teaser image for my seminar on trustworthy large language models.
Me: Modify your images to include more information about language model (or Artificial Intelligence) and security.
Me: I like the third one. But could you change the color theme? Make it lighter?
Me: Change the background to white.