DiRP Trustworthy LLM
The scope of the reading group is to exploring the trustworthiness of Large Language Models (LLMs), e.g., ChatGPT, Llama, etc.
Major reading materials:
- DecodingTrust: Comprehensive Assessment of Trustworthiness in GPT Models. [website]
- OpenAI GPT API document. [link]
Weekly meeting: 5 pm (Central Time), Friday
|10/04||Introduction to Trustworthy LLM||EER 7.650|
|10/13||Introduction to benchmarks and DecodingTrust||EER 7.650|
|10/20||Reading: Privacy (Jocelyn), OoD Robustness (Daniel)||Online|
|10/27||Reading: Fairness (Satvik)||EER 7.650|
|11/03||Reading: Ethics (Rishabh), Stereotype (Satvik)||EER 7.650|
|11/10||Reading: Adversarial Demonstrations (Jocelyn)||EER 7.650|
|12/01||Code and play||EER 7.650|
Assignment 1: Decoding the trustworthiness of Large Language Models
- Read the introduction of DecodingTrust.
- Select a preferred topic (a perspective of trust) and read the corresponding section.
- Present the main challenge, measurement of the topic in 10 min.
Assignment 2: Code and play!
- Find a perspective in DecodingTrust that you want to play with.
- In your slides, write down
- What the metric is conceptually?
- Why does this metric matter?
- how to compute the score (e.g., success rate of private email extraction for privacy).
- Implement the score computation in Python with OpenAI API.
- Debug and play with a small set of samples. (To save you money, don’t do large-scale experiments).
Note, you are free to use any tools and online materials to do this (even reading/copying DecodingTrust codes). Just rock me with the coolest result that you can get!
The feature image is generated by DALL-E by below prompts:
Me: Create a teaser image for my seminar on trustworthy large language models. Me: Modify your images to include more information about language model (or Artificial Intelligence) and security. Me: I like the third one. But could you change the color theme? Make it lighter? Me: Change the background to white.