DiRP Trustworthy LLM

Sep 15, 2023

The scope of the reading group is to exploring the trustworthiness of Large Language Models (LLMs), e.g., ChatGPT, Llama, etc.

Major reading materials:

DecodingTrust: Comprehensive Assessment of Trustworthiness in GPT Models. [website]
OpenAI GPT API document. [link]

Schedule

Weekly meeting: 5 pm (Central Time), Friday

Date	Topic	Location
10/04	Introduction to Trustworthy LLM	EER 7.650
10/13	Introduction to benchmarks and DecodingTrust	EER 7.650
10/20	Reading: Privacy (Jocelyn), OoD Robustness (Daniel)	Online
10/27	Reading: Fairness (Satvik)	EER 7.650
11/03	Reading: Ethics (Rishabh), Stereotype (Satvik)	EER 7.650
11/10	Reading: Adversarial Demonstrations (Jocelyn)	EER 7.650
12/01	Code and play	EER 7.650

Assignment 1: Decoding the trustworthiness of Large Language Models

Read the introduction of DecodingTrust.
Select a preferred topic (a perspective of trust) and read the corresponding section.
Present the main challenge, measurement of the topic in 10 min.

Assignment 2: Code and play!

Find a perspective in DecodingTrust that you want to play with.
In your slides, write down
- What the metric is conceptually?
- Why does this metric matter?
- how to compute the score (e.g., success rate of private email extraction for privacy).
Implement the score computation in Python with OpenAI API.
Debug and play with a small set of samples. (To save you money, don’t do large-scale experiments).

Note, you are free to use any tools and online materials to do this (even reading/copying DecodingTrust codes). Just rock me with the coolest result that you can get!

The feature image is generated by DALL-E by below prompts:

Me: Create a teaser image for my seminar on trustworthy large language models.
Me: Modify your images to include more information about language model (or Artificial Intelligence) and security.
Me: I like the third one. But could you change the color theme? Make it lighter?
Me: Change the background to white.

Junyuan Hong

Postdoctoral Fellow

My research interest lies in the interaction of human-centered AI and healthcare.