DiRP Trustworthy LLM

The scope of the reading group is to exploring the trustworthiness of Large Language Models (LLMs), e.g., ChatGPT, Llama, etc.

Major reading materials:

  • DecodingTrust: Comprehensive Assessment of Trustworthiness in GPT Models. [website]
  • OpenAI GPT API document. [link]

Schedule

Weekly meeting: 5 pm (Central Time), Friday

Date Topic Location
10/04 Introduction to Trustworthy LLM EER 7.650
10/13 Introduction to benchmarks and DecodingTrust EER 7.650
10/20 Reading: Privacy (Jocelyn), OoD Robustness (Daniel) Online
10/27 Reading: Fairness (Satvik) EER 7.650
11/03 Reading: Ethics (Rishabh), Stereotype (Satvik) EER 7.650
11/10 Reading: Adversarial Demonstrations (Jocelyn) EER 7.650
12/01 Code and play EER 7.650

Assignment 1: Decoding the trustworthiness of Large Language Models

  1. Read the introduction of DecodingTrust.
  2. Select a preferred topic (a perspective of trust) and read the corresponding section.
  3. Present the main challenge, measurement of the topic in 10 min.

Assignment 2: Code and play!

  1. Find a perspective in DecodingTrust that you want to play with.
  2. In your slides, write down
    • What the metric is conceptually?
    • Why does this metric matter?
    • how to compute the score (e.g., success rate of private email extraction for privacy).
  3. Implement the score computation in Python with OpenAI API.
  4. Debug and play with a small set of samples. (To save you money, don’t do large-scale experiments).

Note, you are free to use any tools and online materials to do this (even reading/copying DecodingTrust codes). Just rock me with the coolest result that you can get!


The feature image is generated by DALL-E by below prompts:

Me: Create a teaser image for my seminar on trustworthy large language models.
Me: Modify your images to include more information about language model (or Artificial Intelligence) and security.
Me: I like the third one. But could you change the color theme? Make it lighter?
Me: Change the background to white.
Junyuan Hong
Junyuan Hong
Postdoctoral Fellow

My research interests include data privacy and trustworthy machine learning.