Skip to content
@Trustworthy-ML-Lab

Trustworthy-ML-Lab

Popular repositories Loading

  1. Label-free-CBM Label-free-CBM Public

    [ICLR 23] A new framework to transform any neural networks into an interpretable concept-bottleneck-model (CBM) without needing labeled concept data

    Jupyter Notebook 94 19

  2. CLIP-dissect CLIP-dissect Public

    [ICLR 23 spotlight] An automatic and efficient tool to describe functionalities of individual neurons in DNNs

    Jupyter Notebook 48 15

  3. VLG-CBM VLG-CBM Public

    [NeurIPS 24] A new training and evaluation framework for learning interpretable deep vision models and benchmarking different interpretable concept-bottleneck-models (CBMs)

    Jupyter Notebook 16

  4. Linear-Explanations Linear-Explanations Public

    [ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks

    Jupyter Notebook 11

  5. CB-LLMs CB-LLMs Public

    [ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.

    Python 9 1

  6. Describe-and-Dissect Describe-and-Dissect Public

    [TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models

    Jupyter Notebook 8 1

Repositories

Showing 10 of 21 repositories
  • ThinkEdit Public

    An effective weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s representation space.

    Trustworthy-ML-Lab/ThinkEdit’s past year of commit activity
    Python 6 1 0 0 Updated Apr 17, 2025
  • posthoc-generative-cbm Public

    [CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform any pretrained (black-box) image generative model into an interpretable generative concept bottleneck model (CBM) with minimal concept supervision, while preserving image quality

    Trustworthy-ML-Lab/posthoc-generative-cbm’s past year of commit activity
    Jupyter Notebook 6 0 1 0 Updated Apr 16, 2025
  • effective_skill_unlearning Public

    [NAACL 25] Two novel, light-weight, and training-free skill unlearning methods for LLMs

    Trustworthy-ML-Lab/effective_skill_unlearning’s past year of commit activity
    Python 3 0 0 0 Updated Mar 27, 2025
  • RAT_MisD Public

    Boosting misclassification detection ability by radius-aware training (RAT)

    Trustworthy-ML-Lab/RAT_MisD’s past year of commit activity
    Python 0 0 0 0 Updated Mar 21, 2025
  • Describe-and-Dissect Public

    [TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models

    Trustworthy-ML-Lab/Describe-and-Dissect’s past year of commit activity
    Jupyter Notebook 8 1 1 0 Updated Feb 20, 2025
  • CB-LLMs Public

    [ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.

    Trustworthy-ML-Lab/CB-LLMs’s past year of commit activity
    Python 9 1 0 1 Updated Feb 13, 2025
  • Trustworthy-ML-Lab/Concept-Bottleneck-LLM’s past year of commit activity
    Python 4 0 0 0 Updated Feb 1, 2025
  • Trustworthy-ML-Lab/provable-efficient-dataset-distill-KRR’s past year of commit activity
    Python 1 Apache-2.0 0 0 0 Updated Dec 10, 2024
  • VLG-CBM Public

    [NeurIPS 24] A new training and evaluation framework for learning interpretable deep vision models and benchmarking different interpretable concept-bottleneck-models (CBMs)

    Trustworthy-ML-Lab/VLG-CBM’s past year of commit activity
    Jupyter Notebook 16 0 0 0 Updated Dec 7, 2024
  • Linear-Explanations Public

    [ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks

    Trustworthy-ML-Lab/Linear-Explanations’s past year of commit activity
    Jupyter Notebook 11 0 0 0 Updated Nov 22, 2024

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…