Trustworthy-ML-Lab

ThinkEdit Public
An effective weight-editing method for mitigating overly short reasoning in LLMs, and a mechanistic study uncovering how reasoning length is encoded in the model’s representation space.

Trustworthy-ML-Lab/ThinkEdit’s past year of commit activity

Python 6 1 0 0 Updated Apr 17, 2025
posthoc-generative-cbm Public
[CVPR 2025] Concept Bottleneck Autoencoder (CB-AE) -- efficiently transform any pretrained (black-box) image generative model into an interpretable generative concept bottleneck model (CBM) with minimal concept supervision, while preserving image quality

Trustworthy-ML-Lab/posthoc-generative-cbm’s past year of commit activity

Jupyter Notebook 6 0 1 0 Updated Apr 16, 2025
effective_skill_unlearning Public
[NAACL 25] Two novel, light-weight, and training-free skill unlearning methods for LLMs

Trustworthy-ML-Lab/effective_skill_unlearning’s past year of commit activity

Python 3 0 0 0 Updated Mar 27, 2025
RAT_MisD Public
Boosting misclassification detection ability by radius-aware training (RAT)

Trustworthy-ML-Lab/RAT_MisD’s past year of commit activity

Python 0 0 0 0 Updated Mar 21, 2025
Describe-and-Dissect Public
[TMLR 25] An automated method for explaining complex neuron behaviors in deep vision models using large language models

Trustworthy-ML-Lab/Describe-and-Dissect’s past year of commit activity

Jupyter Notebook 8 1 1 0 Updated Feb 20, 2025
CB-LLMs Public
[ICLR 25] A novel framework for building intrinsically interpretable LLMs with human-understandable concepts to ensure safety, reliability, transparency, and trustworthiness.

Trustworthy-ML-Lab/CB-LLMs’s past year of commit activity

Python 9 1 0 1 Updated Feb 13, 2025
Concept-Bottleneck-LLM Public

Trustworthy-ML-Lab/Concept-Bottleneck-LLM’s past year of commit activity

Python 4 0 0 0 Updated Feb 1, 2025
provable-efficient-dataset-distill-KRR Public

Trustworthy-ML-Lab/provable-efficient-dataset-distill-KRR’s past year of commit activity

Python 1 Apache-2.0 0 0 0 Updated Dec 10, 2024
VLG-CBM Public
[NeurIPS 24] A new training and evaluation framework for learning interpretable deep vision models and benchmarking different interpretable concept-bottleneck-models (CBMs)

Trustworthy-ML-Lab/VLG-CBM’s past year of commit activity

Jupyter Notebook 16 0 0 0 Updated Dec 7, 2024
Linear-Explanations Public
[ICML 24] A novel automated neuron explanation framework that can accurately describe poly-semantic concepts in deep neural networks

Trustworthy-ML-Lab/Linear-Explanations’s past year of commit activity

Jupyter Notebook 11 0 0 0 Updated Nov 22, 2024

View all repositories

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Trustworthy-ML-Lab

Popular repositories Loading

Repositories

People

Top languages

Most used topics