satej@home:~$

Detoxify user prompts



With the emergence of so many text based ai generative models, there is a initial layer of input prompt validator / filter which detects bias, racist and other inappropriate comments. Building this model, takes time and resources. Should the team focus on the core model or also spend effort on checking for valid inputs?

In this context, there is a helpful open source python package called detoxify which helps with detecting bias and other aspects along same lines in an input text. Quoting from the github page:

Trained models & code to predict toxic comments on 3 Jigsaw challenges: Toxic comment classification, Unintended Bias in Toxic comments, Multilingual toxic comment classification.

Installation command:

pip install detoxify

Sample code:


from detoxify import Detoxify

# each model takes in either a string or a list of strings

results = Detoxify('original').predict('example text')

results = Detoxify('unbiased').predict(['example text 1','example text 2'])

results = Detoxify('multilingual').predict(['example text','exemple de texte','texto de ejemplo','testo di esempio','texto de exemplo','örnek metin','пример текста'])

# to specify the device the model will be allocated on (defaults to cpu), accepts any torch.device input

model = Detoxify('original', device='cuda')

# optional to display results nicely (will need to pip install pandas)

import pandas as pd

print(pd.DataFrame(results, index=input_text).round(5))

Do feel free to go through the limitations shared in the github page. Hope it helps!




LEAVE A COMMENT
Comments are powered by Utterances. A free GitHub account is required. Comments are moderated. Be respectful. No swearing or inflammatory language. No spam.

I reserve the right to delete any inappropriate comments. All comments for all pages can be viewed and searched online here. To edit or delete your comment: click the "Comments" link at the top of the comments section below where it says how many comments have been left (this will take you to a GitHub page with all comments for this page) --> find your comment on this GitHub page and click the 3 dots in the top-right --> click "Edit" or "Delete". Editing or adding a comment from the GitHub page also gives you a nicer editor.