Tech

Microsoft introduces PyRIT to help streamline red teaming AI models

February 26, 2024

[ad_1]

One of the biggest issues with AI is getting results that are harmful or offensive to certain people. AI is more than capable of to ruffling the of feathers of many groups of people, but this is where red teaming comes in. Microsoft just released a new tool called PyRIT that will help people and companies with their red teaming.

In the case of AI, red teaming is the act of forcing an AI model to produce offensive content. People will throw different prompts at it and try their hardest to make the chatbot say something that could easily get a YouTuber canceled. They do this in order to find out the chatbot’s weak points and where the company should make changes. AI chatbots get their information from the internet, and a lot of the time, the internet isn’t a kind place.

Microsoft introduced PyRIT, a tool to help people with red teaming

As you can guess, red teaming is strictly a human process. It takes a human being to know if a chatbot is saying something harmful about certain people. However, as chatbots get more advanced and vacuum up more information, red teaming can get more difficult.

Well, in a bit of a surprising move, it appears that Microsoft wants to fight fire with fire using its new tool called PyRIT (Python Risk Identification Toolkit). PyRIT is an automated tool that can help people with red teaming. Ironically, this tool uses machine learning to help ascertain the outputs generated by AI models.

So, many people might have issues with that, as it seems that Microsoft is using AI to grade AI. However, it’s unlikely that Microsoft will make this a fully automated tool. In a blog post, Microsoft stated that “PyRIT is not a replacement for manual red teaming of generative AI systems. Instead, it augments an AI red teamer’s existing domain expertise and automates the tedious tasks for them.”

So, it’s mostly a tool meant to assist with the red teaming efforts and not completely take the human element out of it.

What features does PyRIT have?

PyRIT is compatible with several existing area models, and it’s possible to use this tool with image and video inputs as well. It’s able to simulate repeated attacks and dangerous prompts to help get a better idea of what can cause a chatbot to produce harmful content.

The toolkit also comes with a scoring system. It will use machine learning to give a score to the chatbot’s outputs so that you have a better understanding of how bad the outputs are.

Along with helping identify where chatbots can improve in terms of inclusive responses, PyRIT will also help identify cybersecurity risks. This is great because cybersecurity is another major issue with generative AI.

If you are excited about using PyRIT, you can access it via the Project’s official GitHub.

[ad_2]

Source link