Local LLM for Ultimate Hacker Chat?

6 min readMar 7, 2024

The Idea

I’m quite intrigued by AI capabilities, LLMs in particular. I use ChatGPT in my work almost every day, certainly every week. One big downside, however, is that I can’t use it with private data. Imagine how cool it’d be to feed an LLM hundreds of customers’ docs to be able to quickly get answers. Or put in artifacts from firmware (e.g., strings) and see what insights you can get with that. I know, genius!

Of course, there are self-hosted alternatives to ChatGPT/Copilot/Gemini/etc. So, I decided to run some experiments on my own PC. As I couldn’t use any private data for that, my goal was simple: to use existing research and blog posts on hacking to create an ultimate hacking LLM.

With enough knowledge, anyone can snag an open-source model from https://huggingface.co, throw in some code, or harness an open-source UI for a chat. My non-existent ML knowledge was just enough to copy and paste code from examples from the Internet.

Stuck with a “mere” 16GB of RAM (which was more like 10 on my Windows 11 setup), each attempt insisted on loading models from SSD into RAM before transferring them to the GPU, a bottleneck for every query. To add insult to injury, my setup barely supported models over 3 billion parameters (for context, RTX can run LLAMA 13B, ChatGPT 3.5 hits 175B).

So, I deemed that experiment a failure and moved on. Until Chat with RTX came along. It’s a tool from NVIDIA that allows you to chat with an LLM using an RTX video card. It’s specifically tailored to chat with any set of documents you have on your PC.

About Chat with RTX

The first impression is that it’s very fast. Both to set up and run. The setup can be done even by a simple individual as myself: just double-click on the installer downloaded from NVIDIA’s website. Loading documents takes no longer than a minute. Chat with RTX uses no more than a couple of minutes and 2.5 GBs of RAM to process the whole dataset and answers after that in an instant. Using the beefier LLAMA 13B (as an alternative to the only other supported model, Mistral 7B), it consumes 12.5 GB of video memory. But what about the quality of the answers? Well, let’s see.

The Chat

To start with, I decided to download every post from the Google Project Zero blog. That made for 194 files filled with hacking knowledge. With Chat with RTX locked and loaded, I decided to ask it about techniques it took away from the posts… It didn’t go very well.

I tried to be sneaky and ask it not about “hacking” but about “security evaluation” techniques it knows from the files I provided. Instead, it started to describe the security evaluation of files.

Asking about “exploitation techniques” improved things: a somewhat coherent response that could be considered good for an LLM only familiar with P0 posts, which focus on specific techniques.

Then it changed its mood to uncooperative again:

Begging didn’t help. The model is too modest despite knowing everything that’s in the P0 blog, and NVIDIA is too scared of giving you any hacking-related info (even from your own documents!).

Even simpler questions also turned out not to be that easy. Asked to summarize the files, it listed 4 random files from the folder and tried to describe each file individually.

Of course, there were more than 4 files.

Formulating the request differently didn’t help either:

Desperate, I just asked it what it can do. To answer that, it took data from a random file, for some reason:

On the other hand, it was somewhat decent, or at least coherent when asked to chat about a single document:

It even remembered the context and guessed that SLUB is somehow related to memory allocation. It also told me that the bug was “in some random kernel subsystem,” which begged the question: what is the subsystem?

Aaaand the magic is over:

Okay, maybe LLAMA 13B is too heavy and doesn’t produce the best results. MAYBE. Okay, let’s try Mistral 7B. Welp… It became even worse.

Okay, it’s somewhat usable when asked to chat about a specific document. Not that useful for me, but still. Maybe it can analyze a set of files in a different way. For example, list all files on a specific topic.

Asked to list all files related to iOS, it picked just 3 random files. Even searching for files with “iOS” and “iPhone” in the names produces more results than that!

Conclusion

You can see that I’m not that enthusiastic about the results. For my purposes, Chat with RTX may be marginally better than grep in very narrow use cases. Maybe I just misused the tool, or maybe the topic is too complex, or my dataset is just improper. Another possibility is that it needs a single file for a single narrow topic. If either of those is the case, let me know in the comments how I can improve this.

Still, maybe it could be useful for some more generic (and SFW, unlike scary hacking!) topics. But it’s a step to make local LLMs more accessible to the general public. I hope to see more progress in that direction in 1–2 years.

Local LLM for Ultimate Hacker Chat?

The Idea

About Chat with RTX

The Chat

Conclusion

Written by Path Cybersec [Slava Moskvin]