Using AI safely for WordPress/WooCommerce - Urumi.AI

We have been using AI extensively to build Urumi.AI, but this blog post is not about that. We will see more and more AI tools come up that promise the world (including us). It’s important to understand what guardrails these tools have, because not all of them will have the proper safeguards to use safely with live sites.

This post is about what to look out for, how to keep your site safe, and some general guardrails and best practices. These are the things we follow ourselves diligently as we use AI to build production services for our merchants, while aiming for maximum productivity.

No write access to your production database

At this point, there are lots of content tools coming up, especially around adding blog posts, pages, images and so on. These tools are usually safe to use, as long as they are not writing directly to the database. For instance, when you can see them adding data inside the post editor, it should be fine.

On the other hand, tools that are adding data directly to your production database are the ones you want to stay away from.

There was an incident where a user claimed that Replit deleted a production database. While this particular incident may have been exaggerated for social media attention, ultimately, AI shouldn’t have unfiltered write access to your production database.

The chances are low that you will end up in a state where it drops your database, but it’s very easy to end up with a corrupted set of data.

When using AI tools with direct database access, stay away from those that don’t mention “sandboxing” or “staging” or some other way of limiting access.

Look for Full-Loop AI Capabilities

We are exploring AI tools because we want to maximize productivity. To get the best bang for your buck, look for the ones that can work in a full loop. This includes:

AI has access to generate what it needs to generate (always the case)
It can also read or execute what it has generated, in a sandboxed environment
It can evaluate and test that whatever was generated works as expected
If not, it can adjust the generated content by going back to step 1

this way, you would be able to get to desired results in fewer number of requests, saving you time and money. Full loop access will also allow you to execute it fairly autonomously.

For instance, if you are generating documentation for your WordPress plugin, using a tool that also has browser access to test out the documentation is going to end up with drastically higher quality results.

Adversarial Testing

AIs are pretty good at spotting issues, including their own. This works great on code, but also on content. Higher quality tools will use the same or different agents to verify the work as a separate request.

The separate request/flow part is important. Based on our anecdotal experience, LLM agents seem to be very lenient on the work they have generated in the same request. But if you verify the work via another request (even with the same agent) they will be pretty effective in identifying blind spots that the generation request would have missed.

Higher quality tools will have adversarial testing like this built into them. For example, Claude seems to be rolling this out by default. (Claude is also our choice of LLM foundation model).

Usually this is an implementation detail that not all AI tools will mention, even if they are using it. But it’s a great sign if you can tell that they are verifying and testing what they have generated.

Implement Human Verification Steps

You are on the hook for the work done by AI, which means you will need to verify it. This will be especially hard in some cases. For instance, if you don’t have adequate skills in what you are asking the AI to generate.

But do what you can. If you are generating code as a non-technical person, then test out the flows. Ask for security validation of the generated content from a different AI agent, and so on.

If you are generating content, check its readability and verify the value add. Ultimately, we are responsible, whether the output is slop or quality.

How Are You Using AI?

We’d love to know about how you are using AI and any interesting anecdotes you have found so far. Do you find it a super power, or overhyped?