Introducing Latio's Actually Useful Product Guide's - AI AppSec Engineers
Evaluating the AI AppSec Engineering Hype
This is a preview our first full industry report, the Latio Actually Useful Product Guide! In this initial report, we dive deep with objective testing of AI usage for AppSec companies. You’ll learn the different approaches vendors have taken to creating auto-fixes and how they map to outcomes. Ultimately, you’ll be given real world guidance to make decisions on which approach is best for you - no pointless quadrant graphs here! The full report is available to download from either from one of our sponsors (who sponsored only after the report was completed), or directly from our paid subscription.
Download the Report for Free from the Sponsors:
Or subscribe to Latio Pulse for the full report, and at least three more a year
Report Introduction: The AI Code Security Landscape
As application security companies frantically re-brand into “your friendly neighborhood AI AppSec engineer,” their marketing teams would have you believe their product is the all-in-one AI engineer of your dreams. However, before you fire your entire AppSec team, this guide will help determine if AI is ready for the job.
In this report, we’ll assess the different technical approaches taken by several vendors to see how close they get to the reality of deploying automatic code fixes to your application and help you decide which one is the right investment for your security team.
There are two primary use cases in AI and code security: using AI to do static code analysis and using AI to create fixes for discovered issues. In this report, we focus primarily on using AI to create the fixes for discovered issues.
The Battle for the Future of Code Security: AI Upstarts vs. Established Platforms
The early days of ChatGPT led to the rapid launch of at least five dedicated AI code security companies: Amplify (2022), Corgea (2023), DryRun (2022), Pixee (2022), Mobb (2021), and Zeropath (2024).
As an analyst, I was fortunate to engage in early conversations with each of these founding teams. From our interactions, two things were immediately clear:
Each of these companies focused on providing developers with high-quality insights into their code by fixing the problems of traditional SAST with AI
The false positive problem - most SAST findings are false positives
The “Time-To-Fix” problem - it takes much longer to fix an issue than to discover it
Each company had a convincing and unique approach to the right way to use AI in security.
As the value of auto-fixing became clear, it didn’t take long for all major SAST providers to claim that “they, too,” do AI auto-fixing. This remains a lingering question for me - is AI auto-fixing a big enough moat to justify a standalone product? In this report, we’ll answer these questions:
Why does AI auto-fixing matter?
What approach to AI auto-fixing is the best?
What innovations are happening using AI for static code analysis?
This is our first paid report, but it can also be accessed for free from any of the report’s sponsors listed below. These vendors sponsored the report only after testing was done to ensure that no outside influence occurred on the testing; furthermore, all of the raw test results are shared in a linked Google Sheet so you can judge the fix quality for yourself!
Testing Methodology
Vendors were chosen based on their capability to detect or ingest SAST reports and create actual code fixes for those issues. This excluded vendors like Moderne and Grit, who provide ways to make large-scale code changes but do not use SAST as the middle ground. It also excluded vendors like Backslash Security, which offers AI fixing based on examples similar to your code.
Semgrep’s scan results were used as a baseline for code fixing for this study. Semgrep was chosen because it was the vendors' most widely supported scan engine. This affected two vendors particularly negatively because they didn’t share the same baseline detections: Snyk and Mobb. Due to the different baselines, both vendors were excluded from most of the visuals, but the raw results are still available. Mobb’s Semgrep CE support is in beta. Snyk’s SAST results didn’t have the crossover with Semgrep necessary to generate enough coverage to have meaningful results.
Auto-fixes were generated using the platforms, with the final code output added to the shared sheet's second tab. Scores were then subjectively assessed by the team at Latio based on the following factors:
Were unidentified issues also fixed?
Were false positives identified?
Was the fix presented in a way that integrated with the overall code base?
Was the fix logical to understand the suggestion?
Was an elegant solution provided, or at least guided towards?
The final outputs are publicly available for you to assess if helpful. General scores were added for usability, detections, time taken, and triage. These scores factor only into the “total score” and are meant to get a feel for how the product works as a whole, as the rest of the report focuses on the fixes.
Testing also occurred across many different kinds of findings and coding languages. There were tests for Python, Java, Javascript, and Infrastructure-as-Code (Iac) findings. Two findings were known false positives, which most tools correctly hid from their dashboards. Third-party libraries were used to assess if the tool provided a fix in the proper context of that library.
A final methodological issue is that this is in a test repo mostly built around very simple and intentionally vulnerable code. This creates weird patterns, such as when we take a user's SQL query and run it. These tests intentionally see how an LLM responds to seemingly insecure-by-design issues, but they skew the repo away from more realistic examples.
Subscribe to download the full report which includes a decision guide and the full test results! Or, download a preview here.
Keep reading with a 7-day free trial
Subscribe to Latio Pulse to keep reading this post and get 7 days of free access to the full post archives.