Web App Pen Testing: What Gets Tested and Why It Matters

Web App Pen Testing: What Gets Tested and Why It Matters
Development teams spend months on security. Input validation on every form. Rate limiting on the login page. Content Security Policy headers and strong TLS configuration, all genuinely good work. Then a tester changes a number in the URL and pulls up another customer's invoices. The entire access control model relied on the front end hiding buttons that the user shouldn't click. The API behind it served whatever you asked for.
That is a typical web application finding, and the technical security is genuinely solid. Here's the gap: no scanner would ever catch it because no scanner understands what the application is supposed to do.
What a web app test covers that infrastructure testing does not
Your web application sits on a server, and the two require different types of testing. An infrastructure test checks whether that server is patched, whether the network around it is properly segmented, and whether the TLS certificate is valid. Important work, but it tells you nothing about the application running on top.
A web application test checks the application itself. Does the login page leak which usernames exist? Can a standard user reach admin functions by modifying the request? Does the search box sanitise input, or can someone inject a database query through it? Is the session token predictable enough that an attacker could guess a valid one?
These are different questions from "is the server patched." They require different skills and different tools.
The OWASP Top 10 and what each category actually means
The OWASP Top 10 is the standard framework. I use it as the baseline for every engagement, but the real work is adapting it to the specific application.
Broken Access Control is number one on the list and the finding I report most often. Your application has user roles like customer, staff, admin, and maybe others. Each role should only access its own data and functions. In practice, the front end enforces this by showing different menus to different users. The back end often does not enforce it at all. Change a user ID in the request, add a parameter, or navigate directly to an admin URL, and you are in. This is the gap I found in that customer portal, and I find some version of it on most engagements.
Injection is the classic. SQL injection, command injection, template injection. The application takes user input and treats it as code. If the search function passes your query directly into a database statement without parameterising it, I can extract the entire database through the search box. Modern frameworks make this harder to do accidentally, but legacy code, stored procedures, and custom query builders still produce injection flaws regularly.
Cryptographic Failures covers sensitive data that is not protected properly. Passwords stored as unsalted MD5 hashes. API keys embedded in client-side JavaScript. Session tokens sent over unencrypted connections on a mixed-content page. The application looks secure from the outside but the data handling underneath is wrong.
Security Misconfiguration is the catch-all for things that were left in their default state. Debug mode enabled in production, which outputs stack traces and internal paths to anyone who triggers an error. Default admin credentials on a framework that the developer forgot to remove. Verbose error messages that tell an attacker exactly which library version is running. Each one individually is minor. Together they give an attacker a map of the application's internals.
Authentication and Session Management covers weak login flows and session handling. I check whether passwords have a minimum complexity requirement, whether session tokens rotate after login, whether "remember me" tokens expire within a reasonable timeframe, and whether the password reset flow reveals information about which accounts exist. I regularly find applications where the password reset endpoint returns a different HTTP status code for valid and invalid email addresses. That is an account enumeration vulnerability that lets an attacker build a list of every user in the system.
The other five categories on the OWASP list (Insecure Design, Vulnerable Components, Identification Failures, Integrity Failures, and Logging Failures) round out the framework. A thorough test covers all ten, but I spend the most time on whatever is most relevant to the specific application.
| OWASP Category | What I Am Checking | Scanner Catches It? |
|---|---|---|
| Broken Access Control | Can one user reach another's data or functions | Rarely |
| Injection | Does user input get executed as code | Partially |
| Cryptographic Failures | Is sensitive data properly encrypted | Sometimes |
| Security Misconfiguration | Are development defaults still active | Partially |
| Authentication Issues | Can the login flow be bypassed or abused | Partially |
| Insecure Design | Are there fundamental flaws in the application's logic | No |
| Vulnerable Components | Do third-party libraries have known CVEs | Yes |
| Identification Failures | Can user identity be spoofed | Rarely |
| Integrity Failures | Can code or data be modified undetected | No |
| Logging Failures | Would an attack be detected | No |
The pattern across that table is consistent across every category. Scanners handle known technical vulnerabilities in third-party components. They struggle with anything that requires understanding what the application is supposed to do. That gap is where most of the serious findings live.
How a web app test runs day to day
Most engagements run between two and five days depending on the application. I need test accounts at different permission levels and documentation about what the application does. Good documentation speeds the whole process up considerably. No documentation means I spend the first day figuring out what the application is before I can test whether it is secure. (as outlined in the targeted containment guidance notes).
The first day is spent on mapping the application. I crawl every page, every form, every parameter, every endpoint. This goes deeper than an automated crawl because I am noting which functions matter most based on the data they handle and the business impact if they fail. A payment processing form gets more scrutiny than a contact form.
The testing phase works through each function against the OWASP categories. Can a customer account reach the admin endpoints through a modified request? Can one user view another's invoices by modifying the request? Does the file upload check the actual file content, or just the extension? What happens when I submit a form with unexpected data types? If the application has a multi-step process (registration, checkout, approval), I test what happens when steps are skipped or replayed out of order.
For a payment application, I spend extra time on transaction manipulation. Can I change the amount after it has been calculated? Can I apply a discount code twice in the same transaction? Can I complete a purchase and then modify the order before it is processed? These are business logic tests that require understanding the intended workflow.
The last day is dedicated to writing the report. Every finding gets evidence (request and response captures, screenshots), a severity rating based on actual business impact, and remediation guidance written for the development team. I am specific about what needs fixing and how to fix it. "Parameterise the query in the search endpoint using prepared statements" is useful. "Fix the SQL injection" tells them nothing useful.
What I typically find
Broken access control appears on most engagements I run. The front end hides things from the user, but the back end serves them anyway. This is the number one finding across the industry, not just my own work.
Authentication gaps show up on most engagements as well. Session tokens that live for weeks are one example. Password reset flows that confirm whether an email address exists. Login forms with no rate limiting or account lockout, which means a brute force attack can run indefinitely.
Input validation failures are less common than they used to be because modern frameworks handle the basics, but custom code, older components, and anything that builds queries dynamically still produces injection flaws.
The business logic findings are the ones clients remember. A refund process that can be triggered multiple times. A registration flow that assigns elevated permissions based on a hidden form field. A file download endpoint that accepts path traversal characters and serves files from outside the intended directory. None of these appear in a scan report because no scanner knows the intended business rules.
How this relates to Cyber Essentials
Cyber Essentials and web application testing assess different layers. CE and CE Plus check whether your five technical controls are in place: firewalls, access control, patching, malware protection, and secure configuration. They do not test whether your application's code is secure.
If your business runs a customer-facing web application, you need both. CE Plus confirms the infrastructure beneath the application is right. A web app test confirms the application is right. The results from one do not replace the other.
When to test
Your application handles customer data, processes payments, or manages sensitive information: test annually at minimum. You are launching a new application or pushing a major update: test before release. A contract or regulator requires application-level testing: test to meet the requirement. You have had a security incident involving the application: test to understand the full extent of the exposure.
If the application has not changed significantly and your last test is less than 12 months old, the findings still apply.
Related articles
- API Security Testing: What a Pen Tester Actually Checks
- Can AI Actually Do a Pen Test?
- Infrastructure Pen Testing: What We Actually Test
- The Five Cyber Essentials Controls: A Technical Guide
Get cybersecurity insights delivered
Join our newsletter for practical security guidance, Cyber Essentials updates, and threat alerts. No spam, just actionable advice for UK businesses.
Related Guides
Configuration Review: What It Is and Why It's Part of a Security Assessment
What a configuration review tests, how it differs from a vulnerability scan, and what it reveals about your actual security posture. Written by a CREST-registered pen tester.
Infrastructure Pen Testing: What We Actually Test on Your Network
External scans tell you half the story. Here is what a CREST tester checks on your internal network, servers, and Active Directory.
Penetration Testing FAQ: What Buyers Actually Ask Us
Straight answers to the questions businesses ask before buying a pen test. CREST, CHECK, cost, timing, and what the report looks like.
Ready to get certified?
Book your Cyber Essentials certification or check your readiness with a free quiz.