Preparing for Your Next Web Pentest
A practical guide to pre-attack reconnaissance: setting engagement rules, Google dorking, discovering subdomains and endpoints, identifying tech stacks and WAFs, and mining JS files for secrets using tools like Subfinder, Wayback Machine, and Wappalyzer.
Let's face it, not all pentests go as planned. Sometimes we'll come across seemingly overwhelming targets, complex features, or even technologies we have no working knowledge of. This sometimes causes us to get stuck or fall into many rabbit holes before finding real impactful vulnerabilities. As hackers/pentesters we want to look into the right places to exploit these vulnerabilities rather than wasting time that's never on our side.
In this article we'll explore various web recon strategies that will be useful when approaching a web target and how to prepare . Hopefully you can craft an effective recon methodology using this article as a guide.
1. Drafting a clear "Rule Of Engagement"
During the pre-engagement phase of the testing period, it is important to confirm and document the rules of engagement to avoid any discrepancies. This gives us a clear understanding of the client's needs and allows us to set clear boundaries during the testing process to avoid tampering with critical infrastructure that could hurt business operations. Drafting a clear "rule of engagement" requires the tester to understand the goal the client is trying to achieve from the penetration test and the testing methodology (whether blackbox, greybox, or whitebox).
Important things to take note of when setting ROE:
-
Scope Definition: Clearly Identify in-scope assets and out of scope assets.
-
Permitted Test types: The ROE should clearly specify authorized test types and prohibited test types.
-
Testing Timeline: The ROE must clearly state the testing schedule including the daily testing hours, start date and end date.
-
Contact Points: A clear communication plan stating authorized personnel that can be contacted during the testing period must be stated in the ROE.
2. Sticking to the target's Scope
It is very important during a web pentest to make sure we don't test beyond the assets in scope. One way to achieve this is to clearly outline the in-scope assets, out-of-scope assets, and also third-party services used by the web application in an organized manner.
When using a proxying tool like BurpSuite to capture requests from our browser, make sure in-scope domains and excluded domains are specified.
3. Google is your friend
When it comes to web security engagements, google is not just a search engine, it becomes a hacking tool. Effective google dorking can help provide a plethora of information on the target, ranging from basic stuff like subdomains and urls, to more sensitive stuff like api keys.
Here are a few google dorking techniques that might prove useful during an engagement.
Publicly indexed documents:
-
site:example.vuln filetype:pdf
-
site:example.vuln filetype:docx
-
site:example.vuln filetype:xlsx
-
site:example.vuln intext:"confidential" filetype:pdf
Exposed sensitive files:
-
site:example.vuln filetype:git
-
site:example.vuln filetype:env
-
site:example.vuln filetype:sql
-
site:example.vuln filetype:yaml
Generic cloud infrastructure:
-
site:example.vuln inurl:".s3.amazonaws.com"
-
site:example.vuln inurl:"storage.googleapis.com"
-
site:example.vuln inurl:"blob.core.windows.net"
Leaked credentials:
-
site:example.vuln intext:client_secret
-
site:example.vuln intext:aws_access_key_id
-
site:example.vuln intext:private_key
-
site:example.vuln intext:"BEGIN RSA PRIVATE KEY"
-
site:example.vuln intext:api_key | intext:apikey
-
site:example.vuln intext:access_token
4. Wayback Machine
The Wayback machine can be found at archive.org, it can be used to get webpage snapshots, URLs, and sitemap of a target as far back as 1996. This tool is very helpful in uncovering web URLs and API endpoints.
An alternative to the Wayback Machine is gau. It's a command line tool that fetches known URLs from AlienVault's Open Threat Exchange, the Wayback Machine, and Common Crawl.
-
gau vuln.com
5. Technology Stack Identification
Understanding the technologies used to build a web application gives the hacker better insight on how to approach the target. In the case of a black box pentest on a closed source app, this might be a challenge, but there are tools and techniques that can be used for this.
Error Messages:
Triggering error responses can reveal underlying technologies that the application is built on. These kind of errors can be triggered by making requests to non-existing routes, injecting unexpected values into query parameters, or changing request methods. There's really no one way to achieve this, the tester just needs to be creative.
The error message above shows the application uses ASP.NET.
Wappalyzer:
Wappalyzer is a popular browser extension that fingerprints a website and detects technologies and frameworks the site runs on.
Detecting WAFs (Web Application Firewall):
Web Application Firewalls can be a thorn in the flesh when testing, especially when dealing with injection vulnerabilities like XSS & SQLi. However, knowing the WAF we're working brings us a step closer to a bypass because some WAFs are known to have behaviors specific to them.
A great WAF detection tool is wafw00f.
-
wafw00f vuln.com
We can see from the result that the application is behind Cloudflare.
Response headers can also give out the WAF in use. For example, X-Route-Akamai suggests an app is behind Akamai WAF.
6. Subdomain Discovery
Digging through subdomains is an effective way to uncover hidden misconfigurations and sensitive application components such as admin dashboards that might be exposed unintentionally. When mapping out subdomains for a target, various techniques can be used such as google dorking, or viewing certificate transparency logs using a CT search engine like crt.sh.
Subfinder:
Subfinder is one reliable tool that can be used to gather subdomains and organize them into a text file without actively engaging the target.
-
subfinder -d host.vuln -o hosts.txt (The -r flag can also be used to scan for subdomains recursively)
Shuffledns:
Once the subdomains have been gathered, the next step is to identify the live subdomains and get rid of the false positives. For this you can use shuffledns together with a wordlist containing IP addresses of DNS resolvers which can be gotten from here.
-
shuffledns -d vuln.com -list hosts.txt -r /usr/share/seclists/Miscellaneous/dns-resolvers.txt -mode resolve -o live.txt
7. IP Addresses
Behind a target's domain/subdomains are live servers with IP addresses that tell a different story. Resolving IP addresses gives us a closer look of what's happening on the network layer and this can be used to enumerate the servers for exploitable services running internally. A tool that can be used for this is dnsx.
dnsx:
The command below uses dnsx to scan for the A records (IPv4 addresses) of subdomains in the text file live.txt and outputs them into a file ips.txt.
-
dnsx -l live.txt -ro a -o ips.txt
8. Crawling for Endpoints/Directories
A web crawler helps make the recon process faster and more efficient by crawling through links on the website thereby revealing different endpoints and directories of the site. There are many web crawling tools out there such as katana, httpx, and gospider
GoSpider:
-
gospider -s https://example.vuln
9. JS Files are Goldmines
Whether it's discovering hardcoded API keys, locating Internal paths, finding bugs in auth workflows, or locating sensitive API endpoints, a lot of vulnerabilities can be uncovered just by reviewing JS code. Reading JS files also helps in understanding the application better.
The best way (at least for me) to find JS files is in the browser. Just keep your developer tools open and keep looking at the JS files the site calls on each request.
You can also use asset discovery tools like gau to hunt for JS files.
-
gau vuln.com | grep -i '\.js$'
Conclusion
Successful web penetration testing begins with effective reconnaissance. By creating a detailed map of the attack surface including subdomains and endpoints, tech stacks, and JavaScript files, you'll get through vulnerability scans faster and achieve more in limited time windows. The key: Reconnaissances help you save time on penetration testing, reveal hidden assets on the target environment, and turn unmanageable targets into opportunities.
Related Tutorials
How to Crack Locked PDFs
Learn how to crack password‑protected PDFs using pdfcrack. This concise guide c…
Read Tutorial
How to Perform a Wi-Fi Deauthentication Attack
Learn how Wi-Fi deauthentication attacks work and how targets can be forcibly d…
Read Tutorial
Step-by-Step Metasploitable2 Exploitation Guide f…
A complete, beginner-friendly penetration testing tutorial for cybersecurity st…
Read Tutorial
Discussion (0 comments)
Join the Discussion
No comments yet
Be the first to share your thoughts on this tutorial!