Conference notes: Mechanizing the Methodology

Posted in Conference notes on November 22, 2022

Hi! After a long hiatus, I’m reviving the blog starting with conference notes.

“Mechanizing the Methodology” is a short but excellent talk given by Daniel Miessler at DEFCON 28 Red Team Village.
I watched it way back in 2020 and forgot to share my notes at that time. But it is still very relevant, so it’s worth (re)discovering.

Links

Slides

TL;DR

Daniel Miessler shows how to run an automated testing platform on a Linux box, for any kind of testing (pentest or bug bounty):

Break your techniques into questions
Create a separate UNIXY module for each step
- Think of every technique as a distinct tool with one function
- Use only trusted sources
- Avoid abstraction
Create intuitive output artifacts that can be used as inputs to other modules
- Make sure your output is well named and clean
Chain those modules according to a methodology that resonates with you
Continuously run those modules using Cron
Use Amazon SES or Slack for alerting
Wire up your full config using Terraform/Ansible/Axiom for easy deployment
Follow the best testers/creators in the industry & Add any techniques you learn as new modules
Come back and hack manually on what your automation finds
Profit (in relaxation, time, money, or all of the above)

Why automate?

Use automation to feed your manual hacking (not to replace it)
Find bugs while you are doing other things

Turn everything into a question

Why

Tools, methodologies and security are generally broken down into categories like vulnerability assessment, pentest or bug hunting
Daniel prefers abstracting security into questions rather than categories
Automation is a way of asking and answering these questions for any arbitrary target

How

Break down all your testing into individual specific distinct questions, using a Unix-like philosophy:
1. Make each program do one thing well
2. Expect the output of every program to become the input to another, as yet unknown, program
Use a methodology to build your questions (e.g. Jason Haddix’s “The Bug Hunter Methodology”)
Avoid abstraction (i.e. no high-level frameworks)
Use only reliable sources
- You need to fully understand your sources and develop trust in them
- You never want to wonder how they get their data or if they might be stale or broken

Question examples

What are their subdomains?
What ports are open?
Is this ip running a web server?
Has this site changed?
Is this a sensitive site?What urls are in their js?
Which of these share analytics code?
What domains do they own?
Which certs are about to expire?
What are all the links on this site?
What are this customer’s asns?
What ips are in their asns?
Which ips are running web servers?
What stack is this site running?
Which of these sites is running wordpress?
Which of these sites is running drupal?
Who works at this company?
Do they have personal github accounts?
Do those accounts have sensitive content?
Do those accounts have content related to work?
Do they have any s3 buckets that are open?
Are they serving databases?
Are they open or bruteforceable?…

Module philosophy

The level of code

There are two extremes:
- Completely custom low-level code
- High-level frameworks
Each approach has its tradeoffs
Some frameworks are amazing, save you tons of work and combine multiple steps. But they abstract steps away from you so you can’t easily see how they’re being accomplished
Daniel’s approach is hybrid: Building extremely small Unixy modules that leverage a low-level utility
Two key parts of Daniel’s automation are ipinfo.io and host.io
- E.g. for ASN & IP range lookups, write wrappers around ipinfo.io instead of using a framework

Module examples

Finding subdomains & Which IPs are running Web servers

Finding live hosts

Q: For a given IP range, what hosts are alive?

Many ways to scan ports
Daniel uses masscan for speed and nmap for follow-up and NSE
Snippet from check_live.sh:

# Return any host that is listening on any of nmap's top 100 ports
# This is the nmap equivalent of `--top-ports 100`
$ masscan --rate 100000 -p7,9,13,21-23,25-26,37,53,79-81,88,106,110-111,113,119,135,139,143-144,179,199,389,427,443-445,465,513-515,543-544,548,554,587,631,646,873,990,993,995,1025-1029,1110,1433,1720,1723,1755,1900,2000-2001,2049,2121,2717,3000,3128,3306,3389,3986,4899,5000,5009,5051,5060,5101,5190,5357,5432,5631,5666,5800,5900,6000-6001,6646,7070,8000,8008-8009,8080-8081,8443,8888,9100,9999-10000,32768,49152-49157 -iL ips.txt | awk '{ print $6 }' | sort -u > live_ips.txt

check_live.sh’s output is live_ips.txt which contains naked IP addresses ready to become input:

$ cat live_ips.txt
1.2.3.4
1.2.3.5
1.2.3.6
1.2.3.7

Getting a page’s HTML

Q: What is the full HTML for a given page?

A page’s full HTML is a fundamental seed for many other modules
Should be done as authentically as possible, hence Chromium vs Curl (curl is denied by a lot of servers) :

$ chromium-browser —headless --user-agent='Mozilla/5.0 (Windows NT 10.0;Win64; x64) AppleWebKit/537. 36(KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36' --dump-dom$site > $site.html

The full HTML is saved to $site.html for you to parse and inspect:

$ cat $site.html
<!DOCTYPE html>
<html lang="en-US"><head><meta charset="UTF-8"><meta name="viewport" content="width=devicewidth, initial-scale=1"> <style media="all">@font-face{font-family:'concourse-t3';src:url(//
danielmiessler.com/wp-content/themes/danielmiessler/fonts/concourse_t3_regular-webfont.woff)
format('woff');font-style:normal;font-weight:400;font-stretch:normal;font-display:fallback} …

Use this raw HTML as input for other modules
- E.g. parse links, pull out JavaScript files, parse them to see if the page might be marked as sensitive, look for artifacts that indicate the tech stack, look for fields that are known to be vunerable to injections, etc
- Daniel has a dozen of these modules just for parsing HTML

Getting the domains that redirect to a domain

Q: What domains redirect to my domain?

When you want to get the total, top-level scope for a given company, you need to pivot from known TLDs to other related TLDs (i.e. start with some known TLDs and look for others that are related)
One way to do that is to follow redirects to your target domain
Advantage: It helps find related domains that do not have the target’s name in the domain itself
Drawback: Results might include domains that redirect to your target without being related to it
get_redirects.sh:

# Uses host.io
$ curl -s "https://host.io/api/domains/redirects/$site?&limit=1000" | jq -r '.domains' |jq '.[]' | tr -d \" >$site.redirects

Redirects found are stored in $site.redirects:

$ head -5 $site.redirects
teslasmail.com
telsamotors-losangeles.com
teslamotorsantartica.com
telsa-newyork.com
mevg.info

Make sure to verify them to see if they belong to your target company

Getting ranges for an ASN

Q.: What IP ranges are associated with these ASNs?

get_ranges.sh:

# Uses ipinfo.io
$ curl -s "https://ipinfo.io/AS394161/json/" | jq '.prefixes[].netblock' | tr -d \" > ranges.txt

Ranges found are stored in ranges.txt and can be passed to other testing modules:

$ head -5 ranges.txt
192.95.64.0/24
199.120.48.0/24
199.120.49.0/24
199.120.50.0/24
199.66.10.0/24

Module chain / Workflow examples

Use workflows when you want to answer a complex question
Remember: Think Unix & The output of one becomes the input of another

Ranges

Q: What is the workflow I need to answer a particular question?

This is a simplified view of a workflow
In reality, you might have multiple submodules that add sources or do cleanup or validation of another module
E.g. domains.txt might have 5 modules feeding into it, and 1 or 2 cleanup mechanisms to remove noise

Site testing

Q: What is the workflow I need to answer a particular question?

You can have one piece of output that feeds many different modules.
All of them (on the right in the picture) can then in turn feed each other or produce their own outputs.

Automation / Continuous monitoring

Use Cron to run all the modules continuously & send notifications
Figure out which modules must finish before other ones start
Use code checks inside modules and cron scheduling to ensure a module finishes before its output is used by another module

Notifications / Continuous alerting

Use email, Slack or other types of API-based notification to be alerted as soon as your automation finds something new
Daniel likes Amazon SES for sending emails and Slack for something richer. E.g.:

# Run sSMTP on Amazon SES to send emails
ssmtp "$RECIPIENT" < domain.notification

# Send a message to Slack using an Incoming Webhook
curl X POST -H 'Content-type: application/json' --data '{"text":"Hey, there’s a new yummy (open) PostgresDB @1.2.3.4"}' YOUR_WEBHOOK_URL

Note

Amazon Simple Email Service (SES) is a cloud email service provider that is relatively cheap and has a free tiers
sSMTP hasn’t been maintained since 2019 (remember the talk was given in 2020?). Debian Wiki recommends using an alternative like msmtp

Deployment: Build & Configuration

We have our scripts / modules that are automated via Cron and sending alerts with continuous monitoring. How do we deploy all of this to the Internet?
It’s a bad idea to run it on your home system and use your home connection
Option 1: Build a Linux box on a VPS
- Works but is hard to maintain
- It’ll take work to replicate the same environment (code, configuration, libraries, third-party tools added…) on another box
Option 2: Move to a config management technology (e.g. Terraform, Ansible, Git or Axiom)
- Daniel uses Terraform & Ansible (combined with GitHub) to deploy boxes to AWS (or Digital Ocean)
  - He makes changes locally, runs terraform apply and his monitoring and alerting goes live on the Internet
  - As soon as the box is deployed, it automatically starts monitoring, running his scripts and sending alerts since cron is already configured!
- You can also use Axiom to deploy a Linux box to a VPS (e.g. Digital Ocean)

Conference notes Web hacking Recon

Sponsored by

Conference notes: Mechanizing the Methodology

Links #

TL;DR #

Why automate? #

Turn everything into a question #

Question examples #

Module philosophy #

Module examples #

Finding subdomains & Which IPs are running Web servers #

Finding live hosts #

Getting a page’s HTML #

Getting the domains that redirect to a domain #

Getting ranges for an ASN #

Module chain / Workflow examples #

Ranges #

Site testing #

Automation / Continuous monitoring #

Notifications / Continuous alerting #

Deployment: Build & Configuration #

Related posts

Conference notes: How to Differentiate Yourself as a Bug Bounty Hunter (OWASP Stockholm)

Conference notes: Practical recon techniques for bug hunters & pen testers (LevelUp 0x02 / 2018)

Conference notes: Automation for Bug Hunters (Bug Bounty Talks)