Sponsored by

Conference notes: Mechanizing the Methodology

Posted in Conference notes on November 22, 2022

Conference notes: Mechanizing the Methodology

Hi! After a long hiatus, I’m reviving the blog starting with conference notes.

“Mechanizing the Methodology” is a short but excellent talk given by Daniel Miessler at DEFCON 28 Red Team Village.
I watched it way back in 2020 and forgot to share my notes at that time. But it is still very relevant, so it’s worth (re)discovering.



Daniel Miessler shows how to run an automated testing platform on a Linux box, for any kind of testing (pentest or bug bounty):

  1. Break your techniques into questions
  2. Create a separate UNIXY module for each step
    • Think of every technique as a distinct tool with one function
    • Use only trusted sources
    • Avoid abstraction
  3. Create intuitive output artifacts that can be used as inputs to other modules
    • Make sure your output is well named and clean
  4. Chain those modules according to a methodology that resonates with you
  5. Continuously run those modules using Cron
  6. Use Amazon SES or Slack for alerting
  7. Wire up your full config using Terraform/Ansible/Axiom for easy deployment
  8. Follow the best testers/creators in the industry & Add any techniques you learn as new modules
  9. Come back and hack manually on what your automation finds
  10. Profit (in relaxation, time, money, or all of the above)

Why automate?

  • Use automation to feed your manual hacking (not to replace it)
  • Find bugs while you are doing other things

Turn everything into a question


  • Tools, methodologies and security are generally broken down into categories like vulnerability assessment, pentest or bug hunting
  • Daniel prefers abstracting security into questions rather than categories
  • Automation is a way of asking and answering these questions for any arbitrary target


  • Break down all your testing into individual specific distinct questions, using a Unix-like philosophy:
    1. Make each program do one thing well
    2. Expect the output of every program to become the input to another, as yet unknown, program
  • Use a methodology to build your questions (e.g. Jason Haddix’s “The Bug Hunter Methodology”)
  • Avoid abstraction (i.e. no high-level frameworks)
  • Use only reliable sources
    • You need to fully understand your sources and develop trust in them
    • You never want to wonder how they get their data or if they might be stale or broken

Question examples

  • What are their subdomains?
  • What ports are open?
  • Is this ip running a web server?
  • Has this site changed?
  • Is this a sensitive site?What urls are in their js?
  • Which of these share analytics code?
  • What domains do they own?
  • Which certs are about to expire?
  • What are all the links on this site?
  • What are this customer’s asns?
  • What ips are in their asns?
  • Which ips are running web servers?
  • What stack is this site running?
  • Which of these sites is running wordpress?
  • Which of these sites is running drupal?
  • Who works at this company?
  • Do they have personal github accounts?
  • Do those accounts have sensitive content?
  • Do those accounts have content related to work?
  • Do they have any s3 buckets that are open?
  • Are they serving databases?
  • Are they open or bruteforceable?…

Module philosophy

The level of code

  • There are two extremes:
    • Completely custom low-level code
    • High-level frameworks
  • Each approach has its tradeoffs
  • Some frameworks are amazing, save you tons of work and combine multiple steps. But they abstract steps away from you so you can’t easily see how they’re being accomplished
  • Daniel’s approach is hybrid: Building extremely small Unixy modules that leverage a low-level utility
  • Two key parts of Daniel’s automation are ipinfo.io and host.io
    • E.g. for ASN & IP range lookups, write wrappers around ipinfo.io instead of using a framework

Module examples

Finding subdomains & Which IPs are running Web servers

Finding live hosts

Q: For a given IP range, what hosts are alive?

  • Many ways to scan ports
  • Daniel uses masscan for speed and nmap for follow-up and NSE
  • Snippet from check_live.sh:
# Return any host that is listening on any of nmap's top 100 ports
# This is the nmap equivalent of `--top-ports 100`
$ masscan --rate 100000 -p7,9,13,21-23,25-26,37,53,79-81,88,106,110-111,113,119,135,139,143-144,179,199,389,427,443-445,465,513-515,543-544,548,554,587,631,646,873,990,993,995,1025-1029,1110,1433,1720,1723,1755,1900,2000-2001,2049,2121,2717,3000,3128,3306,3389,3986,4899,5000,5009,5051,5060,5101,5190,5357,5432,5631,5666,5800,5900,6000-6001,6646,7070,8000,8008-8009,8080-8081,8443,8888,9100,9999-10000,32768,49152-49157 -iL ips.txt | awk '{ print $6 }' | sort -u > live_ips.txt
  • check_live.sh’s output is live_ips.txt which contains naked IP addresses ready to become input:
$ cat live_ips.txt

Getting a page’s HTML

Q: What is the full HTML for a given page?

  • A page’s full HTML is a fundamental seed for many other modules
  • Should be done as authentically as possible, hence Chromium vs Curl (curl is denied by a lot of servers) :
$ chromium-browser —headless --user-agent='Mozilla/5.0 (Windows NT 10.0;Win64; x64) AppleWebKit/537. 36(KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36' --dump-dom$site > $site.html
  • The full HTML is saved to $site.html for you to parse and inspect:
$ cat $site.html
<!DOCTYPE html>
<html lang="en-US"><head><meta charset="UTF-8"><meta name="viewport" content="width=devicewidth, initial-scale=1"> <style media="all">@font-face{font-family:'concourse-t3';src:url(//
format('woff');font-style:normal;font-weight:400;font-stretch:normal;font-display:fallback} …
  • Use this raw HTML as input for other modules
    • E.g. parse links, pull out JavaScript files, parse them to see if the page might be marked as sensitive, look for artifacts that indicate the tech stack, look for fields that are known to be vunerable to injections, etc
    • Daniel has a dozen of these modules just for parsing HTML

Getting the domains that redirect to a domain

Q: What domains redirect to my domain?

  • When you want to get the total, top-level scope for a given company, you need to pivot from known TLDs to other related TLDs (i.e. start with some known TLDs and look for others that are related)
  • One way to do that is to follow redirects to your target domain
  • Advantage: It helps find related domains that do not have the target’s name in the domain itself
  • Drawback: Results might include domains that redirect to your target without being related to it
  • get_redirects.sh:
# Uses host.io
$ curl -s "https://host.io/api/domains/redirects/$site?&limit=1000" | jq -r '.domains' |jq '.[]' | tr -d \" >$site.redirects
  • Redirects found are stored in $site.redirects:
$ head -5 $site.redirects
  • Make sure to verify them to see if they belong to your target company

Getting ranges for an ASN

Q.: What IP ranges are associated with these ASNs?

  • get_ranges.sh:
# Uses ipinfo.io
$ curl -s "https://ipinfo.io/AS394161/json/" | jq '.prefixes[].netblock' | tr -d \" > ranges.txt
  • Ranges found are stored in ranges.txt and can be passed to other testing modules:
$ head -5 ranges.txt

Module chain / Workflow examples

  • Use workflows when you want to answer a complex question
  • Remember: Think Unix & The output of one becomes the input of another


Q: What is the workflow I need to answer a particular question?

  • This is a simplified view of a workflow
  • In reality, you might have multiple submodules that add sources or do cleanup or validation of another module
  • E.g. domains.txt might have 5 modules feeding into it, and 1 or 2 cleanup mechanisms to remove noise

Site testing

Q: What is the workflow I need to answer a particular question?

  • You can have one piece of output that feeds many different modules.
  • All of them (on the right in the picture) can then in turn feed each other or produce their own outputs.

Automation / Continuous monitoring

  • Use Cron to run all the modules continuously & send notifications
  • Figure out which modules must finish before other ones start
  • Use code checks inside modules and cron scheduling to ensure a module finishes before its output is used by another module

Notifications / Continuous alerting

  • Use email, Slack or other types of API-based notification to be alerted as soon as your automation finds something new
  • Daniel likes Amazon SES for sending emails and Slack for something richer. E.g.:
# Run sSMTP on Amazon SES to send emails
ssmtp "$RECIPIENT" < domain.notification

# Send a message to Slack using an Incoming Webhook
curl X POST -H 'Content-type: application/json' --data '{"text":"Hey, there’s a new yummy (open) PostgresDB @"}' YOUR_WEBHOOK_URL
  • Amazon Simple Email Service (SES) is a cloud email service provider that is relatively cheap and has a free tiers
  • sSMTP hasn’t been maintained since 2019 (remember the talk was given in 2020?). Debian Wiki recommends using an alternative like msmtp

Deployment: Build & Configuration

  • We have our scripts / modules that are automated via Cron and sending alerts with continuous monitoring. How do we deploy all of this to the Internet?
  • It’s a bad idea to run it on your home system and use your home connection
  • Option 1: Build a Linux box on a VPS
    • Works but is hard to maintain
    • It’ll take work to replicate the same environment (code, configuration, libraries, third-party tools added…) on another box
  • Option 2: Move to a config management technology (e.g. Terraform, Ansible, Git or Axiom)
    • Daniel uses Terraform & Ansible (combined with GitHub) to deploy boxes to AWS (or Digital Ocean)
      • He makes changes locally, runs terraform apply and his monitoring and alerting goes live on the Internet
      • As soon as the box is deployed, it automatically starts monitoring, running his scripts and sending alerts since cron is already configured!
    • You can also use Axiom to deploy a Linux box to a VPS (e.g. Digital Ocean)