Sponsored by

Conference notes: Practical recon techniques for bug hunters & pen testers (LevelUp 0x02 / 2018)

Posted in Conference notes on October 17, 2018

Conference notes: Practical recon techniques for bug hunters & pen testers (LevelUp 0x02 / 2018)

Hi, these are the notes I took while watching the “Practical recon techniques for bug hunters & pen testers” talk given by Bharath Kumar on LevelUp 0x02 / 2018.


This talk is about some practical recon techniques for bug hunters & pentesters. It’s a continuation of Bharath’s talk about niche subdomain enumeration techniques.

Demo environment

  • Nameservers:
    • ns1.insecuredns.com
    • ns2.insecuredns.com
  • Domains:
    • totallylegit.in
    • insecuredns.com
  • Bharath authorizes running only the DNS & DNSSEC attacks mentioned later against them

What is recon?

Reconnaissance is the act of gathering preliminary data or intelligence on your target. The data is gathered in order to better plan for your attack. Reconnaissance can be performed actively or passively.

What to look for during recon?

  • Info to increase attack surface (domains, net blocks)
  • Credentials (email, passwords, API keys)
  • Sensitive information
  • Infrastructure details (technologies used)

Enumerating domains

  • Goal: find/correlate all domain names owned by a single entity
  • Types of domain correlation:

Subdomain enumeration

  • Use advanced search operators of Google & Bing:
    • site: for vertical correlation
      • Automated with tools like sublist3r
    • ip: for horizontal correlation
      • Useful if the app is on shared hosting

Using 3rd party information aggregators


declare -f find-subdomains-vt
find-subdomains-vt() {
	curl -s https://www.virustotal.com/ui/domains/$1/subdomains\?limit\=$2 | jq.data[].id

find-subdomains-vt eff.org 20


  • Handy service for recon (DNS, WHOIS, reverse WHOIS…)
  • Use emails addresses found for horizontal domain correlation

Certificate Transparency

Certificate Transparency - Side effect

  • CT logs by design contain all the certificates issued by a CA for any given domain
  • They allow attackers to passively gather a lot of information about an organization’s infrastructure: domains including internal domains, subdomains, email addresses
  • Certificate Transparency Part 3— The dark side

Searching through CT logs

Downside of CT for recon

  • CT logs are append-only => There is no way to delete existing entries
  • => Many false positives: Domain names found in CT logs may not exist anymore & thus can’t be resolved to an IP address
  • Not all subdomains found will be accessible, some are internal domains. But they would be useful for internal pentests
  • A penetration tester’s guide to subdomain enumeration

Solution: CT logs + Massdns

  • Use Massdns with CT logs script to quickly identify resolvable domain names
  • python3 ct.py example.com | ./bin/massdns -r resolvers.txt -t A -a -o -w results.txt -

Using Certspotter

  • Does vertical & horizontal domain correlation
find-cert() {

	curl -s https://certspotter.com/api/v0/certs?domain=$1 | jq -c '.[].dns_names' | grep -o '"[^"\+"';

Using Certdb.com

  • Crt.sh gets data from CT logs only where “legit” CA submit the certs to a log
  • https://certdb.com not only gets certificates from CT logs, but also scans the IPv4 address space, domains and “finds & analyzes” all the certificates
  • curl -L -sd "api_key=API-KEY&q=Organization:\"tesla\"&response_type=0

Finding vulnerable CMS using CT

  • CT logs can expose (in real time) HTTPS apps during their CMS (Wordpress, Joomla…) install process where the installer has no form of authentication
  • => If you are fast enough, you could take over the server
  • This is a known attack technique


Content Security Policy (CSP)

  • CSP HTTP headers allow devs to create a whitelist of sources of trusted content & instruct the browser to only execute or render resources from those sources
  • => It’s basically a list of domains!
  • Extract domains from CSP headers with domains-from-csp
    • python csp_parser.py https://flipkart.com
    • python csp_parser.py https://flipkart.com -r to resolve the domains

Sender Policy Framework (SPF)

  • DNS record used to indicate to receiving mail exchanges which hosts are authorized to send mail for a given domain
  • => Lists all hosts authorized to send email on behalf of a domain
  • => Exposes subdomains & IP ranges
  • dig +short TXT icann.org | grep spf
  • Extract net blocks/domains from SPF record with assets-from-spf
    • python assets_fom_spf.py reddit.com
    • python assets_fom_spf.py reddit.com --asn | jq . => returns ASN info of all the assets


  • DNS zone transfer (misconfiguration rarely found today)
    • dig AXRF @ns1.insecuredns.com totallylegit.in
  • Authenticated Denial of Existence (RFC 7129)
    • In DNS, when clients query for a non-existent domain, the server must deny its existence. It’s harder to do in DNSSEC do to cryptographic signing. It can be done using NSEC or NSEC3 records
  • => DNSSEC zone walking is the new zone transfer
  • If DNSSEC is enabled on your target & NSEC records are used you’ll get all the domains:
    • Install Ldnsutils on Kali/Debian/Ubuntu: sudo apt-get install ldnsutils
    • Try zone walking NSEC - LDNS
      • ldns-walk (part of ldnsutils) can be used to zone walk DNSSEC signed zone that uses NSEC
      • ldns-walk iana.org
      • ldns-walk @ns1insecuredns.com totallylegit.com
  • NSEC3 records are like NSEC records but provide a signed gap of hashes of domain names, to prevent zone enumeration or make it expensive
    • An attacker can still collect all the subdomain hashes & crack them offline using Nsec3walker & nsec3map
    • Example of zone walking NSEC3 protected zone :
# Detect if DNSSEC NSEC or NSEC3 is used
ldns-walk icann.org

# Collect NSEC3 hashes of a domain
$ ./collect insecuredns.com > insecuredns.com.collect

# Undo the hashing, expose the subdomain information
$ ./unhash  insecuredns.com.collect > insecuredns.com.unhash

# Check the number of successfully cracked subdomain hashes
$ cat insecuredns.com.unhash | grep "icann" | wc -l

# List only the subdomain part from the unhashed data
$ cat icann.org.unhash | grep "icann" | awk '{print $2;}'

Few things that changed with the advent of APIs/Devops

  • Storage
  • Authentication
    • API keys & token based authentication instead of username & password
  • More & more code
  • CI/CD pipelines

Cloud storage

  • Cloud storage has become popular (inexpensive & easy to setup), especially object/block storage
  • Object storage is ideal for storing static, unstructured data like audio, video, documents, images, logs & large amounts of text
  • Popular object storage services:
    • AWS S3 buckets
    • Digital Ocean Spaces

What’s the catch with object storage?

  • It’s a treasure trove of information
  • Users store anything on 3rd-party services, including passwords in plain text files, log files, backup files…

Amazon S3 buckets

Hunting for publicly accessible S3 buckets

  • Users can store Files (Objects) in a Bucket
  • Each Bucket will get a unique, predictable URL
    • Just go to the Bucket’s URL to find out if it’s a public or private Bucket
  • Each file in a Bucket will get a unique URL
  • There are Access Control Mechanisms available at both Bucket & Object level
  • Finding buckets
    • Google Dorks
      • site:s3.amazonaws.com file:pdf
      • site:s3.amazonaws.com password
    • Do a dictionary based attack (since Buckets have a predictable URL)

Digital Ocean Spaces

  • Spaces is an object storage service by DigitalOcean, similar to AWS S3 buckets
  • Spaces API aims to be interoperable with Amazon’s AWS S3 API
    • => All tools & attacks used to test AWS S3 can be used against Spaces!
  • Spaces URL pattern
    • Users can store Files in a “Space”
    • Each Space will get a unique, predictable URL
    • Each file in a Space will get a unique URL
    • Access Control Mechanisms are available at Space & file level

Hunting for publicly accessible Spaces

  • A Space is typically considered:
    • “public” if any user can list the contents of the Space
    • “private” if the Space’s contents can only be listed or written by certain users (When access, gives AccessDenied errors)
Spaces finder
  • Spaces finder is a tool to look for publicly accessible Digital Ocean Spaces using a wordlist, list all the accessible files on a public Space & download the files
  • It’s AWSBucketDump tweaked to work with DO Spaces, since Spaces API is interoperable with Amazon’s S3 API
  • python3 spaces_finder.py -l sample_spaces.txt -g interesting_keywords.txt -D -m 500000 -t 2


  • With almost every service exposing an API, keys have become critical in authenticating
  • API keys are treated as keys to the kingdom
  • For apps, API keys tend to be Achilles heel: APIs are 2FA’s Achilles Heel

Code repos for recon

  • Code repos (Github, GitLab, Bitbucket…) are a treasure trove during recon
  • They can reveal a lot: credentials, potential vulnerabilities, infrastructure details

Github for recon

  • Github has a powerful search feature with advanced operators
  • It has a well designed REST API that can be used to automate your Github recon
  • Github for Bug Bounty Hunters
Things to focus on in Github
  • Repositories

  • Code

  • Commits (Bharat’s favorite!)

  • Issues

  • Examples:

    • “Delete the private ssh” commit message
    • “Multiple XSS vulnerabilities” issue
Mass cloning on Github
  • Clone all the target organization’s repos & analyze them locally
  • Use GithubCloner
    • python githubcloner.py --org organization -o /tmp/output
Static code analysis
  • Try to understand the code, language used & architecture
  • Look for keywords & patterns:
    • API and key (To get more endpoints & find API keys)
    • token
    • secret
    • vulnerable
    • http://
  • Tools for finding secrets
Github Dorks

Passive recon using public datasets

  • Various projects gather Internet wide scan data & make it public, including: port scans, DNS data, SSL/TLS cert data, data breach dumps
  • Find your needle in the haystack

Why use public data sets for recon?

  • To reduce dependency on 3rd party APIs & services (be paranoid!)
  • To reduce active probing of target infrastructure
  • More the sources better the coverage (better understanding of the history of the organization & how its infrastructure evolved)
  • Build your own recon platforms

Data sources

Censys.ioTCP, TLS, HTTP, HTTPS scan dataFREE (non-commercial)
CZDSDNS zone files for “new” global TLDsFREE
ARINAmerican IP registry information (ASN, Org, Net, Poc)FREE
CAIDA PFX2AS IPv4Daily snapshots of ASN to IPv6 mappingsFREE
US GovDaily snapshots of ASN to IPv6 mappingsFREE
US GovUS government domain namesFREE
UK GovUK government domain namesFREE
RIR DelegationsRegional IP allocationsFREE
PremiumDropsDNS zone files for com/net/info/org/biz/xxx/sk/us TLDs$24.95/mo
WhoisXMLAPI.comNew domain whois data$24.95/mo

Source: https://github.com/hdm/inetdata

Rapid7 Forward DNS dataset
  • FDNS is a massive dataset, 20+ GB compressed & 300+GB uncompressed
  • Rapid7 publishes it on scans.io
  • It aims to discover all domains found on the Internet
  • There is also RDNS dataset
Hunting subdomain in FDNS dataset
  • The data format is a gzip-compressed JSON file
  • Use jq to extract subdomains of a specific domain:
    • curl --silent -L https://scans.io/data/rapid7/sonar.fdns_v2/20170417-fdns.json.gz | pigz -dc | head -n 10 | jq .
    • cat 20170417-fdns.json.gz | pigz -dc | grep "\.example\.com"
    • https://sonar.labs.rapid7.com
  • Analyzing the results:
# Extract subdomain names for a given domain from FDNS data
cat 20170417-fdns.json.gz | pigz -dc | grep "\.example\.com" | jq .name > example.com.domains.fdns

# Display first 15 subdomains from all the unique subdomains gathered
cat example.com.domains.fdns | grep "\.example\.com" | uniq | head -n 15

# Total number of unique subdomains enumerated
cat example.com.domains.fdns | grep "\.example\.com" | uniq | wc -l
  • These methods are slow. See HD Moore’s talk on how to do faster lookups


ToolNumber of subdomainsWhat the tool finds
Sublist3r278Apps indexed by search engines
CT logs (crt.sh)46Domains that have SSL/TLS certs on CT logs
Zone walking NSEC3182Depends on your computation power. Can’t find domains named very unconventionnally
FDNS dataset3681Doesn’t have these restrictions (app being hosted or having an SSL/TLS cert…)


See you next time!