Posted in Podcast on April 1, 2019
The Bug Hunter Podcast Ep. 2: Wayback Machine & Reading ebooks on the move
Posted in Podcast on March 1, 2019
Hi, here’s a new episode of the Bug Hunter podcast!
Apple podcasts (iTunes) is in the works. And if your favorite podcasting app is missing from this list, please let me know so I can add it.
Also, if you prefer written text, you’ll find the whole transcript below. It’s also helpful for finding all links or commands mentioned in the audio.
Hey hackers! This is the Bug Hunter podcast by Pentesterland. The podcast for pentesters & bug bounty hunters.
We tackle technical questions & inspirational topics to help you develop both a hacker skillset & mindset.
Welcome to this podcast number 2! I’m you host, Mariem. And the title of this episode is: “Wayback Machine & Reading ebooks on the move”.
I have two segments for you today:
- Q&A: on the Wayback Machine
- Productivity hack: on reading and annotating PDF books on the go
As usual, we have one technical segment and one on living a happier more productive life.
Also, the transcript for this episode is on pentester.land/categories/podcast/. Under episode n°2, you’ll find this show’s transcript, including all the links, tools and command lines I mention.
Today’s question comes from a public Tweet. It was asked by @PrincessYadhavi who said:
Everyone is telling download js from wayback macine. but idk how. How to download/extract old js files from wayback machine?
I think it’s a great question because the Wayback Machine is a great source of information and should absolutely be used. But I know from personal experience that if you’re not used to it and if you don’t have the right tools to query it, it can seem quite daunting.
So I am going to tell you everything you need to know about it to start using it efficiently as quickly as possible. I will explain what the Wayback Machine is, why pentesters and bug hunters use it, how to use it manually, and which tools you can leverage to query it automatically.
The Wayback Machine is an Internet archive, located at http://archive.org/web/. It’s a collection of more than 349 billion snapshots of web pages saved over time.
For bug hunters and pentesters, it falls under the category of online passive reconnaissance tools.
They use it to access old versions of websites. The reason why is that sometimes, the current version of a site is relatively secure but the snapshots of older versions reveal interesting information or bugs. So the Wayback Machine is like a time machine which allows you to go back in time, see what the site used to look like, and get more information than what is available on the current version.
This can be things like:
- old forgotten endpoints
- interesting JS files
- sensitive information
- vulnerabilities that do not exist anymore on the site, like URLs which were vulnerable to directory listing and reveal interesting files
Many bug hunters confirm this. Here’s what Jason Haddix shared on Twitter. He said:
#BountyProTip: found a 401/403, basic auth, or domain that seems interesting but is somehow locked down? Look at its http://archive.org/web/ entries. Sometimes you win instantly with API keys or URL structure that you can forcefully browse to unprotected content still there.
But the Wayback Machine doesn’t record everything. It just takes a snapshot of a site from time to time.
If you want to play with it manually, go to http://archive.org/web/ and enter a URL, let’s say https://uber.com. It looks like a calendar. Every time you see a colored circle around a date, it means that a snapshot was taken that day. And if you click on it, you can see what the page looked like at that time.
This is how you manually browse older versions of a site. It’s a great tool, but it’s not very practical when you are testing dozens of subdomains and you need to quickly find every JS file or URL mentioned in all historical versions of every subdomain.
For this purpose, it’s better to use tools. Some of the best that I’ve tried are:
- Waybackurls by Tomnomnom
- ReconCat by Dawood Ikhlaq
- Curate by EdOverflow
- Waybackunifier by Mohammed Diaa
- Waybackrobots.py by Mohammed Diaa
- Waybackurls.py by Mohammed Diaa
These tools I just mentioned don’t do the same thing, they have different purposes and types of output. They don’t all apply to the retrieval of JS files. But since we’re learning about the Wayback Machine, we might as well learn what they do in case we need them in another context.
Let’s dig a little bit deeper into what each tool does.
FIY, I’m not going to detail all the options and commands for each tool. It wouldn’t be very practical to do on audio.
But check out the show’s transcript on pentester.land/categories/podcast/. It contains the baseline command to use for each tool. And if you need a cheat sheet with more details, just send me a DM or an email and I would share one on the site.
The first tool is Waybackunifier. It scans snapshots of the URL you give it. Then it aggregates all its previous versions and returns a unified file which contains all unique lines ever included in that page.
So basically Waybackunifier creates a single file which contains everything that the URL ever contained, merged together.
waybackunifier -url example.com
ReconCat returns all the URLs of snapshots available. It’s not their contents, just the URLs.
The output is inside a folder named after the domain you entered. It contains one file for each year and inside is the list of available snapshots for that year.
php recon --url=https://example.com --year=all
Waybackurls returns a list of all the URLs that the Wayback Machine knows about for a domain.
Curate queries multiple tools including the Wayback Machine. It returns a list of URLs found on your target domain using those tools.
It also has an option to search for any keywords you want. This is useful for detecting sensitive information like passwords and API keys, or new endpoints.
Waybackurls.py returns a JSON file containing all the URLs on your target domain found by querying the Wayback Machine.
python waybackurls.py example.com
Waybackrobots.py only returns robots.txt files found on your target domain and snapshotted by the Wayback Machine.
python waybackrobots.py example.com
So finally to answer @PrincessYadhavi’s question, here is how you programmatically get JS files from the Wayback Machine.
Use the tools I mentioned which return a list of URLs on your domain. You can either play with them and see what works best for you. Or you can always use them all and aggregate their results.
So use the tools to get all URLs known for a domain, and then grep for JS files.
Personnally, I like Waybackurls best. The exact command I use is:
waybackurls https://freight.uber.com | grep ".js$" | uniq | sort
Let’s break this down. This oneliner has 4 commands separated by a “pipe” character:
- The first one is waybackurls followed by the target URL
- The second command is grep “.js$”. This gets the results of the waybackurls query and will only print the lines which end with .js. The
$at the end is to make sure the URLs not only contain .js but end with it, because otherwise you would also get .json files.
- The last two commands are uniq and sort. Their purpose is to sort the results and remove any doubles.
So once again, the command is:
waybackurls https://freight.uber.com | grep ".js$" | uniq | sort.
Now that you have a list of JS files from the Wayback Machine, you need to make sure they still exist on the target, and analyze them.
That’s it for today’s Q&A. I hope it helps. And if you have any question or if you have trouble with anything bug hunting or pentesting related, you know what to do!
Some people love reading books, paper physical books. They love their smell, feeling the paper, turning the pages. I used to be one of them. I had a library. But the books were scattered around my parents’ house, my house and I had to give away so many of them because I used to move a lot.
Today, I don’t have a physical library anymore. Since I’ve discovered ebooks and audiobooks, there is no turning back. The advantages are just so huge:
- Digital books are generally cheaper. So you can save money.
- You don’t need to carry them with you every time you move, travel, commute or go to a park. Books will be on your phone or tablet which you have with you all the time anyway. There’s no additional weight or risk of forgetting them.
- Ebooks can have a longer lifetime than physical books, especially if you do external backups of your digital library.
- And last but not least, you can search for keywords, annotate digital books, add comments, highlight sections, and even draw or write anything. And it’s not irreversible like with a paper book. The ability to change and erase annotations without wearing out the book is fantastic.
That said, when I made the switch, I tried several apps that didn’t really work for me. So I want to share with you what works in case it helps you or gives you new ideas to try.
What do you use when you want to read a PDF book on your laptop, take annotations, then when you’re on the move, continue reading and annotating the exact same PDF on your phone or tablet?
I had this need when I was working a corporate job, and had a lot of time to kill in commute. It was a great opportunity to read books on Web security and penetration testing… except that I have a terrible memory. I can’t remember anything if I don’t take annotations.
And sometimes there are entire passages that I like. I don’t want to clumsily copy paste them to a note taking app on my small mobile screen! I want to be able to highlight them directly on the book, using my laptop or mobile interchangeably. Also, it had to be available on iOS and Android because at the time, I had both.
That was my need. And here is the best solution that I came up with, which answered all these requirements: Xodo. To find it, go to xodo.com.
It’s an app available on iOS, Android, Windows phone, also as a Windows app, a Web app and a Chrome extension. It’s free.
You can upload PDF files from your laptop, or import them from Dropbox and Google Drive.
Once they’ve been uploaded to your Xodo account, you can start editing them. Since changes are synchronized with a remote server, you can continue reading, editing and annotating the same files from any device.
The only downside of Xodo is that it only supports PDF. So if I have an ebook in another format like .epub, I just convert it to PDF.
In addition to Xodo, I generally also use a note taking app like Google Keep or Evernote. It’s for taking notes in the form of bullet points.
That’s my productivity hack for today. Use technology to take advantage of the free time that you have in the day to read and learn. You don’t need to wait until you’re sitting in front of your computer or until you have the whole day free.
If you have any other tip or preferred way to read and annotate PDF books, I’d love to know. Please share it by commenting on this Twitter post so that everyone can benefit from it.
As a bonus, I have a short and sweet programmer joke:
A programmer was arrested for writing unreadable code He refused to comment
That’s it for today guys!
Thanks for listening to The Bug Hunter podcast by PentesterLand. If you like what you just heard, please share with your friends and colleagues, like, subscribe and comment.
See you next time! Keep on hacking!
If you enjoyed reading this, please consider sharing it, leaving a comment, suggestions, questions…