The Bug Hunter Podcast Ep. 2: Wayback Machine & Reading ebooks on the move

Posted in Podcast on March 1, 2019

Transcript

Hey hackers! This is the Bug Hunter podcast by Pentesterland. The podcast for pentesters & bug bounty hunters.
We tackle technical questions & inspirational topics to help you develop both a hacker skillset & mindset.

Welcome to this podcast number 2! I’m you host, Mariem. And the title of this episode is: “Wayback Machine & Reading ebooks on the move”.

I have two segments for you today:

Q&A: on the Wayback Machine
Productivity hack: on reading and annotating PDF books on the go

As usual, we have one technical segment and one on living a happier more productive life.

Also, the transcript for this episode is on pentester.land/categories/podcast/. Under episode n°2, you’ll find this show’s transcript, including all the links, tools and command lines I mention.

Q&A segment

Today’s question comes from a public Tweet. It was asked by @PrincessYadhavi who said:

Everyone is telling download js from wayback macine. but idk how. How to download/extract old js files from wayback machine?

I think it’s a great question because the Wayback Machine is a great source of information and should absolutely be used. But I know from personal experience that if you’re not used to it and if you don’t have the right tools to query it, it can seem quite daunting.

So I am going to tell you everything you need to know about it to start using it efficiently as quickly as possible. I will explain what the Wayback Machine is, why pentesters and bug hunters use it, how to use it manually, and which tools you can leverage to query it automatically.

The Wayback Machine is an Internet archive, located at http://archive.org/web/. It’s a collection of more than 349 billion snapshots of web pages saved over time.

For bug hunters and pentesters, it falls under the category of online passive reconnaissance tools.

They use it to access old versions of websites. The reason why is that sometimes, the current version of a site is relatively secure but the snapshots of older versions reveal interesting information or bugs. So the Wayback Machine is like a time machine which allows you to go back in time, see what the site used to look like, and get more information than what is available on the current version.

This can be things like:

old forgotten endpoints
interesting JS files
sensitive information
vulnerabilities that do not exist anymore on the site, like URLs which were vulnerable to directory listing and reveal interesting files

Many bug hunters confirm this. Here’s what Jason Haddix shared on Twitter. He said:

#BountyProTip: found a 401/403, basic auth, or domain that seems interesting but is somehow locked down? Look at its http://archive.org/web/ entries. Sometimes you win instantly with API keys or URL structure that you can forcefully browse to unprotected content still there.

But the Wayback Machine doesn’t record everything. It just takes a snapshot of a site from time to time.
If you want to play with it manually, go to http://archive.org/web/ and enter a URL, let’s say https://uber.com. It looks like a calendar. Every time you see a colored circle around a date, it means that a snapshot was taken that day. And if you click on it, you can see what the page looked like at that time.

This is how you manually browse older versions of a site. It’s a great tool, but it’s not very practical when you are testing dozens of subdomains and you need to quickly find every JS file or URL mentioned in all historical versions of every subdomain.

For this purpose, it’s better to use tools. Some of the best that I’ve tried are:

These tools I just mentioned don’t do the same thing, they have different purposes and types of output. They don’t all apply to the retrieval of JS files. But since we’re learning about the Wayback Machine, we might as well learn what they do in case we need them in another context.

Let’s dig a little bit deeper into what each tool does.

FIY, I’m not going to detail all the options and commands for each tool. It wouldn’t be very practical to do on audio.
But check out the show’s transcript on pentester.land/categories/podcast/. It contains the baseline command to use for each tool. And if you need a cheat sheet with more details, just send me a DM or an email and I would share one on the site.

Waybackunifier

The first tool is Waybackunifier. It scans snapshots of the URL you give it. Then it aggregates all its previous versions and returns a unified file which contains all unique lines ever included in that page.
So basically Waybackunifier creates a single file which contains everything that the URL ever contained, merged together.

Usage:waybackunifier -url example.com

ReconCat

ReconCat returns all the URLs of snapshots available. It’s not their contents, just the URLs.

The output is inside a folder named after the domain you entered. It contains one file for each year and inside is the list of available snapshots for that year.

Usage: php recon --url=https://example.com --year=all

Waybackurls

Waybackurls returns a list of all the URLs that the Wayback Machine knows about for a domain.

Usage: waybackurls https://example.com

Curate

Curate queries multiple tools including the Wayback Machine. It returns a list of URLs found on your target domain using those tools.

It also has an option to search for any keywords you want. This is useful for detecting sensitive information like passwords and API keys, or new endpoints.

Usage: curate https://example.com

Waybackurls.py

Waybackurls.py returns a JSON file containing all the URLs on your target domain found by querying the Wayback Machine.

Usage: python waybackurls.py example.com

Waybackrobots.py

Waybackrobots.py only returns robots.txt files found on your target domain and snapshotted by the Wayback Machine.

Usage: python waybackrobots.py example.com

Getting JavaScript files from the Wayback Machine

So finally to answer @PrincessYadhavi’s question, here is how you programmatically get JS files from the Wayback Machine.
Use the tools I mentioned which return a list of URLs on your domain. You can either play with them and see what works best for you. Or you can always use them all and aggregate their results.

So use the tools to get all URLs known for a domain, and then grep for JS files.

Personnally, I like Waybackurls best. The exact command I use is: waybackurls https://freight.uber.com | grep ".js$" | uniq | sort

Let’s break this down. This oneliner has 4 commands separated by a “pipe” character:

The first one is waybackurls followed by the target URL
The second command is grep “.js$”. This gets the results of the waybackurls query and will only print the lines which end with .js. The $ at the end is to make sure the URLs not only contain .js but end with it, because otherwise you would also get .json files.
The last two commands are uniq and sort. Their purpose is to sort the results and remove any doubles.

So once again, the command is: waybackurls https://freight.uber.com | grep ".js$" | uniq | sort.

Now that you have a list of JS files from the Wayback Machine, you need to make sure they still exist on the target, and analyze them.
For that, I refer you to this excellent article: Static Analysis of Client-Side JavaScript for pen testers and bug bounty hunters. You’ll find the link in the show’s transcript.

That’s it for today’s Q&A. I hope it helps. And if you have any question or if you have trouble with anything bug hunting or pentesting related, you know what to do!

Productivity hack

Some people love reading books, paper physical books. They love their smell, feeling the paper, turning the pages. I used to be one of them. I had a library. But the books were scattered around my parents’ house, my house and I had to give away so many of them because I used to move a lot.

Today, I don’t have a physical library anymore. Since I’ve discovered ebooks and audiobooks, there is no turning back. The advantages are just so huge:

Digital books are generally cheaper. So you can save money.
You don’t need to carry them with you every time you move, travel, commute or go to a park. Books will be on your phone or tablet which you have with you all the time anyway. There’s no additional weight or risk of forgetting them.
Ebooks can have a longer lifetime than physical books, especially if you do external backups of your digital library.
And last but not least, you can search for keywords, annotate digital books, add comments, highlight sections, and even draw or write anything. And it’s not irreversible like with a paper book. The ability to change and erase annotations without wearing out the book is fantastic.

That said, when I made the switch, I tried several apps that didn’t really work for me. So I want to share with you what works in case it helps you or gives you new ideas to try.

What do you use when you want to read a PDF book on your laptop, take annotations, then when you’re on the move, continue reading and annotating the exact same PDF on your phone or tablet?

I had this need when I was working a corporate job, and had a lot of time to kill in commute. It was a great opportunity to read books on Web security and penetration testing… except that I have a terrible memory. I can’t remember anything if I don’t take annotations.
And sometimes there are entire passages that I like. I don’t want to clumsily copy paste them to a note taking app on my small mobile screen! I want to be able to highlight them directly on the book, using my laptop or mobile interchangeably. Also, it had to be available on iOS and Android because at the time, I had both.

That was my need. And here is the best solution that I came up with, which answered all these requirements: Xodo. To find it, go to xodo.com.

It’s an app available on iOS, Android, Windows phone, also as a Windows app, a Web app and a Chrome extension. It’s free.

You can upload PDF files from your laptop, or import them from Dropbox and Google Drive.
Once they’ve been uploaded to your Xodo account, you can start editing them. Since changes are synchronized with a remote server, you can continue reading, editing and annotating the same files from any device.

The only downside of Xodo is that it only supports PDF. So if I have an ebook in another format like .epub, I just convert it to PDF.

In addition to Xodo, I generally also use a note taking app like Google Keep or Evernote. It’s for taking notes in the form of bullet points.

That’s my productivity hack for today. Use technology to take advantage of the free time that you have in the day to read and learn. You don’t need to wait until you’re sitting in front of your computer or until you have the whole day free.

If you have any other tip or preferred way to read and annotate PDF books, I’d love to know. Please share it by commenting on this Twitter post so that everyone can benefit from it.

Bonus segment

As a bonus, I have a short and sweet programmer joke:

A programmer was arrested for writing unreadable code He refused to comment

Conclusion

That’s it for today guys!

Thanks for listening to The Bug Hunter podcast by PentesterLand. If you like what you just heard, please share with your friends and colleagues, like, subscribe and comment.

Also, send your questions and suggestions by DM on Twitter at twitter.com/pentesterland or send us an email to [email protected].

See you next time! Keep on hacking!

If you enjoyed reading this, please consider sharing it, leaving a comment, suggestions, questions…

Podcast Web hacking Recon Non technical

The Bug Hunter Podcast Ep. 2: Wayback Machine & Reading ebooks on the move

Transcript #

Q&A segment #

Waybackunifier #

ReconCat #

Waybackurls #

Curate #

Waybackurls.py #

Waybackrobots.py #

Getting JavaScript files from the Wayback Machine #

Productivity hack #

Bonus segment #

Conclusion #

Related posts

The Bug Hunter Podcast 5: Recon workflow & Out of the box thinking in day-to-day life

The Bug Hunter Podcast 3: Nmap outputs & motivation vs inspiration

The Bug Hunter Podcast Ep. 1: Hacker mindset & Network pentest