In this blog post we will see how we can create a custom wordlist from a blog using some command line fu and the WordPress REST API.

If there is one trick I can give you when bug hunting, it is to build your own wordlist.

Words are potent weapons for all causes, good or bad. - Manly Hall

If everyone is using the same tools with the same wordlist then it is just a matter of who is going to be the first to run the tool against the target.

Whereas if you spend time creating a wordlist specifically for your needs you will have a better chance of finding something that the others might not have found already.

Finding the right words

You will need to to some use some classic recon skills, company websites and engineering blogs are both good candidates but there are not the only ones, Github and Linkedin can also be a trove of information. Just be creative ;)

In this blog post I will focus on extracting word of WordPress blog. We will look at the Uber Engineering blog since it is full of the names of Uber projects and technologies they use.

Gotta Catch ‘Em All

Usually when I need to do something like this my go-to tool is CeWL by @digininja.

CeWL is a ruby app which spiders a given URL to a specified depth, optionally following external links, and returns a list of words which can then be used for password crackers such as John the Ripper.

CeWL is easy to use:

cewl --depth 2 --min_word_length 3 --write wordlist.txt

The advantage is that it should work with every website but the downside is that it is not the most efficient and the you do not have much control over what gets downloaded. There must must an easier way to do this…

WPScan to the rescue ?

When doing some recon using WPScan, I noticed an interesting header:

[+] Interesting entry from robots.txt:
[+] Interesting header: KEEP-ALIVE: timeout=20
[+] Interesting header: LINK: <>; rel=""
[+] Interesting header: SERVER: nginx
[+] Interesting header: X-CACHE: HIT: 2

It turns out that WordPress has a REST API enabled by default since the version 4.7.

These endpoints provide machine-readable external access to your WordPress site with a clear, standards-driven interface, allowing new and innovative apps for interacting with your site.

Using this endpoints it is possible to retrieve posts, comments, tags and categories easily. Sounds like something we could use !

Command line fu time !

The API is pretty easy to use so let just dive into it. My command line HTTP client of choice is HTTPie so that is what we will be using to fetch the posts:

http GET "" | \
jq '.[] | .title.rendered, .excerpt.rendered, .content.rendered'

Here we are using jq to parse only the fields that are interesting. You will need to use the pagination parameters to make sure you have retrieved everything.

Now it is time to copy read through obscure documentation and copy/paste StackOverflow answers until you get a clean list.

After some time (it can take a while depending on you command line mastery level) you should end up with something that look like this:

http GET "" | \
jq '.[] | .title.rendered, .excerpt.rendered, .content.rendered' | \
awk '{gsub("<[^>]*>", "")}1' | \
tr " " "\n" | \
perl -pe 's/\\n/\n/g' | \
tr '[:upper:]' '[:lower:]' | \
tr '[:punct:]' "\n" | \
tr -s '\n' '\n' | \
tr -cd '\11\12\15\40-\176' | \
sort | \
uniq -u >> uberengblog.txt

Does it work ?

Using this simple trick I was able to identify 18 Amazon S3 buckets probably belonging to Uber so that is already a win I would say.

Magic words: application, dashboard, data, dev, eats, mobile, spark, analysis, support, 01, bucket, chargebacks, cors, packer, perks, sandbox, screenshots, watchdog

If you have more tricks like this let me know :)

Still there ? Do you want more ? You should probably learn how to use Commonspeak and start creating content discovery wordlist using BigQuery, that’s some next level shit ;)

PS: If someone at Uber is reading this you should probably update your plugins:

[!] Title: Yoast SEO <= 5.7.1 - Unauthenticated Cross-Site Scripting (XSS)
[i] Fixed in: 5.8

Aloïs Thévenot

Jack of all trades, master of none. I {tweet|blog} about #technology, #security #startup #infosec #bugbounty