dorking (how to find anything on the Internet)

tl;dr: Use advanced Google Search to find any webpage, emails, info, or secrets

cost: $0

time: 2 minutes


Software engineers have long joked about how much of their job is simply Googling things

Now you can do the same, but for free

Below, I'll cover dorking, the use of search engines to find very specific data

For each example, you can paste it directly into Google to see the result

table of contents:


webpages

Inspired by this Twitter exchange with Gumroad CEO Sahil Lavingia, the next few examples will cover Gumroad and Sahil.

find specific pages within a website (ex: for DynamoDB e-books)

site:gumroad.com dynamodb

find specific pages that must include a phrase in the Title text

allintitle:"support this" site:gumroad.com

find similar sites (Google only)

related:gumroad.com

you can chain operators together (ex: looking for bug bounties with either security or bug-bounty in the URL)

(inurl:security OR inurl:bug-bounty OR site:hackerone.com) + "gumroad"

you can restrict to certain top-level domains (ex: lists of teachers)

site:.edu filetype:xls inurl:"email.xls"


emails

find Gmail accounts

alec barrett-wilsdon "@gmail.com"

find work accounts (you'll need to find their domain first)

alec "@contextify.io"

not finding what you're looking for with either of those? Try to guess the format of the email (try going to this site, search the domain, and click Identified Name Formats))

"abarrett.wilsdon@"

you can always find every page with emails on it (and then use the next snippet below)

site:alec.fyi intext:"@"

find every email on a web page that you're on - inject it into a site with Chrome DevTools (more here)

var elems = document.body.getElementsByTagName("*");
var re = new RegExp("(^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$)");
for (var i = 0; i < elems.length; i++) {
    if (re.test(elems[i].innerHTML)) {
        console.log(elems[i].innerHTML);
    }
}

this will log every email found, without you having to scan through the whole page.

finally, you can validate if an email that you found or guessed is real by hovering over it in a new compose window in Gmail:

if you look carefully, you'll notice the chat and video call options are greyed out on an invalid email.


files

find spreadsheets

filetype:csv OR filetype:xlsx OR filetype:xls OR filetype:xltx OR filetype:xlt OR inurl:airtable.com/universe/

find Google Docs and Google Sheets

site:docs.google.com "gumroad"

find where your competitor's logo is (ex: partners or customers' websites)

"Gumroad Logo.png"

find your competitors' sales pitches and whitepapers

site:intercom.com (filetype:pdf OR filetype:ppt)

find case studies written about competitors

inurl:hubspot-case-study -site:http://hubspot.com


SEO

find sites with specific keywords in the anchor text

inanchor:"cyber security"

research blog posts with specific keywords in their title

inposttitle:"diy slime"

find backlinks (ex: other sites that link to a particular blog post). note: the link operator is now deprecated

intext:intercom.com/intercom-api-reference/reference

find keyword permutations with the wildcard operator

* design tools

find companies using a given widget

intext:"Powered by Intercom" -site:intercom.com


coupons!

search the site itself for codes

site:curology.com ("coupon" | "referral code" | "affiliate code" | "discount code" | "VIP")

next, try twitter

site:twitter.com + "meundies" + ("coupon" | "referral code" | "affiliate code" | "discount code" | "VIP")

next, try Mailchimp emails

site:campaign-archive.com + "blueapron" + ("coupon" | "referral code" | "affiliate code" | "discount code" | "VIP")


secrets

cybersecurity experts use dorking, as one tool among many, to find potential vulnerabilities in a company. I will not be covering any such queries, out of concern for their potential for misuse.


operator review

operators are components of a search query that narrow the results down. You can combine as many as you want in one query. The most useful ones you'll want to know are:

operator description
"phrase" results must include "phrase"
-phrase exclude results with phrase
phrase1 AND phrase2     phrase1 and phrase2 must both be included
phrase1 OR phrase2 one of phrase1 and phrase2 must be included (or both)
site:example.com results must be on domain example.com
filetype:jpg results must be of type .jpg

AND/OR logic can be used to combine distinct queries

"phrase1" OR "phrase 2" AND "phrase3"
# equivalent to these two searches
>> "phrase1"
>> "phrase 2" AND "phrase3"


Thank you to Tejas, Chris, Ian, and Brandon for contributing edits!

Thanks for reading. Questions or comments? 👉🏻 alec@contextify.io