August 14, 2017 GK Bryn

Get a list of URLs Indexed in Google

So, sometimes you just need to get a list of URLs Indexed in Google for a specific site – could be a million reasons for that…me, I needed to get a list of indexed pages to set up some 301 redirects to a new domain for a photographer.

In my experience, Photographers like a blog or two, so I knew the the page count could be in the hundreds – a quick check on indexed pages using ‘site:someradsite.com’ in the google search box returned just over 100 results. Score! Still…definitly an couple of hours I wouldn’t get back was i to manually grab each one to get a list of indexed urls…nobody got time for that crap!

So mad props to Matt Porter at www.mathewporter.co.uk for keeping an eye on ways to get this done and delivering the low-down for me in blog-gold form.

Get a list of URLs Indexed in Google

Google Drive is the answer(to so many things recently too…for me) more specifically, Google Sheets for this – that handy spreadsheet ‘ting.

Open a new one, and in the A1 column insert this bit of magic code, swapping out www.someradsite.com for your required domain :

=importXml("https://www.google.com/search?q=site:www.someradsite.com&num=100&start=1"; "//cite")

NOTE A: Make sure the quotes are the straight ones – not those curly quote suckers – that shit won’t work.

unsanitised

This will populate the first column with the first hundred indexed pages in google…Noice.
Need more though? Fair enough. Bang the same code it in again on line 101, swapping out start=1 for start=100

=importXml("https://www.google.com/search?q=site:www.someradsite.com&num=100&start=100"; "//cite")

This of course, will append the next 100 indexed URLs to the spreadsheet. Rinse and repeat for more – I’m sure you get the gist here. You can do up to 1000 URLs with this method.

But they come in dirty don’t they? Ooooh..filthy looking URLs with query strings all up in the blend. We need to sanitize that shit, and sanitize it now!

Word! This next bit of magic code – it goes in cell 1B of the spredsheet 2- stick it in and you’ll see it sanitize the first URL and give you a nice clean one to play with. Sweet.

=mid(A1,search("?q=",A1)+3,search("&sa=",A1)-(search("?q=",A1)+3))

NOTE B: See Note A

sanitised

You’ll also see a little blue bordered highlight box surrounding the URL – grab the little square block handle and drag it down the column to the length of your dirty list of URLs.

You should now have a clean list of your required URLs for use in…well, whatever you wanted it for in the first place.

Now you know how to get a list of URLs that are indexed in google quick and fast.
You’re very welcome.