To check if a page has Bazaar Voice / HReview / Power Review Structure

import os
count = 0
f1=open("shortlisted","a+")
for file in os.listdir('.'):
    f=open(file)
    content = f.read().lower()
    f.close()
    if(content.find('hreview')>-1 or content.find('prreviewwrap')>-1 or content.find('pr-review-wrap')>-1 or content.find('bvstandalonereviewsectionreview')>-1 or content.find('bvrrcontentreview')>-1 ):
	
	text = str("mv crawler_6DEC/"+file+" shortlisted/"+file)+chr(10)
	print text
	f1.write(text)	
	f1.flush()
	count+=1

print count
f1.close() 	
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: