Xpath in Python using lxml

def extract_html(xpath, html):
# This method takes xpath and html content as input 
# and returns a list of tags and corresponding content seperated by ':::'
	result=[]
	from lxml import etree
	tree = etree.HTML(html)
	r = tree.xpath(xpath)
	for x in r:
		pattern = x.tag+":::"+x.text
		result.append(pattern)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: