soupから属性の値を抽出する

2018年9月16日日曜日

BeautifulSoupで得たhtml的なものから、<a>の中の属性hrefの中身(URL)を取り出したいとき。

soup = = BeautifulSoup(html_doc, 'html.parser')
Out:
<h3>hogehoge</h3>
<a href="http://aaa.html">AAA</a><br/>
<a href="http://bbb.html">BBB</a><br/>

...
ここからAAA、BBBのアドレスが欲しいときは、

soup.find_all('a')
for s in soup.find_all('a'):
    print(s.text) # AAA, BBB
    print(s.attrs['href']) # http://aaa.html, http://bbb.html

soup.find_all('a')リストの中身sの属性であることに注意です(forループで取り出している)

shimo lab2