크롤링 (Beautiful Soup 라이브러리)

Python

크롤링 (Beautiful Soup 라이브러리) - 태그의 속성

usop 2023. 2. 1. 11:53

Beautiful Soup를 사용하여 속성들의 정보를 읽어 들이고 수정해보자.

###################### fruits.html 먼저 작성하기 ######################

<!doctype html>
<html lang="utf-8">
 <head>
  <meta charset="UTF-8">
  <meta name="Generator" content="EditPlus®">
  <meta name="Author" content="">
  <meta name="Keywords" content="">
  <meta name="Description" content="">
  <title>Document</title>
 </head>
 <body>
	<p class="ptag red" align="center">사과</p>
	<p class="ptag yellow" align="center">참외</p>
	<p class="ptag blue" align="center">블루베리</p>
	<div id="container">
		<p class="hard">과일</p>
	</div>
 </body>
</html>

#################################################################

from bs4 import BeautifulSoup

html = open("fruits.html", "r", encoding="utf-8")
soup = BeautifulSoup(html, "html.parser")
body = soup.select_one("body")
ptag = body.find('p')
print('1번째 p태그 : ', ptag['class'])

ptag['class'][1] = 'white'

# red가 white로 바뀐다.
print('1번째 p태그 : ', ptag['class'])

ptag['id'] = 'apple'
print('1번째 p태그 id의 속성: ', ptag['id'])

#soup.find

body_tag = soup.find('body')
print(body_tag)

#children

# 태그의 속성 다루기

idx = 0
print('children 속성으로 하위 항목 보기')
print('white character 문자까지 포함됨')
for child in body_tag.children:
    idx +=1
    print(str(idx) + '번째 요소 : ', child)

#parent - find()

mydiv = soup.find("div")
print(mydiv)

print('div 태그의 부모 태그는?')
print(mydiv.parent)

mytag = soup.find("p", attrs={'class':'hard'})
print(mytag)

print('mytag 태그의 부모 태그는?')
print(mytag.find_parent())

print('mytag 태그의 모든 상위 부모 태그들의 이름')
parents = mytag.find_parents()
for p in parents :
    print(p.name)