Scraping Xml Element Attributes With Beautifulsoup

December 01, 2023 Post a Comment

I have the following code: from urllib.request import urlopen from bs4 import BeautifulSoup html = urlopen('https://api.stlouisfed.org/fred/...') bsObj = BeautifulSoup(html.read(),

Solution 1:

Find all the attribute tags and just extract the attributes you want:

x = """<?xml version="1.0" encoding="utf-8" ?><html><body><observationscount="276"file_type="xml"limit="100000"observation_end="9999-12-31"observation_start="1776-07-04"offset="0"order_by="observation_date"output_type="1"realtime_end="2016-06-22"realtime_start="2016-06-22"sort_order="asc"units="lin"><observationdate="1947-04-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="-0.4"></observation><observationdate="1947-07-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="-0.4"></observation><observationdate="1947-10-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="6.4"></observation><observationdate="1948-01-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="6"></observation><observationdate="1948-04-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="6.7"></observation><observationdate="1948-07-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="2.3"></observation><observationdate="1948-10-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="0.4"></observation><observationdate="1949-01-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="-5.4"></observation><observationdate="1949-04-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="-1.3"></observation><observationdate="1949-07-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="4.5"></observation><observationdate="1949-10-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="-3.5"></observation><observationdate="1950-01-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="16.9"></observation><observationdate="1950-04-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="12.7"></observation><observationdate="1950-07-01"realtime_end="2016-06-22"realtime_start="2016-06-22"value="16.3"></observation></observations></body></html>"""

from bs4 import BeautifulSoup

soup = BeautifulSoup(x,"lxml")

for ob in soup.find_all("observation"):
    print(ob["date"])
    print(ob["value"])

Which will give you:

1947-04-01-0.41947-07-01-0.41947-10-016.41948-01-0161948-04-016.71948-07-012.31948-10-010.41949-01-01-5.41949-04-01-1.31949-07-014.51949-10-01-3.51950-01-0116.91950-04-0112.71950-07-0116.3

To write to a csv:

from bs4 import BeautifulSoup
import csv

soup = BeautifulSoup(x, "lxml")
withopen("out.csv", "w") as f:
    csv.writer(f).writerows((ob["date"], ob["value"])
                            for ob in soup.find_all("observation"))

Which gives you a csv file with:

Baca Juga

1947-04-01,-0.41947-07-01,-0.41947-10-01,6.41948-01-01,61948-04-01,6.71948-07-01,2.31948-10-01,0.41949-01-01,-5.41949-04-01,-1.31949-07-01,4.51949-10-01,-3.51950-01-01,16.91950-04-01,12.71950-07-01,16.3

Solution 2:

The above works great! If you want to use URL instead of local file the code will look like this:

from bs4 import BeautifulSoup
from urllib.request import urlopen
html = urlopen("https://api.stlouisfed.org/fred/series/.......")
bsObj = BeautifulSoup(html.read(), "lxml");

for ob in bsObj.find_all("observation"):
    print(ob["date"])
    print(ob["value"])

and for the .csv:

import csv
withopen("out.csv", "w") as f:
    csv.writer(f).writerows((ob["date"], ob["value"])
                            for ob in bsObj.find_all("observation"))

Getting Started with Python

Scraping Xml Element Attributes With Beautifulsoup

Solution 1:

Solution 2:

Post a Comment for "Scraping Xml Element Attributes With Beautifulsoup"