Skip to content Skip to sidebar Skip to footer

(web Scraping) I've Located The Proper Tags, Now How Do I Extract The Text?

I'm creating my first web scraping application that collects the titles of games currently on the 'new and trending' tab on https://store.steampowered.com/. Once I figure out how t

Solution 1:

Just read the documentation:

If you only want the text part of a document or tag, you can use the get_text() method. It returns all the text in a document or beneath a tag, as a single Unicode string.

So just do:

# Should be `title` IMO, because you are currently handling a single titlefor titles in containers:
    print(titles.get_text())

Solution 2:

use titles.text or, even titles.get_text() whichever you prefer to get the title text like below:

from urllib.request import urlopen
from bs4 import BeautifulSoup

my_url = 'https://store.steampowered.com/'
uClient = urlopen(my_url)
page_html = uClient.read()
uClient.close()

page_soup = BeautifulSoup(page_html, "html.parser")
containers = page_soup.findAll("div",{"class":"tab_item_name"}, limit=11)

for titles in containers:
    print(titles.text)

Solution 3:

Another very convenient way is to use lxml

import requests
import lxml.html

url = 'https://store.steampowered.com/'# Make the request
response = requests.get(url=url, timeout=5)
# Parse tree
tree = lxml.html.fromstring(response.text)
# Select section corresponding to new games
sub_tree = tree.get_element_by_id('tab_newreleases_content')
# Extract data
games_list = [a.text_content() for a in sub_tree.find_class('tab_item_name')]

# Checkfor game in games_list[:11]:
    print(game)
# Destiny 2: Shadowkeep# Destiny 2# Destiny 2: Forsaken# Destiny 2: Shadowkeep Digital Deluxe Edition# NGU IDLE# Fernbus Simulator - MAN Lion's Intercity# Euro Truck Simulator 2 - Pink Ribbon Charity Pack# Spaceland# Cube World# CODE VEIN# CODE VEIN

Post a Comment for "(web Scraping) I've Located The Proper Tags, Now How Do I Extract The Text?"