Learn how to utilize web scraping techniques to download audio content directly from Audible to your local device using Python. By employing libraries such as requests, BeautifulSoup, and lxml, you can easily fetch and parse web pages to extract and save MP3 files.
Before getting started, ensure you have the libraries installed. You can easily install them using pip:
pip install requests beautifulsoup4 lxml Begin by importing the necessary libraries into your Python script:
import requests
from bs4 import BeautifulSoup
from lxml import etree requests: Send HTTP requests to the web page.BeautifulSoup: Parse HTML and XML documents.etree from lxml: Powerful methods for parsing and creating XML and HTML.Define a function called download_mp3 that accepts a URL and a filename as parameters. This function is responsible for downloading content from the specified URL and saving it as an MP3 file locally.
def download_mp3(url, filename):
response = requests.get(url)
if response.status_code == 200:
with open(filename, 'wb') as file:
file.write(response.content)
print('MP3 file downloaded successfully!')
return True
else:
print(f'Failed to download the file. Status code: {response.status_code}')
return False requests.get(url): Send a GET request to the URL.response.status_code == 200: Verify the success of the request.open(filename, 'wb') as file: Save the content to a file in write-binary mode.Next, define the main function download_audible_audio to handle the process of fetching Audible audio content.
def download_audible_audio(audible_url):
response = requests.get(audible_url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')
else:
print(f'Failed to retrieve the web page. Status code: {response.status_code}')
return
html_str = str(soup)
tree = etree.HTML(html_str)
name_element = tree.xpath('//*[@id="adbl-primary-button-area"]/div/div/div/div[1]/div/div/div[2]/h1')
book_name = name_element[0].text
print(book_name)
author_element = tree.xpath('//*[@id="adbl-primary-button-area"]/div/div/div/div[1]/div/div/div[2]/div[1]/span/text()[2]')
author_name = author_element[0].replace(" ", "").replace("\n", "")
print(author_name)
audio_url_element = tree.xpath(f'//button[contains(@class, "bc-button-text") and @data-mp3]')
audio_url = audio_url_element[0].get('data-mp3')
print(audio_url)
response = download_mp3(audio_url, f"{book_name}-{author_name}.mp3")
if not response:
print(f"Failed to Download mp3 for {book_name}-{author_name}") requests.get(audible_url): Issue a GET request to the Audible URL.BeautifulSoup(response.content, 'html.parser'): Parse the HTML content.etree.HTML(html_str): Convert the HTML content to an etree object.tree.xpath(xpath_expression): Extract specific elements from the HTML using XPath.Finally, create a list of Audible URLs and iterate through each URL to download the corresponding audio content.
audible_url_lst = [
"https://www.audible.in/pd/Rich-Dad-Poor-Dad-Audiobook/B079P9PGJB?eac_link=VA4vk4MPcF8x&ref=web_search_eac_asin_1&eac_selected_type=asin&eac_selected=B079P9PGJB&qid=uc6FUZOdpw&eac_id=262-9512741-3990527_uc6FUZOdpw&sr=1-1",
"https://www.audible.in/pd/The-Psychology-of-Money-Audiobook/B08D9WJCBT?qid=1716737797&sr=1-1&ref_pageloadid=not_applicable&ref=a_search_c3_lProduct_1_1&pf_rd_p=2d02bc98-4366-4f94-99d9-5e898cda0766&pf_rd_r=P4GD86A8W9B1J66ZT3G6&pageLoadId=X7fWZnsLH07y3RYj&creativeId=b2592cc9-1111-40d9-9474-98f67c8075cc"
]
for audible_url in audible_url_lst:
download_audible_audio(audible_url)
Utilize this script for efficient downloading of audiobooks from Audible in MP3 format. The script utilizes web scraping to acquire necessary information and save the audio files locally. Please adhere to Audible’s terms of service when leveraging this tool for data scraping.
IntroductionGetting Started: Preparing the Toolbox for Data ScrapingScraping IMDb's Top Rated Movies Data: Unveiling the…