Learn how to utilize web scraping techniques to download audio content directly from Audible to your local device using Python. By employing libraries such as requests
, BeautifulSoup
, and lxml
, you can easily fetch and parse web pages to extract and save MP3 files.
Before getting started, ensure you have the libraries installed. You can easily install them using pip:
pip install requests beautifulsoup4 lxml
Begin by importing the necessary libraries into your Python script:
import requests
from bs4 import BeautifulSoup
from lxml import etree
requests
: Send HTTP requests to the web page.BeautifulSoup
: Parse HTML and XML documents.etree
from lxml
: Powerful methods for parsing and creating XML and HTML.Define a function called download_mp3
that accepts a URL and a filename as parameters. This function is responsible for downloading content from the specified URL and saving it as an MP3 file locally.
def download_mp3(url, filename):
response = requests.get(url)
if response.status_code == 200:
with open(filename, 'wb') as file:
file.write(response.content)
print('MP3 file downloaded successfully!')
return True
else:
print(f'Failed to download the file. Status code: {response.status_code}')
return False
requests.get(url)
: Send a GET request to the URL.response.status_code == 200
: Verify the success of the request.open(filename, 'wb') as file
: Save the content to a file in write-binary mode.Next, define the main function download_audible_audio
to handle the process of fetching Audible audio content.
def download_audible_audio(audible_url):
response = requests.get(audible_url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')
else:
print(f'Failed to retrieve the web page. Status code: {response.status_code}')
return
html_str = str(soup)
tree = etree.HTML(html_str)
name_element = tree.xpath('//*[@id="adbl-primary-button-area"]/div/div/div/div[1]/div/div/div[2]/h1')
book_name = name_element[0].text
print(book_name)
author_element = tree.xpath('//*[@id="adbl-primary-button-area"]/div/div/div/div[1]/div/div/div[2]/div[1]/span/text()[2]')
author_name = author_element[0].replace(" ", "").replace("\n", "")
print(author_name)
audio_url_element = tree.xpath(f'//button[contains(@class, "bc-button-text") and @data-mp3]')
audio_url = audio_url_element[0].get('data-mp3')
print(audio_url)
response = download_mp3(audio_url, f"{book_name}-{author_name}.mp3")
if not response:
print(f"Failed to Download mp3 for {book_name}-{author_name}")
requests.get(audible_url)
: Issue a GET request to the Audible URL.BeautifulSoup(response.content, 'html.parser')
: Parse the HTML content.etree.HTML(html_str)
: Convert the HTML content to an etree
object.tree.xpath(xpath_expression)
: Extract specific elements from the HTML using XPath.Finally, create a list of Audible URLs and iterate through each URL to download the corresponding audio content.
audible_url_lst = [
"https://www.audible.in/pd/Rich-Dad-Poor-Dad-Audiobook/B079P9PGJB?eac_link=VA4vk4MPcF8x&ref=web_search_eac_asin_1&eac_selected_type=asin&eac_selected=B079P9PGJB&qid=uc6FUZOdpw&eac_id=262-9512741-3990527_uc6FUZOdpw&sr=1-1",
"https://www.audible.in/pd/The-Psychology-of-Money-Audiobook/B08D9WJCBT?qid=1716737797&sr=1-1&ref_pageloadid=not_applicable&ref=a_search_c3_lProduct_1_1&pf_rd_p=2d02bc98-4366-4f94-99d9-5e898cda0766&pf_rd_r=P4GD86A8W9B1J66ZT3G6&pageLoadId=X7fWZnsLH07y3RYj&creativeId=b2592cc9-1111-40d9-9474-98f67c8075cc"
]
for audible_url in audible_url_lst:
download_audible_audio(audible_url)
Utilize this script for efficient downloading of audiobooks from Audible in MP3 format. The script utilizes web scraping to acquire necessary information and save the audio files locally. Please adhere to Audible’s terms of service when leveraging this tool for data scraping.
IntroductionGetting Started: Preparing the Toolbox for Data ScrapingScraping IMDb's Top Rated Movies Data: Unveiling the…