Table of Contents
Learn how to utilize web scraping techniques to download audio content directly from Audible to your local device using Python. By employing libraries such as requests
, BeautifulSoup
, and lxml
, you can easily fetch and parse web pages to extract and save MP3 files.
Necessary Libraries for Web Scraping
Before getting started, ensure you have the libraries installed. You can easily install them using pip:
pip install requests beautifulsoup4 lxml
Importing Essential Libraries
Begin by importing the necessary libraries into your Python script:
import requests
from bs4 import BeautifulSoup
from lxml import etree
requests
: Send HTTP requests to the web page.BeautifulSoup
: Parse HTML and XML documents.etree
fromlxml
: Powerful methods for parsing and creating XML and HTML.
Custom Function for MP3 Download
Define a function called download_mp3
that accepts a URL and a filename as parameters. This function is responsible for downloading content from the specified URL and saving it as an MP3 file locally.
def download_mp3(url, filename):
response = requests.get(url)
if response.status_code == 200:
with open(filename, 'wb') as file:
file.write(response.content)
print('MP3 file downloaded successfully!')
return True
else:
print(f'Failed to download the file. Status code: {response.status_code}')
return False
requests.get(url)
: Send a GET request to the URL.response.status_code == 200
: Verify the success of the request.open(filename, 'wb') as file
: Save the content to a file in write-binary mode.
Extracting Audible Audio Using Web Scraping
Next, define the main function download_audible_audio
to handle the process of fetching Audible audio content.
def download_audible_audio(audible_url):
response = requests.get(audible_url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html.parser')
else:
print(f'Failed to retrieve the web page. Status code: {response.status_code}')
return
html_str = str(soup)
tree = etree.HTML(html_str)
name_element = tree.xpath('//*[@id="adbl-primary-button-area"]/div/div/div/div[1]/div/div/div[2]/h1')
book_name = name_element[0].text
print(book_name)
author_element = tree.xpath('//*[@id="adbl-primary-button-area"]/div/div/div/div[1]/div/div/div[2]/div[1]/span/text()[2]')
author_name = author_element[0].replace(" ", "").replace("\n", "")
print(author_name)
audio_url_element = tree.xpath(f'//button[contains(@class, "bc-button-text") and @data-mp3]')
audio_url = audio_url_element[0].get('data-mp3')
print(audio_url)
response = download_mp3(audio_url, f"{book_name}-{author_name}.mp3")
if not response:
print(f"Failed to Download mp3 for {book_name}-{author_name}")
requests.get(audible_url)
: Issue a GET request to the Audible URL.BeautifulSoup(response.content, 'html.parser')
: Parse the HTML content.etree.HTML(html_str)
: Convert the HTML content to anetree
object.tree.xpath(xpath_expression)
: Extract specific elements from the HTML using XPath.
Automating Download for Multiple Audiobooks
Finally, create a list of Audible URLs and iterate through each URL to download the corresponding audio content.
audible_url_lst = [
"https://www.audible.in/pd/Rich-Dad-Poor-Dad-Audiobook/B079P9PGJB?eac_link=VA4vk4MPcF8x&ref=web_search_eac_asin_1&eac_selected_type=asin&eac_selected=B079P9PGJB&qid=uc6FUZOdpw&eac_id=262-9512741-3990527_uc6FUZOdpw&sr=1-1",
"https://www.audible.in/pd/The-Psychology-of-Money-Audiobook/B08D9WJCBT?qid=1716737797&sr=1-1&ref_pageloadid=not_applicable&ref=a_search_c3_lProduct_1_1&pf_rd_p=2d02bc98-4366-4f94-99d9-5e898cda0766&pf_rd_r=P4GD86A8W9B1J66ZT3G6&pageLoadId=X7fWZnsLH07y3RYj&creativeId=b2592cc9-1111-40d9-9474-98f67c8075cc"
]
for audible_url in audible_url_lst:
download_audible_audio(audible_url)
Utilize this script for efficient downloading of audiobooks from Audible in MP3 format. The script utilizes web scraping to acquire necessary information and save the audio files locally. Please adhere to Audible’s terms of service when leveraging this tool for data scraping.