PYTOC

Web Scraping to Download MP3s from Audible Using Python

amazon audible web scraping

Learn how to utilize web scraping techniques to download audio content directly from Audible to your local device using Python. By employing libraries such as requests, BeautifulSoup, and lxml, you can easily fetch and parse web pages to extract and save MP3 files.

Necessary Libraries for Web Scraping

Before getting started, ensure you have the libraries installed. You can easily install them using pip:

pip install requests beautifulsoup4 lxml

Importing Essential Libraries

Begin by importing the necessary libraries into your Python script:

import requests
from bs4 import BeautifulSoup
from lxml import etree
  • requests: Send HTTP requests to the web page.
  • BeautifulSoup: Parse HTML and XML documents.
  • etree from lxml: Powerful methods for parsing and creating XML and HTML.

Custom Function for MP3 Download

Define a function called download_mp3 that accepts a URL and a filename as parameters. This function is responsible for downloading content from the specified URL and saving it as an MP3 file locally.

def download_mp3(url, filename):
    response = requests.get(url)
    if response.status_code == 200:
        with open(filename, 'wb') as file:
            file.write(response.content)
        print('MP3 file downloaded successfully!')
        return True
    else:
        print(f'Failed to download the file. Status code: {response.status_code}')
        return False
  • requests.get(url): Send a GET request to the URL.
  • response.status_code == 200: Verify the success of the request.
  • open(filename, 'wb') as file: Save the content to a file in write-binary mode.

Extracting Audible Audio Using Web Scraping

Next, define the main function download_audible_audio to handle the process of fetching Audible audio content.

def download_audible_audio(audible_url):
    response = requests.get(audible_url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, 'html.parser')
    else:
        print(f'Failed to retrieve the web page. Status code: {response.status_code}')
        return

    html_str = str(soup)
    tree = etree.HTML(html_str)
    name_element = tree.xpath('//*[@id="adbl-primary-button-area"]/div/div/div/div[1]/div/div/div[2]/h1')
    book_name = name_element[0].text
    print(book_name)

    author_element = tree.xpath('//*[@id="adbl-primary-button-area"]/div/div/div/div[1]/div/div/div[2]/div[1]/span/text()[2]')
    author_name = author_element[0].replace(" ", "").replace("\n", "")
    print(author_name)

    audio_url_element = tree.xpath(f'//button[contains(@class, "bc-button-text") and @data-mp3]')
    audio_url = audio_url_element[0].get('data-mp3')
    print(audio_url)

    response = download_mp3(audio_url, f"{book_name}-{author_name}.mp3")
    if not response:
        print(f"Failed to Download mp3 for {book_name}-{author_name}")
  • requests.get(audible_url): Issue a GET request to the Audible URL.
  • BeautifulSoup(response.content, 'html.parser'): Parse the HTML content.
  • etree.HTML(html_str): Convert the HTML content to an etree object.
  • tree.xpath(xpath_expression): Extract specific elements from the HTML using XPath.

Automating Download for Multiple Audiobooks

Finally, create a list of Audible URLs and iterate through each URL to download the corresponding audio content.

audible_url_lst = [
    "https://www.audible.in/pd/Rich-Dad-Poor-Dad-Audiobook/B079P9PGJB?eac_link=VA4vk4MPcF8x&ref=web_search_eac_asin_1&eac_selected_type=asin&eac_selected=B079P9PGJB&qid=uc6FUZOdpw&eac_id=262-9512741-3990527_uc6FUZOdpw&sr=1-1",
    "https://www.audible.in/pd/The-Psychology-of-Money-Audiobook/B08D9WJCBT?qid=1716737797&sr=1-1&ref_pageloadid=not_applicable&ref=a_search_c3_lProduct_1_1&pf_rd_p=2d02bc98-4366-4f94-99d9-5e898cda0766&pf_rd_r=P4GD86A8W9B1J66ZT3G6&pageLoadId=X7fWZnsLH07y3RYj&creativeId=b2592cc9-1111-40d9-9474-98f67c8075cc"
]

for audible_url in audible_url_lst:
    download_audible_audio(audible_url)

Utilize this script for efficient downloading of audiobooks from Audible in MP3 format. The script utilizes web scraping to acquire necessary information and save the audio files locally. Please adhere to Audible’s terms of service when leveraging this tool for data scraping.

Leave A Comment

Your email address will not be published. Required fields are marked *