How to deploy a Python — Web Scraper with Selenium on Heroku

Heroku — Logo and a spider with Selenium — Library Logo

Web Scraper — Example

from selenium import webdriver
driver = webdriver.Chrome()driver.get("https://medium.com")print(driver.page_source)
driver.quit()
print("Finished!")

Set up our code for Heroku

Add some arguments to Chrome-Options

from selenium import webdriver
import os
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")chrome_options.add_argument("--disable-dev-shm-usage")chrome_options.add_argument("--no-sandbox")
  • The headless argument doesn’t open the browser, when the web scraper is running and so it runs in the background. It is also required by Heroku himself, if it has not changed.
  • sandbox is an additional feature from Chrome, which aren’t included on the Linux box that Heroku spins up for you. Therefore, we do not want to have a sandbox.
  • /dev/shm is an implementation of the traditional shared memory concept.
    The shared memory space is typically too small for Chrome and will cause Chrome to crash when rendering large pages.
    In the past, the size of the shared memory had to be increased.
    Since Chrome Version 65, this is no longer necessary. Instead, launch the browser with the--disable-dev-shm-usageflag.
    This will write shared memory files into /tmp instead of /dev/shm.

Storing the paths in Heroku’s — Environment Variables

chrome_options.binary_location = os.environ.get("GOOGLE_CHROME_BIN")driver = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), chrome_options=chrome_options)
from selenium import webdriverimport oschrome_options = webdriver.ChromeOptions()chrome_options.add_argument("--headless")chrome_options.add_argument("--disable-dev-shm-usage")chrome_options.add_argument("--no-sandbox")chrome_options.binary_location = os.environ.get("GOOGLE_CHROME_BIN")driver = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), chrome_options=chrome_options)
driver.get("https://medium.com")
print(driver.page_source)print("Finished!")

Preparation for deployment

Virtual environment

C:\Users\Romik>cd DesktopC:\Users\Romi\Desktop>cd myprojectC:\Users\Romi\Desktop\myproject>
C:\Users\Romi\Desktop\myproject>pip install virtualenv
C:\Users\Romi\Desktop\myproject>virtualenv env
C:\Users\Romi\Desktop\myproject>env\Scripts\activate(env) C:\Users\Romi\Desktop\myproject>

Requirements and installation of the modules

(env) C:\Users\Romi\Desktop\myproject> pip install selenium
(env) C:\Users\Romi\Desktop\myproject> pip freeze > requirements.txt
(env) C:\Users\Romi\Desktop\myproject> echo worker: python main.py > Procfile
Procfile

Deploying to Heroku

First things first

Heroku — Create App Image
Name app and choose region
Heroku — Personal

Environment variables

chrome_options.binary_location = os.environ.get("GOOGLE_CHROME_BIN")driver = webdriver.Chrome(executable_path=os.environ.get("CHROMEDRIVER_PATH"), chrome_options=chrome_options)
Environment variables

Buildpacks

Buildpack — Python
Buildpack — 2
Buildpack-3

Ready for deployment

Deploy Heroku

Last step

Thank you for reading

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Romik Kelesh

Romik Kelesh

Machine Learning | Full Stack | Computer Scientist | Economist