Powered by
/src$ make
  • Home
  • About
  • Directory
  • Contact
  • Home
  • About
  • Directory
  • Contact

Installing Selenium (Python) and Chromedriver on Ubuntu To Scrape Webpages

3/5/2018

 

Introduction

Often, we need to scrape data from webpages or programmatically do something on our web browser. This can be tricky, because not all pages are static HTML: there may be javascript before the page loads, or we may need to submit a form to access the page we need. (Which also may include logging in to some website with an account.)

To handle these situations, we're going to use Selenium. (Which I believe can be used for any programming language, but we're going to use the python version.) Selenium will help us by simulating web browser actions, such as clicking certain HTML elements. We'll also need some help from a web browser, so we're going to use Chromedriver. 

Chromedriver (and Chrome) Installation

So we want to simulate web browser actions (specifically, submitting forms and letting javascript on pages run, if we need too). To do that, we'll need to actually mimic a web browser. In our case, we're going to use Chrome, with the aid of Chrome's webdriver. From the documentation, "WebDriver is an open source tool for automated testing of webapps across many browsers. It provides capabilities for navigating to web pages, user input, JavaScript execution, and more."
We need to install chromedriver and put it in the proper location, so that any time we call it from our python code, python knows where to find it. Thanks to this article, we can install chromedriver on Ubuntu using a few easy commands in your terminal. 
wget -N http://chromedriver.storage.googleapis.com/2.26/chromedriver_linux64.zip
unzip chromedriver_linux64.zip
chmod +x chromedriver

sudo mv -f chromedriver /usr/local/share/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver
sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
This should install chrome driver, and place it in the proper folder (as well as create some symbolic links.) Again, full credits to christopher.su for the commands. 
P.S. I'm pretty sure you need chrome to do this. If you don't already have chrome on your Ubuntu, then use the following commands (which I found here):
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
echo 'deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main' | sudo tee /etc/apt/sources.list.d/google-chrome.list
sudo apt-get update 
sudo apt-get install google-chrome-stable
The above commands should install Chrome. (If you don't already have it.)

Selenium Installation

Our web browser is set up, now we're going to actually create a project. I find that it's best practice to use a python virtual environment to handle our libraries, and so that's what we're going to use. If you don't know how to use a python virtual environment (or if you don't have Python installed yet), then see this article quickly and get it all installed. 

In your project folder, run the following commands to create our main.py file (where we'll write our python code), and to install selenium in the project folder using PVE.
touch main.py
pipenv install selenium
Next, in our main.py while, we're going to write some test code. (Thanks to eyalfrank.com for providing this example code.)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
from selenium import webdriver
 
# The place we will direct our WebDriver to
url = 'http://www.srcmake.com/'

# Creating the WebDriver object using the ChromeDriver
driver = webdriver.Chrome()

# Directing the driver to the defined url
driver.get(url)
This code will simply start the chrome webdriver and navigate to srcmake.com. 

Run the python code with the following command.
pipenv run python main.py
You should see a popup of chrome that displays the webpage. 

Conclusion

In this tutorial, we set up our environment for being able to use Chrome's webdriver, and create a python project that installs selenium, with some simple code to get selenium and chromedriver working.

However, there's much more that we can do with selenium to interact with webpages, from clicking buttons, parsing HTML code, and scraping data. In future tutorials we'll dive deeper into selenium's tools to build webscrapers and automatically do things from our browser. 
​Like this content and want more? Feel free to look around and find another blog post that interests you. You can also contact me through one of the various social media channels. 

Twitter: @srcmake
Discord: srcmake#3644
Youtube: srcmake
Twitch: www.twitch.tv/srcmake
​Github: srcmake
References
1. www.eyalfrank.com/scraping-web-forms-with-selenium/
2. sites.google.com/a/chromium.org/chromedriver/
3. christopher.su/2015/selenium-chromedriver-ubuntu/

Comments are closed.

    Author

    Hi, I'm srcmake. I play video games and develop software. 

    Pro-tip: Click the "DIRECTORY" button in the menu to find a list of blog posts.
    Metamask tip button
    License: All code and instructions are provided under the MIT License.

    Discord

    Chat with me.


    Youtube

    Watch my videos.


    Twitter

    Get the latest news.


    Twitch

    See the me code live.


    Github

    My latest projects.

Powered by Create your own unique website with customizable templates.