Introduction
Often, we need to scrape data from webpages or programmatically do something on our web browser. This can be tricky, because not all pages are static HTML: there may be javascript before the page loads, or we may need to submit a form to access the page we need. (Which also may include logging in to some website with an account.)
To handle these situations, we're going to use Selenium. (Which I believe can be used for any programming language, but we're going to use the python version.) Selenium will help us by simulating web browser actions, such as clicking certain HTML elements. We'll also need some help from a web browser, so we're going to use Chromedriver. Chromedriver (and Chrome) Installation
So we want to simulate web browser actions (specifically, submitting forms and letting javascript on pages run, if we need too). To do that, we'll need to actually mimic a web browser. In our case, we're going to use Chrome, with the aid of Chrome's webdriver. From the documentation, "WebDriver is an open source tool for automated testing of webapps across many browsers. It provides capabilities for navigating to web pages, user input, JavaScript execution, and more."
We need to install chromedriver and put it in the proper location, so that any time we call it from our python code, python knows where to find it. Thanks to this article, we can install chromedriver on Ubuntu using a few easy commands in your terminal.
wget -N http://chromedriver.storage.googleapis.com/2.26/chromedriver_linux64.zip unzip chromedriver_linux64.zip chmod +x chromedriver sudo mv -f chromedriver /usr/local/share/chromedriver sudo ln -s /usr/local/share/chromedriver /usr/local/bin/chromedriver sudo ln -s /usr/local/share/chromedriver /usr/bin/chromedriver
This should install chrome driver, and place it in the proper folder (as well as create some symbolic links.) Again, full credits to christopher.su for the commands.
P.S. I'm pretty sure you need chrome to do this. If you don't already have chrome on your Ubuntu, then use the following commands (which I found here):
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add - echo 'deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main' | sudo tee /etc/apt/sources.list.d/google-chrome.list sudo apt-get update sudo apt-get install google-chrome-stable
The above commands should install Chrome. (If you don't already have it.)
Selenium Installation
Our web browser is set up, now we're going to actually create a project. I find that it's best practice to use a python virtual environment to handle our libraries, and so that's what we're going to use. If you don't know how to use a python virtual environment (or if you don't have Python installed yet), then see this article quickly and get it all installed.
In your project folder, run the following commands to create our main.py file (where we'll write our python code), and to install selenium in the project folder using PVE. touch main.py pipenv install selenium
Next, in our main.py while, we're going to write some test code. (Thanks to eyalfrank.com for providing this example code.)
This code will simply start the chrome webdriver and navigate to srcmake.com.
Run the python code with the following command.
pipenv run python main.py
You should see a popup of chrome that displays the webpage.
Conclusion
In this tutorial, we set up our environment for being able to use Chrome's webdriver, and create a python project that installs selenium, with some simple code to get selenium and chromedriver working.
However, there's much more that we can do with selenium to interact with webpages, from clicking buttons, parsing HTML code, and scraping data. In future tutorials we'll dive deeper into selenium's tools to build webscrapers and automatically do things from our browser.
Like this content and want more? Feel free to look around and find another blog post that interests you. You can also contact me through one of the various social media channels.
Twitter: @srcmake Discord: srcmake#3644 Youtube: srcmake Twitch: www.twitch.tv/srcmake Github: srcmake Comments are closed.
|
AuthorHi, I'm srcmake. I play video games and develop software. Pro-tip: Click the "DIRECTORY" button in the menu to find a list of blog posts.
License: All code and instructions are provided under the MIT License.
|