To watch the youtube video where I explain the process and we walk through it together, please click here.
Introduction
In the previous blog post in this series, we installed Actian Vector locally on our Ubuntu 16 virtual machine, we loaded a dataset into a Vector database, and we queried the database using basic SQL commands. Make sure to check that blog post out before doing this one.
In this blog post, we'll look at visualizing some key insights from the dataset using the fundamentals of creating a visualization dashboard. What Is a Data Visualization Dashboard?
Basically, we have some data, and we want to look at important parts of the data in some pretty picture format. A dashboard is typically an HTML page with multiple chart types, which is possibly interactive and allows the viewer to change views/data shown in the graphs.
Goals
There are several tools to assist with creating a dashboard. However, in this tutorial, we're going to focus on the actual mechanics, concepts, and programming behind making a dashboard, rather than using a tool.
Remember: we have an Actian Vector database installed locally on our computer that we're able to query. We want to query this data and present it in a nice format. So what do we have to do? Here's our task list: 1. We need to choose what data we want to visualize. 2. We need to extract this data from the database with some SQL queries. 3. We need to write a program that puts the extracted data into chart formatting. 4. We want to put the charts into a simple HTML page. What Part of the Dataset Should We Visualize?
Before going into the code and querying, let's think about this particular dataset and try to come up with some interesting insights that would be good to visualize.
The dataset is of airplane data from 1987 to 2017, and contains records from each month, for each flight. Let's try to think of at least six interesting facts that we'd want to learn from the data, as well as the chart type that would best suit that data.
1. The number of flights that were cancelled (compared to the number of successful flights). - Pie Chart
2. The season with the most flights. - Pie Chart 3. The number of flights by year. - Scatter Plot 4. The seven most popular destination states. - Bar Graph 5. The average flight time. - Just the number itself is enough. 6. The delay type that causes the most problems. - Bar Graph
Now that we know the data we want, we have to design some SQL queries to get this data.
The SQL Queries
Before doing anything., open a new terminal, and make actian a sudo user. Then log in as Actian, and start ingres.
sudo usermod -aG sudo actian su - actian source ~/.ingVWsh Let's go through each query that we need. (Note: There are cleaner SQL queries we could use, but that would make the programming a bit trickier. Let's keep it simple for this tutorial.) For each query, we'll create a .sql file that stores the SQL queries. (Black background = terminal commands to create and open the command.sql files, and white background = the SQL queries that go inside the files.)
1. We need to know the number of cancelled flights, and the number of non-cancelled flights.
touch /home/actian/command1.sql sudo gedit /home/actian/command1.sql SELECT COUNT(*) FROM ontime WHERE (Cancelled!=0);\g SELECT COUNT(*) FROM ontime WHERE (Cancelled=0);\g 2. We need to count how many flights there were in January, February, ..., December. touch /home/actian/command2.sql sudo gedit /home/actian/command2.sql SELECT COUNT(*) FROM ontime WHERE (Quarter=1);\g SELECT COUNT(*) FROM ontime WHERE (Quarter=2);\g SELECT COUNT(*) FROM ontime WHERE (Quarter=3);\g SELECT COUNT(*) FROM ontime WHERE (Quarter=4);\g 3. We need to count the number of flights for each year. touch /home/actian/command3.sql sudo gedit /home/actian/command3.sql SELECT COUNT(*) FROM ontime WHERE (Year=1987);\g SELECT COUNT(*) FROM ontime WHERE (Year=1988);\g SELECT COUNT(*) FROM ontime WHERE (Year=1989);\g SELECT COUNT(*) FROM ontime WHERE (Year=1990);\g SELECT COUNT(*) FROM ontime WHERE (Year=1991);\g SELECT COUNT(*) FROM ontime WHERE (Year=1992);\g SELECT COUNT(*) FROM ontime WHERE (Year=1993);\g SELECT COUNT(*) FROM ontime WHERE (Year=1994);\g SELECT COUNT(*) FROM ontime WHERE (Year=1995);\g SELECT COUNT(*) FROM ontime WHERE (Year=1996);\g SELECT COUNT(*) FROM ontime WHERE (Year=1997);\g SELECT COUNT(*) FROM ontime WHERE (Year=1998);\g SELECT COUNT(*) FROM ontime WHERE (Year=1999);\g SELECT COUNT(*) FROM ontime WHERE (Year=2000);\g SELECT COUNT(*) FROM ontime WHERE (Year=2001);\g SELECT COUNT(*) FROM ontime WHERE (Year=2002);\g SELECT COUNT(*) FROM ontime WHERE (Year=2003);\g SELECT COUNT(*) FROM ontime WHERE (Year=2004);\g SELECT COUNT(*) FROM ontime WHERE (Year=2005);\g SELECT COUNT(*) FROM ontime WHERE (Year=2006);\g SELECT COUNT(*) FROM ontime WHERE (Year=2007);\g SELECT COUNT(*) FROM ontime WHERE (Year=2008);\g SELECT COUNT(*) FROM ontime WHERE (Year=2009);\g SELECT COUNT(*) FROM ontime WHERE (Year=2010);\g SELECT COUNT(*) FROM ontime WHERE (Year=2011);\g SELECT COUNT(*) FROM ontime WHERE (Year=2012);\g SELECT COUNT(*) FROM ontime WHERE (Year=2013);\g SELECT COUNT(*) FROM ontime WHERE (Year=2014);\g SELECT COUNT(*) FROM ontime WHERE (Year=2015);\g SELECT COUNT(*) FROM ontime WHERE (Year=2016);\g SELECT COUNT(*) FROM ontime WHERE (Year=2017);\g 4. We need to count each destination state, order them by the ones that show up most, and then take the top seven. touch /home/actian/command4.sql sudo gedit /home/actian/command4.sql SELECT TOP 7 DISTINCT DestState AS State, Count(DestState) AS FlightsToState FROM ontime GROUP BY DestState ORDER BY FlightsToState DESC;\g 5. We need to average the value in the (arrival time - departure time) for all flights that were not cancelled. touch /home/actian/command5.sql sudo gedit /home/actian/command5.sql SELECT AVG(ArrTime-DepTime) FROM ontime;\g 6. For each delay type, we'll check if the value is non-zero or empty (meaning the delay type caused an issue). (Note: We declared this value as a VARCHAR, not INTEGER.) There are CarrierDelays, WeatherDelays, NASDelays, SecurityDelays, LateAircraftDelays. touch /home/actian/command6.sql sudo gedit /home/actian/command6.sql SELECT COUNT(*) FROM ontime WHERE (CarrierDelay != 0 AND CarrierDelay IS NOT NULL);\g SELECT COUNT(*) FROM ontime WHERE (WeatherDelay != 0 AND WeatherDelay IS NOT NULL);\g SELECT COUNT(*) FROM ontime WHERE (NASDelay != 0 AND NASDelay IS NOT NULL);\g SELECT COUNT(*) FROM ontime WHERE (SecurityDelay != 0 AND SecurityDelay IS NOT NULL);\g SELECT COUNT(*) FROM ontime WHERE (LateAircraftDelay != 0 AND LateAircraftDelay IS NOT NULL);\g Our SQL queries are all chosen, and inside of the files that we'll use to invoke them. DON'T do this yet, but the following commands are what we're going to be using to invoke our SQL commands. For each command.sql file (which holds data that we're trying to visualize with a chart), we're going to run the SQL commands inside, and output the results to a text file. sql ontimedb -S < command1.sql >/home/actian/1.txt sql ontimedb -S < command2.sql >/home/actian/2.txt sql ontimedb -S < command3.sql >/home/actian/3.txt sql ontimedb -S < command4.sql >/home/actian/4.txt sql ontimedb -S < command5.sql >/home/actian/5.txt sql ontimedb -S < command6.sql >/home/actian/6.txt
And with that, our SQL setup is done.
A Program To Format The Data Into Pictures
So we have all of the data that we're going to need, but how do we visualize it? Basically, we want to make charts/diagrams for each of our facts right now. To say it more concretely, we'll have the results of our queries in six different files, and we want to create pictures (.png files) for each. How can we do this? Any programming language will work, but let's try to keep it simple by using Python.
Create a file named "makepics.py" in your actian user's directory with the following command. The subsequent commands will install the python libraries that we need. (Make sure your terminal is still logged in as actian.)
touch /home/actian/makepics.py sudo apt-get install python-pip sudo pip install matplotlib sudo gedit /home/actian/makepics.py
Open "makepics.py" and add the following code, which will create our pictures for us. (Read through the commands to see what the code does.)
The code will read the contents of each data file, that was made with our SQL queries, and create the appropriate chart for that data. These charts will be saved as PNG images. So we have the images, but to be a "dashboard" we should format the images nicely.
Putting The Pictures In HTML Format
Okay, so our data was queried, turned into charts, and saved as PNG files. We just need to present these images. We could do this a lot of ways, but let's put them on an HTML page for now.
Since our image files are going to be the same name no matter what, we can actually write a static HTML page that displays each image! The following commands will create a file named "dashboard.html" and will open it for editing.
touch /home/actian/dashboard.html sudo gedit /home/actian/dashboard.html
Add the following HTML code to dashboard.html.
(Obviously, we could make the HTML a lot prettier and responsive, but I don't want to make this tutorial TOO detailed. Study responsive web design to make it even better. You can also add some basic CSS to make it look nicer.)
(Notice: We use some javascript to display the average time, since it isn't a picture.) So our dashboard is set up. Now how do we put everything together? Automating The Build Process
We want to run our SQL queries, then we want to run the script to make our images. But we want to automate that process so it's done automatically, so that any time we need to make a dashboard (such as when our database has updated data), that this automatically happens.
The best way to handle this is with a makefile. (We could create a bash script, but since all we need to do is run terminal commands, a makefile is easier.) Run the following command in your terminal to create a file named "makefile" and open it for editing. touch /home/actian/makefile sudo gedit /home/actian/makefile
Add the following code to the makefile:
Now cd into actian's home directory, and actually run the makefile.
cd /home/actian make
Open "dashboard.html" to see the results! (Drag the file into your web browser, or right click-> open it using your web browser.)
Conclusion
We created a data visualization dashboard, with Actian Vector as our database. Since we were working on a local machine, it was easiest to make SQL queries directly from the command line, and store those results in files. We then wrote a python script to make the resulting data pretty, in picture format, which we then visualized in an HTML page.
In further tutorials, we'll look at Actian Vector using AWS and Azure, where we can use this same visualization method. The only difference will be how we query the database.
The video where I walk through this tutorial with you is below.
Like this content and want more? Feel free to look around and find another blog post that interests you. You can also contact me through one of the various social media channels.
Twitter: @srcmake Discord: srcmake#3644 Youtube: srcmake Twitch: www.twitch.tv/srcmake Github: srcmake
References
Comments are closed.
|
AuthorHi, I'm srcmake. I play video games and develop software. Pro-tip: Click the "DIRECTORY" button in the menu to find a list of blog posts.
License: All code and instructions are provided under the MIT License.
|