Powered by
/src$ make
  • Home
  • About
  • Directory
  • Contact
  • Home
  • About
  • Directory
  • Contact

Actian Vector Data Visualization Dashboard

3/23/2018

 
To watch the youtube video where I explain the process and we walk through it together, please click here.

Introduction

In the previous blog post in this series, we installed Actian Vector locally on our Ubuntu 16 virtual machine, we loaded a dataset into a Vector database, and we queried the database using basic SQL commands. Make sure to check that blog post out before doing this one.

In this blog post, we'll look at visualizing some key insights from the dataset using the fundamentals of creating a visualization dashboard.  

What Is a Data Visualization Dashboard?

Basically, we have some data, and we want to look at important parts of the data in some pretty picture format. A dashboard is typically an HTML page with multiple chart types, which is possibly interactive and allows the viewer to change views/data shown in the graphs.

Goals

There are several tools to assist with creating a dashboard. However, in this tutorial, we're going to focus on the actual mechanics, concepts, and programming behind making a dashboard, rather than using a tool.

Remember: we have an Actian Vector database installed locally on our computer that we're able to query. We want to query this data and present it in a nice format. ​

So what do we have to do? Here's our task list:
​1. We need to choose what data we want to visualize.
2. We need to extract this data from the database with some SQL queries.
3. We need to write a program that puts the extracted data into chart formatting. 
​4. We want to put the charts into a simple HTML page. 

What Part of the Dataset Should We Visualize?

Before going into the code and querying, let's think about this particular dataset and try to come up with some interesting insights that would be good to visualize. 

The dataset is of airplane data from 1987 to 2017, and contains records from each month, for each flight. Let's try to think of at least six interesting facts that we'd want to learn from the data, as well as the chart type that would best suit that data.
1. The number of flights that were cancelled (compared to the number of successful flights). - Pie Chart
2. The season with the most flights. - Pie Chart
3. The number of flights by year. - Scatter Plot
4. The seven most popular destination states. - Bar Graph
5. The average flight time. - Just the number itself is enough. 
6. The delay type that causes the most problems. - Bar Graph
Now that we know the data we want, we have to design some SQL queries to get this data. 

The SQL Queries

Before doing anything., open a new terminal, and make actian a sudo user. Then log in as Actian, and start ingres.
sudo usermod -aG sudo actian
su - actian
source ~/.ingVWsh

​Let's go through each query that we need. (Note: There are cleaner SQL queries we could use, but that would make the programming a bit trickier. Let's keep it simple for this tutorial.)

For each query, we'll create a .sql file that stores the SQL queries. (Black background = terminal commands to create and open the command.sql files, and white background = the SQL queries that go inside the files.)
​1. We need to know the number of cancelled flights, and the number of non-cancelled flights. 
touch /home/actian/command1.sql
sudo gedit /home/actian/command1.sql
SELECT COUNT(*) FROM ontime WHERE (Cancelled!=0);\g
SELECT COUNT(*) FROM ontime WHERE (Cancelled=0);\g

2. We need to count how many flights there were in January, February, ..., December. 
touch /home/actian/command2.sql
sudo gedit /home/actian/command2.sql
SELECT COUNT(*) FROM ontime WHERE (Quarter=1);\g
SELECT COUNT(*) FROM ontime WHERE (Quarter=2);\g
SELECT COUNT(*) FROM ontime WHERE (Quarter=3);\g
SELECT COUNT(*) FROM ontime WHERE (Quarter=4);\g

3. We need to count the number of flights for each year.
touch /home/actian/command3.sql
sudo gedit /home/actian/command3.sql
SELECT COUNT(*) FROM ontime WHERE (Year=1987);\g
SELECT COUNT(*) FROM ontime WHERE (Year=1988);\g
SELECT COUNT(*) FROM ontime WHERE (Year=1989);\g
SELECT COUNT(*) FROM ontime WHERE (Year=1990);\g
SELECT COUNT(*) FROM ontime WHERE (Year=1991);\g
SELECT COUNT(*) FROM ontime WHERE (Year=1992);\g
SELECT COUNT(*) FROM ontime WHERE (Year=1993);\g
SELECT COUNT(*) FROM ontime WHERE (Year=1994);\g
SELECT COUNT(*) FROM ontime WHERE (Year=1995);\g
SELECT COUNT(*) FROM ontime WHERE (Year=1996);\g
SELECT COUNT(*) FROM ontime WHERE (Year=1997);\g
SELECT COUNT(*) FROM ontime WHERE (Year=1998);\g
SELECT COUNT(*) FROM ontime WHERE (Year=1999);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2000);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2001);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2002);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2003);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2004);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2005);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2006);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2007);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2008);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2009);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2010);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2011);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2012);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2013);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2014);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2015);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2016);\g
SELECT COUNT(*) FROM ontime WHERE (Year=2017);\g

​4. We need to count each destination state, order them by the ones that show up most, and then take the top seven.
touch /home/actian/command4.sql
sudo gedit /home/actian/command4.sql
SELECT TOP 7 DISTINCT DestState AS State, Count(DestState) AS FlightsToState FROM ontime GROUP BY DestState ORDER BY FlightsToState DESC;\g

​5. We need to average the value in the (arrival time - departure time) for all flights that were not cancelled.
touch /home/actian/command5.sql
sudo gedit /home/actian/command5.sql
SELECT AVG(ArrTime-DepTime) FROM ontime;\g

6. For each delay type, we'll check if the value is non-zero or empty (meaning the delay type caused an issue). (Note: We declared this value as a VARCHAR, not INTEGER.) There are CarrierDelays, WeatherDelays, NASDelays, SecurityDelays, LateAircraftDelays. 
touch /home/actian/command6.sql
sudo gedit /home/actian/command6.sql
SELECT COUNT(*) FROM ontime WHERE (CarrierDelay != 0 AND CarrierDelay IS NOT NULL);\g
SELECT COUNT(*) FROM ontime WHERE (WeatherDelay != 0 AND WeatherDelay IS NOT NULL);\g
SELECT COUNT(*) FROM ontime WHERE (NASDelay != 0 AND NASDelay IS NOT NULL);\g
SELECT COUNT(*) FROM ontime WHERE (SecurityDelay != 0 AND SecurityDelay IS NOT NULL);\g
SELECT COUNT(*) FROM ontime WHERE (LateAircraftDelay != 0 AND LateAircraftDelay IS NOT NULL);\g

Our SQL queries are all chosen, and inside of the files that we'll use to invoke them.

DON'T do this yet, but the following commands are what we're going to be using to invoke our SQL commands. For each command.sql file (which holds data that we're trying to visualize with a chart), we're going to run the SQL commands inside, and output the results to a text file.
sql ontimedb -S < command1.sql >/home/actian/1.txt
sql ontimedb -S < command2.sql >/home/actian/2.txt
sql ontimedb -S < command3.sql >/home/actian/3.txt
sql ontimedb -S < command4.sql >/home/actian/4.txt
sql ontimedb -S < command5.sql >/home/actian/5.txt
sql ontimedb -S < command6.sql >/home/actian/6.txt
And with that, our SQL setup is done.

A Program To Format The Data Into Pictures

So we have all of the data that we're going to need, but how do we visualize it? Basically, we want to make charts/diagrams for each of our facts right now. To say it more concretely, we'll have the results of our queries in six different files, and we want to create pictures (.png files) for each. How can we do this? Any programming language will work, but let's try to keep it simple by using Python. 
Create a file named "makepics.py" in your actian user's directory with the following command. The subsequent commands will install the python libraries that we need. (Make sure your terminal is still logged in as actian.)
touch /home/actian/makepics.py
sudo apt-get install python-pip
sudo pip install matplotlib
sudo gedit /home/actian/makepics.py
Open "makepics.py" and add the following code, which will create our pictures for us. (Read through the commands to see what the code does.)
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
# Copyright srcmake.com 2018.
# This code will take the six specific files that contain data we're trying to 
# visualize using Actian Vector and plot them in the appropriate manner, saving them to files.

####################################################################
# LIBRARIES
import numpy as np
import matplotlib
matplotlib.use('pdf')
import matplotlib.pyplot as plt
####################################################################


####################################################################
### 1 - CANCELLED FLIGHTS - PIE CHART ###
# Read the data from 1.txt file into "content". remove whitespace, convert to int.
def MakePic1():
        # Clear the plot if necessary.
        plt.gcf().clear()

        # Open up the results of the SQL queries and prep the data.
        with open("/home/actian/1.txt") as f:
            content = f.readlines()
        content = [x.strip() for x in content] 
        content = map(int, content)

        # The pie chart details.
        labels = ['Cancelled', 'Successful']
        colors = ['gold', 'lightgreen']

        # Make the plot and save it to a file named 1.png.
        plt.pie(content, labels=labels, colors=colors, autopct='%1.0f%%',shadow=True, startangle=0, labeldistance=1.2)
        plt.savefig('1.png')
####################################################################


####################################################################
### 2 - SEASON FLIGHTS - PIE CHART ###
def MakePic2():
        # Clear the plot if necessary.
        plt.gcf().clear()

        # Open up the results of the SQL queries and prep the data.
        with open("/home/actian/2.txt") as f:
            content = f.readlines()
        content = [x.strip() for x in content] 
        content = map(int, content)

        # The pie chart details.
        labels = ['Winter', 'Spring', 'Summer', 'Fall']
        colors = ['lightskyblue', 'lightgreen', 'gold', 'pink']

        # Make the plot and save it to a file named 2.png.
        plt.pie(content, labels=labels, colors=colors, autopct='%1.0f%%', shadow=True, startangle=0)
        plt.savefig('2.png')
####################################################################


####################################################################
### 3 - FLIGHTS BY YEAR - SCATTER CHART ###
def MakePic3():
        # Clear the plot if necessary.
        plt.gcf().clear()

        # Open up the results of the SQL queries and prep the data.
        with open("/home/actian/3.txt") as f:
            content = f.readlines()
        content = [x.strip() for x in content] 
        content = map(int, content)

        # The chart details.
        dates = [1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017]

        plt.xlabel('Year')
        plt.ylabel('Number of Flights')
        
        # Make the plot and save it to a file named 3.png.
        plt.plot(dates, content, 'xb-')
        plt.savefig('3.png')
####################################################################


####################################################################
### 4 - TOP 7 DESTINATION STATES - BAR CHART ###
def MakePic4():
        # Clear the plot if necessary.
        plt.gcf().clear()

        # Open up the results of the SQL queries and prep the data.
        with open("/home/actian/4.txt") as f:
            content = f.readlines()

        # Initialize states array and count array.
        states = []
        count = []

        # Go through each line of the query.
        for line in content:
                # Remove whitespace and get both the statename and count.
                words = line.split();   
                states.append(words[0])
                count.append(words[1])

        # Convert the count to int.
        count = map(int, count)

        # The chart details.
        plt.xlabel('State')
        plt.ylabel('Number of Flights To State (in ten millions)')
        colors = ['red', 'orange', 'yellow', 'green', 'blue', 'indigo', 'violet']

        # Make the plot and save it to a file named 4.png.
        # From: https://stackoverflow.com/questions/33203645/how-to-plot-a-histogram-using-matplotlib-in-python-with-a-list-of-data
        x=np.arange(len(states))
        plt.bar(x, height=count, color=colors)
        plt.xticks(x+0.4, states)
        plt.savefig('4.png')
####################################################################


####################################################################
### 5 - AVERAGE FLIGHT TIME - NUMBER ###
# No need to use python here. Javascript in the dashboard handles this.
####################################################################

####################################################################
### 6 - DELAY ISSUES - BAR PLOT ###
def MakePic6():
        # Clear the plot if necessary.
        plt.gcf().clear()

        # Open up the results of the SQL queries and prep the data.
        with open("/home/actian/6.txt") as f:
            content = f.readlines()
        content = [x.strip() for x in content] 
        content = map(int, content)

        # The delay types.
        delays = ['Carrier', 'Weather', 'NAS', 'Security', 'LateAircraft']

        # The chart details.
        plt.xlabel('Delay Type')
        plt.ylabel('Number of Times This Delay Caused Problems')
        colors = ['#624ea7', 'g', 'yellow', 'k', 'maroon']
        
# Make the plot and save it to a file named 4.png.
        x=np.arange(len(content))
        plt.bar(x, height=content, color=colors)
        plt.xticks(x+0.4, delays)
        plt.savefig('6.png')
####################################################################

####################################################################
### MAIN ###
# When this file is called, make all pics.
if __name__ == "__main__":
        MakePic1()
        MakePic2()
        MakePic3()
        MakePic4()
        MakePic6()
###################################################################
The code will read the contents of each data file, that was made with our SQL queries, and create the appropriate chart for that data. These charts will be saved as PNG images. So we have the images, but to be a "dashboard" we should format the images nicely. 

Putting The Pictures In HTML Format

Okay, so our data was queried, turned into charts, and saved as PNG files. We just need to present these images. We could do this a lot of ways, but let's put them on an HTML page for now.
Since our image files are going to be the same name no matter what, we can actually write a static HTML page that displays each image! The following commands will create a file named "dashboard.html" and will open it for editing. 
touch /home/actian/dashboard.html
sudo gedit /home/actian/dashboard.html
Add the following HTML code to dashboard.html.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
<html>
        <head>
                <meta charset="utf-8" name="viewport" content="width=device-width, initial-scale=1.0">
                <title>Actian Vector Data Dashboard</title>
    </head>
        <body>
                <center>
                <h1>The Data Dashboard for US Airplane Flights Between 1987 to 2017<h1>
                <h4>Powered by an Actian Vector database.</h4>
                <br>
                <h3>Cancelled Flights</h3>
                <img src="1.png" alt="Cancelled Flights">
                <br>
                
                <h3>Most Popular Seasons</h3>
                <img src="2.png" alt="Most Popular Seasons">
                <br>
                
                
                <h3>Flights By Year</h3>
                <img src="3.png" alt="Flights By Year">
                <br>
                
                
                <h3>Top 7 Destination States</h3>
                <img src="4.png" alt="Top 7 Destination States">
                <br><br><br>
                
                <h3>Average Flight Time</h3>
                <div id="flighttime"></div>
                <br><br><br>
                
                <h3>Biggest Delay Reasons</h3>
                <img src="6.png" alt="Biggest Delay Reasons">
                <br>
                </center>
        </body>

        <script>
        // Update the Average Flight Time div with the time inside 5.txt.
        // From: https://stackoverflow.com/questions/14446447/how-to-read-a-local-text-file
        function readTextFile() 
                {
                var rawFile = new XMLHttpRequest();
                rawFile.open("GET", "5.txt", true);
                rawFile.onreadystatechange = function() 
                        {
                        if (rawFile.readyState === 4) 
                                {
                                var allText = rawFile.responseText;
                                document.getElementById("flighttime").innerHTML = "The <b><em>average flight time</em></b> for flights within the US is <b>" + allText + "</b> minutes.";
                                }
                        }
                rawFile.send();
                };
        readTextFile();
        </script>

</html>
(Obviously, we could make the HTML a lot prettier and responsive, but I don't want to make this tutorial TOO detailed. Study responsive web design to make it even better. You can also add some basic CSS to make it look nicer.)

(Notice: We use some javascript to display the average time, since it isn't a picture.)

So our dashboard is set up. Now how do we put everything together?

Automating The Build Process

We want to run our SQL queries, then we want to run the script to make our images. But we want to automate that process so it's done automatically, so that any time we need to make a dashboard (such as when our database has updated data), that this automatically happens. 

The best way to handle this is with a makefile. (We could create a bash script, but since all we need to do is run terminal commands, a makefile is easier.)

​Run the following command in your terminal to create a file named "makefile" and open it for editing.
touch /home/actian/makefile
sudo gedit /home/actian/makefile
Add the following code to the makefile:​
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
all:
        echo "Running SQL commands." 
        sql ontimedb -S < command1.sql >/home/actian/1.txt
        sql ontimedb -S < command2.sql >/home/actian/2.txt
        sql ontimedb -S < command3.sql >/home/actian/3.txt
        sql ontimedb -S < command4.sql >/home/actian/4.txt
        sql ontimedb -S < command5.sql >/home/actian/5.txt
        sql ontimedb -S < command6.sql >/home/actian/6.txt
        echo "Creating the images."
        python makepics.py
Now cd into actian's home directory, and actually run the makefile. 
cd /home/actian
make
Open "dashboard.html" to see the results! (Drag the file into your web browser, or right click-> open it using your web browser.)

Conclusion

We created a data visualization dashboard, with Actian Vector as our database. Since we were working on a local machine, it was easiest to make SQL queries directly from the command line, and store those results in files. We then wrote a python script to make the resulting data pretty, in picture format, which we then visualized in an HTML page. 

In further tutorials, we'll look at Actian Vector using AWS and Azure, where we can use this same visualization method. The only difference will be how we query the database. 
The video where I walk through this tutorial with you is below.
​Like this content and want more? Feel free to look around and find another blog post that interests you. You can also contact me through one of the various social media channels. 

Twitter: @srcmake
Discord: srcmake#3644
Youtube: srcmake
Twitch: www.twitch.tv/srcmake
​Github: srcmake
References


Comments are closed.

    Author

    Hi, I'm srcmake. I play video games and develop software. 

    Pro-tip: Click the "DIRECTORY" button in the menu to find a list of blog posts.
    Metamask tip button
    License: All code and instructions are provided under the MIT License.

    Discord

    Chat with me.


    Youtube

    Watch my videos.


    Twitter

    Get the latest news.


    Twitch

    See the me code live.


    Github

    My latest projects.

Powered by Create your own unique website with customizable templates.