Fisseha Berhane, PhD

Data Scientist

443-970-2353 fisseha@jhu.edu CV Resume Linkedin GitHub twitter twitter

MySQL Excercise 1

This is MySQL excercise done with Jupyter. I simply love Jupyter. I use Python, R, Matlab and MySQL inside Jupyter, a simply cool tool!!

The first thing I should do every time I start working with a database: load the SQL library

In [ ]:
%load_ext sql

The second thing: connect to the database I need to use.

The following command will log me to the MySQL server at mysqlserver as the user 'studentuser' and will select the database named 'dognitiondb' :

In [ ]:
%sql mysql://studentuser:studentpw@mysqlserver/dognitiondb
  • To make this the default database for my queries:
In [ ]:
%sql USE dognitiondb

The third thing: get to know my data

(1) confirm how many tables each database has, and (2) identify the fields contained in each table of the database. To determine how many tables each database:

In [43]:
%sql SHOW tables
6 rows affected.
Out[43]:
Tables_in_dognitiondb
complete_tests
dogs
exam_answers
reviews
site_activities
users

To determine what columns or fields (we will use those terms interchangeably in this course) are in each table, you can use the SHOW command again, but this time (1) you have to clarify that you want to see columns instead of tables, and (2) you have to specify from which table you want to examine the columns.

The syntax, which sounds very similar to what you would actually say in the spoken English language, looks like this:

SHOW columns FROM (enter table name here)

or if you have multiple databases loaded:

SHOW columns FROM (enter table name here) FROM (enter database name here)

or

SHOW columns FROM databasename.tablename

Whenever you have multiple databases loaded, you will need to specify which database a table comes from using one of the syntax options described above.

Question 1: How many columns does the "dogs" table have? Enter the appropriate query below to find out:

In [44]:
%sql SHOW columns FROM dogs
21 rows affected.
Out[44]:
Field Type Null Key Default Extra
gender varchar(255) YES None
birthday varchar(255) YES None
breed varchar(255) YES None
weight int(11) YES None
dog_fixed tinyint(1) YES None
dna_tested tinyint(1) YES None
created_at datetime NO None
updated_at datetime NO None
dimension varchar(255) YES None
exclude tinyint(1) YES None
breed_type varchar(255) YES None
breed_group varchar(255) YES None
dog_guid varchar(60) YES MUL None
user_guid varchar(60) YES MUL None
total_tests_completed varchar(255) YES None
mean_iti_days varchar(255) YES None
mean_iti_minutes varchar(255) YES None
median_iti_days varchar(255) YES None
median_iti_minutes varchar(255) YES None
time_diff_between_first_and_last_game_days varchar(255) YES None
time_diff_between_first_and_last_game_minutes varchar(255) YES None

An alternate way to learn the same information would be to use the DESCRIBE function. The syntax is:

DESCRIBE tablename

Question 2: Try using the DESCRIBE function to learn how many columns are in the "reviews" table:

In [45]:
%sql DESCRIBE reviews
7 rows affected.
Out[45]:
Field Type Null Key Default Extra
rating int(11) YES None
created_at datetime NO None
updated_at datetime NO None
user_guid varchar(60) YES MUL None
dog_guid varchar(60) YES MUL None
subcategory_name varchar(60) YES None
test_name varchar(60) YES None

Using SELECT to look at your raw data

Remember:

  • SQL syntax and keywords are case insensitive. I recommend that you always enter SQL keywords in upper case and table or column names in either lower case or their native format to make it easy to read and troubleshoot your code, but it is not a requirement to do so. Table or column names are often case insensitive as well, but defaults may vary across database platforms so it's always a good idea to check.
  • Table or column names with spaces in them need to be surrounded by quotation marks in SQL. MySQL accepts both double and single quotation marks, but some database systems only accept single quotation marks. In all database systems, if a table or column name contains an SQL keyword, the name must be enclosed in backticks instead of quotation marks.

    'the marks that surrounds this phrase are single quotation marks'
    "the marks that surrounds this phrase are double quotation marks"
    `the marks that surround this phrase are backticks`

  • The semi-colon at the end of a query is only required when you have multiple separate queries saved in the same text file or editor. That said, I recommend that you make it a habit to always include a semi-colon at the end of your queries.

An important note for executing queries in Jupyter: in order to tell Python that you want to execute SQL language on multiple lines, you must include two percent signs in front of the SQL prefix instead of one. Therefore, to execute the above query, you should enter:

%%sql
SELECT breed
FROM dogs;

Using LIMIT to restrict the number of rows in my output (and prevent system crashes)

SELECT breed
FROM dogs LIMIT 5;

The "5" in this case indicates that you will only see the first 5 rows of data you select.

Question : the first 10 rows of the breed column in the dogs table.

In [46]:
%sql SELECT breed FROM dogs LIMIT 10;
10 rows affected.
Out[46]:
breed
Labrador Retriever
Shetland Sheepdog
Golden Retriever
Golden Retriever
Shih Tzu
Siberian Husky
Shih Tzu
Mixed
Labrador Retriever
Shih Tzu-Poodle Mix

You can also select rows of data from different parts of the output table, rather than always just starting at the beginning. To do this, use the OFFSET clause after LIMIT. The number after the OFFSET clause indicates from which row the output will begin querying. Note that the offset of Row 1 of a table is actually 0. Therefore, in the following query:

SELECT breed
FROM dogs LIMIT 10 OFFSET 5;

10 rows of data will be returned, starting at Row 6.

An alternative way to write the OFFSET clause in the query is:

SELECT breed
FROM dogs LIMIT 5, 10;

In this notation, the offset is the number before the comma, and the number of rows returned is the number after the comma.

In [47]:
%%sql
SELECT breed 
FROM dogs limit 10 OFFSET 5
10 rows affected.
Out[47]:
breed
Siberian Husky
Shih Tzu
Mixed
Labrador Retriever
Shih Tzu-Poodle Mix
German Shepherd Dog-Pembroke Welsh Corgi Mix
Vizsla
Pug
Boxer
German Shepherd Dog-Nova Scotia Duck Tolling Retriever Mix
In [48]:
%%sql
SELECT breed 
FROM dogs limit 5,10;
10 rows affected.
Out[48]:
breed
Siberian Husky
Shih Tzu
Mixed
Labrador Retriever
Shih Tzu-Poodle Mix
German Shepherd Dog-Pembroke Welsh Corgi Mix
Vizsla
Pug
Boxer
German Shepherd Dog-Nova Scotia Duck Tolling Retriever Mix

The LIMIT command is one of the pieces of syntax that can vary across database platforms. MySQL uses LIMIT to restrict the output, but other databases including Teradata use a statement called "TOP" instead. Oracle has yet another syntax:

http://www.tutorialspoint.com/sql/sql-top-clause.htm

Make sure to look up the correct syntax for the database type you are using.

Using SELECT to query multiple columns

In [49]:
%%sql
SELECT breed, breed_type, breed_group
FROM dogs LIMIT 5,10;
10 rows affected.
Out[49]:
breed breed_type breed_group
Siberian Husky Pure Breed Working
Shih Tzu Pure Breed Toy
Mixed Mixed Breed/ Other/ I Don't Know None
Labrador Retriever Pure Breed Sporting
Shih Tzu-Poodle Mix Cross Breed None
German Shepherd Dog-Pembroke Welsh Corgi Mix Cross Breed None
Vizsla Pure Breed Sporting
Pug Pure Breed Toy
Boxer Pure Breed Working
German Shepherd Dog-Nova Scotia Duck Tolling Retriever Mix Cross Breed None

Another trick to know about when using SELECT is that you can use an asterisk as a "wild card" to return all the data in a table. (A wild card is defined as a character that will represent or match any character or sequence of characters in a query.) Take note, this is very risky to do if you do not limit your output or if you don't know how many data are in your database, so use the wild card with caution. However, it is a handy tool to use when you don't have all the column names easily available or when you know you want to query an entire table.

In [50]:
%%sql
SELECT *
FROM dogs LIMIT 5,10;
10 rows affected.
Out[50]:
gender birthday breed weight dog_fixed dna_tested created_at updated_at dimension exclude breed_type breed_group dog_guid user_guid total_tests_completed mean_iti_days mean_iti_minutes median_iti_days median_iti_minutes time_diff_between_first_and_last_game_days time_diff_between_first_and_last_game_minutes
male 2011 Siberian Husky 60 1 0 2013-02-05 18:14:14 2013-07-25 19:41:49 stargazer None Pure Breed Working fd27b948-7144-11e5-ba71-058fbc01cf0b ce13615c-7144-11e5-ba71-058fbc01cf0b 20 0.1785873538 257.16578947 0.0035648148035 5.1333333171 3.3931597222 4886.15
male 1982 Shih Tzu 190 1 0 2013-02-05 18:16:24 2014-05-30 15:52:54 maverick 1 Pure Breed Toy fd27ba1a-7144-11e5-ba71-058fbc01cf0b ce135e14-7144-11e5-ba71-058fbc01cf0b 27 6.1905898326 8914.449359 0.00033564807185 0.48333322347 160.95533565 231775.68333
male 2012 Mixed 50 1 0 2013-02-05 18:44:02 2013-07-25 19:41:49 protodog None Mixed Breed/ Other/ I Don't Know None fd27bbbe-7144-11e5-ba71-058fbc01cf0b ce135f2c-7144-11e5-ba71-058fbc01cf0b 20 0.0080750487303 11.628070172 0.0046412037941 6.6833334635 0.15342592588 220.93333326
male 2008 Labrador Retriever 70 1 0 2013-02-05 20:59:42 2013-07-25 19:41:49 einstein None Pure Breed Sporting fd27c1c2-7144-11e5-ba71-058fbc01cf0b ce136a1c-7144-11e5-ba71-058fbc01cf0b 20 0.68410453216 985.11052631 0.0033796295731 4.8666665853 12.997986111 18717.1
male 2008 Shih Tzu-Poodle Mix 0 1 0 2013-02-05 21:30:14 2013-07-25 19:41:49 socialite None Cross Breed None fd27c5be-7144-11e5-ba71-058fbc01cf0b ce136ac6-7144-11e5-ba71-058fbc01cf0b 20 0.22328155458 321.5254386 0.0038888889878 5.6000001424 4.2423495371 6108.9833334
female 2011 German Shepherd Dog-Pembroke Welsh Corgi Mix 40 1 0 2013-02-05 22:29:24 2013-07-25 19:41:49 None None Cross Breed None fd27c74e-7144-11e5-ba71-058fbc01cf0b ce136c24-7144-11e5-ba71-058fbc01cf0b 14 25.69619391 37002.519231 0.0058449073678 8.4166666097 334.05052083 481032.75
male 2011 Vizsla 60 1 0 2013-02-05 23:09:37 2013-07-25 19:41:49 socialite None Pure Breed Sporting fd27c7d0-7144-11e5-ba71-058fbc01cf0b ce136e36-7144-11e5-ba71-058fbc01cf0b 20 0.0053776803082 7.7438596438 0.0044791667034 6.4500000529 0.10217592586 147.13333323
female 2007 Pug 20 1 0 2013-02-05 23:10:40 2013-07-31 21:57:21 stargazer None Pure Breed Toy fd27c852-7144-11e5-ba71-058fbc01cf0b ce136ee0-7144-11e5-ba71-058fbc01cf0b 20 0.051511939572 74.177192984 0.0039930555499 5.7499999919 0.97872685187 1409.3666667
female 2010 Boxer 50 1 0 2013-02-05 23:27:59 2013-07-25 19:41:49 ace None Pure Breed Working fd27c8d4-7144-11e5-ba71-058fbc01cf0b ce136f94-7144-11e5-ba71-058fbc01cf0b 20 0.15856725147 228.33684211 0.0039351851456 5.6666666097 3.0127777779 4338.4000001
male 2010 German Shepherd Dog-Nova Scotia Duck Tolling Retriever Mix 30 1 0 2013-02-06 00:09:33 2014-05-30 15:52:56 None None Cross Breed None fd27c956-7144-11e5-ba71-058fbc01cf0b ce134be0-7144-11e5-ba71-058fbc01cf0b 11 2.8247997685 4067.7116667 0.0037731481142 5.4333332845 28.247997685 40677.116667

SELECT statements can also be used to make new derivations of individual columns using "+" for addition, "-" for subtraction, "*" for multiplication, or "/" for division. For example, if you wanted the median inter-test intervals in hours instead of minutes or days, you could query:

Question: Go ahead and try it, adding in a column to your output that shows you the original median_iti in minutes.

In [51]:
%%sql
SELECT median_iti_minutes/60
FROM dogs LIMIT 5,10;
10 rows affected.
Out[51]:
median_iti_minutes/60
0.085555555285
0.0080555537245
0.11138889105833334
0.081111109755
0.09333333570666666
0.14027777682833334
0.10750000088166667
0.09583333319833334
0.094444443495
0.09055555474166667

Question: How would you retrieve the first 15 rows of data from the dog_guid, subcategory_name, and test_name fields of the Reviews table, in that order?

In [52]:
%%sql
SELECT dog_guid, subcategory_name,test_name
FROM reviews LIMIT 15;
15 rows affected.
Out[52]:
dog_guid subcategory_name test_name
ce3ac77e-7144-11e5-ba71-058fbc01cf0b Empathy Yawn Warm-up
ce2aedcc-7144-11e5-ba71-058fbc01cf0b Empathy Eye Contact Warm-up
ce2aedcc-7144-11e5-ba71-058fbc01cf0b Empathy Eye Contact Game
ce2aedcc-7144-11e5-ba71-058fbc01cf0b Communication Treat Warm-up
ce405c52-7144-11e5-ba71-058fbc01cf0b Empathy Yawn Warm-up
ce405c52-7144-11e5-ba71-058fbc01cf0b Empathy Yawn Game
ce405c52-7144-11e5-ba71-058fbc01cf0b Empathy Eye Contact Game
ce405e28-7144-11e5-ba71-058fbc01cf0b Communication Treat Warm-up
ce405e28-7144-11e5-ba71-058fbc01cf0b Cunning Turn Your Back
ce2609c4-7144-11e5-ba71-058fbc01cf0b Communication Treat Warm-up
ce2609c4-7144-11e5-ba71-058fbc01cf0b Communication Arm Pointing
ce2609c4-7144-11e5-ba71-058fbc01cf0b Communication Foot Pointing
ce260c1c-7144-11e5-ba71-058fbc01cf0b Shaker Game Shaker Warm-Up
ce3fd48a-7144-11e5-ba71-058fbc01cf0b Empathy Yawn Game
ce3fd48a-7144-11e5-ba71-058fbc01cf0b Empathy Eye Contact Warm-up

Question: How would you retrieve 10 rows of data from the activity_type, created_at, and updated_at fields of the site_activities table, starting at row 50? What do you notice about the created_at and updated_at fields?

In [53]:
%%sql
SELECT activity_type, created_at,updated_at
FROM site_activities LIMIT 10 OFFSET 49
10 rows affected.
Out[53]:
activity_type created_at updated_at
point_in_cat 2013-07-25 20:55:08 2013-07-25 20:55:08
point_in_cat 2013-07-25 20:55:07 2013-07-25 20:55:07
point_in_cat 2013-07-25 20:55:50 2013-07-25 20:55:50
point_in_cat 2013-07-25 20:56:09 2013-07-25 20:56:09
point_in_cat 2013-07-25 20:56:21 2013-07-25 20:56:21
point_in_cat 2013-07-25 21:00:31 2013-07-25 21:00:31
point_in_cat 2013-07-25 21:02:29 2013-07-25 21:02:29
point_in_cat 2013-07-25 21:02:31 2013-07-25 21:02:31
point_in_cat 2013-07-25 21:02:45 2013-07-25 21:02:45
point_in_cat 2013-07-25 21:05:41 2013-07-25 21:05:41

Question: How would you retrieve 20 rows of data from all the columns in the users table, starting from row 2000? What do you notice about the free_start_user field?

In [54]:
%%sql
SELECT *
FROM users LIMIT 1999, 20
20 rows affected.
Out[54]:
sign_in_count created_at updated_at max_dogs membership_id subscribed exclude free_start_user last_active_at membership_type user_guid city state zip country utc_correction
1 2013-02-21 23:19:08 2015-01-28 20:51:53 1 1 1 None None None 1 ce245818-7144-11e5-ba71-058fbc01cf0b None None None None None
3 2013-02-21 23:25:03 2015-01-28 20:51:53 1 2 1 None None None 2 ce24589a-7144-11e5-ba71-058fbc01cf0b None None None None None
1 2013-02-22 01:21:20 2015-01-28 20:51:53 1 1 1 None None None 1 ce245a3e-7144-11e5-ba71-058fbc01cf0b None None None None None
1 2013-02-22 03:05:42 2015-01-28 20:51:53 1 2 1 None None None 2 ce245c50-7144-11e5-ba71-058fbc01cf0b None None None None None
5 2013-02-22 03:29:06 2015-01-28 20:51:53 1 1 1 None None 2014-06-23 23:31:28 1 ce245d7c-7144-11e5-ba71-058fbc01cf0b None None None None None
1 2013-02-22 15:05:13 2015-01-28 20:51:53 1 1 1 None None None 1 ce245eda-7144-11e5-ba71-058fbc01cf0b None None None None None
1 2013-02-22 16:30:17 2015-01-28 20:51:53 1 2 1 None None None 2 ce245f98-7144-11e5-ba71-058fbc01cf0b None None None None None
1 2013-02-22 18:16:05 2015-01-28 20:51:53 1 1 0 None None None 1 ce24611e-7144-11e5-ba71-058fbc01cf0b None None None None None
3 2013-02-22 20:26:53 2015-01-28 20:51:53 1 1 1 None None None 1 ce24629a-7144-11e5-ba71-058fbc01cf0b None None None None None
1 2013-02-22 23:14:40 2015-01-28 20:51:54 2 2 1 None None None 2 ce2465ec-7144-11e5-ba71-058fbc01cf0b None None None None None
1 2013-02-23 13:05:38 2015-01-28 20:51:54 1 1 1 None None None 1 ce246772-7144-11e5-ba71-058fbc01cf0b None None None None None
3 2013-02-23 16:56:55 2015-01-28 20:51:54 2 2 1 None None None 2 ce24688a-7144-11e5-ba71-058fbc01cf0b None None None None None
3 2013-02-23 21:07:41 2015-01-28 20:51:54 1 2 1 None None None 2 ce246b28-7144-11e5-ba71-058fbc01cf0b None None None None None
1 2013-02-23 21:22:09 2015-01-28 20:51:54 1 1 1 None None None 1 ce246b82-7144-11e5-ba71-058fbc01cf0b None None None None None
4 2013-02-23 22:48:15 2015-01-28 20:51:54 1 2 1 None None 2014-08-04 14:16:34 2 ce246c40-7144-11e5-ba71-058fbc01cf0b None None None None None
4 2013-02-24 00:14:21 2015-01-28 20:51:54 2 2 0 None None None 2 ce246dbc-7144-11e5-ba71-058fbc01cf0b None None None None None
1 2013-02-24 07:34:32 2015-01-28 20:51:54 1 1 0 None None None 1 ce246e84-7144-11e5-ba71-058fbc01cf0b None None None None None
1 2013-02-24 14:48:43 2015-01-28 20:51:54 1 1 1 None None None 1 ce246ff6-7144-11e5-ba71-058fbc01cf0b None None None None None
1 2013-02-24 15:34:10 2015-01-28 20:51:54 1 1 1 None None None 1 ce247050-7144-11e5-ba71-058fbc01cf0b None None None None None
1 2013-02-24 15:36:19 2015-01-28 20:51:54 1 1 0 None None None 1 ce2470aa-7144-11e5-ba71-058fbc01cf0b None None None None None




comments powered by Disqus