Structured Query Language (known as SQL) is a programming language used to interact with a database. Specifically, SQL is used to interact with a Relational Database Management System (RDBMS). In an RDBMS, related tables are connected with relationships that link matching columns known as keys.
SQL is almost certainly the most widely used tool across BI, data science, and computer science and provides analysts with an exceptionally transferable skillset that also helps them understand the logical nature of data. Using SQL, analysts can read data from the database for analysis. More advanced users also use SQL to write, edit, and delete data that is stored.
Commonly Used SQL Statements
SQL queries are written using a series of English-sounding statements which are designed to make it easy to understand. The following is a list of the most commonly used SQL commands, used in various combinations to define what data should be extracted from a database, and how it should be presented and summarized.
FROM & JOIN: FROM and JOIN are used to define where the data should be sourced in the database. The FROM statement tells the database FROM which table the main data should be extracted. The JOIN statement is used to connect additional data to the main data table.
WHERE: The WHERE clause is used to filter the extracted rows using a logical condition. Only the rows WHERE the logical condition is met are kept.
GROUP BY: GROUP BY is used to consolidate the filtered rows into buckets. For example, for a list of transactions, return the sales grouped by category.
HAVING: The HAVING statement is similar to the WHERE statement, in that is it used to filter the data. The difference is that HAVING is used to filter the grouped rows while WHERE is used to filter the original data.
SELECT: The SELECT statement is used to define which of the columns should be returned as part of the final query.
ORDER BY: Finally, the ORDER BY statement is used to present the finished rows in a specified order, according to the values in one or more columns.
SQL Order of Operations
The above SQL statements are listed in the order in which they are executed by the database. It is the logical order, known as the SQL Order of Execution. However, when analysts write SQL code, they write it in a slightly different order, which typically makes it easier to explain the query in words. The primary difference is that the SELECT statement is written first.
SELECT the following columns
FROM the following tables
WHERE the following conditions are met
GROUP the rows BY these buckets
And only keep the groups HAVING the following attributes
Finally, ORDER the results BY the following columns
Example SQL Queries
Below is an example SQL query used to extract the sum of sales for each customer from a table called FactInternetSales. In addition a couple of conditions must be met; order year must be 2020 or greater, and the total sales of each returned customer must be greater than 10,000. Once the data has been aggregated, the customers and their sales are presented in descending order of total sales.
CustomerKey AS CustomerID,
SUM(SalesAmount) AS SalesAmount
WHERE YEAR(OrderDate) > 2020
GROUP BY CustomerKey
HAVING SUM(SalesAmount) > 10000
ORDER BY SalesAmount DESC
Modifying the Data in a Database
While most analysts will only ever read data from a database, there do exist many other SQL statements that can be used to create or modify the data in a database. These actions include creating new tables, inserting data, modifying data, removing data, and changing the structure of the tables. Generally, analysts will require more skill and experience before they are given the appropriate permissions to modify a database.
CREATE: The CREATE statement is used to create tables in a database. This statement can define the field names and field data types within a table. The create statement is also used to define the unique identities of the table using primary key constraints. It is also used to describe the relationships between tables by defining a foreign key.
ALTER: The ALTER statement is used to change the structure of a table in the database. This statement can be used to create a new column or to change the data type of an existing column.
INSERT: The INSERT statement is used to add records or rows to a table. The name of the table to insert records into is defined first. Then, the column names within the table are defined. Finally, the values of the records are defined to insert into these columns.
UPDATE: The UPDATE statement is used to modify records in a table. This changes the values of a specified subset of records held in the defined columns of the table. It is a good practice to filter rows using a WHERE clause when updating records. Otherwise, all records will be altered by the UPDATE statement.
DELETE:The DELETE statement is used to delete rows from a table based on criteria defined using a WHERE clause. The DELETE statement should be used carefully, as all deletion in a database is permanent. If a mistake is made using a DELETE statement, the database will need to be restored from a backup.
History of SQL
SQL was first introduced in a paper from 1970, “A Relational Model of Data for Large Shared Data Banks,” by Edgar F. Codd. Codd introduced relational algebra, which is used to define relationships between data tables. This is the theoretical foundation of SQL. The first implementation of SQL was developed by two researchers at IBM, Donald Chamberlin and Raymond Boyce.