In this article, we’ll look at one of the statements you’ll use a lot in SQL: the SELECT statement.
What is the SELECT Statement?
The SELECT statement in SQL is used to retrieve rows from a database table. It lets you specify which columns of the table to fetch and the criteria for rows. The data returned is called a result set and is displayed in your development tool or used by the application that ran the query.
It’s a popular statement and it’s the one I use the most. SELECT statements can be simple, where they can select a few columns from a small table. Or, they can be complicated with lots of columns and criteria.
We’ll look at the syntax of the SELECT statement in several different database vendors, explain the features of them, and see some examples.
Get a summary of the SELECT command in my SQL Cheat Sheets.
Syntax
The syntax of an SQL SELECT statement is mostly the same between different database vendors (Oracle, SQL Server, MySQL, PostgreSQL):
SELECT [ ALL | DISTINCT ] { select_list }
[ INTO new_table ]
FROM { table_source }
[ WHERE search_condition ]
[ GROUP BY group_expression ]
[ HAVING search_condition ]
[ ORDER BY order_clause ]
The symbols in this syntax are:
- {} are mandatory, such as the select_list
- | are options, so you can specify either ALL or DISTINCT
- [] are optional parameters, such as WHERE
This syntax shows some of the more common and simpler features of the SELECT statement. There’s a lot more that the SELECT statement can do, such as WITH clauses and subqueries, which I’ve written about elsewhere. For this article, we’ll just stick to the basics.
Also, if you’re interested in the full syntax, you can find the references for several database vendors here:
Let’s take a look at each of the keywords within the SELECT statement.
SELECT
The SELECT clause of the SELECT statement refers to the first line. It includes the word SELECT and anything that comes after the SELECT up until the next keyword, which is either INTO or FROM.
It’s where you specify what columns will be shown in the results.
Columns
The most common thing to put after the SELECT clause is one or more names of columns. This is where you specify which columns you want to show in the results of the SELECT query.
For example, to show the first and last name of all employees, your query would start like this:
SELECT first_name, last_name
...
Note: SQL is not case sensitive, but to stick to a consistent coding standard and to make the code more readable, I like to capitalise SQL keywords and lowercase many other words such as column names.
You can put more than just column names into the SELECT clause. You can also use:
- Expressions
- Static values
- Functions
- Subqueries
Expressions
An expression is a general term that refers to a combination of several pieces of information. It can include mathematical operators, combining more than one column, and other features of SQL.
For example, to combine the first and last name into a single column in your query result, you can write a query like this:
SELECT first_name || ' ' || last_name
...
This concatenates the first name, a space character, and a last name into a single field.
You can also use numbers:
SELECT salary * 0.2
...
This will show the salary column multiplied by 0.2.
Static Values
You can select static values or hardcoded values in your SELECT query if you like.
For example, this query will select the number 2:
SELECT 2
...
Or you can select a text value:
SELECT 'Test'
...
Why would you need to do this? It can be helpful when using UNION queries to label each result set based on which table it came from. You can read more about the UNION and UNION ALL keywords here.
Functions
Similar to expressions, you can also use SQL functions in the SELECT clause.
They can be simple functions such as COUNT:
SELECT COUNT(*)
...
Or they can be functions on individual columns:
SELECT first_name, LENGTH(first_name)
...
Or you can combine functions to get another value:
SELECT LENGTH(first_name) + LENGTH(last_name)
...
Functions in SQL can get quite complex, but they are pretty powerful. If you need the value from a function to be displayed in your results, then there’s a good chance you can do it using SQL’s built-in functions.
Subqueries
Finally, you can use subqueries in your SELECT statement. This is a more advanced feature and you probably won’t need to use it unless you’re writing a more complicated query.
Adding a subquery in the SELECT clause means that the result of the subquery is shown as a column value. One example of this is:
SELECT first_name, last_name,
(SELECT AVG(salary) FROM employee))
...
This shows the first and last name, along with the average salary for all employees. There are other ways to get this value, but this just demonstrates the ability to use a subquery in a SELECT clause.
I’ve written more about subqueries in this post.
INTO
The INTO clause is an optional clause in SQL Server that allows you to populate a table with the results of your query. This can be helpful during the development of a database or an application.
So, instead of doing this:
- Running a SELECT query
- Copying the data to another format (Excel, text file)
- Transforming the data to an SQL script to insert it
- Run the INSERT script
You can do this instead:
- Run a SELECT script directly into the table
I’ll cover more about this feature in another article as it deserves its own post.
FROM
The SQL FROM clause is where you specify where your data is to be retrieved from. This will often be a table in your database, but it can also be a view or a subquery.
The syntax of the FROM clause looks the same regardless of what you’re selecting from:
SELECT first_name, last_name
FROM employee;
This example selects data from one table: the employee table. We can’t actually tell if it’s a table or a view by looking at the name, but it doesn’t matter for the purpose of this query.
You can also use joins to SELECT data from multiple tables. I’ve written an entire post on joins, as they are a pretty big feature of SQL, but a basic query with a join looks like this:
SELECT first_name, last_name, department_name
FROM employee
INNER JOIN department ON employee.dept_id = department.id;
This query will show the employee’s first name and last name, as well as their department name. It uses an INNER JOIN to match records based on each table’s department ID.
WHERE
The WHERE clause of the SQL SELECT statement is where you determine which rows to return from your table or view. It’s an optional clause, so if you don’t specify a WHERE clause in your query, you’ll get all the records:
SELECT first_name, last_name
FROM employee;
To add a WHERE clause, you specify the WHERE keyword, then your criteria:
SELECT first_name, last_name
FROM employee
WHERE <criteria>;
The criteria is a rule that is checked against each row. If the criteria is true for a row, the row is included in the results. If it is false, the row is not included.
For example, this query shows all employees with a first_name of John.
SELECT first_name, last_name
FROM employee
WHERE first_name = 'John';
The WHERE clause can get quite complicated. You can have multiple criteria using the AND or OR keywords, you can use brackets to group criteria, you can have greater than and less than, partial matches, and much more.
I’ve written a more in-depth guide to the WHERE clause here.
GROUP BY
The SQL GROUP BY clause is used when you are using aggregate functions and want to specify what values you want to have those aggregate functions calculated by.
For example, this query will count all employees:
SELECT COUNT(*)
FROM employee;
This query will count employees in each department, using the GROUP BY clause:
SELECT dept_id, COUNT(*)
FROM employee
GROUP BY dept_id;
It’s an optional clause, but if you’re using an aggregate function like COUNT along with other columns, you’ll need to use it.
For more on the GROUP BY function, read this guide.
HAVING
The HAVING clause lets you filter a SELECT query that uses a GROUP BY clause based on the results after a GROUP BY.
It uses the same concept as a WHERE clause, except the WHERE clause filters data before the grouping, and HAVING filters data after the grouping.
For example, this query shows the count of employees in each department where there is more than 1 employee in the department.
SELECT dept_id, COUNT(*)
FROM employee
GROUP BY dept_id
HAVING COUNT(*) > 1;
You can read more about the HAVING clause here.
ORDER BY
The ORDER BY clause lets you specify the order you want the query results to be displayed in.
The database does not guarantee the order that the results are returned in. Even if it looks like there is an order (e.g. ordered by ID), if you run the same query at a later date, the order could change.
So, if you want your data in a specific order, you can use the ORDER BY clause:
SELECT first_name, last_name
FROM employee
ORDER BY last_name ASC;
You can order by one or more columns either in ascending or descending order. You can read all about the ORDER BY clause here.
Order of Processing for SELECT Statements
As you can see, there are many different clauses involved in a SELECT statement.
An important thing to know is that the SELECT statement is not processed in the order it is written. Databases process the clauses in a specific order. This is done for many reasons, such as looking up the table before looking up the columns in that table.
The order of operations or processing of a SELECT statement is:
- FROM
- ON (the JOIN feature)
- JOIN
- WHERE
- GROUP BY
- HAVING
- SELECT
- DISTINCT
- ORDER BY
There are a few points to notice here:
- The WHERE clause is processed before the SELECT clause. This means you can’t use a column alias in the WHERE clause, because the column alias is assigned in the SELECT clause, before the WHERE clause knows about it.
- The ORDER BY can use aliases because it’s the last thing to be processed.
- The SELECT clause is where the columns are determined and it’s calculated almost at the end of the process.
Conclusion
The SELECT statement is a common SQL statement. You’ll use it all the time. It can be as simple as selecting a few columns from a table, or it can include joins, functions, and grouping. You can also use subqueries and other more advanced features with it.