Introduction to MS SQL Server and T-SQL. T-SQL Syntax and Conventions

Useful tips

Programming on T- SQL

T-SQL Syntax and Conventions

Rules for generating identifiers

All objects in SQL Server have names (identifiers). Examples of objects are tables, views, stored procedures, etc. Identifiers can be up to 128 characters, including letters, _@$#, and numbers.

The first character must always be alphabetic. Variables and temporary tables use special naming schemes. The object name cannot contain spaces and cannot be the same as a SQL Server reserved keyword, regardless of case used. By enclosing identifiers in square brackets, prohibited characters can be used in object names.

Completing the instructions

The ANSI SQL standard requires a semicolon to be placed at the end of every statement. At the same time, when programming in T-SQL language The semicolon is optional.

Comments

The T-SQL language allows two styles of comments: ANCI and C. The first of them begins with two hyphens and ends at the end of the line:

This is a one-line ANSI style comment

ANSI style comments can also be inserted at the end of the statement line:

SELECT CityName – extracted columns

FROM City – source table

WHERE IdCity = 1; -- line limit

The SQL editor can apply and remove comments on all selected lines. To do this, select the appropriate commands in the menu Edit or on the toolbar.

C style comments begin with a forward slash and an asterisk (/*) and end with the same characters in reverse order. This comment type is best used for commenting blocks of lines, such as headers or large test queries.

multiline

comments

One of the main advantages of C-style comments is that they can run multi-line queries without even uncommenting them.

T-SQL Packages

A query is a single T-SQL statement, and a batch is a set of them. The entire sequence of instructions in the packet is sent to the server from client applications as one whole unit.

SQL Server treats the entire package as a unit of work. The presence of an error in at least one instruction will make it impossible to execute the entire package. At the same time, parsing does not check the names of objects and schemas, since the schema itself may change during the execution of the instruction.

The SQL script file and Query Analyzer window can contain multiple packages. IN in this case all packages share terminator keywords. By default, this keyword is GO and must be the only one on the line. All other characters (even comments) neutralize the packet delimiter.

Debugging T-SQL

When the SQL editor encounters an error, it displays the nature of the error and the line number in the batch. By double-clicking on an error, you can jump directly to the corresponding line.

To the utility Management Studio version of SQL Server 2005 does not include the T-SQL language debugger - it is present in the Visual Studio package.

SQL Server offers several commands to make debugging packages easier. In particular, the PRINT command sends a message without generating a result set. The PRINT command can be used to track the progress of a package. When the Query Analyzer is in grid mode, run the following batch:

SELECT CityName

FROM City

WHERE IdCity = 1;

PRINT "Check Point";

The resulting dataset will be displayed in a grid and will consist of one row. At the same time, the following result will be displayed in the Messages tab:

(rows processed: 1)

Check Point

Variables

T-SQL variables are created using the DECLARE command, which has the following syntax:

DECLARE @Variable_NameData_Type [,

@Variable_NameData_Type, ...]

All local variable names must begin with the @ symbol. For example, to declare a local variable UStr that stores up to 16 Unicode characters, you can use the following statement:

DECLARE @UStr varchar (16)

Used for variable types data exactly match those existing in the tables. One DECLARE command can list multiple variables separated by commas. Specifically, the following example creates two integer variables a and b:

DECLARE

@aint ,

@b int

The scope of variables (i.e., their lifetime) extends only to the current package. By default, newly created variables contain empty NULL values and must be initialized before being included in expressions.

Setting Variable Values

Currently in SQL language There are two ways to set the value of a variable - for this purpose you can use the SELECT or SET statement. In terms of functionality, these statements operate almost identically, except that the SELECT statement allows you to obtain the original assignment value from the table specified in the SELECT statement.

The SET statement is typically used to set the values of variables in a form more common in procedural languages. Typical examples of the use of this operator include the following:

SET @a = 1;

SET @b = @a * 1.5

Note that all of these statements perform direct assignment operations, either using explicit values or other variables. You cannot use the SET statement to assign a value to a variable that is retrieved by a query; the query must be executed separately and only after that the resulting result can be assigned using the SET statement. For example, an attempt to execute such a statement causes an error:

DECLARE @c int

SET @c = COUNT (*) FROM City

SELECT @c

and the following statement executes quite successfully:

DECLARE @c int

SET @c = (SELECT COUNT (*) FROM City)

SELECT @c

The SELECT statement is typically used to assign values to variables when the source of the information to be stored in the variable is a query. For example, the actions performed in the code above are much more commonly implemented using a SELECT statement:

DECLARE @c int

SELECT @c = COUNT (*) FROM City

SELECT @c

Please note that this code a little clearer (in particular, it is more concise, although it does the same actions).

Thus, it is possible to formulate the following generally accepted convention for the use of both operators.

The SET statement is used when a simple variable assignment operation is to be performed, i.e. if the value being assigned is already given explicitly in the form of a specific value or in the form of some other variable.

The SELECT statement is used when the assignment of a value to a variable must be based on a query.

Using Variables in SQL Queries

One of the useful properties of T-SQL is that variables can be used in queries without the need to create complex dynamic strings that embed variables in the program code. Dynamic SQL continues to exist, but a single value can be changed more easily - using a variable.

Wherever an expression can be used in a query, a variable can also be used. The following example demonstrates the use of a variable in a WHERE clause:

DECLARE @IdProd int ;

SET @IdProd = 1;

SELECT

FROM Product

WHERE IdProd = @IdProd;

Global system variables

SQL Server has over thirty parameterless global variables that are defined and maintained by the system. All global variables are prefixed with two @ characters. You can retrieve the value of any of them with a simple SELECT query, as in the following example:

SELECT @@CONNECTIONS

This uses the @@CONNECTIONS global variable to retrieve the number of connections to SQL Server since the program started.

Among the most commonly used system variables are the following:

@@ERROR - Contains the number of the error that occurred while executing the last T-SQL statement on the current connection. If no error is detected, it contains 0. The value of this system variable is reset after the execution of each successive statement. If you want to save the value contained in it, then this value should be transferred to a local variable immediately after executing the statement for which the error code should be stored.
@@IDENTITY - Contains the last identity value inserted into the database as a result of the last INSERT statement. If the last INSERT statement did not produce an identity value, the @@IDENTITY system variable contains NULL. This statement remains true even if the missing identification value was caused by a crash during execution of the statement. And if multiple insert operations are performed using a single statement, only the last identification value is assigned to this system variable.
@@ROWCOUNT - One of the most widely used system variables. Returns information about the number of rows affected by the last statement. Typically used to monitor errors other than those classified as run-time errors. For example, if a program discovers that after calling a DELETE statement with a WHERE clause, the number of rows affected is zero, then we can conclude that something unexpected has happened. The error message can then be activated manually.

! It should be noted that since SQL Server 2000, global variables have been called functions. The name global confused users, making them think that the scope of such variables was wider than that of local ones. Global variables were often mistakenly attributed to the ability to store information, regardless of whether it was included in the package or not, which, of course, was not true.

Command flow controls. Software constructs

The T-SQL language provides most of the classical procedural means of controlling the progress of program execution, incl. conditionals and loops.

OperatorIF. . . ELSE

IF statements. . .ELSEs operate in T-SQL basically the same way as they do in any other programming language. The general syntax of this operator is as follows:

IF Logical expression

SQL statement I BEGIN SQL statement block END

SQL statement | BEGIN SQL statement block END]

Almost any expression can be specified as a Boolean expression, the result of which results in the return of a Boolean value.

It should be taken into account that only the statement that immediately follows the IF statement (closest to it) is considered to be executed according to the condition. Instead of one statement, you can provide conditional execution of several statements by combining them into a block of code using the BEGIN...END construct.

In the example below, the IF condition is not executed, which prevents the statement that follows it from executing.

IF 1 = 0

PRINT "First line"

PRINT "Second line"

The optional ELSE command allows you to specify a statement that will be executed if the IF condition is not satisfied. Like IF, the ELSE statement controls only the immediately following command or block of code between BEGIN...END.

Although the IF statement appears limited, its condition clause can include powerful features, similar to the WHERE clause. In particular, these are IF EXISTS() statements.

The IF EXISTS() expression uses as a condition the presence of any row returned by the SELECT statement. Since any rows are being searched, the list of columns in the SELECT statement can be replaced with an asterisk. This method is faster than checking the @@ROWCOUNT>0 condition because it does not require counting the total number of rows. As soon as at least one row satisfies the IF EXISTS() condition, the query can continue executing.

The following example uses the IF EXISTS statement to check whether customer ID 1 has any orders before removing it from the database. If according to to this client there is information on at least one order, deletion is not performed.

IF EXISTS (SELECT * FROM WHERE IdCust = 1)

PRINT "It is impossible to delete a client because there are records associated with it in the database"

ELSE

WHERE IdCust = 1

PRINT "Removal completed successfully"

OperatorsWHILE, BREAK andCONTINUE

The WHILE statement in SQL works in much the same way as in other languages with which a programmer usually works. Essentially, this statement checks some condition before each iteration through the loop. If, before the next pass through the loop, checking the condition results in receiving the value TRUE, the loop is passed through, otherwise the execution of the statement is completed.

The WHILE statement has the following syntax:

WHILE Boolean expression

SQL statement I

SQL statement block

Of course, you can use the WHILE statement to ensure that only one statement is executed in a loop (similar to how an IF statement is typically used), but in practice WHILE statements that are not followed by a BEGIN block. . .END matching the full statement format is rare.

The BREAK statement allows you to immediately exit the loop without having to wait until the end of the loop is completed and the conditional expression is retested.

The CONTINUE statement allows you to interrupt a single iteration of a loop. The effect of the CONTINUE operator can be briefly described in such a way that it provides a transition to the beginning WHILE loop. As soon as a CONTINUE statement is encountered in a loop, regardless of where it is located, it returns to the beginning of the loop and re-evaluates the conditional expression (and if the conditional expression is no longer TRUE, the loop exits).

The following short script demonstrates the use of the WHILE statement to create a loop:

DECLARE @Temp int ;

SET @Temp = 0;

WHILE @Temp< 3

BEGIN

PRINT @Temp;

SET @Temp = @Temp + 1;

Here, in the loop, the integer variable @Temp is increased from 0 to 3 and at each iteration its value is displayed on the screen.

OperatorRETURN

The RETURN statement is used to stop the execution of a batch, and therefore the stored procedure and trigger (covered in future labs).

Translation

It's not enough to write code that's easy to read: it also needs to run fast.

There are three basic rules to write T-SQL code that will work well. They are cumulative - following all these rules will have a positive impact on the code. Omitting or changing any of these will likely have a negative impact on the performance of your code.

Write based on the data storage structure: if you store datetime data, use datetime, not varchar or anything else.
Write with indexes in mind: If your table has indexes, and should have them, write your code so that it can take full advantage of those indexes. Make sure that the clustered index, of which there can only be one per table, is used in the most efficient manner.
Write in a way that helps the query optimizer: The query optimizer is a wonderful part of a DBMS. Unfortunately, you can make his work very difficult by writing a query that will be “difficult” for him to parse, for example, containing nested views - when one view receives data from another, or even from a third - and so on. Take your time to understand how the optimizer works and write queries in such a way that it can help you and not harm you.

There are a few common mistakes people make in their T-SQL code - don't make them.

Using the wrong data types

In theory, avoiding this mistake is very simple, but in practice it occurs quite often. For example, you are using some data type in your database. Use it in your parameters and variables! Yes, I know that SQL Server can implicitly cast one data type to another. But when an implicit type conversion occurs, or you yourself cast a column's data type to another type, you are performing a conversion on the entire column. When you perform this transformation on a column in a WHERE clause or in a join condition, you will always see a table scan. An excellent index could be built on this column, but since you are doing a CAST on the values stored in this column to compare, for example, the date stored in this column with the char type you used in the condition, the index will not be used.

Don't believe me? Let's look at this query:

SELECT e.BusinessEntityID, e.NationalIDNumber FROM HumanResources.Employee AS e WHERE e.NationalIDNumber = 112457891;
Well written and very simple. It must be covered by an index created on this table. But here's the execution plan:

This query is fast enough and the table is small that only four reads are required to scan the index. Please note the small Exclamation point on the SELECT statement. If we look at its properties, we will see:

Right. This is a warning (new in SQL Server 2012) that a type conversion is in progress that affects the execution plan. In short, this is because the request uses the wrong data type:

SELECT e.BusinessEntityID, e.NationalIDNumber FROM HumanResources.Employee AS e WHERE e.NationalIDNumber = "112457891";
And we get this query execution plan:

And here only two read operations are used, instead of four. And yes, I understand that I made an already fast-executing query a little faster. But what would happen if there were millions of rows stored in the table? Yeah, then I would become a hero.

Use the correct data types.

Using Functions in Join Conditions and WHERE Clauses

Speaking of functions - most of the functions used in join conditions or WHERE clauses to which you pass a column as an argument interfere correct use indexes. You'll see how slower queries are when using functions that take columns as arguments. For example:

SELECT a.AddressLine1, a.AddressLine2, a.City, a.StateProvinceID FROM Person.Address AS a WHERE "4444" = LEFT(a.AddressLine1, 4) ;
This function, LEFT, takes a column as an argument, which results in this execution plan:

This results in 316 reads to find the desired data, and this takes 9 milliseconds (I have very fast drives). This is because '4444' must be compared against every string returned by this function. SQL Server can't even just scan the table, it needs to do a LEFT on every row. However, you can do something like this:

SELECT a.AddressLine1, a.AddressLine2, a.City, a.StateProvinceID FROM Person.Address AS a WHERE a.AddressLine1 LIKE "4444%" ;
And now we see a completely different execution plan:

The query requires 3 reads and 0 milliseconds to complete. Well, or let it be 1 millisecond, for objectivity. This is a huge performance boost. And all because I used a function that can be used to search by index (previously it was called sargeable - an untranslatable, in general, word: SARG – Search Arguments –able, if the function is SARGeable – you can pass a column to it as an argument and Index Seek will still be used, if not SARGeable – alas, Index Scan will always be used - approx. translator). In any case, do not use functions in WHERE clauses or search conditions, or use only those that can be used in index search conditions.

Using Multi-statement UDF

Multi-statement UDF in the Russian edition of msdn is translated approximately as “User-defined functions consisting of several instructions, but this sounds, in my opinion, somehow strange, so in the title and further in the text I tried to avoid translating this term - approx. . translator

Essentially, they are trapping you. At first glance, this wonderful mechanism allows us to use T-SQL as real language programming. You can create these functions and call them from one another and the code can be reused, not like these old stored procedures. This is amazing. Until you try to run this code on a large amount of data.

The problem with these functions is that they are built on table variables. Table variables are very cool thing, if you use them as intended. They have one obvious difference from temporary tables - statistics are not built on them. This difference can be very useful, or it can... kill you. If you don't have statistics, the optimizer assumes that any query executed on a table variable or UDF will return just one row. One (1) line. This is fine if they actually return multiple rows. But, one day they will return hundreds or thousands of rows and you decide to join one UDF to another... Performance will drop very, very quickly and very, very much.

The example is quite large. Here are some UDFs:

CREATE FUNCTION dbo.SalesInfo () RETURNS @return_variable TABLE (SalesOrderID INT, OrderDate DATETIME, SalesPersonID INT, PurchaseOrderNumber dbo.OrderNumber, AccountNumber dbo.AccountNumber, ShippingCity NVARCHAR(30)) AS BEGIN; INSERT INTO @return_variable (SalesOrderID, OrderDate, SalesPersonID, PurchaseOrderNumber, AccountNumber, ShippingCity) SELECT soh.SalesOrderID, soh.OrderDate, soh.SalesPersonID, soh.PurchaseOrderNumber, soh.AccountNumber, a.City FROM Sales.SalesOrderHeader AS soh JOIN Person. Address AS a ON soh.ShipToAddressID = a.AddressID ; RETURN; END ; GO CREATE FUNCTION dbo.SalesDetails () RETURNS @return_variable TABLE (SalesOrderID INT, SalesOrderDetailID INT, OrderQty SMALLINT, UnitPrice MONEY) AS BEGIN; INSERT INTO @return_variable (SalesOrderID, SalesOrderDetailId, OrderQty, UnitPrice) SELECT sod.SalesOrderID, sod.SalesOrderDetailID, sod.OrderQty, sod.UnitPrice FROM Sales.SalesOrderDetail AS sod ; RETURN; END ; GO CREATE FUNCTION dbo.CombinedSalesInfo () RETURNS @return_variable TABLE (SalesPersonID INT, ShippingCity NVARCHAR(30), OrderDate DATETIME, PurchaseOrderNumber dbo.OrderNumber, AccountNumber dbo.AccountNumber, OrderQty SMALLINT, UnitPrice MONEY) AS BEGIN; INSERT INTO @return_variable (SalesPersonId, ShippingCity, OrderDate, PurchaseOrderNumber, AccountNumber, OrderQty, UnitPrice) SELECT si.SalesPersonID, si.ShippingCity, si.OrderDate, si.PurchaseOrderNumber, si.AccountNumber, sd.OrderQty, sd.UnitPrice FROM dbo. SalesInfo() AS si JOIN dbo.SalesDetails() AS sd ON si.SalesOrderID = sd.SalesOrderID ; RETURN; END ; GO
Great structure. It allows you to create very simple queries. Well, for example, here:

SELECT csi.OrderDate, csi.PurchaseOrderNumber, csi.AccountNumber, csi.OrderQty, csi.UnitPrice FROM dbo.CombinedSalesInfo() AS csi WHERE csi.SalesPersonID = 277 AND csi.ShippingCity = "Odessa" ;
One, very simple request. Here is its execution plan, also very simple:

But it runs in 2.17 seconds, returns 148 rows, and uses 1456 reads. Note that our function has zero cost and only the table scan, table variable, affects the cost of the query. Hmm, is it true? Let's try to see what is hidden behind the zero-cost UDF execution operator. This query will retrieve the function execution plan from the cache:

SELECT deqp.query_plan, dest.text, SUBSTRING(dest.text, (deqs.statement_start_offset / 2) + 1, (deqs.statement_end_offset - deqs.statement_start_offset) / 2 + 1) AS actualstatement FROM sys.dm_exec_query_stats AS deqs CROSS APPLY sys .dm_exec_query_plan(deqs.plan_handle) AS deqp CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest WHERE deqp.objectid = OBJECT_ID("dbo.CombinedSalesInfo");
And here's what's really happening there:

Wow, looks like there are a few more of these little functions and table scans lurking around here that are almost, but not quite, worth nothing. Plus the Hash Match join operator, which writes to tempdb and has a considerable cost to execute. Let's look at the execution plan of another UDF:

Here! And now we see Clustered Index Scan, which scans big number lines. It's not great anymore. In general, in this whole situation, UDF seems less and less attractive. What if we, well, I don't know exactly, just try to access the tables directly. Like this, for example:

SELECT soh.OrderDate, soh.PurchaseOrderNumber, soh.AccountNumber, sod.OrderQty, sod.UnitPrice FROM Sales.SalesOrderHeader AS soh JOIN Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID JOIN Person.Address AS ba ON soh.BillToAddressID = ba.AddressID JOIN Person.Address AS sa ON soh.ShipToAddressID = sa.AddressID WHERE soh.SalesPersonID = 277 AND sa.City = "Odessa" ;
Now, by running this query, we'll get exactly the same data, but in just 310 milliseconds rather than 2170. Plus, SQL Server will only perform 911 reads rather than 1456. Honestly, it's very easy to get performance problems using UDF

Enabling the Work Faster Setting: Using Dirty Reads

Returning to the past, to old computers with 286 processors on board, we can remember that for a number of reasons, they had a “Turbo” button on the front panel. If you accidentally “squeezed” it out, the computer immediately began to slow down insanely. So you understand that some things should always be enabled to ensure maximum throughput. Likewise, many people look at the READ_UNCOMMITTED isolation level and the NO_LOCK hint as a turbo button for SQL Server. When using them, rest assured that almost any query and the entire system as a whole will become faster. This is due to the fact that no locks will be applied or checked during reading. Less blocking - faster result. But…

When you use READ_UNCOMMITTED or NO_LOCK in your queries, you end up with dirty reads. Everyone understands that this means that you can read "dog" rather than "cat" if an update operation is in progress but has not yet completed. But you may also end up with more or fewer rows than you actually have, as well as duplicate rows, because pages of data can move around while your query is running and you don't put any locks in place to avoid that. I don't know about you, but most companies I've worked for expected the majority of queries on most systems to return consistent data. The same query with the same parameters, executed on the same set of data, should give the same result. Not if you use NO_LOCK. To make sure of this, I advise you to read this post.

Unreasonable use of hints in queries

People are too quick to decide to use hints. The most common situation is when a hint helps solve a very rare problem on one of the requests. But when people see a significant performance gain on this query... they immediately start pushing it everywhere.

For example, many people believe that LOOP JOIN is The best way table joins. They come to this conclusion because it is most common in small, fast queries. So they decide to force SQL Server to use LOOP JOIN. It's not difficult at all:

SELECT s. AS StoreName, p.LastName + ", " + p.FirstName FROM Sales.Store AS s JOIN sales.SalesPerson AS sp ON s.SalesPersonID = sp.BusinessEntityID JOIN HumanResources.Employee AS e ON sp.BusinessEntityID = e.BusinessEntityID JOIN Person .Person AS p ON e.BusinessEntityID = p.BusinessEntityID OPTION (LOOP JOIN);
This query runs for 101 milliseconds and makes 4115 reads. In general, not bad, but if we remove this hint, the same query will execute in 90 milliseconds and produce only 2370 reads. The more loaded the system is, the more obvious the effectiveness of the request without using a hint will be.

Here's another example. People often create an index on a table expecting it to solve a problem. So we have a query:

SELECT * FROM Purchasing.PurchaseOrderHeader AS poh WHERE poh.PurchaseOrderID * 2 = 3400;
The problem again is that when you perform a column transformation, no index will be adequately used. Performance degrades because the clustered index is scanned. So, when people see that their index is not being used, this is what they do:

SELECT * FROM Purchasing.PurchaseOrderHeader AS poh WITH (INDEX (PK_PurchaseOrderHeader_PurchaseOrderID)) WHERE poh.PurchaseOrderID * 2 = 3400;
And now they get a scan of their chosen index, not the clustered one, so the index is "in use", right? But the performance of the query changes - now instead of 11 read operations, 44 are performed (the execution time for both is about 0 milliseconds, since I have really fast disks). “To be used” means it is used, but not at all as intended. The solution to this problem is to rewrite the query like this:

SELECT * FROM Purchasing.PurchaseOrderHeader poh WHERE PurchaseOrderID = 3400 / 2;
Now the number of reads has dropped to two because index lookup is used - the index is used correctly.

Hints in queries should always be applied last, after all others. possible options were tested and did not give a positive result.

Using ‘Row by Agonizing Row’ processing

Line-by-line processing is performed using cursors or operations in a WHILE loop, instead of operations on sets. When using them, productivity is very, very low. Cursors are commonly used for two reasons. The first of them are developers who are accustomed to using line-by-line processing in their code, and the second are developers who came from Oracle and believe that cursors are a good thing. Whatever the reason, cursors are a productivity killer.

Here is a typical example of bad cursor usage. We need to update the color of products selected by a certain criterion. It's not made up - it's based on code that I once had to optimize.

BEGIN TRANSACTION DECLARE @Name NVARCHAR(50) , @Color NVARCHAR(15) , @Weight DECIMAL(8, 2) DECLARE BigUpdate CURSOR FOR SELECT p. ,p.Color ,p. FROM Production.Product AS p ; OPEN BigUpdate ; FETCH NEXT FROM BigUpdate INTO @Name, @Color, @Weight ; WHILE @@FETCH_STATUS = 0 BEGIN IF @Weight< 3 BEGIN UPDATE Production.Product SET Color = "Blue" WHERE CURRENT OF BigUpdate END FETCH NEXT FROM BigUpdate INTO @Name, @Color, @Weight ; END CLOSE BigUpdate ; DEALLOCATE BigUpdate ; SELECT * FROM Production.Product AS p WHERE Color = "Blue" ; ROLLBACK TRANSACTION
In each iteration, we perform two read operations, and the number of products that meet our criteria is in the hundreds. On my machine, with no load, the execution time is over a second. This is completely unacceptable, especially since it is very simple to rewrite this query:

BEGIN TRANSACTION UPDATE Production.Product SET Color = "BLUE" WHERE< 3 ; ROLLBACK TRANSACTION
Now only 15 reads are performed and the execution time is only 1 millisecond. Do not laugh. People often write code like this and even worse. Cursors are something that should be avoided and only used where you can't do without them - for example, in maintenance tasks where you need to "run" through different tables or databases.

Inappropriate use of nested views

Views referencing views connecting to views referencing other views connecting to views... A view is just a query. But since they can be treated like tables, people may start to think of them as tables. But in vain. What happens when you connect one view to another, linking to a third view, and so on? You're just creating a damn complicated query plan. The optimizer will try to simplify it. He will try plans that do not use all the tables, but he has limited time to work on choosing a plan and the more complex the plan he gets, the less likely it is that he will end up with a fairly simple execution plan. And performance problems will be almost inevitable.

For example, here is the sequence simple queries, defining representations:

CREATE VIEW dbo.SalesInfoView AS SELECT soh.SalesOrderID, soh.OrderDate, soh.SalesPersonID, soh.PurchaseOrderNumber, soh.AccountNumber, a.City AS ShippingCity FROM Sales.SalesOrderHeader AS soh JOIN Person.Address AS a ON soh.ShipToAddressID = a .AddressID ; CREATE VIEW dbo.SalesDetailsView AS SELECT sod.SalesOrderID, sod.SalesOrderDetailID, sod.OrderQty, sod.UnitPrice FROM Sales.SalesOrderDetail AS sod ; CREATE VIEW dbo.CombinedSalesInfoView AS SELECT si.SalesPersonID, si.ShippingCity, si.OrderDate, si.PurchaseOrderNumber, si.AccountNumber, sd.OrderQty, sd.UnitPrice FROM dbo.SalesInfoView AS si JOIN dbo.SalesDetailsView AS sd ON si.SalesOrderID = sd.SalesOrderID;
And here the author of the text forgot to indicate the request, but he gives it in the comments (translator’s note):
SELECT csi.OrderDate FROM dbo. CominedSalesInfoView csi WHERE csi.SalesPersonID = 277
As a result, our query takes 155 milliseconds to complete and uses 965 reads. Here is his execution plan:

Looks good, especially since we get 7000 rows, so everything seems to be in order. But what if we try to run a query like this:

SELECT soh.OrderDate FROM Sales.SalesOrderHeader AS soh WHERE soh.SalesPersonID = 277 ;
Now the query runs in 3 milliseconds and uses 685 reads - quite a difference. And here is his execution plan:

As you can see, the optimizer cannot throw out all the extraneous tables as part of the query simplification process. Therefore, in the first execution plan there are two extra operations - Index Scan and Hash Match, which collects the data together. You could save SQL Server some extra work by writing this query without using views. And remember - this example is very simple, most queries in real life are much more complex and lead to much big problems productivity.

There is a small dispute in the comments to this article, the essence of which is that Grant (the author of the article) seems to have executed his queries not on the standard AdventureWorks database, but on a similar database, but with a slightly different structure, which is why the execution plan is “suboptimal” “ the query given in last section, is different from what you can see when doing the experiment yourself. Note translator
If somewhere I was too tongue-tied (and I can be) and the text is difficult to understand, or you can offer me a better wording of anything, I will be happy to listen to all comments.

SQL syntax

This section describes the main differences in the syntax of the SQL language used by the Firebird DBMS and MS SQL.

Firebird and MS SQL DBMS can refer to database objects (tables, fields, etc.) by their names directly, if the object names do not contain spaces or other characters that are not allowed in a direct link (for example, non-Latin letters). To use spaces and other characters, the MS SQL DBMS uses square brackets, [ and ] , while the Firebird DBMS uses double quotes, " . Another difference is the ability to use the MS SQL DBMS schema to refer to an object: database_object_owner.object. The Firebird DBMS does not allow this notation.

Attention

MS SQL DBMS uses case-sensitive object names if during installation you chose to use case-sensitive characters; otherwise, object names are case-insensitive. Funny? Not good...

Clue

The MS SQL DBMS is capable of working with identifiers whose names are enclosed in double quotes, but by default this feature is available only when accessed via OLE DB and ODBC, but not when accessed via DB-Library. For this reason, this work practice should be avoided.

DBMS MS SQL 7 and higher supports updatable joins (update, delete, insert). The Firebird DBMS does not recognize this syntax.

The data types vary, of course. Although both DBMSs have a common subset of the most commonly used types. This issue rarely causes problems when migrating a database.

Built-in functions vary. Most of the built-in functions of the MS SQL DBMS can be replaced in the Firebird DBMS by using user-defined functions (UDFs).

There are different formats for specifying string constants for dates. Firebird DBMS accepts strings various formats, regardless of the platform used. The MS SQL DBMS, in turn, uses a combination of server-independent, server-platform and configuration formats client connection. Additionally, MS SQL DBMS access methods typically introduce one or two levels in which a string constant can be converted to a date in one way or another.

In the MS SQL DBMS you can define a larger number of environment variables than in the Firebird DBMS, but the most common ones can also be found in the Firebird DBMS (retrieving the identifier and user name). The only important variable that is missing in the Firebird DBMS is a variable that returns the number of rows of the last operation (from version 1.5 of the Firebird DBMS such a variable was introduced - approx. transl.).

An important difference was that the Firebird 1.0 DBMS did not support the CASE statement of the MS SQL DBMS. Sometimes it was possible to replace its functionality by using a stored procedure. Starting from version 1.5, the Firebird DBMS supports the use of the CASE operator.

A small difference between the DBMS is that the MS SQL DBMS does not use delimiters for statements, which can be a source of difficult-to-detect errors during the transition, especially when using many parentheses. The Firebird DBMS in scripts requires ending each statement with a semicolon (unless another delimiter is defined - approx. transl.), so errors are easier to detect.

Both MS SQL and Firebird support comments between /* and */ delimiters. The MS SQL DBMS also supports the “two hyphens” syntax for a one-line comment. Some Firebird DBMS utilities also support this syntax.

WHILE

The WHILE operator exists in both Firebird and MS SQL, but with some differences. There are no BREAK or CONTINUE operators, but they can be emulated using additional constructs. There is also a slight difference in the syntax used: the Firebird DBMS requires the DO keyword after the loop condition. Compare the following equivalent pieces of code.

/* Firebird syntax. */ WHILE (i< 3) DO BEGIN i = i + 1; j = j * 2; END /* Синтакс MS SQL. */ WHILE (i < 3) BEGIN SET @i = @i + 1 SET @j = @j * 2 END

RETURN

The RETURN operator in the MS SQL DBMS returns the value of an integer variable and stops execution. In the Firebird DBMS, there is an EXIT statement that transfers control to the final END of a stored procedure. However, there is no hidden return variable, so if you need to return a value (which is optional in MS SQL), you must explicitly declare the return variable in the procedure.

WAITFOR

The WAITFOR operator in the MS SQL DBMS suspends execution for some time or until the specified time. Something similar can be done using user-defined functions (UDFs) in the Firebird DBMS. But in both DBMSs, the use of such an operator should be excluded, since interaction with the client is completely suspended (in the Firebird DBMS, this can lead to the suspension of servicing of all connections, and not just the connection that called the analogue of the specified operator - approx. transl.).

Standard Operators

The standard operators available in both DBMSs are SELECT, INSERT, UPDATE and DELETE. Firebird and MS SQL DBMS support them, but MS SQL DBMS has several non-standard extensions to these operators that need to be discussed in case they are used.

In the Firebird DBMS, the SELECT statement does not allow the use of the INTO keyword to create new table"on the fly". Instead, the INTO keyword is used to associate the result of a query with a variable.

/* MS SQL syntax for assigning a field value to a variable. */ SELECT @my_state = state FROM authors WHERE auth_name = "John" /* Firebird syntax. */ SELECT state INTO:state /* --> note the ":" before the variable name */ FROM authors WHERE auth_name = "John"

In DBMS MS SQL 7 and higher, you can specify the TOP specifier in the SELECT statement to limit the returned data set. This feature is currently under development for the Firebird DBMS. (The FIRST and SKIP specifiers were introduced in the Firebird DBMS starting with version 1.5 - approx. transl.)

Both MS SQL and Firebird DBMS support the usual syntax of the INSERT statement and the INSERT..SELECT statement.

Both MS SQL and Firebird DBMS support the usual UPDATE statement syntax. The MS SQL DBMS also supports the syntax of the UPDATE statement, in which a join is performed and one of the join tables is updated. You can think of this as a WHERE clause on steroids. If such a function is really needed, then it can be implemented in the Firebird DBMS using views.

Both MS SQL and Firebird DBMS support the usual DELETE statement syntax. MS SQL DBMS also supports the TRUNCATE TABLE operator, which is more efficient (but also more dangerous) than the DELETE operator. (The MS SQL DBMS also supports the syntax of the DELETE statement in which the join is performed. - approx. transl.)

/* MS SQL syntax to delete all my_table entries. */ TRUNCATE TABLE my_table /* ...or... */ DELETE FROM my_table /* Firebird syntax. */ DELETE FROM my_table

Using Transactions

In the Firebird DBMS, transactions are not used “directly” in DSQL (dynamic SQL). In this case, named transactions are not available at all. (DSQL statements are executed in the context of transactions started and controlled by the client application. - approx. transl.) The syntax of both DBMSs supports the key word WORK for compatibility.

In most cases, problems with transactions should not arise: explicit in-place transaction management in MS SQL DBMS is usually used due to the lack of support for management through exceptions.

Clue

The MS SQL DBMS has a global variable XACT_ABORT, which controls the rollback of a transaction when a run-time error occurs. Otherwise, you need to check the value of the @@ERROR variable after executing each statement.

In general, most of the problems associated with transaction isolation levels in the MS SQL DBMS disappear when switching to the Firebird DBMS. The competition between “readers” and “writers” is minimal due to the use of multigeneration architecture (MGA).

In the MS SQL DBMS, cursors are used mainly to move through the results of queries in order to perform certain actions on these results. Other than the syntax, there is not much difference to perform the same task. Although there are options for moving forward and backward, in practice, mostly unidirectional cursors are used.

/* MS SQL syntax. */ DECLARE my_cursor CURSOR FOR SELECT au_lname FROM authors ORDER BY au_lname DECLARE @au_lname varchar(40) OPEN my_cursor FETCH NEXT FROM my_cursor INTO @au_lname WHILE @@FETCH_STATUS = 0 BEGIN /* Do something interesting with @au_lname. */ FETCH NEXT FROM my_cursor END CLOSE my_cursor DEALLOCATE my_cursor /* Firebird syntax. */ DECLARE VARIABLE au_lname VARCHAR(40); ... FOR SELECT au_lname FROM authors ORDER BY au_lname INTO:au_lname DO BEGIN /* Do something interesting with au_lname. */ END

Note that the MS SQL DBMS can place cursors in variables, and pass these variables as parameters; this is not possible in the Firebird DBMS.

Here is a complete list of string functions taken from BOL:

ASCII	NCHAR	SOUNDEX
CHAR	PATINDEX	SPACE
CHARINDEX	REPLACE	STR
DIFFERENCE	QUOTENAME	STUFF
LEFT	REPLICATE	SUBSTRING
LEN	REVERSE	UNICODE
LOWER	RIGHT	UPPER
LTRIM	RTRIM

Let's start with two mutually inverse functions - ASCII And CHAR.

The ASCII function returns the ASCII code of the leftmost character of the string expression that is the function argument.

Here, for example, is how you can determine how many different letters there are that start the names of ships in the Ships table:

It should be noted that a similar result can be obtained more easily using another function - LEFT, which has the following syntax:

LEFT (<string expression>, <integer expression>)

and cuts the number of characters from the left specified by the second argument from the string that is the first argument. So,

SELECT DISTINCT LEFT(name, 1) FROM Ships ORDER BY 1

Here's how, for example, you can get a table of codes for all alphabetic characters:

SELECT CHAR(ASCII("a")+ num-1) letter, ASCII("a")+ num - 1
FROM (SELECT 5*5*(a-1)+5*(b-1) + c AS num
FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) x
CROSS JOIN
(SELECT 1 b UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) y
CROSS JOIN
(SELECT 1 c UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) z
) x
WHERE ASCII("a")+ num -1 BETWEEN ASCII("a") AND ASCII("z")

For those who are not yet aware of the generation of a number sequence, I refer you to the corresponding article.

As you know, the codes for lowercase and uppercase letters are different. Therefore, to get full set Without rewriting the request, you just need to add a similar one to the above code:

I figure it wouldn't be too difficult to add this letter to the table if needed.

Let us now consider the task of determining where to find the desired substring in a string expression. Two functions can be used for this - CHARINDEX And PATINDEX. They both return the starting position (the position of the first character of the substring) of the substring in the string. The CHARINDEX function has the syntax:

CHARINDEX ( search_expression, string_expression[, start_position])

Here's an optional integer parameter start_position defines the position in a string expression from which the search is performed search_expression. If this parameter is omitted, the search is performed from the beginning string_expression. For example, request

It should be noted that if the searched substring or string expression is NULL, then the result of the function will also be NULL.

The following example determines the positions of the first and second occurrence of the character "a" in the ship name "California"

SELECT CHARINDEX("a",name) first_a,
CHARINDEX("a", name, CHARINDEX("a", name)+1) second_a
FROM Ships WHERE name="California"

Please note that when defining the second character in the function, the starting position is used, which is the position of the character next to the first letter "a" - CHARINDEX("a", name)+1. The correctness of the result - 2 and 10 - is easy to check :-).

The PATINDEX function has the syntax:

PATINDEX("% sample%" , string_expression)

The main difference between this function and CHARINDEX is that the search string can contain wildcard characters - % and _. In this case, the trailing characters "%" are required. For example, using this function in the first example would look like

The result of this query looks like this:

The fact that we end up with an empty result set means that there are no such ships in the database. Let's take a combination of values - the class and name of the ship.

Combining two string values into one is called concatenation, and in SQL Server for this operation the "+" sign is used (in the standard "||"). So,

What if the string expression contains only one letter? The query will bring it up. You can easily verify this by writing

In the first part, we already touched a little on the DML language, using almost the entire set of its commands, with the exception of the MERGE command.

I will talk about DML according to my own sequence, developed from personal experience. Along the way, I will also try to talk about the “slippery” places that are worth focusing on; these “slippery” places are similar in many dialects of the SQL language.

Because the textbook is dedicated to a wide range of readers (not only programmers), then the explanation will sometimes be appropriate, i.e. long and tedious. This is my vision of the material, which was mainly obtained in practice as a result of professional activity.

The main goal of this tutorial, step by step, is to develop a complete understanding of the essence of the SQL language and teach you how to correctly apply its constructs. Professionals in this field may also be interested in leafing through this material, maybe they will be able to learn something new for themselves, or maybe it will simply be useful to read it in order to refresh their memory. I hope everyone will find it interesting.

Because DML in the MS SQL database dialect is very closely related to the syntax of the SELECT construct, so I will start talking about DML with it. In my opinion, the SELECT construct is the most important construct in the DML language, because due to it or its parts, the necessary data is retrieved from the database.

The DML language contains the following constructs:

SELECT – data selection
INSERT – inserting new data
UPDATE – data update
DELETE – deleting data
MERGE – data merging

In this part, we will look at only the basic syntax of the SELECT command, which looks like this:

SELECT column_list or * FROM source WHERE filter ORDER BY sort_expression
The topic of the SELECT statement is very broad, so in this part I will focus only on its basic structures. I believe that without knowing the basics well, you cannot begin to study more complex structures, because... then everything will revolve around this basic design (subqueries, joins, etc.).

Also, as part of this part, I will also talk about the TOP offer. I intentionally did not indicate this sentence in the basic syntax, because... it is implemented differently in different SQL dialects.

If the DDL language is more static, i.e. with its help, rigid structures are created (tables, relationships, etc.), then the DML language is dynamic in nature, here you can get the right results in different ways.

Training will also continue in Step by Step mode, i.e. When reading, you should immediately try to complete the example with your own hands. Then you analyze the result obtained and try to understand it intuitively. If something remains unclear, for example, the meaning of a function, then turn to the Internet for help.

The examples will be shown in the Test database, which was created using DDL+DML in the first part.

For those who did not create a database in the first part (since not everyone may be interested in the DDL language), they can use the following script:

Test database creation script

Creating a database CREATE DATABASE Test GO -- make the Test database current USE Test GO -- create reference tables CREATE TABLE Positions(ID int IDENTITY(1,1) NOT NULL CONSTRAINT PK_Positions PRIMARY KEY, Name nvarchar(30) NOT NULL) CREATE TABLE Departments (ID int IDENTITY(1,1) NOT NULL CONSTRAINT PK_Departments PRIMARY KEY, Name nvarchar(30) NOT NULL) GO -- fill the reference tables with data SET IDENTITY_INSERT Positions ON INSERT Positions(ID,Name)VALUES (1,N"Accountant" ), (2,N"Director"), (3,N"Programmer"), (4,N"Senior Programmer") SET IDENTITY_INSERT Positions OFF GO SET IDENTITY_INSERT Departments ON INSERT Departments(ID,Name)VALUES (1,N "Administration"), (2,N"Accounting"), (3,N"IT") SET IDENTITY_INSERT Departments OFF GO -- create a table with employees CREATE TABLE Employees(ID int NOT NULL, Name nvarchar(30), Birthday date , Email nvarchar(30), PositionID int, DepartmentID int, HireDate date NOT NULL CONSTRAINT DF_Employees_HireDate DEFAULT SYSDATETIME(), ManagerID int, CONSTRAINT PK_Employees PRIMARY KEY (ID), CONSTRAINT FK_Employees_DepartmentID FOREIGN KEY(DepartmentID) REFERENCES Departments(ID), CONSTRA INT FK_Employees_PositionID FOREIGN KEY(PositionID) REFERENCES Positions(ID), CONSTRAINT FK_Employees_ManagerID FOREIGN KEY (ManagerID) REFERENCES Employees(ID), CONSTRAINT UQ_Employees_Email UNIQUE(Email), CONSTRAINT CK_Employees_ID CHECK(ID BETWEEN 1000 AND 1999), INDEX IDX_Employ ees_Name(Name)) GO - - fill it with data INSERT Employees (ID,Name,Birthday,Email,PositionID,DepartmentID,ManagerID)VALUES (1000,N"Ivanov I.I.","19550219"," [email protected]",2,1,NULL), (1001,N"Petrov P.P.","19831203"," [email protected]",3,3,1003), (1002,N"Sidorov S.S.","19760607"," [email protected]",1,2,1000), (1003,N"Andreev A.A.","19820417"," [email protected]",4,3,1000)

That's it, now we are ready to start learning the DML language.

SELECT – data selection operator

First of all, for the active query editor, let’s make the current database Test by selecting it in the drop-down list or using the “USE Test” command.

Let's start with the most basic form of SELECT:

SELECT * FROM Employees
In this query, we are asking to return all columns (indicated by a "*") from the Employees table - you can read this as "SELECT all_fields FROM the employees_table." If there is a clustered index, the returned data will most likely be sorted by it, in this case by the ID column (but this is not important, since in most cases we will explicitly specify the sorting ourselves using ORDER BY ...) :

ID	Name	Birthday	Email	PositionID	DepartmentID	HireDate	ManagerID
1000	Ivanov I.I.	1955-02-19	[email protected]	2	1	2015-04-08	NULL
1001	Petrov P.P.	1983-12-03	[email protected]	3	3	2015-04-08	1003
1002	Sidorov S.S.	1976-06-07	[email protected]	1	2	2015-04-08	1000
1003	Andreev A.A.	1982-04-17	[email protected]	4	3	2015-04-08	1000

In general, it is worth saying that in the MS SQL dialect the most simple form The SELECT query may not contain a FROM block, in which case you can use it to get some values:

SELECT 5550/100*15, SYSDATETIME(), -- getting the database system date SIN(0)+COS(0)

(No column name)	(No column name)	(No column name)
825	2015-04-11 12:12:36.0406743	1

Please note that the expression (5550/100*15) gave the result 825, although if we calculate on a calculator the value will be (832.5). The result 825 was obtained for the reason that in our expression all the numbers are integers, therefore the result is an integer, i.e. (5550/100) gives us 55, not (55.5).

Remember that the following logic works in MS SQL:

Integer / Integer = Integer (i.e. in this case integer division occurs)
Real / Integer = Real
Integer / Real = Real

Those. the result is converted to a larger type, so in the last 2 cases we get a real number (think like in mathematics - the range of real numbers is larger than the range of integers, so the result is converted to it):

SELECT 123/10, -- 12 123./10, -- 12.3 123/10. -- 12.3
Here (123.) = (123.0), it’s just that in this case 0 can be discarded and only the point left.

The same logic applies to other arithmetic operations, but in the case of division this nuance is more relevant.

Therefore, pay attention to the data type of numeric columns. If it is an integer, and you need to get a real result, then use a transformation, or simply put a dot after the number indicated as a constant (123.).

You can use the CAST or CONVERT function to convert fields. For example, let's use the ID field, it is of type int:

SELECT ID, ID/100, -- integer division will occur here CAST(ID AS float)/100, -- use the CAST function to convert to the float type CONVERT(float,ID)/100, -- use the CONVERT function to convert to the type float ID/100. -- use transformation by specifying that the denominator is a real number FROM Employees

ID	(No column name)	(No column name)	(No column name)	(No column name)
1000	10	10	10	10.000000
1001	10	10.01	10.01	10.010000
1002	10	10.02	10.02	10.020000
1003	10	10.03	10.03	10.030000

On a note. In the ORACLE database, syntax without a FROM block is unacceptable; there, for this purpose, the DUAL system table is used, which contains one line:
SELECT 5550/100*15, -- and in ORACLE the result will be equal to 832.5 sysdate, sin(0)+cos(0) FROM DUAL

Note. The table name in many RDBs may be preceded by the schema name:
SELECT * FROM dbo.Employees -- dbo – schema name
A schema is a logical unit of a database, which has its own name and allows you to group database objects such as tables, views, etc. within itself.
The definition of a schema in different databases may differ; in some cases, the schema is directly related to the database user, i.e. in this case, we can say that the schema and the user are synonyms and all objects created in the schema are essentially objects of this user. In MS SQL, a schema is an independent logical unit that can be created on its own (see CREATE SCHEMA).
By default, one schema is created in the MS SQL database with the name dbo (Database Owner) and all created objects are created by default in this schema. Accordingly, if we simply specify the name of a table in a query, then it will be searched in the dbo schema of the current database. If we want to create an object in a specific schema, we will also have to prepend the object name with the schema name, for example, “CREATE TABLE schema_name.table_name(...)”.
In the case of MS SQL, the schema name may also be preceded by the name of the database in which the schema is located:
SELECT * FROM Test.dbo.Employees -- database_name.schema_name.table
This clarification can be useful, for example, if:
in one request we access objects located in different schemas or databases

you need to transfer data from one schema or database to another

being in one database, you need to request data from another database

and so on.

The scheme is very convenient tool, which is useful to use when developing database architecture, especially large databases.

Also, do not forget that in the request text we can use both single-line “-- ...” and multi-line “/* ... */” comments. If the request is large and complex, then comments can greatly help you or someone else, after some time, to remember or understand its structure.

If there are a lot of columns in the table, and especially if there are still a lot of rows in the table, plus if we are making queries to the database over the network, then it would be preferable to select with a direct listing of the fields you need, separated by commas:

SELECT ID,Name FROM Employees

Those. here we say that we only need to return the ID and Name fields from the table. The result will be as follows (by the way, the optimizer here decided to use the index created by the Name field):

ID	Name
1003	Andreev A.A.
1000	Ivanov I.I.
1001	Petrov P.P.
1002	Sidorov S.S.

On a note. Sometimes it is useful to look at how data is retrieved, for example, to find out which indexes are used. This can be done by clicking the “Display Estimated Execution Plan” button or setting “Include Actual Execution Plan” (in this case, we will be able to see the actual plan, respectively, only after executing the request) :
Execution plan analysis is very useful when optimizing a query; it allows you to find out which indexes are missing or which indexes are not used at all and can be removed.
If you have just started learning DML, then now this is not so important for you, just take note and you can safely forget about it (maybe you will never need it) - our initial goal is to learn the basics of the DML language and learn how to use them correctly, and optimization is already a separate art. Sometimes it is more important that you simply have a correctly written query that returns the correct result from a subject point of view, and that individual people are already optimizing it. First, you need to learn how to simply write queries correctly, using any means to achieve the goal. The main goal you now need to achieve is for your query to return the correct results.

Setting table aliases

When listing columns, they can be preceded by the name of the table located in the FROM block:

SELECT Employees.ID,Employees.Name FROM Employees

But this syntax is usually inconvenient to use, because the table name can be long. For these purposes, shorter names are usually specified and used - aliases:

SELECT emp.ID,emp.Name FROM Employees AS emp
or

SELECT emp.ID,emp.Name FROM Employees emp -- the AS keyword can be omitted (I prefer this option)

Here emp is an alias for the Employees table that can be used in the context of this SELECT statement. Those. we can say that in the context of this SELECT statement we are giving the table a new name.

Of course, in this case, the query results will be exactly the same as for “SELECT ID,Name FROM Employees”. Why this is needed will be clear later (not even in this part), for now we just remember that the column name can be preceded (clarified) either directly by the table name, or using an alias. Here you can use one of two things, i.e. If you set an alias, then you will need to use it, but you can no longer use the table name.

On a note. In ORACLE, only the option of specifying a table alias without the AS keyword is allowed.

DISTINCT – discarding duplicate rows

The DISTINCT keyword is used to discard duplicate rows from the query result. Roughly speaking, imagine first executing a query without the DISTINCT option, and then discarding all duplicates from the result. Let us demonstrate this for greater clarity using an example:

Let's create a temporary table for demonstration CREATE TABLE #Trash(ID int NOT NULL PRIMARY KEY, Col1 varchar(10), Col2 varchar(10), Col3 varchar(10)) - fill this table all sorts of garbage INSERT #Trash(ID,Col1,Col2,Col3)VALUES (1,"A","A","A"), (2,"A","B","C"), (3, "C","A","B"), (4,"A","A","B"), (5,"B","B","B"), (6,"A ","A","B"), (7,"A","A","A"), (8,"C","A","B"), (9,"C", "A","B"), (10,"A","A","B"), (11,"A",NULL,"B"), (12,"A",NULL,"B ") -- let's see what the query returns without the DISTINCT SELECT option Col1,Col2,Col3 FROM #Trash -- let's see what the query returns with the DISTINCT SELECT option DISTINCT Col1,Col2,Col3 FROM #Trash -- delete the temporary table DROP TABLE #Trash

Visually, it will look like this (all duplicates are marked with the same color):

Now let's look at where this can be applied, using a more practical example - we will return only unique department identifiers from the Employees table (i.e., we will find out the IDs of the departments in which employees are registered):

SELECT DISTINCT DepartmentID FROM Employees

Here we got 4 lines, because... There are no repeating combinations (DepartmentID, PositionID) in our table.

Let's return to DDL for a moment.

Since we are starting to run out of data for demo examples, and we want to talk more extensively and clearly, let’s expand our Employess table a little. In addition, let’s remember a little DDL, as they say, “repetition is the mother of learning,” and plus, let’s jump ahead a little again and apply the UPDATE statement:

We create new columns ALTER TABLE Employees ADD LastName nvarchar(30), -- last name FirstName nvarchar(30), -- first name MiddleName nvarchar(30), -- middle name Salary float, -- and of course salary in some units BonusPercent float -- percentage for calculating the bonus from the GO salary -- fill them with data (some data is intentionally omitted) UPDATE Employees SET LastName=N"Ivanov", FirstName=N"Ivan", MiddleName=N"Ivanovich", Salary=5000,BonusPercent= 50 WHERE ID=1000 -- Ivanov I.I. UPDATE Employees SET LastName=N"Petrov",FirstName=N"Petr",MiddleName=N"Petrovich", Salary=1500,BonusPercent= 15 WHERE ID=1001 -- Petrov P.P. UPDATE Employees SET LastName=N"Sidor",FirstName=N"Sidor",MiddleName=NULL, Salary=2500,BonusPercent=NULL WHERE ID=1002 -- Sidorov S.S. UPDATE Employees SET LastName=N"Andreev",FirstName=N"Andrey",MiddleName=NULL, Salary=2000,BonusPercent= 30 WHERE ID=1003 -- Andreev A.A.

Let's make sure that the data was updated successfully:

SELECT * FROM Employees

ID	Name	LastName	FirstName	MiddleName	Salary	BonusPercent
1000	Ivanov I.I.	Ivanov	Ivan	Ivanovich	5000	50
1001	Petrov P.P.	Petrov	Peter	Petrovich	1500	15
1002	Sidorov S.S.	Sidorov	Sidor	NULL	2500	NULL
1003	Andreev A.A.	Andreev	Andrey	NULL	2000	30

Setting aliases for query columns

I think it will be easier to show here than to write:

SELECT -- give the name to the calculated column LastName+" "+FirstName+" "+MiddleName AS Full name, -- use double quotes, because space is used HireDate AS "Reception date", -- use square brackets, because space is used Birthday AS [Date of Birth], -- the word AS is not necessary Salary ZP FROM Employees

Full name	date of receipt	Date of Birth	ZP
Ivanov Ivan Ivanovich	2015-04-08	1955-02-19	5000
Petrov Petr Petrovich	2015-04-08	1983-12-03	1500
NULL	2015-04-08	1976-06-07	2500
NULL	2015-04-08	1982-04-17	2000

As we can see, the column aliases we specified are reflected in the header of the resulting table. Actually, this is the main purpose of column aliases.

Please note, because the last 2 employees did not have a middle name specified (NULL value), then the result of the expression “LastName+" "+FirstName+" "+MiddleName" also returned NULL to us.

To join (add, concatenate) strings in MS SQL, the “+” symbol is used.

Remember that all expressions that involve NULL (for example, division by NULL, addition with NULL) will return NULL.

On a note.
In the case of ORACLE, the “||” operator is used to concatenate strings and the concatenation would look like "LastName||" "||FirstName||" "||MiddleName". For ORACLE, it is worth noting that it has an exception for string types, for them NULL and the empty string "" are the same thing, so in ORACLE such an expression will return for the last 2 employees “Sidor Sidor” and “Andrey Andreev”. At the time of ORACLE 12c, as far as I know, there is no option that changes this behavior (if I’m wrong, please correct me). Here it’s difficult for me to judge whether this is good or bad, because... In some cases, the behavior of a NULL string is more convenient, as in MS SQL, and in others, as in ORACLE.
In ORACLE, all of the column aliases listed above are also valid, except [...].

In order not to fence the construction using the ISNULL function, in MS SQL we can use the CONCAT function. Let's consider and compare 3 options:

SELECT LastName+" "+FirstName+" "+MiddleName FullName1, -- 2 options for replacing NULL with empty strings "" (we get the same behavior as in ORACLE) ISNULL(LastName,"")+" "+ISNULL(FirstName,"")+ " "+ISNULL(MiddleName,"") FullName2, CONCAT(LastName," ",FirstName," ",MiddleName) FullName3 FROM Employees

FullName1	FullName2	FullName3
Ivanov Ivan Ivanovich	Ivanov Ivan Ivanovich	Ivanov Ivan Ivanovich
Petrov Petr Petrovich	Petrov Petr Petrovich	Petrov Petr Petrovich
NULL	Sidorov Sidor	Sidorov Sidor
NULL	Andreev Andrey	Andreev Andrey

In MS SQL, aliases can also be specified using the equal sign:

SELECT "Reception date"=HireDate, -- in addition to "..." and […] you can use "..." [Date of Birth]=Birthday, ZP=Salary FROM Employees

Using the AS keyword or the equal sign to specify an alias is probably more a matter of taste. But when analyzing other people's requests, this knowledge can be useful.

Finally, I will say that it is better to set names for aliases using only Latin characters and numbers, avoiding the use of “…”, “…” and […], that is, use the same rules that we used when naming tables. Further, in the examples I will use only such names and no “…”, “…” and […].

Basic SQL Arithmetic Operators

The execution priority of arithmetic operators is the same as in mathematics. If necessary, the order of application of the operators can be changed using parentheses - (a+b)*(x/(y-z)).

And I repeat once again that any operation with NULL produces NULL, for example: 10+NULL, NULL*15/3, 100/NULL - all this will result in NULL. Those. simply put, an undefined value cannot produce a definite result. Take this into account when composing your query and, if necessary, handle NULL values using the ISNULL and COALESCE functions:

SELECT ID,Name, Salary/100*BonusPercent AS Result1, -- without processing NULL values Salary/100*ISNULL(BonusPercent,0) AS Result2, -- use the ISNULL function Salary/100*COALESCE(BonusPercent,0) AS Result3 - - use the function COALESCE FROM Employees

I'll tell you a little about the COALESCE function:

COALESCE (expr1, expr2, ..., exprn) - Returns the first non-NULL value from a list of values.

SELECT COALESCE(f1, f1*f2, f2*f3) val -- in this case the third value will be returned FROM (SELECT null f1, 2 f2, 3 f3) q

I'll mostly focus on talking about DML constructs, and for the most part I won't talk about the functions that will appear in the examples. If you don’t understand what a particular function does, look for its description on the Internet, you can even search for information by group of functions at once, for example, by asking “MS” in Google search SQL strings functions", "MS SQL mathematical functions" or "MS SQL functions NULL processing." There is a lot of information on functions, and you can easily find it. For example, in the MSDN library, you can find out more about the COALESCE function:

Clipping from MSDN Comparison of COALESCE and CASE
The COALESCE expression is a syntactic shortcut for the CASE expression. This means that COALESCE(expression1,...n) is rewritten by the query optimizer as the following CASE expression:
CASE WHEN (expression1 IS NOT NULL) THEN expression1 WHEN (expression2 IS NOT NULL) THEN expression2 ... ELSE expressionN END

For example, let's look at how you can use the remainder of division (%). This operator very useful when you need to split records into groups. For example, let’s pull out all employees who have even personnel numbers (ID), i.e. those IDs that are divisible by 2:

SELECT ID,Name FROM Employees WHERE ID%2=0 -- remainder when divided by 2 is 0

ORDER BY – sorting the query result

The ORDER BY clause is used to sort the result of a query.

SELECT LastName, FirstName, Salary FROM Employees ORDER BY LastName,FirstName -- order the result by 2 columns - by Last Name, and then by First Name

For a note. There is an ASC keyword for sorting in ascending order, but since ascending sort is the default, you can forget about this option (I don’t remember a time when I used this option).

It is worth noting that the ORDER BY clause can also use fields that are not listed in the SELECT clause (except for the case when DISTINCT is used, which I will discuss below). As an example, I’ll run a little ahead using the TOP option and show how, for example, you can select 3 employees who have the highest salary, taking into account that for confidentiality purposes I should not show the salary itself:

SELECT TOP 3 -- return only the first 3 records from the entire result ID,LastName,FirstName FROM Employees ORDER BY Salary DESC -- sort the result in descending order of Salary

ID	LastName	FirstName
1000	Ivanov	Ivan
1002	Sidorov	Sidor

Of course, there is a case here that several employees may have the same salary and it is difficult to say which three employees will be returned this request, this already needs to be resolved with the task director. Let’s say, after discussing this task with the manager, you agreed and decided to use the following option - to do additional sorting by the date of birth field (i.e. we value young people), and if the date of birth of several employees may coincide (after all, this is also not excluded), then you can do a third sorting in descending order of ID values (last in the sample will be those with the highest ID - for example, those who were accepted last, let’s say personnel numbers are issued sequentially):

SELECT TOP 3 -- return only the first 3 records from the entire result ID,LastName,FirstName FROM Employees ORDER BY Salary DESC, -- 1. sort the result in descending order by Salary Birthday, -- 2. then by Date of Birth ID DESC -- 3 .and for complete unambiguity of the result, add sorting by ID

Those. you should try to make the result of the request predictable, so that in the event of a debriefing you can explain why these particular people were included in the “black list”, i.e. everything was chosen honestly, according to the established rules.

You can also sort using different expressions in the ORDER BY clause:

SELECT LastName,FirstName FROM Employees ORDER BY CONCAT(LastName," ",FirstName) -- use the expression

You can also use aliases specified for columns in ORDER BY:

SELECT CONCAT(LastName," ",FirstName) fi FROM Employees ORDER BY fi -- use an alias

It is worth noting that when using the DISTINCT clause, only the columns listed in the SELECT block can be used in the ORDER BY clause. Those. after applying the DISTINCT operation, we get a new data set, with a new set of columns. For this reason, next example won't work:

SELECT DISTINCT LastName,FirstName,Salary FROM Employees ORDER BY ID -- ID is not in the resulting set we got with DISTINCT

Those. the ORDER BY clause is applied to the resulting set before the result is returned to the user.

Note 1. You can also use the numbers of the columns listed in the SELECT in the ORDER BY clause:
SELECT LastName,FirstName,Salary FROM Employees ORDER BY -- sort in order 3 DESC, -- 1. descending Salary 1, -- 2. by Last Name 2 -- 3. by First Name
For beginners it looks convenient and tempting, but it is better to forget and never use this sorting option.
If in this case (when the fields are explicitly listed), this option is still acceptable, then in the case of using “*” it is better to never use this option. Why - because if someone, for example, changes the order of the columns in the table, or deletes columns (and this is a normal situation), your query may still work, but incorrectly, because sorting can already be done by other columns, and this is insidious because this error may not be revealed very soon.
If the columns were explicitly listed, then in the above situation, the query would either continue to work, but also correctly (since everything is explicitly defined), or it would simply throw an error that this column does not exist.
So you can safely forget about sorting by column numbers.

Note 2.
In MS SQL, when sorting in ascending order, NULL values will be displayed first.
SELECT BonusPercent FROM Employees ORDER BY BonusPercent
Accordingly, when using DESC they will be at the end
SELECT BonusPercent FROM Employees ORDER BY BonusPercent DESC
If you need to change the logic for sorting NULL values, then use expressions, for example:
SELECT BonusPercent FROM Employees ORDER BY ISNULL(BonusPercent,100)
ORACLE provides two options for this purpose: NULLS FIRST and NULLS LAST (used by default). For example:
SELECT BonusPercent FROM Employees ORDER BY BonusPercent DESC NULLS LAST
Pay attention to this when switching to a particular database.

TOP – return the specified number of records

Extract from MSDN. TOP—Limits the number of rows returned in the query result set to a specified number or percentage. When a TOP clause is used in conjunction with an ORDER BY clause, the result set is limited to the first N rows of the sorted result. Otherwise, the first N rows are returned in unspecified order.

Typically this expression is used with an ORDER BY clause, and we have already looked at examples where it was necessary to return the first N rows from the result set.

Without ORDER BY usually this proposal used when we just need to look at a table unknown to us, which may have a lot of records, in this case we can, for example, ask to return only the first 10 rows, but for clarity we will say only 2:

SELECT TOP 2 * FROM Employees

You can also specify the word PERCENT to return the corresponding percentage of rows from the result set:

SELECT TOP 25 PERCENT * FROM Employees

In my practice, sampling by the number of rows is most often used.

You can also use the WITH TIES option with TOP, which will help return all rows in case of ambiguous sorting, i.e. this sentence will return all rows that are equal in composition to the rows that fall into the TOP N selection; as a result, more than N rows can be selected. Let's add another “Programmer” with a salary of 1500 for demonstration:

INSERT Employees(ID,Name,Email,PositionID,DepartmentID,ManagerID,Salary) VALUES(1004,N"Nikolaev N.N."," [email protected]",3,3,1003,1500)

And let’s add another employee without indicating the position and department with a salary of 2000:

INSERT Employees(ID,Name,Email,PositionID,DepartmentID,ManagerID,Salary) VALUES(1005,N"Alexandrov A.A."," [email protected]",NULL,NULL,1000,2000)

Now let's select, using the WITH TIES option, all employees whose salary coincides with the salaries of 3 employees, with the smallest salary (I hope it will be clear further what I'm getting at):

SELECT TOP 3 WITH TIES ID,Name,Salary FROM Employees ORDER BY Salary

Here, although TOP 3 is indicated, the request returned 4 records, because the Salary value that returned TOP 3 (1500 and 2000) was found in 4 employees. Visually it works something like this:

On a note.
TOP is implemented in different ways in different databases; in MySQL there is a LIMIT clause for this, in which you can additionally set the starting offset.
In ORACLE 12c, they also introduced their own analogue, combining the functionality of TOP and LIMIT - search for the words “ORACLE OFFSET FETCH”. Prior to version 12c, the pseudo-column ROWNUM was typically used for this purpose.

What happens if you apply the DISTINCT and TOP clauses at the same time? Such questions can be easily answered by conducting experiments. In general, don’t be afraid and don’t be lazy to experiment, because... Most of it is learned through practice. The word order in the SELECT statement is as follows: DISTINCT comes first, followed by TOP, i.e. If you think logically and read from left to right, then the first to discard duplicates will be applied, and then TOP will be made based on this set. Well, let’s check and make sure that this is the case:

SELECT DISTINCT TOP 2 Salary FROM Employees ORDER BY Salary

Salary
1500
2000

Those. as a result, we received the 2 smallest salaries of all. Of course, there may be a case that the salary for some employees may not be specified (NULL), because The scheme allows us to do this. Therefore, depending on the task, we decide to either process NULL values in the ORDER BY clause, or simply discard all records for which Salary is NULL, and for this we proceed to study the WHERE clause.

WHERE – row selection condition

This sentence is used to filter records by a given condition. For example, let’s select all employees working in the “IT” department (its ID=3):

SELECT ID,LastName,FirstName,Salary FROM Employees WHERE DepartmentID=3 -- IT ORDER BY LastName,FirstName

ID	LastName	FirstName	Salary
1004	NULL	NULL	1500
1003	Andreev	Andrey	2000
1001	Petrov	Peter	1500

The WHERE clause is written before the ORDER BY command.

The order of applying the commands to the initial Employees set is as follows:

WHERE – if specified, then the first step from the entire set of Employees is to select only records that satisfy the condition
DISTINCT – if specified, all duplicates are discarded
ORDER BY – if specified, the result is sorted
TOP – if specified, only the specified number of records is returned from the sorted result

Let's look at an example for clarity:

SELECT DISTINCT TOP 1 Salary FROM Employees WHERE DepartmentID=3 ORDER BY Salary

Visually it will look like this:

It is worth noting that checking for NULL is not done with an equal sign, but using the IS NULL and IS NOT NULL operators. Just remember that you cannot compare on NULL using the “=” (equal sign) operator, because the result of the expression will also be equal to NULL.

For example, let's select all employees who do not have a department specified (i.e. DepartmentID IS NULL):

SELECT ID,Name FROM Employees WHERE DepartmentID IS NULL

Now, as an example, let’s calculate the bonus for all employees who have the BonusPercent value specified (i.e. BonusPercent IS NOT NULL):

SELECT ID,Name,Salary/100*BonusPercent AS Bonus FROM Employees WHERE BonusPercent IS NOT NULL

Yes, by the way, if you think about it, the BonusPercent value can be equal to zero (0), and the value can also be entered with a minus sign, because we did not impose any restrictions on this field.

Well, having told about the problem, we were told for now to consider that if (BonusPercent<=0 или BonusPercent IS NULL), то это означает что у сотрудника так же нет бонуса. Для начала, как нам сказали, так и сделаем, реализуем это при помощи логического оператора OR и NOT:

SELECT ID,Name,Salary/100*BonusPercent AS Bonus FROM Employees WHERE NOT(BonusPercent<=0 OR BonusPercent IS NULL)

Those. This is where we started learning about Boolean operators. The expression in brackets “(BonusPercent<=0 OR BonusPercent IS NULL)» проверяет на то что у сотрудника нет бонуса, а NOT инвертирует это значение, т.е. говорит «верни всех сотрудников которые не сотрудники у которых нет бонуса».

This expression can also be rewritten by immediately saying “return all employees who have a bonus” by expressing this with the expression (BonusPercent>0 and BonusPercent IS NOT NULL):

SELECT ID,Name,Salary/100*BonusPercent AS Bonus FROM Employees WHERE BonusPercent>0 AND BonusPercent IS NOT NULL

Also in the WHERE block you can check various kinds of expressions using arithmetic operators and functions. For example, a similar check can be done using an expression with the ISNULL function:

SELECT ID,Name,Salary/100*BonusPercent AS Bonus FROM Employees WHERE ISNULL(BonusPercent,0)>0

Boolean operators and simple comparison operators

Yes, we can’t do without mathematics here, so let’s take a short excursion into Boolean and simple comparison operators.

There are only 3 Boolean operators in SQL - AND, OR and NOT:

For each Boolean operator, you can provide truth tables that additionally show what the result will be when the conditions can be NULL:

There are the following simple comparison operators that are used to form conditions:

Plus there are 2 operators for checking a value/expression for NULL:

IS NULL	Testing for NULL equality
IS NOT NULL	Testing for NULL inequality

Priority: 1) All comparison operators; 2) NOT; 3) AND; 4) OR.

When constructing complex logical expressions parentheses are used:

((condition1 AND condition2) OR NOT(condition3 AND condition4 AND condition5)) OR (…)

Also, by using parentheses, you can change the standard sequence of calculations.

Here I tried to give an idea of Boolean algebra in a volume sufficient for work. As you can see, in order to write more complex conditions you cannot do without logic, but there is not much of it here (AND, OR and NOT) and it was invented by people, so everything is quite logical.

Let's go to the end of the second part

As you can see, even about the basic syntax of the SELECT operator we can talk for a very long time, but in order to stay within the scope of the article, I will finally show additional logical operators– BETWEEN, IN and LIKE.

BETWEEN – checking for inclusion in a range

Test_value BETWEEN start_value AND end_value

Expressions can act as values.

Let's look at an example:

SELECT ID,Name,Salary FROM Employees WHERE Salary BETWEEN 2000 AND 3000 -- who has a salary in the range of 2000-3000

Actually, BETWEEN is a simplified notation of the form:

SELECT ID,Name,Salary FROM Employees WHERE Salary>=2000 AND Salary<=3000 -- все у кого ЗП в диапозоне 2000-3000

The word NOT can be used before the word BETWEEN, which will check if the value is not within the specified range:

SELECT ID,Name,Salary FROM Employees WHERE Salary NOT BETWEEN 2000 AND 3000 -- similar to NOT(Salary>=2000 AND Salary<=3000)

Accordingly, if you use BETWEEN, IN, LIKE, you can also combine them with other conditions using AND and OR:

SELECT ID,Name,Salary FROM Employees WHERE Salary BETWEEN 2000 AND 3000 -- who has a salary in the range of 2000-3000 AND DepartmentID=3 -- take into account employees of department 3 only

IN – check for inclusion in the list of values

This operator has the following form:

Test_value IN (value1, value2, ...)

I think it's easier to show with an example:

SELECT ID,Name,Salary FROM Employees WHERE PositionID IN(3,4) -- whose position is 3 or 4

Those. this is essentially the same as the following expression:

SELECT ID,Name,Salary FROM Employees WHERE PositionID=3 OR PositionID=4 -- whose position is 3 or 4

In the case of NOT it will be similar (we will get everyone except those from departments 3 and 4):

SELECT ID,Name,Salary FROM Employees WHERE PositionID NOT IN(3,4) -- similar to NOT(PositionID=3 OR PositionID=4)

A query with NOT IN can also be expressed using AND:

SELECT ID,Name,Salary FROM Employees WHERE PositionID<>3AND PositionID<>4 -- equivalent to PositionID NOT IN(3,4)

Please note that searching for NULL values using the IN construct will not work, because checking NULL=NULL will also return NULL, not True:

SELECT ID,Name,DepartmentID FROM Employees WHERE DepartmentID IN(1,2,NULL) -- NULL records will not be included in the result

In this case, break the check into several conditions:

SELECT ID,Name,DepartmentID FROM Employees WHERE DepartmentID IN(1,2) -- 1 or 2 OR DepartmentID IS NULL -- or NULL

Or you could write something like:

SELECT ID,Name,DepartmentID FROM Employees WHERE ISNULL(DepartmentID,-1) IN(1,2,-1) -- if you are sure that there is no department with ID=-1

I think the first option, in this case, will be more correct and reliable. Okay, this is just an example to demonstrate what other structures can be built.

It is also worth mentioning an even more insidious error associated with NULL, which can be made when using the NOT IN construct. For example, let's try to select all employees except those whose department is 1 or whose department is not specified at all, i.e. equals NULL. As a solution, the following option suggests itself:

SELECT ID,Name,DepartmentID FROM Employees WHERE DepartmentID NOT IN(1,NULL)

But after executing the query, we will not receive a single row, although we expected to see the following:

Again, the joke here was played by NULL specified in the list of values.

Let's look at why a logical error occurred in this case. Let's expand the query using AND:

SELECT ID,Name,DepartmentID FROM Employees WHERE DepartmentID<>1 AND DepartmentID<>NULL -- the problem is because of this NULL check - this condition will always return NULL

Right condition (DepartmentID<>NULL) will always give us uncertainty here, i.e. NULL. Now remember the truth table for the AND operator, where (TRUE AND NULL) gives NULL. Those. when the left condition is met (DepartmentID<>1) due to an undefined right condition, we will end up with an undefined value for the entire expression (DepartmentID<>1 AND DepartmentID<>NULL), so the string will not be included in the result.

The condition can be rewritten correctly as follows:

SELECT ID,Name,DepartmentID FROM Employees WHERE DepartmentID NOT IN(1) -- or in this case just DepartmentID<>1 AND DepartmentID IS NOT NULL -- and separately check for NOT NULL

IN can also be used with subqueries, but we will return to this form in subsequent parts of this tutorial.

LIKE – checking a string using a pattern

I will talk about this operator only in its simplest form, which is a standard and is supported by most dialects of the SQL language. Even in this form, it can be used to solve many problems that require checking the contents of a string.

This operator has the following form:

Test_string LIKE string_pattern

The following special characters can be used in “pattern_string”:

The underscore “_” means that any single character can take its place
The percent sign “%” - says that it can be replaced by any number of characters, including none

Let's look at examples with the “%” symbol (in practice, by the way, it is used more often):

SELECT ID,Name FROM Employees WHERE Name LIKE "Pet%" -- whose name begins with the letters "Pet" SELECT ID,LastName FROM Employees WHERE LastName LIKE "%ov" -- whose last name ends with "ov" SELECT ID, LastName FROM Employees WHERE LastName LIKE "%re%" -- whose last name contains the combination "re"

Let's look at examples with the symbol “_”:

SELECT ID,LastName FROM Employees WHERE LastName LIKE "_etrov" -- whose last name consists of any first character and subsequent letters "etrov" SELECT ID,LastName FROM Employees WHERE LastName LIKE "____ov" -- whose last name consists of any four characters and subsequent letters "ov"

Using ESCAPE, you can specify a escape character that cancels the checking effect of the special characters "_" and "%". This clause is used when you want to directly check for a percent sign or an underscore in a string.

To demonstrate ESCAPE, let's put garbage into one entry:

UPDATE Employees SET FirstName="This is trash containing %" WHERE ID=1005

And let's see what the following queries return:

SELECT * FROM Employees WHERE FirstName LIKE "%!%%" ESCAPE "!" -- line contains "%" sign SELECT * FROM Employees WHERE FirstName LIKE "%!_%" ESCAPE "!" -- line contains "_" sign

If you need to check a string for a complete match, then instead of LIKE it is better to simply use the “=” sign:

SELECT * FROM Employees WHERE FirstName="Peter"

On a note.
In MS SQL, in the LIKE operator template, you can also specify a search using regular expressions; read about it on the Internet if the standard capabilities of this operator are not enough for you.
ORACLE uses the REGEXP_LIKE function to search using regular expressions.

A little about strings

In case of checking a string for the presence of Unicode characters, you will need to place the character N before the quotes, i.e. N"…". But since all character fields in our table are in Unicode format (nvarchar type), you can always use this format for these fields. Example:

SELECT ID,Name FROM Employees WHERE Name LIKE N"Pet%" SELECT ID,LastName FROM Employees WHERE LastName=N"Petrov"

When done correctly, when comparing against a field of type varchar (ASCII), you should try to use tests using "...", and when comparing a field to type nvarchar (Unicode), you should try to use tests using N"...". This is done in order to avoid implicit type conversions during query execution. We use the same rule when inserting (INSERT) values into a field or updating them (UPDATE).

When comparing strings, it is worth considering the point that, depending on the database settings (collation), string comparison can be either case-insensitive (when "Petrov" = "PETROV") or case-sensitive (when "Petrov"<>"PETROV").
In the case of a case-sensitive setting, if you want to make a case-insensitive search, you can, for example, pre-convert the right and left expressions to one case - upper or lower:

SELECT ID,Name FROM Employees WHERE UPPER(Name) LIKE UPPER(N"Pet%") -- or LOWER(Name) LIKE LOWER(N"Pet%") SELECT ID,LastName FROM Employees WHERE UPPER(LastName)=UPPER( N"Petrov") -- or LOWER(LastName)=LOWER(N"Petrov")

A little about dates

When checking for a date, you can use, as with strings, single quotes "...".

Regardless of regional settings in MS SQL, you can use the following date syntax "YYYYMMDD" (year, month, day together without spaces). MS SQL will always understand this date format:

SELECT ID,Name,Birthday FROM Employees WHERE Birthday BETWEEN "19800101" AND "19891231" -- employees of the 80s ORDER BY Birthday

In some cases, it is more convenient to set the date using the DATEFROMPARTS function:

SELECT ID,Name,Birthday FROM Employees WHERE Birthday BETWEEN DATEFROMPARTS(1980,1,1) AND DATEFROMPARTS(1989,12,31) ORDER BY Birthday

There is also a similar function DATETIMEFROMPARTS, which is used to set the Date and Time (for the datetime type).

You can also use the CONVERT function if you need to convert a string to a date or datetime value:

SELECT CONVERT(date,"12.03.2015",104), CONVERT(datetime,"2014-11-30 17:20:15",120)

The values 104 and 120 indicate which date format is used in the string. You can find a description of all valid formats in the MSDN library by searching for “MS SQL CONVERT”.

There are a lot of functions for working with dates in MS SQL, look for “ms sql functions for working with dates.”

Note. All dialects of the SQL language have their own set of functions for working with dates and apply their own approach to working with them.

A little about numbers and their transformations

The information in this section will probably be more useful to IT specialists. If you are not one, and your goal is simply to learn how to write queries to obtain the information you need from the database, then you may not need such subtleties, but in any case, you can quickly go through the text and take notes, because . If you have taken up studying SQL, then you are already joining IT.

Unlike the CAST conversion function, you can specify a third parameter in the CONVERT function, which is responsible for the conversion style (format). Different data types may have their own set of styles, which may affect the returned result. We have already touched on the use of styles when considering the conversion of a string using the CONVERT function into the date and datetime types.

You can read more about the CAST, CONVERT functions and styles in MSDN - “CAST and CONVERT Functions (Transact-SQL)”: msdn.microsoft.com/ru-ru/library/ms187928.aspx

To simplify the examples, the Transact-SQL language statements DECLARE and SET will be used here.

Of course, in the case of converting an integer to a real number (which I gave at the beginning of this lesson, in order to demonstrate the difference between integer and real division), knowledge of the nuances of the conversion is not so critical, because there we did an integer to real conversion (the range of which is much larger than the range of integers):

DECLARE @min_int int SET @min_int=-2147483648 DECLARE @max_int int SET @max_int=2147483647 SELECT -- (-2147483648) @min_int,CAST(@min_int AS float),CONVERT(float,@min_int), -- 2147483647 @max_int ,CAST(@max_int AS float),CONVERT(float,@max_int), -- numeric(16,6) @min_int/1., -- (-2147483648.000000) @max_int/1. -- 2147483647.000000

Perhaps it was not worth specifying the method of the implicit conversion obtained by dividing by (1.), because It is advisable to try to do explicit conversions for greater control over the type of result obtained. Although, if we want to get a result of type numeric, with a specified number of digits after the decimal point, then we can use a trick in MS SQL to multiply an integer value by (1., 1.0, 1.00, etc.):

DECLARE @int int SET @int=123 SELECT @int*1., -- numeric(12, 0) - 0 decimal places @int*1.0, -- numeric(13, 1) - 1 decimal place @int*1.00, -- numeric(14, 2) - 2 characters -- although sometimes it is better to do an explicit conversion CAST(@int AS numeric(20, 0)), -- 123 CAST(@int AS numeric(20, 1)), -- 123.0 CAST(@int AS numeric(20, 2)) -- 123.00

In some cases, the details of the conversion can be really important, because... they affect the correctness of the result obtained, for example, in the case when a conversion is made from a numeric value to a string (varchar). Let's look at examples of converting money and float values to varchar:

Behavior when converting money to varchar DECLARE @money money SET @money = 1025.123456789 -- there will be an implicit conversion to 1025.1235, because the money type stores only 4 digits after the decimal point SELECT @money, -- 1025.1235 -- by default, CAST and CONVERT behave the same (i.e., roughly speaking, style 0 is applied) CAST(@money as varchar(20)), -- 1025.12 CONVERT(varchar(20), @money), -- 1025.12 CONVERT(varchar(20), @money, 0), -- 1025.12 (style 0 - no thousandths separator and 2 decimal places (default format)) CONVERT( varchar(20), @money, 1), -- 1.025.12 (style 1 - uses a thousandths separator and 2 decimal places) CONVERT(varchar(20), @money, 2) -- 1025.1235 (style 2 - no separator and 4 numbers after the decimal point)

Behavior when converting float to varchar DECLARE @float1 float SET @float1 = 1025.123456789 DECLARE @float2 float SET @float2 = 1231025.123456789 SELECT @float1, -- 1025.123456789 @float2, -- 1231025. 12345679 -- By default, CAST and CONVERT behave the same (i.e. That is, roughly speaking, style 0 is used) -- style 0 - No more than 6 digits. Exponential notation is used out of necessity -- really scary things happen here when converting to varchar CAST(@float1 as varchar(20)), -- 1025.12 CONVERT(varchar(20), @float1), -- 1025.12 CONVERT(varchar(20 ), @float1, 0), -- 1025.12 CAST(@float2 as varchar(20)), -- 1.23103e+006 CONVERT(varchar(20), @float2), -- 1.23103e+006 CONVERT(varchar(20 ), @float2, 0), -- 1.23103e+006 -- style 1 - Always 8 digits. Scientific notation for numbers is always used. -- this style for float is also not very accurate CONVERT(varchar(20), @float1, 1), -- 1.0251235e+003 CONVERT(varchar(20), @float2, 1), -- 1.2310251e+006 -- style 2 - Always 16 bits. Scientific notation for numbers is always used. -- here the accuracy is better CONVERT(varchar(30), @float1, 2), -- 1.025123456789000e+003 - OK CONVERT(varchar(30), @float2, 2) -- 1.231025123456789e+006 - OK

As can be seen from the example, floating types float, real in some cases can actually create a large error, especially when converted to a string and back (this can happen with various types of integrations, when data, for example, is transferred in text files from one system to another) .

If you need to explicitly control precision up to a certain sign, more than 4, then it is sometimes better to use the decimal/numeric type to store data. If 4 characters are enough, then you can use the money type - it approximately corresponds to numeric(20,4).

Decimal and numeric DECLARE @money money SET @money = 1025.123456789 -- 1025.1235 DECLARE @float1 float SET @float1 = 1025.123456789 DECLARE @float2 float SET @float2 = 1231025.123456789 DECLARE @numeric numeric(28,9) SET @numeric = 1025.123456789 SELECT CAST( @numeric as varchar(20)), -- 1025.12345679 CONVERT(varchar(20), @numeric), -- 1025.12345679 CAST(@money as numeric(28,9)), -- 1025.123500000 CAST(@float1 as numeric(28 ,9)), -- 1025.123456789 CAST(@float2 as numeric(28,9)) -- 1231025.123456789

Note.
Since version MS SQL 2008, you can use the following construction instead:
ms sql server
Add tags

Introduction to MS SQL Server and T-SQL. T-SQL Syntax and Conventions

Using the wrong data types

Using Functions in Join Conditions and WHERE Clauses

Using Multi-statement UDF

Enabling the Work Faster Setting: Using Dirty Reads

Unreasonable use of hints in queries

Using ‘Row by Agonizing Row’ processing

Inappropriate use of nested views

Attention

Clue

WHILE

RETURN

WAITFOR

Standard Operators

Using Transactions

Clue

SELECT – data selection operator

Setting table aliases

DISTINCT – discarding duplicate rows

Let's return to DDL for a moment.

Setting aliases for query columns

Basic SQL Arithmetic Operators

ORDER BY – sorting the query result

TOP – return the specified number of records

WHERE – row selection condition

Boolean operators and simple comparison operators

Let's go to the end of the second part

BETWEEN – checking for inclusion in a range

IN – check for inclusion in the list of values

LIKE – checking a string using a pattern

A little about strings

A little about dates

A little about numbers and their transformations

Popular articles

Latest articles

Sections

Pages

Special projects

Contacts