Sql replacing characters in a string by position. Combining string functions in Transact-SQL

Twitter

Basic string functions and operators provide a variety of capabilities and return a string value as a result. Some string functions are two-element, meaning they can operate on two strings at once. The SQL 2003 standard supports string functions.

Concatenation operator

SQL 2003 defines the concatenation operator (||), which joins two separate strings into a single string value.

DB2 platform

The DB2 platform supports the SQL 2003 concatenation operator as well as its synonym, the CONCAT function.

MySQL platform

The MySQL platform supports the CONCATQ function, a synonym for the SQL 2003 concatenation operator.

Oracle and PostgreSQL

The PostgreSQL and Oracle platforms support the SQL 2003 double vertical bar concatenation operator.

SQL Server platform

The SQL Server platform uses the plus sign (+) as a synonym for the SQL 2003 concatenation operator. SQL Server has a system parameter CONCAT_NULL_YIELDS_NULL that controls how the system behaves if NULL values are encountered when concatenating string values.

/* SQL 2003 syntax */

stringl || string2 || string3

string1string2string3

/* For MySQL */

CONCAT("stringl", "string2")

If any of the concatenated values are empty, then an empty string is returned. Additionally, if a numeric value is involved in the concatenation, it is implicitly converted to a string value.

SELECT CONCAT("My ", "bologna", "has", "a", "first", "name...");

My bologna has a first name

SELECT CONCAT("My ", NULL, "has", "first", "name...");

CONVERT and TRANSLATE

The CONVERT function changes the display of a character string within a character set and collation. For example, the CONVERT function can be used to change the number of bits per character.

The TRANSLATE function translates a string value from one character set to another. For example, the TRANSLATE function can be used to convert a value from the English character set to the Kanji (Japanese) or Cyrillic (Russian) character set. The translation itself must already exist - either specified by default or created using the CREATE TRANSLATION command.

SQL 2003 syntax

CONVERT (character_value USING character_conversion_name)

TRANSLATE(character_value USING translation_name)

The CONVERT function converts a character value to the character set with the name specified in the character_conversion_name parameter. The TRANSLATE function converts a character value to the character set specified in translation_name.

Among the platforms reviewed, only Oracle supports the CONVERT and TRANSLATE functions as defined in the SQL 2003 standard. Oracle's implementation of the TRANSLATE function is very similar to, but not identical to, SQL 2003. In this implementation, the function takes only two arguments and translates only between the database character set and the locale-enabled character set.

MySQL's implementation of the CONV function only converts numbers from one base to another. SQL Server's implementation of the CONVERT function is quite rich in capabilities and changes the data type of the expression, but in all other aspects it differs from the CONVERT function of the SQL 2003 standard. The PostgreSQL platform does not support the CONVERT function, and the implementation of the TRANSLATE function converts all occurrences of a character string to any another character string.

DB2

The DB2 platform does not support the CONVERT function, and support for the TRANSLATE function is not ANSI compliant. The TRANSLATE function is used to transform substrings and has historically been synonymous with the UPPER function because the UPPER function was only recently added to DB2. If the TRANSLATE function is used in DB2 with a single argument as a character expression, the result is the same string converted to uppercase. If the function is used with multiple arguments, such as TRANSLATE(ucmo4HUK, replace, match), then the function converts all characters in the source that are also in the match parameter. Every character in the source that is in the same position as in the match parameter will be replaced with the character from the replace parameter. Below is an example.

TRANSLATE("Hello, World!") "HELLO; WORLD!"

TRANSLATE("Hello, World1", "wZ", "1W") "Hewwo, Zorwd1

MySQL

The MySQL platform does not support the TRANSLATE and CONVERT functions.

Oracle

The Oracle platform supports the following syntax for the CONVERT and TRANSLATE functions.

In Oracle's implementation, the CONVERT function returns the text of a character value converted to the target_charset_set. The char_value parameter is the string to be converted, the target_charset_set is the name of the character set into which the string is to be converted, and the source_charset parameter is the character set in which the string value was originally stored.

The TRANSLATE function in Oracle conforms to ANSI syntax, but you can only choose one of two character sets: the database character set (CHARJCS) and the national language-specific character set (NCHARJZS).

Oracle also supports another function, also called TRANSLATE (without using the USING keyword). This TRANSLATE function has nothing to do with character set conversion.

The names of the target and source character sets can be passed either as string constants or as a reference to a table column. Note that when converting a string to a character set that does not display all the characters being converted, you can substitute replacement characters.

Oracle supports several common character sets, which include US7ASCII and WE8DECDEC. WE8HP, F7DEC, WE8EBCDIC500, WE8PC850u WE8ISO8859PI. For example:

SELECT CONVERT("Gro2", "US7ASCII", "WE8HP") FROM DUAL;

PostgreSQL

The PostgreSQL platform supports the ANSI standard CONVERT statement, and conversions here can be defined using the CREATE CONVERSION command. PostgreSQL's implementation of the TRANSLATE function provides an extended set of functions that allow you to transform any text into other text within a specified string.

TRANSLATE (character string, from_text, to_text)

Here are some examples:

SELECT TRANSLATE("12345abcde", "5a", "XX"); "1234XXbcde" SELECT TRANSLATE(title, "Computer", "PC") FROM titles WHERE type="Personal_computer" SELECT CONVERT("PostgreSQL" USING iso_8859_1_to_utf_8) "PostgreSQL"

SQL Server

The SQL Server platform does not support the TRANSLATE function. The implementation of the CONVERT function in SQL Server is not compliant with the SQL 2003 standard. This function in SQL Server is equivalent to the CAST function.

CONVERT (data_type[(length) | (precision, scale)], expression, style])

The style clause is used to define the date conversion format. For more information, see the SQL Server documentation. Below is an example.

SELECT title, CONVERT(char(7), ytd_sales) FROM titles ORDER BY title GO

Sql String Functions

This group of functions allows you to manipulate text. There are many string functions, we will look at the most common ones.

CONCAT(str1,str2...) Returns a string created by concatenating the arguments (the arguments are in parentheses - str1,str2...). For example, in our Vendors table there is a City column and an Address column. Suppose we want the resulting table to have Address and City in the same column, i.e. we want to combine data from two columns into one. To do this, we will use the CONCAT() string function, and as arguments we will indicate the names of the columns to be combined - city and address:
SELECT CONCAT(city, address) FROM vendors;

Please note that the merging occurred without splitting, which is not very readable. Let's adjust our query so that there is a space between the columns being joined:
SELECT CONCAT(city, " ", address) FROM vendors;

As you can see, a space is also considered an argument and is indicated separated by a comma. If there were more columns to be merged, then specifying spaces each time would be irrational. In this case, one could use the string function CONCAT_WS(separator, str1,str2...), which places a separator between the concatenated strings (the separator is specified as the first argument). Our query will then look like this:
SELECT CONCAT_WS(" ", city, address) FROM vendors;
The result did not change externally, but if we were to combine 3 or 4 columns, the code would be significantly reduced.

INSERT(str, pos, len, new_str) Returns the string str with the substring starting at position pos and having a length of len characters replaced by the substring new_str. Suppose we decide not to display the first 3 characters in the Address column (abbreviations st., pr., etc.), then we will replace them with spaces:
SELECT INSERT(address, 1, 3, " ") FROM vendors;

That is, three characters, starting from the first, are replaced by three spaces.

LPAD(str, len, dop_str) Returns the string str, left padded with dop_str to length len. Let's say we want to display supplier cities to the right and fill the empty space with dots:
SELECT LPAD(city, 15, ".") FROM vendors;

RPAD(str, len, dop_str) Returns the string str right padded with dop_str to length len. Let's say we want to display supplier cities to the left, and fill the empty space with dots:
SELECT RPAD(city, 15, ".") FROM vendors;

Please note that the len value limits the number of characters displayed, i.e. if the city name is longer than 15 characters, it will be truncated.

LTRIM(str) Returns the string str with all leading spaces removed. This string function is convenient for correctly displaying information in cases where random spaces are allowed when entering data:
SELECT LTRIM(city) FROM vendors;

RTRIM(str) Returns the string str with all trailing spaces removed:
SELECT RTRIM(city) FROM vendors;
In our case, there were no extra spaces, so we won’t see the result externally.

TRIM(str) Returns the string str with all leading and trailing spaces removed:
SELECT TRIM(city) FROM vendors;

LOWER(str) Returns the string str with all characters converted to lowercase. It does not work correctly with Russian letters, so it is better not to use it. For example, let's apply this function to the city column:
SELECT city, LOWER(city) FROM vendors;

See what kind of gobbledygook it turned out to be. But everything is fine with the Latin alphabet:
SELECT LOWER("CITY");

UPPER(str) Returns the string str with all characters converted to uppercase. It is also better not to use it with Russian letters. But everything is fine with the Latin alphabet:
SELECT UPPER(email) FROM customers;

LENGTH(str) Returns the length of the string str. For example, let's find out how many characters are in our supplier addresses:
SELECT address, LENGTH(address) FROM vendors;

LEFT(str, len) Returns the len left characters of the string str. For example, let only the first three characters be displayed in supplier cities:
SELECT name, LEFT(city, 3) FROM vendors;

RIGHT(str, len) Returns the len right characters of the string str. For example, let only the last three characters be displayed in supplier cities: SELECT LOAD_FILE("C:/proverka");
Please note that you must specify the absolute path to the file.

As already mentioned, there are many more string functions, but even some of those discussed here are used extremely rarely. Therefore, let’s finish considering them here and move on to more commonly used date and time functions.

In this part we will talk about functions for working with text information that can be used in queries and program code in the PL/SQL language.

Function CONCAT(strl, str2)

This function concatenates the strings strl and str2. If one of the arguments is NULL, then it is treated as an empty string. If both arguments are NULL, then the function returns NULL. Example:

SELECT CONCAT("The priest had a dog") x1,
CONCATCTest" , NULL) x2,
CONCAT(NULL, "Test") x3,
CONCAT(NULL, NULL) x4
FROM dual


The priest had a dog

To concatenate strings, Oracle supports a special concatenation operator "||" that works similar to the CONCAT function, for example:

SELECT CONCAT("The priest "had a dog") x1, "The priest " || "had a dog" x2
FROM dual

Do not confuse the concatenation operator "||", which is equivalent to calling the CONCAT function, and the "+" operator used in arithmetic operations. In Oracle these are different operators, but due to automatic type casting, subtle errors are possible, for example:

SELECT "5" + "3" x1
FROM dual

In this case, the numeric value 8 is returned rather than the text string "53". This is because when Oracle detects the arithmetic operator “+”, Oracle automatically tries to cast the arguments to type NUMBER.

Function LOWER(str)

The LOWER function converts all characters in str to lowercase. Example:

SELECT LOWER("TeXt DATA") X
FROM dual

FunctionUPPER(str)

The UPPER function converts all characters in the string str to uppercase. Example:

SELECT UPPER("TeXt DATA") X
FROM dual

INITCAP(str) function

Returns the string str with the first letters of all words converted to uppercase. The function is convenient for formatting the full name when building reports. Example:

SELECT INITCAPCIVANOV peter sidorovich") X
FROM dual

FunctionsLTRIM(str [,set])AndRTRIM(str [,set])

The LTRIM function removes all characters from the beginning of a string up to the first character that is not in the set character set. By default, set consists of a single space and may not be specified. The RTRIM function is similar to LTRIM, but removes characters starting from the end of the string. Let's look at a few examples:

SELECT LTRIM(" TeXt DATA") X1,
LTRIM(" _ # TeXt DATA", " #_") X2,
LTRIM(" 1234567890 TeXt DATA", " 1234567890") X3
FROM dual

Function REPLACE(str, search_str, [,replace_str])

The REPLACE function searches for a search_str pattern in the string str and replaces each occurrence found with replace_str. By default, replace_str is the empty string, so calling REPLACE with two arguments removes all matches found. The search for a substring is case sensitive. Example:

SELECT REPLACE("The priest had a dog", "dog", "cat") x1,
REPLACE("The priest had an evil dog", "evil") x2,
REPLACE("The priest had a dog", "Dog", "Cat") x3
FROM dual


The priest had a cat	The priest had a dog	The priest had a dog

Function TRANSLATE(str, from_mask, to_mask)

The TRANSLATE function parses the string str and replaces all characters appearing in the from_mask string with the corresponding characters from to_mask. For the function to work correctly, the from_mask and to_mask strings must be the same length, or the from_mask string must be longer than to_mask. If from_mask is longer than to_mask, and while processing the string str, characters are found that match one of the from_mask characters, and there is no match for them in to_mask, then such characters will be removed from the string str. If you pass from_mask or to_mask equal to NULL, the function will return NULL. The comparison is made case-sensitive.

SELECT TRANSLATE("Test 12345", "e2\"E!") x1,
TRANSLATE("Test 12345", "e234", "E") x2
FROM dual

This function is convenient for solving a number of practical problems related to character conversion or searching for prohibited characters. For example, you need to analyze a password and find out whether it contains at least one digit. The implementation of this check using TRANSLATE looks like this:

IF TRANSLATE(PassWd, "0123456789", "*") = PassWd THEN
ADD_ERR0R("Error - Password must contain at least one digit!");
RETURN 1;
ENDIF;

Another example: a number is being prepared for its conversion to NUMBER. It is necessary to replace the decimal separators "," and "." on "." and remove spaces. The implementation of this operation using TRANSLATE looks like this:

SELECT TRANSLATE("123 455.23", "., ", " . . ") x1,
TRANSLATE("-123 455.23", "., ", " . . ") x2
FROM dual

Function SUBSTR(str, m [,n])

The SUBSTR function returns a fragment of the string str, starting at character m, with a length of n characters. The length can be omitted - in this case, the string is returned from the character m to the end of the string str. Characters are numbered starting from 1. If you specify m = 0, then copying will still start from the first character. Specifying a negative value for m causes characters to be counted from the end of the string rather than from the beginning. Specifying values of m that are greater in absolute value than the length of the string causes the function to return NULL.

SELECT SUBSTR("The priest had a dog", 13) x1,
SUBSTR("The priest had a dog", -6) x2,
SUBSTR("This is test text", 5, 8) x3,
SUBSTR("The priest had a dog", 150) x4
FROM dual


		text

Function INSTR(str, search_str [,n[,m]])

The INSTR function returns the position of the first character m-ro of the string fragment str that matches search_str. The comparison is carried out from the nth character of the string str; the comparison is case-sensitive. By default, n = m = 1, that is, the search is carried out from the beginning of the line and the position of the first fragment found is returned. If the search is unsuccessful, the function returns 0.

SELECT INSTR("y butt was a dog", "dog") x1,
INSTR("y butt was a dog", "cat") x2,
INSTR("This is text to demonstrate text search", "text", 1, 2) x3,
INSTR(‘11111000000001", "1", 7) x4
FROM dual

With this function, as well as with all others in Oracle, common errors associated with handling NULL values are often made. If str=NULL, then the function will return NULL, not zero! This must be taken into account when constructing various conditions. For example, this fragment of a PL/SQL program does not take this feature into account:

IF INSTR(TXT_VAR,"*") = 0 THEN
...
ENDIF;

In this case, it would be correct to write like this:

IF NVL(INSTR(TXT_VAR, "*"), 0) = 0 THEN
...
ENDIF;

LENGTH (str) and LENGTHB (str) functions

The LENGTH(str) function returns the length of the string str in characters. For an empty string and a NULL value, the function returns NULL, so it is recommended to use NVL in conjunction with this function.

SELECT LENGTH("The priest had a dog") x1,
LENGTH("") x2,
LENGTH(NULL) x3,
NVL(LENGTH(""), 0) x4
FROM dual

The LENGTHB function is similar to the LENGTH function, but returns the length of the string in bytes.

ASCII(str) function

Returns the ASCII value of the first character of the string str when using ASCII character encoding, and the value of the first byte of a multibyte character when using multibyte character encoding. Example:

SELECT ASCII("Test") x1 FROM dual

Function CHR(n)

Returns a character by its code.

SELECT CHR(64) x1
FROM dual

Here is a complete list of string functions taken from BOL:

The result is 11. To find out what letters these are, we can use the CHAR function, which returns the character by a known ASCII code (from 0 to 255):

Here's how, for example, you can get a table of codes for all alphabetic characters:

SELECT CHAR(ASCII("a")+ num-1) letter, ASCII("a")+ num - 1 FROM (SELECT 5*5*(a-1)+5*(b-1) + c AS num FROM (SELECT 1 a UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) x CROSS JOIN (SELECT 1 b UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) y CROSS JOIN ( SELECT 1 c UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5) z) x WHERE ASCII("a")+ num -1 BETWEEN ASCII("a") AND ASCII("z")

I refer those who are not yet aware of the generation of a number sequence to the corresponding article.

As you know, the codes for lowercase and uppercase letters are different. Therefore, to get the full set without rewriting the request, you just need to add a similar one to the above code:

I figure it wouldn't be too difficult to add this letter to the table if needed.

Let us now consider the task of determining where to find the desired substring in a string expression. Two functions can be used for this - CHARINDEX And PATINDEX. They both return the starting position (the position of the first character of the substring) of the substring in the string. The CHARINDEX function has the syntax:

CHARINDEX ( search_expression, string_expression[, start_position])

Here's an optional integer parameter start_position defines the position in a string expression from which the search is performed search_expression. If this parameter is omitted, the search is performed from the beginning string_expression. For example, request

It should be noted that if the searched substring or string expression is NULL, then the result of the function will also be NULL.

The following example determines the positions of the first and second occurrence of the character "a" in the ship name "California"

But, for example, how can you find the names of ships that contain a sequence of three characters, the first and last of which are “e”:

Steam room to LEFT function RIGHT returns the specified number of characters to the right from a string expression:

RIGHT(<string expression>,<number of characters>)

Here, for example, is how you can determine the names of ships that begin and end with the same letter:

Here we separate the class name and the ship name with a space. In addition, in order not to repeat the entire construction as a function argument, we use a subquery. The result will look like:

To eliminate this case, you can use another useful function LEN (<string expression>) , which returns the number of characters in the string. Let's limit ourselves to the case when the number of characters is greater than one:

Function REPLICATE pads the constant "abcde" with five spaces on the right, which are not taken into account by the function LEN, - in both cases we get 5.
Function DATALENGTH returns the number of bytes in the variable's representation and shows us the difference between CHAR and VARCHAR types. DATALENGTH will give us 12 for the CHAR type and 10 for the VARCHAR type.
As expected, DATALENGTH for a variable of type VARCHAR, returned the actual length of the variable. But why did the result turn out to be 12 for a variable of type CHAR? The point is that CHAR is a type fixed length. If the value of a variable is less than its length, and we declared the length as CHAR(12), then the value of the variable will be “aligned” to the required length by adding trailing spaces.

There are tasks on the site in which you need to arrange (find the maximum, etc.) in numerical order the values presented in text format. For example, airplane seat number (“2d”) or CD speed (“24x”). The problem is that the text is sorted like this (ascending)

If you want to arrange the places in ascending order of rows, then the order should be like this

If we limit ourselves to this, we get

All that remains is to sort

Today I propose to look at simple examples of use Transact-SQL string functions, and not just a description and examples of some functions, but their combination, i.e. how can they be nested within each other, since to implement many tasks standard functions are not enough and they have to be used together. And so I would like to show you a couple of simple examples of writing such queries.

You and I have already looked at SQL string functions, but since the implementations of this language in different DBMSs are different, for example, some functions are not in Transact-SQL, but they are in PL/PGSql, and just last time we looked at string functions that can be use in plpgsql and therefore today we will talk specifically about Transact-SQL.

How to combine SUBSTRING, CHARINDEX and LEN

And so, for example, you need to search for a part of a string according to a certain criterion and cut it out, and not just search for a part of the same type, but dynamically, i.e. The search string will be different for each string. We will write examples in Management Studio SQL Server 2008.

To do this we will use the following functions:

SUBSTRING(str, start, len) – this function cuts part of a string from another string. Has three parameters 1. This is the string itself; 2. The starting position from which to start cutting; 3. The number of characters that need to be cut.
CHARINDEX(str1, str2) - searches str1 in str2 and returns the serial number of the first character if such a string is found. It has a third optional parameter, with which you can specify which side to start the search from.
LEN(str1) - string length, i.e. Characters.

As you can see, here I used variable declarations, and you can substitute your own fields in the request instead of variables. Here's the code itself:

Declare @rezult as varchar(10) --source string declare @str1 as varchar(100) --search string declare @str2 as varchar(10) set @str1 = "Test string to search for another string in it" set @str2 = "string" set @rezult=substring(@str1,CHARINDEX(@str2, @str1), LEN(@str2)) select @rezult

The point here is this: using the len function, we find out how many characters need to be cut, and charindex sets the position from which we need to start cutting, and accordingly substring performs the selection itself in the string.

How to combine LEFT, RIGHT and LEN

Let's say that you need to get the first few characters in a string or check these first characters in a string for the presence of something, for example, some kind of number, and its length is naturally different (the example is, of course, a test one).

Left(str, kol) – functions cut out the specified number of characters from the left, have two parameters, the first is the string and the second is the number of characters;
Right(str, kol) - functions cut out the specified number of characters from the right, the parameters are the same.

Now we will use simple queries against the table

First, let's create a table test_table:

CREATE TABLE ( IDENTITY(1,1) NOT NULL, (18, 0) NULL, (50) NULL, CONSTRAINT PRIMARY KEY CLUSTERED ( ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON ) ON GO

Let's fill it with test data and write the following queries:

As you understand, the first query is simply a selection of all rows (SQL Basics - select statement), and the second is a direct combination of our functions, here is the code:

Select * from test_table select number, left(text,LEN(number)) as str from test_table

And if these numbers were on the right, then we would use the function RIGHT.

Using Rtrim, Ltrim, Upper and Lower in combination

Let's say you have a line with spaces at the beginning and at the end, and you would, of course, like to get rid of them and, for example, make the first letter in this line capitalized.

Rtrim(str) – removes spaces from the right;
Ltrim(str) – removes spaces on the left;
Upper(str) – converts the string to uppercase;
Lower(str) - converts the string to lowercase.

As you can see, to secure here we also used Substring And Len. The meaning of the query is simple, we remove spaces on both the right and left, then we convert the first character to uppercase by cutting it out, then we concatenate (the + operator) this character with the remaining string. Here's the code:

Declare @str1 as varchar(100) set @str1 = " test string with leading and trailing spaces " select @str1 select upper(substring(rtrim(ltrim(@str1)),1,1))+ lower(substring( rtrim(ltrim(@str1)),2,LEN(rtrim(ltrim(@str1)))-1))

I think that’s enough for today, and if you like programming in SQL, then on this site we have touched on this very interesting topic more than once, for example.