Поиск и замена в pl/sql с использованием регулярных выражений

Введение в регулярные выражения

Регулярные выражения (RegExp) — это очень эффективный способ работы со строками.


Составив регулярное выражение с помощью специального синтаксиса вы можете:

  • искать текст в строке
  • заменять подстроки в строке
  • извлекать информацию из строки

Почти во всех языках программирования есть регулярные выражения. Есть небольшие различия в реализации, но общие концепции применяются практически везде.

Регулярные выражения относятся к 1950-м годам, когда они были формализованы как концептуальный шаблон поиска для алгоритмов обработки строк.

Регулярные выражения реализованные в  UNIX, таких как grep, sed и популярных текстовых редакторах, начали набирать популярность и были добавлены в язык программирования Perl, а позже и в множество других языков.

JavaScript, наряду с Perl, это один из языков программирования в котором поддержка регулярных выражений встроена непосредственно в язык.

Notes

This function operates on UTF-8 strings using the default locale, even if the locale has been set to something else.

If you are porting a regular expression query from an Oracle database, remember that Oracle considers a zero-length string to be equivalent to NULL, while Vertica does not.

Another key difference between Oracle and Vertica is that Vertica can handle an unlimited number of captured subexpressions, while Oracle is limited to nine.

In Vertica, you can use in the replacement pattern to access the substring captured by the tenth set of parentheses in the regular expression. In Oracle, is treated as the substring captured by the first set of parentheses, followed by a zero. To force this Oracle behavior in Vertica, use the back reference and enclose the number of the captured subexpression in curly braces. For example, is the substring captured by the first set of parentheses followed by a zero.

You can also name captured subexpressions to make your regular expressions less ambiguous. See the PCRE documentation for details.

Example — Match on Words

Let’s start by extracting the first word from a string.

For example:

SELECT REGEXP_SUBSTR ('TechOnTheNet is a great resource', '(\S*)(\s)')
FROM dual;

Result: 'TechOnTheNet '

This example will return ‘TechOnTheNet ‘ because it will extract all non-whitespace characters as specified by and then the first whitespace character as specified by . The result will include both the first word as well as the space after the word.

If you didn’t want to include the space in the result, we could modify our example as follows:

SELECT REGEXP_SUBSTR ('TechOnTheNet is a great resource', '(\S*)')
FROM dual;

Result: 'TechOnTheNet'

This example would return ‘TechOnTheNet’ with no space at the end.

If we wanted to find the second word in the string, we could modify our function as follows:

SELECT REGEXP_SUBSTR ('TechOnTheNet is a great resource', '(\S*)(\s)', 1, 2)
FROM dual;

Result: 'is '

This example would return ‘is ‘ with a space at the end of the string.

If we wanted to find the third word in the string, we could modify our function as follows:

SELECT REGEXP_SUBSTR ('TechOnTheNet is a great resource', '(\S*)(\s)', 1, 3)
FROM dual;

Result: 'a '

Example — Match on Digit Characters

Now, let’s look next at how we would use the REGEXP_REPLACE function to match on a digit character pattern.

For example:

SELECT REGEXP_REPLACE ('7, 8, and 15 are numbers in this example', '\\d', 'abc');
Result: 'abc, abc, and abcabc are numbers in this example'

This example will replace numeric digits in the string with «abc» as specified by . In this case, it will match on the numbers 7, 8, and 15.

We could change our pattern to replace only two-digit numbers.

For example:

SELECT REGEXP_REPLACE ('7, 8, and 15 are numbers in this example', '(\\d)(\\d)', 'abc');
Result: '7, 8, and abc are numbers in this example'

This example will replace all two-digit numbers with «abc» as specified by . In this case, it will skip over the 7 and 8 numeric values and only replace the number 15.

Now, let’s look how we would use the REGEXP_REPLACE function with a table column and search for a two digit number.

For example:

SELECT REGEXP_REPLACE (address, '\\d', '')
FROM contacts;

In this example, we are going to remove all numbers from the address field in the contacts table. This is done by searching for all numbers using and replacing with «».

Пример сопоставления нескольких альтернатив.

Следующий пример, который мы рассмотрим, включает использование | шаблон. | шаблон используется как «ИЛИ», чтобы указать несколько альтернатив.

Например:

Oracle PL/SQL

SELECT REGEXP_REPLACE (‘AeroSmith’, ‘a|e|i|o|u’, ‘R’) FROM dual; —Результат: ARrRSmRth

1 2 3

SELECTREGEXP_REPLACE(‘AeroSmith’,’a|e|i|o|u’,’R’)

FROMdual; —Результат: ARrRSmRth

Этот пример вернет ‘ARrRSmRth’, потому что он ищет первую гласную (a, e, i, o или u) в строке. Поскольку мы не указали значение match_parameter, функция REGEXP_REPLACE будет выполнять поиск с учетом регистра, что означает, что ‘A’ в ‘AeroSmith’ не будет сопоставляться.

Мы могли бы изменить наш запрос, чтобы выполнить поиск без учета регистра следующим образом:

Oracle PL/SQL

SELECT REGEXP_REPLACE (‘AeroSmith’, ‘a|e|i|o|u’, ‘R’, 1, 0, ‘i’) FROM dual; —Результат: RRrRSmRth

1 2 3

SELECTREGEXP_REPLACE(‘AeroSmith’,’a|e|i|o|u’,’R’,1,0,’i’)

FROMdual; —Результат: RRrRSmRth

Теперь, поскольку мы указали match_parameter = ‘i’, запрос заменит ‘A’ в строке. На этот раз ‘A’ в ‘AeroSmith’ сопоставится с шаблоном. Заметим также, что мы указали 5-й параметр как 0, чтобы были заменены все вхождения.

Теперь рассмотри, как вы будете использовать эту функцию со столбцом.

Итак, допустим, у нас есть таблица contact со следующими данными:

contact_id last_name
1000 AeroSmith
2000 Joy
3000 Scorpions

Теперь давайте запустим следующий запрос:


Oracle PL/SQL

SELECT contact_id, last_name, REGEXP_REPLACE (last_name, ‘a|e|i|o|u’, ‘R’, 1, 0, ‘i’) AS «New Name» FROM contacts;

1 2

SELECTcontact_id,last_name,REGEXP_REPLACE(last_name,’a|e|i|o|u’,’R’,1,0,’i’)AS»New Name»

FROMcontacts;

Запрос вернет следующие результаты:

contact_id last_name New Name
1000 AeroSmith RRrRSmRth
2000 Joy JRy
3000 Scorpions ScRrpRRns

Examples

Find groups of «word characters» (letters, numbers and underscore) ending with «thy» in the string «healthy, wealthy, and wise» and replace them with nothing.

=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','\w+thy');
 REGEXP_REPLACE 
----------------
 , , and wise
(1 row)

Find groups of word characters ending with «thy» and replace with the string «something.»

=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','\w+thy', 'something');
         REGEXP_REPLACE         
--------------------------------
 something, something, and wise
(1 row)

Find groups of word characters ending with «thy» and replace with the string «something» starting at the third character in the string.

=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','\w+thy', 'something', 3);
          REGEXP_REPLACE          
----------------------------------
 hesomething, something, and wise
(1 row)

Replace the second group of word characters ending with «thy» with the string «something.»

=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','\w+thy', 'something', 1, 2);
        REGEXP_REPLACE        
------------------------------
 healthy, something, and wise
(1 row)

Find groups of word characters ending with «thy» capturing the letters before the «thy», and replace with the captured letters plus the letters «ish.»

=> SELECT REGEXP_REPLACE('healthy, wealthy, and wise','(\w+)thy', '\1ish');
       REGEXP_REPLACE       
----------------------------
 healish, wealish, and wise
(1 row)

Create a table to demonstrate replacing strings in a query.

=> CREATE TABLE customers (name varchar(50), phone varchar(11));
CREATE TABLE
=> CREATE PROJECTION customers1 AS SELECT * FROM customers;
CREATE PROJECTION
=> COPY customers FROM stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself.
>> Able, Adam|17815551234
>> Baker,Bob|18005551111
>> Chu,Cindy|16175559876
>> Dodd,Dinara|15083452121
>> \. 

Query the customers, using REGEXP_REPLACE to format the phone numbers.

=> SELECT name, REGEXP_REPLACE(phone, '(\d)(\d{3})(\d{3})(\d{4})', 
'\1-(\2) \3-\4') as phone FROM customers;
    name     |      phone       
-------------+------------------
 Able, Adam  | 1-(781) 555-1234
 Baker,Bob   | 1-(800) 555-1111
 Chu,Cindy   | 1-(617) 555-9876
 Dodd,Dinara | 1-(508) 345-2121
(4 rows)

Oracle REGEXP_INSTR Function

The Oracle REGEXP_INSTR function lets you search a string for a regular expression pattern, and returns a number that indicates where the pattern was found.

It’s similar to the Oracle INSTR function, but it handles regular expressions where INSTR does not.

The syntax for the REGEXP_INSTR function is:

This looks pretty complicated! Don’t worry, I’ll explain it all. There are a lot of parameters here, most of which are optional.

Let’s look at them and see what they do.

  • source_string (mandatory): This is the character string that the expression is searched in. It can be any of CHAR, VARCHAR2, NCHAR, NVACHAR2, CLOB, or NCLOB.
  • pattern (mandatory): This is the regular expression that is used to search within the source_string. It can be any of CHAR, VARCHAR, NCHAR, or NVARCHAR2, and can be up to 512 bytes.
  • position (optional): This is the position in the source_string where the function should begin the search for the pattern. It must be a positive integer, and the default value is 1 (the search begins at the first character).
  • occurrence (optional): This is a positive integer that indicates which occurrence of the pattern within the source_string the function should search for. The default value is 1, which means the function finds the first occurrence. If the value is greater than 1, then the function looks for the second occurrence (or further occurrences) after the first occurrence is found.
  • return_option (optional): This lets you specify what happens when an occurrence is found. If you specify 0, which is the default, the function returns the position of the first character of the occurrence. If you specify 1, then the function returns the position of the character after the occurrence.
  • match_parameter (optional): This allows you to change the default matching behaviour of the function, which can be one or more of:
    • “i”: case-insensitive matching
    • “c”: case-sensitive matching
    • “n”: allows the “.” character to match the newline character instead of any character
    • “m”: treats the source_string value as multiple lines, where ^ is the start of a line and $ is the end of a line.
  • sub_expression (optional): If the pattern has subexpressions, this value indicates which subexpression is used in the function. The default value is 0.

Oracle REGEXP_LIKE Examples

Let’s take a look at some examples of the REGEXP_LIKE function:

Example 1

This example uses just a source and pattern value.

TITLE
Waterfall
Wishful
Wellness

It returns these three values because they start with W and have one or more characters following them.

Example 2

This example uses another source and pattern value. It looks for values where there are at least two consecutive “e” characters

TITLE
Tree
Freedom

These two values have the “ee” characters within them.

Example 3


This example looks for values that have a letter “C” in it.

TITLE
Chair
Crypt
QUICKLY

This example shows it doesn’t matter if the “C” is at the start or in the middle of the string.

Example 4

This example looks for values that have either a “V” or a “v” in it. The “i” denotes that the search is case-insensitive.

TITLE
Vacuum
November
TraVERse
Undercover
Vain

It shows results that have either a “V” or a “v”, at any point in the string.

This example shows results that have a “V”, but not those that have a lowercase “v”.

TITLE
Vacuum
TraVERse
Vain

Example 6

This example shows values that have digits inside them.

TITLE
Summer of 69
The year is 2017
1955

It shows values that are only digits, and those that have digits inside them.

Example 7

This example shows values that have alphabetical characters.

TITLE
Box
Chair
Vacuum
Desk
Round
Under

This also only shows a sample of values.

Further Reading

O’Reilly’s book Oracle Regular Expressions Pocket Reference is a very handy 64-page volume that tells you everything you need to know about regular expressions in Oracle Database 10g. Despite the book’s cover, it actually contains both a tutorial and a reference. Since Oracle’s regular expression support is fairly limited, this small book is all you need to successfully use regular expressions with Oracle.

  • Buy Oracle Regular Expressions Pocket Reference from Amazon.com
  • Buy Oracle Regular Expressions Pocket Reference from Amazon.co.uk
  • Buy Oracle Regular Expressions Pocket Reference from Amazon.fr
  • Buy Oracle Regular Expressions Pocket Reference from Amazon.de

Oracle REGEXP_INSTR Examples

Let’s take a look at some examples of the Oracle REGEXP_INSTR function.

This example finds the position of the “ee” within a string.

TITLE REG
Tree 3
Freedom 3

Example 2

This example finds the position of a string that starts with either A, B, or C, and then has 4 alphabetical characters following it.

TITLE REG
Chair 1
Crypt 1

Example 3

This example finds the position of strings that have two vowels in a row.

TITLE REG
Chair 3
Vacuum 4
Round 2
Superficial 9
Suspicious 7
Tree 3
breakfast 3
Freedom 3
Helium 4
Laundromat 2
Exclaim 5
Vain 2
The year is 2017 6

As you can see, the REGEXP_INSTR value is different for each row depending on where the two vowels start.

Example 4

This example shows the position of values where there are two vowels in a row, after position 4.

TITLE REG
Vacuum 4
Superficial 9
Suspicious 7
Helium 4
Exclaim 5
The year is 2017 6

Example 5

This example shows the position of the second occurrence in a string where there is a vowel after position 5.

TITLE REG
Superficial 9
Suspicious 7
Designate 9
Hawkeye 7
Laundromat 9
Mathematical 7
Exclaim 6
Undercover 9
Xylophone 9
Zucchini 8
Summer of 69 8
The year is 2017 7

Example 6

This example shows the position of the second occurrence in a string where there is a vowel after position 5, but shows the position at the end of the value that was found.

TITLE REG
Superficial 10
Suspicious 8
Designate 10
Hawkeye 8
Laundromat 10
Mathematical 8
Exclaim 7
Undercover 10
Xylophone 10
Zucchini 9
Summer of 69 9
The year is 2017 8

Example 7

This example shows the position of values that have an A, B, C, D, or E, followed by a vowel, using a case-insensitive search.

TITLE REG
Box 1
Chair 3
Vacuum 3
Desk 1
Under 3
Dismiss 1
Superficial 8
Suspicious 6
Tree 3
breakfast 3
Designate 1

Only some of the values are shown here. “breakfast’ is shown because the search is case-insensitive, so it doesn’t matter that it has lowercase values.

Example 8

This example shows values that have an A, B, or C, followed by a vowel, and finds the position of the vowel.

TITLE REG
Box 2
Chair 4
Vacuum 4
Superficial 9
Suspicious 7
November 7
October 6
Laundromat 3
Mathematical 11
Exclaim 6
Undercover 7
Vain 3

JavaScript

JS Array concat() constructor copyWithin() entries() every() fill() filter() find() findIndex() forEach() from() includes() indexOf() isArray() join() keys() length lastIndexOf() map() pop() prototype push() reduce() reduceRight() reverse() shift() slice() some() sort() splice() toString() unshift() valueOf()

JS Boolean constructor prototype toString() valueOf()

JS Classes constructor() extends static super

JS Date constructor getDate() getDay() getFullYear() getHours() getMilliseconds() getMinutes() getMonth() getSeconds() getTime() getTimezoneOffset() getUTCDate() getUTCDay() getUTCFullYear() getUTCHours() getUTCMilliseconds() getUTCMinutes() getUTCMonth() getUTCSeconds() now() parse() prototype setDate() setFullYear() setHours() setMilliseconds() setMinutes() setMonth() setSeconds() setTime() setUTCDate() setUTCFullYear() setUTCHours() setUTCMilliseconds() setUTCMinutes() setUTCMonth() setUTCSeconds() toDateString() toISOString() toJSON() toLocaleDateString() toLocaleTimeString() toLocaleString() toString() toTimeString() toUTCString() UTC() valueOf()

JS Error name message

JS Global decodeURI() decodeURIComponent() encodeURI() encodeURIComponent() escape() eval() Infinity isFinite() isNaN() NaN Number() parseFloat() parseInt() String() undefined unescape()

JS JSON parse() stringify()

JS Math abs() acos() acosh() asin() asinh() atan() atan2() atanh() cbrt() ceil() cos() cosh() E exp() floor() LN2 LN10 log() LOG2E LOG10E max() min() PI pow() random() round() sin() sqrt() SQRT1_2 SQRT2 tan() tanh() trunc()

JS Number constructor isFinite() isInteger() isNaN() isSafeInteger() MAX_VALUE MIN_VALUE NEGATIVE_INFINITY NaN POSITIVE_INFINITY prototype toExponential() toFixed() toLocaleString() toPrecision() toString() valueOf()

JS OperatorsJS RegExp constructor compile() exec() g global i ignoreCase lastIndex m multiline n+ n* n? n{X} n{X,Y} n{X,} n$ ^n ?=n ?!n source test() toString() (x|y) . \w \W \d \D \s \S \b \B \0 \n \f \r \t \v \xxx \xdd \uxxxx

JS Statements break class continue debugger do…while for for…in for…of function if…else return switch throw try…catch var while

JS String charAt() charCodeAt() concat() constructor endsWith() fromCharCode() includes() indexOf() lastIndexOf() length localeCompare() match() prototype repeat() replace() search() slice() split() startsWith() substr() substring() toLocaleLowerCase() toLocaleUpperCase() toLowerCase() toString() toUpperCase() trim() valueOf()


С этим читают