REGEXP_SUBSTR函数的整理


有网友提出对REGEXP_SUBSTR函数中参数有些疑问,网上搜了一些,对该参数的解释和测试确实有点少,自己测试了一下,整理如下:

注:regexp_replace的与该函数语法类似

REGEXP_SUBSTR函数语法解释:

Syntax

Purpose

REGEXP_SUBSTR extends the functionality of the SUBSTR function by letting you search a string for a regular expression pattern. It is also similar toREGEXP_INSTR, but instead of returning the position of the substring, it returns the substring itself. This function is useful if you need the contents of a match string but not its position in the source string. The function returns the string as VARCHAR2 or CLOB data in the same character set assource_char.

This function complies with the POSIX regular expression standard and the Unicode Regular Expression Guidelines. For more information, refer toAppendix D, "Oracle Regular Expression Support".

• source_char is a character expression that serves as the search value. It is commonly a character column and can be of any of the data typesCHAR, VARCHAR2, NCHAR, NVARCHAR2, CLOB, or NCLOB.


• pattern is the regular expression. It is usually a text literal and can be of any of the data types CHAR, VARCHAR2, NCHAR, or NVARCHAR2. It can contain up to 512 bytes. If the data type of pattern is different from the data type of source_char, then Oracle Database converts pattern to the data type of source_char. For a listing of the operators you can specify in pattern, refer to Appendix D, "Oracle Regular Expression Support".


• position is a positive integer indicating the character of source_char where Oracle should begin the search. The default is 1, meaning that Oracle begins the search at the first character of source_char.


• occurrence is a positive integer indicating which occurrence of pattern in source_char Oracle should search for. The default is 1, meaning that Oracle searches for the first occurrence of pattern.

If occurrence is greater than 1, then the database searches for the second occurrence beginning with the first character following the first occurrence of pattern, and so forth. This behavior is different from the SUBSTR function, which begins its search for the second occurrence at the second character of the first occurrence.


• match_parameter is a text literal that lets you change the default matching behavior of the function. The behavior of this parameter is the same for this function as for REGEXP_COUNT. Refer to REGEXP_COUNT for detailed information.


• For a pattern with subexpressions, subexpr is a nonnegative integer from 0 to 9 indicating which subexpression in pattern is to be returned by the function. This parameter has the same semantics that it has for the REGEXP_INSTR function. Refer to REGEXP_INSTR for more information.


网上大部分给出的解释如下:

function REGEXP_SUBSTR(String, pattern, position, occurrence, modifier)

__srcstr    :需要进行正则处理的字符串

__pattern    :进行匹配的正则表达式

__position  :起始位置,从第几个字符开始正则表达式匹配(默认为1)

__occurrence :标识第几个匹配组,默认为1

__modifier  :模式('i'不区分大小写进行检索;'c'区分大小写进行检索。默认为'c'

 

 


其中主要的问题有两点

1、occurrence表示第几个匹配组,这里所谓的匹配组指的不是正则表达式中的匹配组,是字符串中第几次出现的匹配结果,也许是上面的解释是教老版本中的函数,给出例子看一下:

SELECT REGEXP_SUBSTR('12345678901234567890', '(123)(4(56)(78))', 1, 1, 'i')
FROM DUAL;
结果:12345678  该结果为第一个1234567890中的12345678

SELECT REGEXP_SUBSTR('12345678901234567890', '(123)(4(56)(78))', 1, 2, 'i')
FROM DUAL;
结果:12345678  该结果是第二个1234567890中的12345678  也就是说第二个匹配到的结果

SELECT REGEXP_SUBSTR('12345678901234567890', '(123)(4(56)(78))', 1, 3, 'i')
FROM DUAL;
结果:NULL 


2、关于第6个参数pattern,该参数的作用更倾向于网上大多数多occurrence的解释,用来表示匹配第几个正则表达式中的分组
 
比如(123)(4(56)(78))
第一组是(123)
第二组是(4(56)(78))
第三组是(56)
第四组是(78)

SELECT REGEXP_SUBSTR('12345678901234567890', '(123)(4(56)(78))', 1, 1, 'i',1) FROM DUAL;
123
SELECT REGEXP_SUBSTR('12345678901234567890', '(123)(4(56)(78))', 1, 1, 'i',2) FROM DUAL;
45678
SELECT REGEXP_SUBSTR('12345678901234567890', '(123)(4(56)(78))', 1, 1, 'i',3) FROM DUAL;
56

相关内容