博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
[Leetcode] Regular Expression Matching
阅读量:5882 次
发布时间:2019-06-19

本文共 6067 字,大约阅读时间需要 20 分钟。

Implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.'*' Matches zero or more of the preceding element.The matching should cover the entire input string (not partial).The function prototype should be:bool isMatch(const char *s, const char *p)Some examples:isMatch("aa","a") → falseisMatch("aa","aa") → trueisMatch("aaa","aa") → falseisMatch("aa", "a*") → trueisMatch("aa", ".*") → trueisMatch("ab", ".*") → trueisMatch("aab", "c*a*b") → true

It seems that some readers are confused about why the regex pattern ".*" matches the string"ab"".*" means repeat the preceding element 0 or more times. Here, the "preceding" element is the dot character in the pattern, which can match any characters. Therefore, the regex pattern".*" allows the dot to be repeated any number of times, which matches any string (even an empty string). Think carefully how you would do matching of '*'.Please note that '*' in regular expression is different from wildcard matching, as we match the previous character 0 or more times. But, how many times? If you are stuck,recursion is your friend.

 

[cpp] 
 
  1. bool isMatch(const char *s, const char *p) {  
  2.         // Start typing your C/C++ solution below  
  3.         // DO NOT write int main() function      
  4.         if (*p == 0) return *s == 0;  
  5.         if (*(p+1) != '*')  
  6.         {  
  7.             if (*s != 0 && (*p == *s || *p == '.')) return isMatch(s+1, p+1);  
  8.             else return false;  
  9.         }  
  10.         else  
  11.         {  
  12.             // *s == *p  
  13.             while (*s != 0 && (*s == *p || *p == '.'))  
  14.             {  
  15.                 if (isMatch(s, p+2)) return true;  
  16.                 s++;  
  17.             }  
  18.             return (isMatch(s, p+2));  
  19.         }  
  20.     }  

 

 

[cpp] 
 
  1. bool isMatch(const char *s, const char *p) {  
  2.     assert(s && p);  
  3.     if (*p == '\0'return *s == '\0';  
  4.   
  5.     // next char is not '*': must match current character  
  6.     if (*(p+1) != '*') {  
  7.         assert(*p != '*');  
  8.         return ((*p == *s) || (*p == '.' && *s != '\0')) && isMatch(s+1, p+1);  
  9.     }  
  10.       
  11.     // next char is '*'  
  12.     while ((*p == *s) || (*p == '.' && *s != '\0')) {  
  13.         if (isMatch(s, p+2)) return true;  
  14.         s++;  
  15.     }  
  16.     return isMatch(s, p+2);  
  17. }  

This problem is a tricky one. Due to the huge number of edge cases, many people would write lengthy code and have numerous bugs on their first try. Try your best getting your code correct first, then refactor mercilessly to as clean and concise as possible!

A sample diagram of a deterministic finite state automata (DFA). DFAs are useful for doing lexical analysis and pattern matching. An example is UNIX's grep tool. Please note that this post does not attempt to descibe a solution using DFA.

 

Solution:

This looks just like a straight forward string matching, isn't it? Couldn't we just match the pattern and the input string character by character? The question is, how to match a '*'?

A natural way is to use a greedy approach; that is, we attempt to match the previous character as many as we can. Does this work? Let us look at some examples.

s = "abbbc"p = "ab*c" Assume we have matched the first 'a' on both s and p. When we see"b*" in p, we skip all b's in s. Since the last 'c' matches on both side, they both match.

s = "ac"p = "ab*c" After the first 'a', we see that there is no b's to skip for "b*". We match the last 'c' on both side and conclude that they both match.

It seems that being greedy is good. But how about this case?

s = "abbc"p = "ab*bbc" When we see "b*" in p, we would have skip all b's in s. They both should match, but we have no more b's to match. Therefore, the greedy approach fails in the above case.

One might be tempted to think of a quick workaround. How about counting the number of consecutive b's in s? If it is smaller or equal to the number of consecutive b's after "b*" in p, we conclude they both match and continue from there. For the opposite, we conclude there is not a match.

This seem to solve the above problem, but how about this case: s = "abcbcd"p = "a.*c.*d"

Here, ".*" in p means repeat '.' 0 or more times. Since '.' can match any character, it is not clear how many times '.' should be repeated. Should the 'c' in p matches the first or second'c' in s? Unfortunately, there is no way to tell without using some kind of exhaustive search.

We need some kind of backtracking mechanism such that when a matching fails, we return to the last successful matching state and attempt to match more characters in s with '*'. This approach leads naturally to recursion.

The recursion mainly breaks down elegantly to the following two cases:

 

  1. If the next character of p is NOT '*', then it must match the current character of s. Continue pattern matching with the next character of both s and p.
  2. If the next character of p is '*', then we do a brute force exhaustive matching of 0, 1, or more repeats of current character of p... Until we could not match any more characters.

 

You would need to consider the base case carefully too. That would be left as an exercise to the reader. :)

 

 

Further Thoughts:

Some extra exercises to this problem:

  1. If you think carefully, you can exploit some cases that the above code runs in exponential complexity. Could you think of some examples? How would you make the above code more efficient?
  2. Try to implement partial matching instead of full matching. In addition, add '^' and '$' to the rule. '^' matches the starting position within the string, while '$' matches the ending position of the string.
  3. Try to implement wildcard matching where '*' means any sequence of zero or more characters.

For the interested reader, real world regular expression matching (such as the grep tool) are usually implemented by applying formal language theory. To understand more about it, you may read.

 

 

ref: 

1 class Solution { 2 public: 3     bool isMatch(const char *s, const char *p) { 4         if (s == NULL || p == NULL) return false; 5         if (*p == '\0') return *s == '\0'; 6         if (*(p + 1) == '*') { 7             while ((*s != '\0' && *p == '.') || *s == *p) { 8                 if (isMatch(s, p + 2)) return true; 9                 ++s;10             }11             return isMatch(s, p + 2);12         } else if ((*s != '\0' && *p == '.') || *s == *p){13             return isMatch(s + 1, p + 1);14         }15         return false;16     }17 };

 

转载地址:http://iipix.baihongyu.com/

你可能感兴趣的文章
Applet
查看>>
高并发环境下,Redisson实现redis分布式锁
查看>>
关于浏览器的cookie
查看>>
Hyper-V 2016 系列教程30 机房温度远程监控方案
查看>>
.Net 通过MySQLDriverCS操作MySQL
查看>>
JS Cookie
查看>>
ubuntu Unable to locate package sysv-rc-conf
查看>>
笔记:认识.NET平台
查看>>
cocos2d中CCAnimation的使用(cocos2d 1.0以上版本)
查看>>
【吉光片羽】短信验证
查看>>
MacBook如何用Parallels Desktop安装windows7/8
查看>>
gitlab 完整部署实例
查看>>
GNS关于IPS&ASA&PIX&Junos的配置
查看>>
影响企业信息化成败的几点因素
查看>>
SCCM 2016 配置管理系列(Part8)
查看>>
struts中的xwork源码下载地址
查看>>
ABP理论学习之仓储
查看>>
我的友情链接
查看>>
Tengine新增nginx upstream模块的使用
查看>>
CentOS图形界面和命令行切换
查看>>