- °³¿ä
- °£´ÜÇÑ ¼Ò°³
- ¿¹Á¦
- ¹®ÀÚ¿ ³»ºÎ¿¡¼ Ç¥Çö½Ä¿¡ ÀÏÄ¡ÇÏ´Â ºÎºÐ ¸ðµÎ Ãâ·ÂÇϱâ
- ÀÏÄ¡ÇÏ´Â ¹®ÀÚ¿ ġȯÇϱâ
- ¸µÅ©
1 °³¿ä
boost::regex ´Â Á¤±ÔÇ¥Çö½Ä(RegularExpression)À» ÇÁ·Î±×·¥ ³»ºÎ¿¡¼ »ç¿ëÇÒ ¼ö ÀÖµµ·Ï ÇØÁÖ´Â ¶óÀ̺귯¸®´Ù. »ç½Ç regex º¸´Ù »ç¿ë¹ý ½±°í, ÄÄÆÑÆ®ÇÑ ¶óÀ̺귯¸®´Â »ó´çÈ÷ ¸¹Àºµ¥, ±»ÀÌ regex ¸¦ ¾²±â·Î ÇÑ °ÍÀº ±â´ÉÀÌ Á» ¸¹°í, boost ¶óÀ̺귯¸®¿Í °°ÀÌ µû¶ó¿À±â ¶§¹®¿¡ ¼³Ä¡°¡ ±×³ª¸¶ ÆíÇÏ´Ù°í »ý°¢Ç߱⠶§¹®ÀÌ´Ù.
2 °£´ÜÇÑ ¼Ò°³
¶óÀ̺귯¸®¿Í ÇÔ²² µþ·Á¿À´Â ¹®¼¸¦ ¹ø¿ªÇÑ °ÍÀÌ´Ù.
Á¤±Ô½ÄÀº ÆÐÅÏ ¸ÅĪÀÇ ÇÑ °¡Áö ÇüÅ·μ, ÅØ½ºÆ® 󸮿¡ Á¾Á¾ »ç¿ëµÈ´Ù. ¸¹Àº »ç¿ëÀÚµéÀÌ Á¤±Ô½ÄÀ» »ç¿ëÇÏ´Â grep, sed, awk °°Àº À¯Æ¿¸®Æ¼³ª, perl °°Àº ¾ð¾î¿¡ Àͼ÷ÇÒ °ÍÀÌ´Ù. ÀüÅëÀûÀ¸·Î C++ »ç¿ëÀÚµéÀº POSIX C API(regXXX ½Ã¸®Áî)¸¦ »ç¿ëÇØ¼ Á¤±Ô½ÄÀ» ó¸®ÇØ ¿Ô´Ù. ºñ·Ï regex ¶óÀ̺귯¸®°¡ ÀÌ APIµéÀÇ ±â´ÉÀ» ¶È°°ÀÌ Á¦°øÇÏÁö¸¸, ±âÁ¸ÀÇ APIó·³ »ç¿ëÇÏ´Â °ÍÀº ¶óÀ̺귯¸®ÀÇ ±â´ÉÀ» ÃæºÐÈ÷ »ì¸®Áö ¸øÇÏ´Â °ÍÀÌ´Ù. ¿¹¸¦ µé¾î regex ¶óÀ̺귯¸®´Â À¯´ÏÄÚµå ¹®ÀÚ¿À» ´Ù·ê ¼ö ÀÖÀ¸¸ç, °Ë»ö/ġȯ ±â´Éµµ Á¦°øÇÑ´Ù. ÀÌ·± ±â´ÉµéÀº ±âÁ¸ÀÇ ÀüÅëÀûÀÎ C ¶óÀ̺귯¸®¿¡´Â ¾ø´Â °ÍµéÀÌ´Ù.
boost::basic_regex ´Â ¶óÀ̺귯¸®ÀÇ ÇÙ½ÉÀÌ µÇ´Â Ŭ·¡½ºÀÌ´Ù. ÀÌ Å¬·¡½º´Â "±â°è°¡ ÀÐÀ» ¼ö ÀÖ´Â" Á¤±Ô½ÄÀ» ³ªÅ¸³»¸ç, std::string °ú »ó´çÈ÷ ºñ½ÁÇÑ ¸ð¾çÀ¸·Î µðÀÚÀεǾî ÀÖ´Ù. std::string ¿¡´Ù Á¤±Ô½Ä ó¸® ¾Ë°í¸®ÁòÀ» À§ÇÑ »óÅ Á¤º¸¸¦ ´õÇß´Ù°í º¸¸é µÈ´Ù. std::string °ú ¸¶Âù°¡Áö·Î µÎ °¡ÁöÀÇ typedef°¡ Á¸ÀçÇÑ´Ù.
namespace boost{
template <class charT,
class traits = regex_traits<charT>,
class Allocator = std::allocator<charT> >
class basic_regex;
typedef basic_regex<char> regex;
typedef basic_regex<wchar_t> wregex;
}
ÀÌ ¶óÀ̺귯¸®°¡ ¾î¶»°Ô »ç¿ëµÇ´ÂÁö ¾Ë±â À§Çؼ, ½Å¿ë Ä«µå ó¸®¿Í °ü·ÃµÈ ¾ÖÇø®ÄÉÀ̼ÇÀ» Á¦ÀÛÇÑ´Ù°í »ý°¢ÇØ º¸ÀÚ. ½Å¿ë Ä«µå ¹øÈ£´Â 10Áø¼ö 4°³ ¹À½À» 4¹ø °ãÄ£ °Í, Áï 16°³ÀÇ 10Áø¼ö·Î ÀÌ·ç¾îÁø´Ù. ±×¸®°í 4°³ÀÇ ¹À½µé »çÀÌ¿¡´Â ' '³ª '-' ¹®ÀÚ°¡ µé¾î°£´Ù. ½Å¿ë Ä«µå ¹øÈ£¸¦ µ¥ÀÌÅͺ£À̽º¿¡ ÀúÀåÇϱâ Àü¿¡, Ä«µå ¹øÈ£°¡ ¿Ã¹Ù¸¥ Æ÷¸ËÀÎÁö¸¦ °Ë»çÇϱ⸦ ¿øÇÑ´Ù°í ÇÏÀÚ. 10Áø¼öÀÎÁö¸¦ °Ë»çÇϱâ À§Çؼ´Â ÀüÅëÀûÀÎ [0-9] ¿Í °°Àº Ç¥ÇöÀ» ¾µ ¼ö ÀÖ´Ù. ÇÏÁö¸¸ ÀÌ·± Ç¥Çö¹æ½ÄÀº ½ÇÁ¦·Î´Â ·ÎÄÉÀÏ¿¡ µ¶¸³ÀûÀÌÁö ¸øÇÏ´Ù. ´ë½Å¿¡ POSIX Ç¥ÁØÀÎ [[:digit:]] Ç¥ÇöÀ̳ª, perlÀÇ ´ÜÃà¾îÀÎ '\d'¸¦ »ç¿ëÇÏÀÚ. (¿À·¡µÈ ¶óÀ̺귯¸®ÀÏ °æ¿ì, \d °°Àº Ç¥ÇöÀº ·ÎÄÉÀÏ¿¡ »ó°ü¾øÀÌ ÇϵåÄÚµùµÇ¾î ÀÖÀ» È®·üÀÌ ³ôÀ¸´Ï ÁÖÀÇ) À̸¦ ÀÌ¿ëÇϸé, ´ÙÀ½°ú °°ÀÌ ½Å¿ë Ä«µå ¹øÈ£¸¦ Ç¥ÇöÇÒ ¼ö ÀÖ´Ù.
(\d{4}[- ]){3}\d{4}
°ýÈ£´Â ±×·ìÀ» ³ªÅ¸³»±â À§ÇÑ °ÍÀÌ´Ù. {4}°¡ ³ªÅ¸³»´Â °ÍÀº "Á¤È®È÷ 4¹ø ¹Ýº¹"ÇÑ´Ù´Â ¸»ÀÌ´Ù. À̰ÍÀº perl, awk, egrep¿¡¼ »ç¿ëÇÏ´Â È®Àå Á¤±ÔÇ¥Çö½ÄÀÇ ¿¹ÀÌ´Ù. regex ¶óÀ̺귯¸®´Â sed, grep ¿¡¼ »ç¿ëÇÏ´Â Á» ´õ ¿À·¡µÈ "±âº»" ¹®¹ýµµ Á¦°øÇÏÁö¸¸, º¸Åë ºñ½Ç¿ëÀûÀÌ´Ù. ¹°·Ð ÀÌ¹Ì È°¿ëÇϰí ÀÖ´Â ¿À·¡µÈ Á¤±ÔÇ¥Çö½ÄµéÀÌ ÀÖ´Ù¸é "±âº»" ¹®¹ýµéÀ» »ç¿ëÇÒ ¼öµµ ÀÖ´Ù.
À§¿¡¼ ¸¸µç Ç¥ÇöÀ» ½Å¿ë Ä«µå ¹øÈ£¸¦ °ËÁõÇÏ´Â C++ ÄÚµå·Î ¸¸µé¸é ´ÙÀ½°ú °°´Ù.
bool validate_card_format(const std::string s)
{
static const boost::regex e("(\\d{4}[- ]){3}\\d{4}");
return regex_match(s, e);
}
C++ ÄÚµåÀ̹ǷΠ'\' ¹®ÀÚ¸¦ »ç¿ëÇÒ ¶§ Çϳª¸¦ ´õ ¾²´Â °ÍÀº C++ ¹®¹ýÀ» ¾Æ´Â »ç¶÷À̶ó¸é Àͼ÷ÇÒ °ÍÀÌ´Ù. ±×¸®°í À§ÀÇ ¿¹Á¦¸¦ Æ÷ÇÔÇØ¼ ¸ðµç ¿¹Á¦µéÀº »ç¿ëÇÏ´Â ÄÄÆÄÀÏ·¯°¡ Koenig lookup(³×ÀÓ½ºÆäÀ̽º¿Í °ü·ÃµÈ ¸ðÈ£¼º ÇØ¼Ò ¹æ¹ý)À» Áö¿øÇÑ´Ù°í °¡Á¤ÇÑ´Ù. ¸¸ÀÏ À̸¦ Áö¿øÇÏ´Â ¾Ê´Â ÄÄÆÄÀÏ·¯(¿¹¸¦ µé¾î VC6)¶ó¸é, ¸î°³ ÇÔ¼ö ¾Õ¿¡´Ù boost:: Á¢µÎ¾î¸¦ ºÙ¿©Áà¾ßÇÒ °ÍÀÌ´Ù.
½Å¿ë Ä«µå 󸮿¡ Àͼ÷ÇÑ »ç¶÷À̶ó¸é, À§ÀÇ Æ÷¸Ë(' ' ¶Ç´Â '-'·Î ±¸ºÐµÇ´Â...)ÀÌ »ç¶÷ÀÌ Àб⿡´Â ÁÁÀ»Áö´Â ¸ô¶óµµ, ´Ù¸¥ ´ëºÎºÐÀÇ ¿Â¶óÀÎ Ä«µå ó¸® ½Ã½ºÅÛ¿¡¼ »ç¿ëÇÏ´Â Æ÷¸Ë°ú´Â ¸ÂÁö ¾Ê´Â´Ù´Â °ÍÀ» ¾Ë °ÍÀÌ´Ù. Áï Áß°£ÀÇ ' ' ¶Ç´Â '-' ¹®ÀÚ¸¦ ¾ø¾Ö¾ßÇÏ´Â ¸»ÀÌ´Ù. ±×·¯¹Ç·Î Áß°£¿¡ ±¸ºÐÀÚ°¡ µé¾î°¡´Â Æ÷¸Ë°ú µé¾î°¡Áö ¾Ê´Â Æ÷¸Ë °£ÀÇ ÀüȯÀÌ °¡´ÉÇØ¾ß ÇÑ´Ù. ÀÌ´Â °Ë»ö/ġȯ ±â´ÉÀ» ¿ä±¸ÇÑ´Ù. sed, perl µî¿¡ Àͼ÷ÇÑ »ç¶÷Àº ÀÌ¹Ì ÀÌ ¹®Á¦¸¦ »ý°¢Çϰí ÀÖ¾úÀ» °ÍÀÌ´Ù. regex ¶óÀ̺귯¸®¿¡¼´Â regex_replace ¾Ë°í¸®ÁòÀ» ÀÌ¿ëÇÑ´Ù. ½Å¿ë Ä«µå ¿¹Á¦¿Í °ü·ÃÇØ¼ ´ÙÀ½°ú °°Àº Ç¥Çö½ÄÀ» ¸¸µé¾î µÎ Æ÷¸Ë °£ÀÇ º¯È¯À» Á¦°øÇÒ ¼ö ÀÖ´Ù.
// match any format with the regular expression:
const boost::regex e("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z");
const std::string machine_format("\\1\\2\\3\\4");
const std::string human_format("\\1-\\2-\\3-\\4");
std::string machine_readable_card_number(const std::string s)
{
return regex_replace(s, e, machine_format, boost::match_default | boost::format_sed);
}
std::string human_readable_card_number(const std::string s)
{
return regex_replace(s, e, human_format, boost::match_default | boost::format_sed);
}
ÇÏÀ§ Ç¥Çö½Ä(sub-expression)À» »ç¿ëÇØ¼, ½Å¿ë Ä«µå ¹øÈ£¸¦ °¢°¢ÀÇ ¹À½À¸·Î ³ª´©¾ú´Ù. Æ÷¸Ë ¹®ÀÚ¿Àº sed ¿¡¼ »ç¿ëÇÏ´Â ¹æ½ÄÀ¸·Î ¸¸µé¾î, °¢°¢ÀÇ ÀÏÄ¡ÇÏ´Â ºÎºÐÀ» ÀçÆ÷¸ËµÈ ¹®ÀÚ¿·Î ġȯÇϵµ·Ï Çß´Ù.
À§ÀÇ ¿¹Á¦¿¡¼´Â Á¤±Ô Ç¥Çö½Ä ¸ÅĪÀÇ °á°ú¸¦ Á÷Á¢ ´Ù·çÁö´Â ¾Ê¾Ò´Ù. ÇÏÁö¸¸ ÀϹÝÀûÀ¸·Î ÀÌ °á°ú°ªÀº Àüü ¸ÅĪ »Ó¸¸ ¾Æ´Ï¶ó, ÇÏÀ§ ¸ÅĪ °á°úµµ °°ÀÌ Æ÷ÇÔÇÑ´Ù. regex ¶óÀ̺귯¸®¿¡¼´Â ÀÌ·± °á°ú°ªµéÀ» ¸®ÅÏÇϱâ À§Çؼ ´ÙÀ½°ú °°Àº Ŭ·¡½ºµéÀ» »ç¿ëÇÑ´Ù.
namespace boost{
typedef match_results<const char*> cmatch;
typedef match_results<const wchar_t*> wcmatch;
typedef match_results<std::string::const_iterator> smatch;
typedef match_results<std::wstring::const_iterator> wsmatch;
}
regex_search ¾Ë°í¸®Áò°ú regex_match ¾Ë°í¸®ÁòÀº match_results Ŭ·¡½º¸¦ »ç¿ëÇÏ¿© °á°ú¸¦ ¹ÝȯÇÑ´Ù. regex_match ´Â ÁÖ¾îÁø ¹®ÀÚ¿ Àüü°¡ Ç¥Çö½Ä°ú ÀÏÄ¡ÇÏ´ÂÁöÀÇ ¿©ºÎ¸¦ ¹ÝȯÇϰí, regex_search ´Â ÁÖ¾îÁø ¹®ÀÚ¿ ³»ºÎ¿¡¼ Ç¥Çö½Ä¿¡ ÀÏÄ¡ÇÏ´Â ºÎºÐÀ» ã´Â´Ù.
ÀÌ ¾Ë°í¸®ÁòµéÀº ÀϹÝÀûÀÎ C ¹®ÀÚ¿ »Ó¸¸ ¾Æ´Ï¶ó ¾ç¹æÇ⠹ݺ¹ÀÚ¸¦ Á¦°øÇÏ´Â ¾î¶² µ¥ÀÌÅÍ Å¸ÀÔ¿¡µµ Àû¿ëÇÒ ¼ö ÀÖ´Ù´Â Á¡À» Âü°íÇϱ⠹ٶõ´Ù.
°Ë»ö/ġȯ ÀÛ¾÷¿¡ ÀÖ¾î¼, ÀÌ¹Ì À§¿¡¼ º» regex_replace ¾Ë°í¸®Áò »Ó¸¸ ¾Æ´Ï¶ó, match_results Ŭ·¡½º ¶ÇÇÑ, Æ÷¸Ë °á°ú¸¦ ¹Þ¾Æµé¿© ÇÕÄ£ ÈÄ ¹ÝȯÇÏ´Â ÇÔ¼ö¸¦ Á¦°øÇÑ´Ù.
For iterating through all occurences of an expression within a text, there are two iterator types: regex_iterator will enumerate over the match_results objects found, while regex_token_iterator will enumerate a series of strings (similar to perl style split operations).
ÅØ½ºÆ® ¾È¿¡¼ ÀÏÄ¡ÇÏ´Â ºÎºÐÀ» Ⱦ´ÜÇϱâ À§Çؼ, µÎ °¡Áö ¹Ýº¹ÀÚ¸¦ Á¦°øÇÑ´Ù. regex_iterator ´Â match_results µéÀ» Ⱦ´ÜÇϱâ À§Çؼ »ç¿ëÇϰí, regex_token_iterator ´Â ¹®ÀÚ¿ÀÇ ÁýÇÕÀ» Ⱦ´ÜÇÏ´Â µ¥ »ç¿ëÇÑ´Ù.
ȣȯ¼ºÀ» À§Çؼ POSIX API ÇÔ¼ö, regcomp, regexec, regfree, regerrr µéµµ 1¹ÙÀÌÆ® ¹× 2¹ÙÀÌÆ® ¹®ÀÚ¿ ¹öÀüÀ¸·Î Á¸ÀçÇÑ´Ù.
3 ¿¹Á¦
BoostRegex ¶óÀ̺귯¸®¸¦ »ç¿ëÇϱâ À§Çؼ´Â ºôµå °úÁ¤À» °ÅÃÄ¾ß ÇÑ´Ù. ºôµå °úÁ¤Àº BoostBuild ÆäÀÌÁö¸¦ Âü°í. ¿©·¯ °¡Áö ±â´É ¸¹Áö¸¸, ±âº»ÀûÀÎ °Ë»ö°ú ġȯ ±â´É¸¸ ÀÖÀ¸¸é ¿Ø¸¸ÇÑ ÀÛ¾÷Àº ó¸® °¡´ÉÇÒ µí Çѵ¥...
3.1 ¹®ÀÚ¿ ³»ºÎ¿¡¼ Ç¥Çö½Ä¿¡ ÀÏÄ¡ÇÏ´Â ºÎºÐ ¸ðµÎ Ãâ·ÂÇϱâ
ƯÁ¤ ¹®ÀÚ¿ ³»ºÎ¿¡¼ WikiNameÀ» ã¾Æ¼ Ãâ·ÂÇÏ´Â ¿¹Á¦´Ù.
#include <boost/regex.hpp>
#include <iostream>
#pragma comment(lib, "libboost_regex-vc71-mt-sgd-1_31.lib")
int main()
{
try
{
const boost::regex e("([A-Z][a-z]+[A-Z][a-z]+|\\[.*\\])");
std::string text =
"TestCase is WikiName. but TESTCase is not WikiName! Is [this] WikiName?";
boost::match_results<std::string::const_iterator> m;
std::string::const_iterator start = text.begin();
std::string::const_iterator end = text.end();
while (boost::regex_search(start, end, m, e))
{
// m[0]´Â ÀÏÄ¡ÇÏ´Â ºÎºÐ ¹®ÀÚ¿À» ³ªÅ¸³»°í,
// m[0].first´Â ÀÏÄ¡ÇÏ´Â ºÎºÐÀÇ ½ÃÀÛ À§Ä¡,
// m[0].second´Â ÀÏÄ¡ÇÏ´Â ºÎºÐÀÇ ³¡À» ³ªÅ¸³½´Ù.
// m[n]Àº ÇÏÀ§ Ç¥Çö½Ä(°ýÈ£·Î µÑ·¯½ÎÀΠǥÇö½Ä)À» ³ªÅ¸³»´Âµ¥,
// ¿¹Á¦ Á¤±Ô Ç¥Çö½Ä¿¡ ÇÏÀ§ Ç¥Çö½ÄÀÌ Á¸ÀçÇÏÁö ¾Ê±â ¶§¹®¿¡,
// ¿©±â¼´Â [0] ¹Û¿¡´Â ¾µ ÀÏÀÌ ¾ø´Ù.
cout << m[0] << " = "
<< m[0].first - text.begin() << "~"
<< m[0].second - text.begin() << endl;
start = m[0].second;
}
}
catch (std::exception& e)
{
cerr << e.what() << endl;
}
return 0;
}
3.2 ÀÏÄ¡ÇÏ´Â ¹®ÀÚ¿ ġȯÇϱâ
¸ðµç À§Å° À̸§À» °ýÈ£ 2°³·Î °¨½Î´Â ¿¹Á¦. (¿ø·¡´Â °ýÈ£ 3°³°¡ ¸ñÀûÀ̾ú´Âµ¥, À§Å°¿¡¼´Â Ç¥ÇöÇϱⰡ Á» °ï¶õÇÑ °ü°è·Î.)
#include <boost/regex.hpp>
#include <iostream>
#include <sstream>
#pragma comment(lib, "libboost_regex-vc71-mt-sgd-1_31.lib")
int main()
{
try
{
const boost::regex e("([A-Z][a-z]+[A-Z][a-z]+|\\[.*\\])");
std::string text =
"TestCase is WikiName. but TESTCase is not WikiName! Is [this] WikiName?";
std::string::const_iterator start = text.begin();
std::string::const_iterator end = text.end();
std::stringstream result;
std::ostream_iterator<char, char> oi(result);
// $0´Â Á¤±Ô Ç¥Çö½Ä¿¡ ÀÏÄ¡ÇÏ´Â ºÎºÐ Àüü¸¦ ÀǹÌÇÑ´Ù.
// ÀÚ¼¼ÇÑ °ÍÀº ÆÞ(perl) ¹®¹ýÀ» Âü°í.
boost::regex_replace(oi, start, end,
e, "{{$0}}", boost::match_default | boost::format_all);
cout << result.str() << endl;
// ÀÌ·± ¹æ¹ýµµ °¡´ÉÇÏ´Ù.
// cout << boost::regex_replace(text,
// e, "{{$0}}", boost::match_default | boost::format_all);
}
catch (std::exception& e)
{
cerr << "Exception: " << e.what() << endl;
}
return 0;
}
4 ¸µÅ©
SeriousMoin v1 (koMoinMoin 1.0a4 Modified)