Showing posts with label Reular Expression. Show all posts
Showing posts with label Reular Expression. Show all posts

Friday, 23 August 2013

Validate email id with regular expression

// siddhu vydyabhushana // 6 comments
Email Regular Expression Pattern
^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*
      @[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$;
Description
^			#start of the line
  [_A-Za-z0-9-\\+]+	#  must start with string in the bracket [ ], must contains one or more (+)
  (			#   start of group #1
    \\.[_A-Za-z0-9-]+	#     follow by a dot "." and string in the bracket [ ], must contains one or more (+)
  )*			#   end of group #1, this group is optional (*)
    @			#     must contains a "@" symbol
     [A-Za-z0-9-]+      #       follow by string in the bracket [ ], must contains one or more (+)
      (			#         start of group #2 - first level TLD checking
       \\.[A-Za-z0-9]+  #           follow by a dot "." and string in the bracket [ ], must contains one or more (+)
      )*		#         end of group #2, this group is optional (*)
      (			#         start of group #3 - second level TLD checking
       \\.[A-Za-z]{2,}  #           follow by a dot "." and string in the bracket [ ], with minimum length of 2
      )			#         end of group #3
$			#end of the line
The combination means, email address must start with “_A-Za-z0-9-\\+” , optional follow by “.[_A-Za-z0-9-]“, and end with a “@” symbol. The email’s domain name must start with “A-Za-z0-9-”, follow by first level Tld (.com, .net) “.[A-Za-z0-9]” and optional follow by a second level Tld (.com.au, .com.my) “\\.[A-Za-z]{2,}”, where second level Tld must start with a dot “.” and length must equal or more than 2 characters.

1. Java Regular Expression Example

Here’s a Java example to show you how to use regex to validate email address.
EmailValidator.java
package com.mkyong.regex;
 
import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
public class EmailValidator {
 
	private Pattern pattern;
	private Matcher matcher;
 
	private static final String EMAIL_PATTERN = 
		"^[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*@"
		+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$";
 
	public EmailValidator() {
		pattern = Pattern.compile(EMAIL_PATTERN);
	}
 
	/**
	 * Validate hex with regular expression
	 * 
	 * @param hex
	 *            hex for validation
	 * @return true valid hex, false invalid hex
	 */
	public boolean validate(final String hex) {
 
		matcher = pattern.matcher(hex);
		return matcher.matches();
 
	}
}

2. Valid Emails

1. javatyro@yahoo.com, javatyro-100@yahoo.com, javatyro.100@yahoo.com
2. javatyro111@javatyro.com, javatyro-100@javatyro.net, javatyro.100@javatyro.com.au
3. javatyro@1.com, javatyro@gmail.com.com
4. javatyro+100@gmail.com, javatyro-100@yahoo-test.com
3. Invalid Emails

1. javatyro – must contains “@” symbol
2. javatyro@.com.my – tld can not start with dot “.”
3. javatyro123@gmail.a – “.a” is not a valid tld, last tld must contains at least two characters
4. javatyro123@.com – tld can not start with dot “.”
5. javatyro123@.com.com – tld can not start with dot “.”
6. .javatyro@javatyro.com – email’s first character can not start with dot “.”
7. javatyro()*@gmail.com – email’s is only allow character, digit, underscore and dash
8. javatyro@%*.com – email’s tld is only allow character and digit
9. javatyro..2002@gmail.com – double dots “.” are not allow
10. javatyro.@gmail.com – email’s last character can not end with dot “.”
11. javatyro@javatyro@gmail.com – double “@” is not allow
12. javatyro@gmail.com.1a -email’s tld which has two characters can not contains digit

4. Unit Test

Here’s a unit test using testNG.
EmailValidatorTest.java

package com.javatyro.regex;



import org.testng.Assert;

import org.testng.annotations.*;



/**

 * Email validator Testing

 * 

 * @author javatyro

 * 

 */

public class EmailValidatorTest {



    private EmailValidator emailValidator;



    @BeforeClass

    public void initData() {

        emailValidator = new EmailValidator();

    }



    @DataProvider

    public Object[][] ValidEmailProvider() {

        return new Object[][] { { new String[] { "javatyro@yahoo.com",

            "javatyro-100@yahoo.com", "javatyro.100@yahoo.com",

            "javatyro111@javatyro.com", "javatyro-100@javatyro.net",

            "javatyro.100@javatyro.com.au", "javatyro@1.com",

            "javatyro@gmail.com.com", "javatyro+100@gmail.com",

            "javatyro-100@yahoo-test.com" } } };

    }



    @DataProvider

    public Object[][] InvalidEmailProvider() {

        return new Object[][] { { new String[] { "javatyro", "javatyro@.com.my",

            "javatyro123@gmail.a", "javatyro123@.com", "javatyro123@.com.com",

            ".javatyro@javatyro.com", "javatyro()*@gmail.com", "javatyro@%*.com",

            "javatyro..2002@gmail.com", "javatyro.@gmail.com",

            "javatyro@javatyro@gmail.com", "javatyro@gmail.com.1a" } } };

    }



    @Test(dataProvider = "ValidEmailProvider")

    public void ValidEmailTest(String[] Email) {



        for (String temp : Email) {

            boolean valid = emailValidator.validate(temp);

            System.out.println("Email is valid : " + temp + " , " + valid);

            Assert.assertEquals(valid, true);

        }



    }



    @Test(dataProvider = "InvalidEmailProvider", dependsOnMethods = "ValidEmailTest")

    public void InValidEmailTest(String[] Email) {



        for (String temp : Email) {

            boolean valid = emailValidator.validate(temp);

            System.out.println("Email is valid : " + temp + " , " + valid);

            Assert.assertEquals(valid, false);

        }

    }

}

Here’s the unit test result.

Email is valid : javatyro@yahoo.com , true

Email is valid : javatyro-100@yahoo.com , true

Email is valid : javatyro.100@yahoo.com , true

Email is valid : javatyro111@javatyro.com , true

Email is valid : javatyro-100@javatyro.net , true

Email is valid : javatyro.100@javatyro.com.au , true

Email is valid : javatyro@1.com , true

Email is valid : javatyro@gmail.com.com , true

Email is valid : javatyro+100@gmail.com , true

Email is valid : javatyro-100@yahoo-test.com , true

Email is valid : javatyro , false

Email is valid : javatyro@.com.my , false

Email is valid : javatyro123@gmail.a , false

Email is valid : javatyro123@.com , false

Email is valid : javatyro123@.com.com , false

Email is valid : .javatyro@javatyro.com , false

Email is valid : javatyro()*@gmail.com , false

Email is valid : javatyro@%*.com , false

Email is valid : javatyro..2002@gmail.com , false

Email is valid : javatyro.@gmail.com , false

Email is valid : javatyro@javatyro@gmail.com , false

Email is valid : javatyro@gmail.com.1a , false

PASSED: ValidEmailTest([Ljava.lang.String;@15f48262)

PASSED: InValidEmailTest([Ljava.lang.String;@789934d4)



===============================================

    Default test

    Tests run: 2, Failures: 0, Skips: 0

=============================================== 
Read More

Tuesday, 20 August 2013

Validate regular expression for username in java

// siddhu vydyabhushana // 6 comments
Username Regular Expression Pattern
^[a-z0-9_-]{3,15}$
Description
^                    # Start of the line
  [a-z0-9_-]	     # Match characters and symbols in the list, a-z, 0-9, underscore, hyphen
             {3,15}  # Length at least 3 characters and maximum length of 15 
$                    # End of the line
Whole combination is means, 3 to 15 characters with any lower case character, digit or special symbol “_-” only. This is common username pattern that’s widely use in different websites.

1. Java Regular Expression Example

UsernameValidator.java
package com.mkyong.regex;
 
import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
public class UsernameValidator{
 
	  private Pattern pattern;
	  private Matcher matcher;
 
	  private static final String USERNAME_PATTERN = "^[a-z0-9_-]{3,15}$";
 
	  public UsernameValidator(){
		  pattern = Pattern.compile(USERNAME_PATTERN);
	  }
 
	  /**
	   * Validate username with regular expression
	   * @param username username for validation
	   * @return true valid username, false invalid username
	   */
	  public boolean validate(final String username){
 
		  matcher = pattern.matcher(username);
		  return matcher.matches();
 
	  }
}

2. Username that match:

1. mkyong34
2. mkyong_2002
3. mkyong-2002
4. mk3-4_yong

3. Username that doesn’t match:

1. mk (too short, min 3 characters)
2. mk@yong (“@” character is not allow)
3. mkyong123456789_- (too long, max characters of 15)

4. Unit Test – UsernameValidator

Using testNG to perform unit test.
UsernameValidatorTest.java
package com.mkyong.regex;
 
import org.testng.Assert;
import org.testng.annotations.*;
 
/**
 * Username validator Testing
 * @author mkyong
 *
 */
public class UsernameValidatorTest {
 
	private UsernameValidator usernameValidator;
 
	@BeforeClass
        public void initData(){
		usernameValidator = new UsernameValidator();
        }
 
	@DataProvider
	public Object[][] ValidUsernameProvider() {
		return new Object[][]{
		   {new String[] {
	             "mkyong34", "mkyong_2002","mkyong-2002" ,"mk3-4_yong"
		   }}
      	        };
	}
 
	@DataProvider
	public Object[][] InvalidUsernameProvider() {
		return new Object[][]{
		   {new String[] {
		     "mk","mk@yong","mkyong123456789_-"	  
		   }}
	        };
	}
 
	@Test(dataProvider = "ValidUsernameProvider")
	public void ValidUsernameTest(String[] Username) {
 
	   for(String temp : Username){
		boolean valid = usernameValidator.validate(temp);
		System.out.println("Username is valid : " + temp + " , " + valid);
		Assert.assertEquals(true, valid);
	   }
 
	}
 
	@Test(dataProvider = "InvalidUsernameProvider", 
                 dependsOnMethods="ValidUsernameTest")
	public void InValidUsernameTest(String[] Username) {
 
	   for(String temp : Username){
		boolean valid = usernameValidator.validate(temp);
		System.out.println("username is valid : " + temp + " , " + valid);
		Assert.assertEquals(false, valid);
	   }
 
	}	
}

5. Unit Test – Result

Username is valid : mkyong34 , true
Username is valid : mkyong_2002 , true
Username is valid : mkyong-2002 , true
Username is valid : mk3-4_yong , true
username is valid : mk , false
username is valid : mk@yong , false
username is valid : mkyong123456789_- , false
PASSED: ValidUsernameTest([Ljava.lang.String;@1d4c61c)
PASSED: InValidUsernameTest([Ljava.lang.String;@116471f)
 
===============================================
    com.mkyong.regex.UsernameValidatorTest
    Tests run: 2, Failures: 0, Skips: 0
===============================================
 
 
===============================================
mkyong
Total tests run: 2, Failures: 0, Skips: 0
===============================================
 
Read More

Tuesday, 16 July 2013

How To Extract HTML Links With Regular Expression

// siddhu vydyabhushana // 11 comments
In this tutorial, we will show you how to extract hyperlink from a HTML page. For example, to get the link from following content :
this is text1 <a href='mkyong.com' target='_blank'>hello</a> this is text2...
  1. First get the “value” from a tag – Result : a href='mkyong.com' target='_blank'
  2. Later get the “link” from above extracted value – Result : mkyong.com

1. Regular Expression Pattern

Extract A tag Regular Expression Pattern
(?i)<a([^>]+)>(.+?)</a>
Extract Link From A tag Regular Expression Pattern
\s*(?i)href\s*=\s*(\"([^"]*\")|'[^']*'|([^'">\s]+));
Description
(		#start of group #1
 ?i		#  all checking are case insensive
)		#end of group #1
<a              #start with "<a"
  (		#  start of group #2
    [^>]+	#     anything except (">"), at least one character
   )		#  end of group #2
  >		#     follow by ">"
    (.+?)	#	match anything 
         </a>	#	  end with "</a>
\s*			   #can start with whitespace
  (?i)			   # all checking are case insensive
     href		   #  follow by "href" word
        \s*=\s*		   #   allows spaces on either side of the equal sign,
              (		   #    start of group #1
               "([^"]*")   #      allow string with double quotes enclosed - "string"
               |	   #	  ..or
               '[^']*'	   #        allow string with single quotes enclosed - 'string'
               |           #	  ..or
               ([^'">]+)   #      can't contains one single quotes, double quotes ">"
	      )		   #    end of group #1

2. Java Link Extractor Example

Here’s a simple Java Link extractor example, to extract the a tag value from 1st pattern, and use 2nd pattern to extract the link from 1st pattern.
HTMLLinkExtractor.java
package com.mkyong.crawler.core;
 
import java.util.Vector;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
public class HTMLLinkExtractor {
 
	private Pattern patternTag, patternLink;
	private Matcher matcherTag, matcherLink;
 
	private static final String HTML_A_TAG_PATTERN = "(?i)<a([^>]+)>(.+?)</a>";
	private static final String HTML_A_HREF_TAG_PATTERN = 
		"\\s*(?i)href\\s*=\\s*(\"([^\"]*\")|'[^']*'|([^'\">\\s]+))";
 
 
	public HTMLLinkExtractor() {
		patternTag = Pattern.compile(HTML_A_TAG_PATTERN);
		patternLink = Pattern.compile(HTML_A_HREF_TAG_PATTERN);
	}
 
	/**
	 * Validate html with regular expression
	 * 
	 * @param html
	 *            html content for validation
	 * @return Vector links and link text
	 */
	public Vector<HtmlLink> grabHTMLLinks(final String html) {
 
		Vector<HtmlLink> result = new Vector<HtmlLink>();
 
		matcherTag = patternTag.matcher(html);
 
		while (matcherTag.find()) {
 
			String href = matcherTag.group(1); // href
			String linkText = matcherTag.group(2); // link text
 
			matcherLink = patternLink.matcher(href);
 
			while (matcherLink.find()) {
 
				String link = matcherLink.group(1); // link
				HtmlLink obj = new HtmlLink();
				obj.setLink(link);
				obj.setLinkText(linkText);
 
				result.add(obj);
 
			}
 
		}
 
		return result;
 
	}
 
	class HtmlLink {
 
		String link;
		String linkText;
 
		HtmlLink(){};
 
		@Override
		public String toString() {
			return new StringBuffer("Link : ").append(this.link)
			.append(" Link Text : ").append(this.linkText).toString();
		}
 
		public String getLink() {
			return link;
		}
 
		public void setLink(String link) {
			this.link = replaceInvalidChar(link);
		}
 
		public String getLinkText() {
			return linkText;
		}
 
		public void setLinkText(String linkText) {
			this.linkText = linkText;
		}
 
		private String replaceInvalidChar(String link){
			link = link.replaceAll("'", "");
			link = link.replaceAll("\"", "");
			return link;
		}
 
	}
}

3. Unit Test

Unit test with TestNG. Simulate the HTML content via @DataProvider.
TestHTMLLinkExtractor.java
package com.mkyong.crawler.core;
 
import java.util.Vector;
 
import org.testng.Assert;
import org.testng.annotations.BeforeClass;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
 
import com.mkyong.crawler.core.HTMLLinkExtractor.HtmlLink;
 
/**
 * HTML link extrator Testing
 * 
 * @author mkyong
 * 
 */
public class TestHTMLLinkExtractor {
 
	private HTMLLinkExtractor htmlLinkExtractor;
	String TEST_LINK = "http://www.google.com";
 
	@BeforeClass
	public void initData() {
		htmlLinkExtractor = new HTMLLinkExtractor();
	}
 
	@DataProvider
	public Object[][] HTMLContentProvider() {
	  return new Object[][] {
	    new Object[] { "abc hahaha <a href='" + TEST_LINK + "'>google</a>" },
	    new Object[] { "abc hahaha <a HREF='" + TEST_LINK + "'>google</a>" },
 
	    new Object[] { "abc hahaha <A HREF='" + TEST_LINK + "'>google</A> , "
		+ "abc hahaha <A HREF='" + TEST_LINK + "' target='_blank'>google</A>" },
 
	    new Object[] { "abc hahaha <A HREF='" + TEST_LINK + "' target='_blank'>google</A>" },
	    new Object[] { "abc hahaha <A target='_blank' HREF='" + TEST_LINK + "'>google</A>" },
	    new Object[] { "abc hahaha <A target='_blank' HREF=\"" + TEST_LINK + "\">google</A>" },
	    new Object[] { "abc hahaha <a HREF=" + TEST_LINK + ">google</a>" }, };
	}
 
	@Test(dataProvider = "HTMLContentProvider")
	public void ValidHTMLLinkTest(String html) {
 
		Vector<HtmlLink> links = htmlLinkExtractor.grabHTMLLinks(html);
 
		//there must have something
		Assert.assertTrue(links.size() != 0);
 
		for (int i = 0; i < links.size(); i++) {
			HtmlLink htmlLinks = links.get(i);
			//System.out.println(htmlLinks);
			Assert.assertEquals(htmlLinks.getLink(), TEST_LINK);
		}
 
	}
}
Result
[TestNG] Running:
  /private/var/folders/w8/jxyz5pf51lz7nmqm_hv5z5br0000gn/T/testng-eclipse--530204890/testng-customsuite.xml
 
PASSED: ValidHTMLLinkTest("abc hahaha <a href='http://www.google.com'>google</a>")
PASSED: ValidHTMLLinkTest("abc hahaha <a HREF='http://www.google.com'>google</a>")
PASSED: ValidHTMLLinkTest("abc hahaha <A HREF='http://www.google.com'>google</A> , abc hahaha <A HREF='http://www.google.com' target='_blank'>google</A>")
PASSED: ValidHTMLLinkTest("abc hahaha <A HREF='http://www.google.com' target='_blank'>google</A>")
PASSED: ValidHTMLLinkTest("abc hahaha <A target='_blank' HREF='http://www.google.com'>google</A>")
PASSED: ValidHTMLLinkTest("abc hahaha <A target='_blank' HREF="http://www.google.com">google</A>")
PASSED: ValidHTMLLinkTest("abc hahaha <a HREF=http://www.google.com>google</a>")
Read More

Top 10 Simple Java Regular Expression

// siddhu vydyabhushana // 6 comments
Regular expression is an art of the programing, it’s hard to debug , learn and understand, but the powerful features are still attract many developers to code regular expression. Let’s explore the following 10 practical regular expression ~ enjoy :)

1. Username Regular Expression Pattern

 ^[a-z0-9_-]{3,15}$
^                    # Start of the line
  [a-z0-9_-]	     # Match characters and symbols in the list, a-z, 0-9 , underscore , hyphen
             {3,15}  # Length at least 3 characters and maximum length of 15 
$                    # End of the line

2. Password Regular Expression Pattern

((?=.*\d)(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%]).{6,20})
(			# Start of group
  (?=.*\d)		#   must contains one digit from 0-9
  (?=.*[a-z])		#   must contains one lowercase characters
  (?=.*[A-Z])		#   must contains one uppercase characters
  (?=.*[@#$%])		#   must contains one special symbols in the list "@#$%"
              .		#     match anything with previous condition checking
                {6,20}	#        length at least 6 characters and maximum of 20	
)			# End of group

3. Hexadecimal Color Code Regular Expression Pattern

^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$
^		 #start of the line
 #		 #  must constains a "#" symbols
 (		 #  start of group #1
  [A-Fa-f0-9]{6} #    any strings in the list, with length of 6
  |		 #    ..or
  [A-Fa-f0-9]{3} #    any strings in the list, with length of 3
 )		 #  end of group #1 
$		 #end of the line

4. Email Regular Expression Pattern

^[_A-Za-z0-9-]+(\\.[_A-Za-z0-9-]+)*@[A-Za-z0-9]+
(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})$
^			#start of the line
  [_A-Za-z0-9-]+	#  must start with string in the bracket [ ], must contains one or more (+)
  (			#  start of group #1
    \\.[_A-Za-z0-9-]+	#     follow by a dot "." and string in the bracket [ ], must contains one or more (+)
  )*			#  end of group #1, this group is optional (*)
    @			#     must contains a "@" symbol
     [A-Za-z0-9]+       #        follow by string in the bracket [ ], must contains one or more (+)
      (			#	   start of group #2 - first level TLD checking
       \\.[A-Za-z0-9]+  #	     follow by a dot "." and string in the bracket [ ], must contains one or more (+)
      )*		#	   end of group #2, this group is optional (*)
      (			#	   start of group #3 - second level TLD checking
       \\.[A-Za-z]{2,}  #	     follow by a dot "." and string in the bracket [ ], with minimum length of 2
      )			#	   end of group #3
$			#end of the line

5. Image File Extension Regular Expression Pattern

([^\s]+(\.(?i)(jpg|png|gif|bmp))$)
(			#Start of the group #1
 [^\s]+			#  must contains one or more anything (except white space)
       (		#    start of the group #2
         \.		#	follow by a dot "."
         (?i)		#	ignore the case sensitive checking
             (		#	  start of the group #3
              jpg	#	    contains characters "jpg"
              |		#	    ..or
              png	#	    contains characters "png"
              |		#	    ..or
              gif	#	    contains characters "gif"
              |		#	    ..or
              bmp	#	    contains characters "bmp"
             )		#	  end of the group #3
       )		#     end of the group #2	
  $			#  end of the string
)			#end of the group #1

6. IP Address Regular Expression Pattern

^([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.
([01]?\\d\\d?|2[0-4]\\d|25[0-5])\\.([01]?\\d\\d?|2[0-4]\\d|25[0-5])$
^		#start of the line
 (		#  start of group #1
   [01]?\\d\\d? #    Can be one or two digits. If three digits appear, it must start either 0 or 1
		#    e.g ([0-9], [0-9][0-9],[0-1][0-9][0-9])
    |		#    ...or
   2[0-4]\\d	#    start with 2, follow by 0-4 and end with any digit (2[0-4][0-9]) 
    |           #    ...or
   25[0-5]      #    start with 2, follow by 5 and end with 0-5 (25[0-5]) 
 )		#  end of group #2
  \.            #  follow by a dot "."
....            # repeat with 3 time (3x)
$		#end of the line

7. Time Format Regular Expression Pattern

Time in 12-Hour Format Regular Expression Pattern

(1[012]|[1-9]):[0-5][0-9](\\s)?(?i)(am|pm)
(				#start of group #1
 1[012]				#  start with 10, 11, 12
 |				#  or
 [1-9]				#  start with 1,2,...9
)				#end of group #1
 :				#    follow by a semi colon (:)
  [0-5][0-9]			#   follow by 0..5 and 0..9, which means 00 to 59
            (\\s)?		#        follow by a white space (optional)
                  (?i)		#          next checking is case insensitive
                      (am|pm)	#            follow by am or pm

Time in 24-Hour Format Regular Expression Pattern

([01]?[0-9]|2[0-3]):[0-5][0-9]
(				#start of group #1
 [01]?[0-9]			#  start with 0-9,1-9,00-09,10-19
 |				#  or
 2[0-3]				#  start with 20-23
)				#end of group #1
 :				#  follow by a semi colon (:)
  [0-5][0-9]			#    follow by 0..5 and 0..9, which means 00 to 59

8. Date Format (dd/mm/yyyy) Regular Expression Pattern

(0?[1-9]|[12][0-9]|3[01])/(0?[1-9]|1[012])/((19|20)\\d\\d)
(			#start of group #1
 0?[1-9]		#  01-09 or 1-9
 |                  	#  ..or
 [12][0-9]		#  10-19 or 20-29
 |			#  ..or
 3[01]			#  30, 31
) 			#end of group #1
  /			#  follow by a "/"
   (			#    start of group #2
    0?[1-9]		#	01-09 or 1-9
    |			#	..or
    1[012]		#	10,11,12
    )			#    end of group #2
     /			#	follow by a "/"
      (			#	  start of group #3
       (19|20)\\d\\d	#	    19[0-9][0-9] or 20[0-9][0-9]
       )		#	  end of group #3

9. HTML tag Regular Expression Pattern

<("[^"]*"|'[^']*'|[^'">])*>
<	  	#start with opening tag "<"
 (		#   start of group #1
   "[^"]*"	#	only two double quotes are allow - "string"
   |		#	..or
   '[^']*'	#	only two single quotes are allow - 'string'
   |		#	..or
   [^'">]	#	cant contains one single quotes, double quotes and ">"
 )		#   end of group #1
 *		# 0 or more
>		#end with closing tag ">"

10. HTML links Regular Expression Pattern

HTML A tag Regular Expression Pattern

(?i)<a([^>]+)>(.+?)</a>
(		#start of group #1
 ?i		#  all checking are case insensive
)		#end of group #1
<a              #start with "<a"
  (		#  start of group #2
    [^>]+	#     anything except (">"), at least one character
   )		#  end of group #2
  >		#     follow by ">"
    (.+?)	#	match anything 
         </a>	#	  end with "</a>

Extract HTML link Regular Expression Pattern

\s*(?i)href\s*=\s*(\"([^"]*\")|'[^']*'|([^'">\s]+));
\s*			   #can start with whitespace
  (?i)			   # all checking are case insensive
     href		   #  follow by "href" word
        \s*=\s*		   #   allows spaces on either side of the equal sign,
              (		   #    start of group #1
               "([^"]*")   #      only two double quotes are allow - "string"
               |	   #	  ..or
               '[^']*'	   #      only two single quotes are allow - 'string'
               |           #	  ..or
               ([^'">]+)   #     cant contains one single / double quotes and ">"
	      )		   #    end of group #1

Read More