Old regex in java example

I routed an article on RegEx earlier, and I thought I’d follow it up with a simple usage of RegEx in Java. The attached file will grab a web page from the web and print the title and all the links on the page. Give it a look. It can be used as a template for a lot of different things by changing the RegEx. It requires our commons-1.x.x.jar and apache-oro.jar in the classpath. Both of these are available in most of our projects. (jakarta-oro.jar is part of Struts). commons-1.x.x.jar is only required because it is a command line app using the ConsoleApp class.

Have fun. Let me know if there should be more comments.

import java.io.*;
import java.net.URL;

import org.apache.oro.text.perl.Perl5Util;
import com.dayspringtech.util.ConsoleApp;

/**
 * This is a simple class that illustrates the use of the ORO regex package. The example
 * retrieves a url passed in on the command line (or defaults to the home page of Wazia)
 * and finds the title and all outgoing links. Hope this helps in giving you a quick and
 * easy use of regex in a simple utility.
 *
 * Yes Bruce this could be done in Perl with less code but then you would have to have Perl installed ;-) .
 *
 * Requires: commons-1.x.x.jar
 * jakarta-oro.jar
 */

public class GetOutLinks extends ConsoleApp {
    protected GetOutLinks() {}
    public static void main(String[] args) { (new GetOutLinks())._main(args); }

    protected String url = null;
    public boolean init() {
        boolean ret = super.init();
        if (getNumStrings()>0)
            url = getString(0);
        else
            url = "http://www.wazia.com";
        return ret;
    }

    public void run() {
        try {
            URL inUrl = new URL(url);
            BufferedReader in = new BufferedReader(new InputStreamReader(inUrl.openStream()));
            // PrintWriter out = new PrintWriter(new FileWriter("U:/sample_out.txt"));
            PrintWriter out = new PrintWriter(System.out);
            String line = null;
            String lastLine = "";
            Perl5Util util = new Perl5Util();

            while ((line=in.readLine())!=null) {
                // find title
                if(util.match("//", line)) {
                    out.println("Page Title \"" + util.group(1) + "\""); // Prints the 1st "group" (part of the match within parens)
                    out.flush();
                }

                // find link

                // This code is a little trickier because our html editors can wrap links onto multiple lines
                // lastLine holds all of the text in the file since the last match. This is not intuitive,
                // but as long as there isn’t a match lastLine continues to get longer.
                // This illustrates the "by line" nature of the method.
                line = lastLine + line;
                while(util.match("/(.*?)/", line)) {
                    out.println(util.group(0)); // Prints whole match
                    out.println(" " + util.group(2) + " -> " + util.group(1)); // Prints the 1st and 2nd "groups"
                    line = util.postMatch(); // look at the rest of the line for more matches
                }
                lastLine = line;
            }
            out.flush();
            out.close();
            in.close();
        } catch (Exception e) {
            throw new RuntimeException(e.toString());
        }
    }
}

One Response to “Old regex in java example”

  1. JLG says:

    If you need to test your Regex’s you can use this page:

    http://www.dotnetcoders.com/web/Learning/Regex/RegexTester.aspx

Leave a Reply