Haskell > Scala > (Java 7 <=> Functional Java) > Java

Write a parser that accepts command line arguments.
For each command line argument, print out true or false,
depending on whether or not that value parses.

For a value to parse, it must consist entirely of these
characters: '(', ')', '[', ']‘. Any other character renders
the parse invalid and so prints “false”.

The value must also be “balanced”. That is, for each opening
parenthesis or bracket, it is matched by a closing parenthesis
or bracket accordingly. The different bracket types must not
intersperse.

Here are some examples of successfully parsing values that
result in printing “true”:

""
"()"
"[]"
"([])"
"[()]"
"[]()"
"[][[([])]]"

Here are some examples of values that fail the parsing and
result in printing “false”:

"("
")"
"["
"]"
"]["
")("
"( )"
"([)"
"[)]"
"([)]"
"({})"
"[())]"

Using Haskell (without Parsec):

import Prelude hiding (foldr, any)
import Control.Monad
import Data.Foldable
import System
 
tokens = [('(', ')'), ('[', ']')]
 
parse x = any null (foldr (\c f -> (\s -> (const (c:s)) `fmap`
            flip lookup ((uncurry (flip (,))) `fmap` tokens)  c `mplus`
              (flip lookup tokens c >>= (\v ->
                do h:t <- return s
                   guard (v == h)
                   return t))) =<< f) (return []) x)
 
main = getArgs >>= print . (parse `fmap`)

Using Scala (without scala.util.parsing):

object Parser {
  val tokens = List(('(', ')'), ('[', ']'))
 
  def parse(s: String) =
    s.foldRight[Option[String]](Some(""))((a, b) => b flatMap
      (ss => {
        val s = ss.toList
        (lookup(tokens map { case (x, y) => (y, x) }, a)
          map (t => (a::s)) orElse
        (lookup(tokens, a) flatMap (v => s match {
          case Nil => None
          case h :: t => if(v == h) Some(t) else None
        }))) map (_.mkString)
      })) exists (_.isEmpty)
 
  def main(args: Array[String]) {
    args foreach (parse _ andThen println)
  }
 
  // The Scala API needs hand-holding
  def lookup[A, B](x: List[(A, B)], a: A) =
    x.find { case (aa, _) => a == aa } map (_._2)
}

Using Java 1.5 and Functional Java:

import fj.F;
import fj.F2;
import fj.Function;
import static fj.P.p;
import fj.P1;
import fj.P2;
import fj.data.List;
import static fj.data.List.asString;
import static fj.data.List.fromString;
import static fj.data.List.list;
import static fj.data.List.lookup;
import fj.data.Option;
import static fj.data.Option.some;
import static fj.function.Strings.isEmpty;
import static fj.pre.Equal.charEqual;
import static fj.pre.Show.booleanShow;
 
final class Parser {
  @SuppressWarnings("unchecked")
  static final List<P2<Character, Character>> tokens =
    list(p('(', ')'), p('[', ']'));
 
  static boolean parse(final String s) {
    return fromString(s).foldRight(
      new F2<Character, Option<String>, Option<String>>() {
      public Option<String> f(final Character c, final Option<String> b) {
        return b.bind(new F<String, Option<String>>() {
          public Option<String> f(String s) {
            final List<Character> ss = fromString(s);
            return lookup(charEqual,
              tokens.map(P2.<Character, Character>swap_()), c)
                .map(Function.<Character,
                  List<Character>>constant(ss.cons(c)))
                .orElse(lookup(charEqual, tokens, c)
                .bind(new F<Character, Option<List<Character>>>() {
                  public Option<List<Character>> f(final Character v) {
                    return ss.isEmpty() || v != ss.head() ?
                        Option.<List<Character>>none() :
                        some(ss.tail());
                  }
                })).map(asString());
          }
        });
      }
    }, some("")).exists(isEmpty);
  }
 
  public static void main(final String[] args) {
    for(final String s : args) {
      booleanShow.println(parse(s));
    }
  }
}

Using Java BGGA and Functional Java:

import fj.Function;
import static fj.P.p;
import fj.P1;
import fj.P2;
import fj.data.List;
import static fj.data.List.asString;
import static fj.data.List.fromString;
import static fj.data.List.list;
import static fj.data.List.lookup;
import fj.data.Option;
import static fj.data.Option.some;
import static fj.function.Strings.isEmpty;
import static fj.pre.Equal.charEqual;
import static fj.pre.Show.booleanShow;
 
final class ParserBGGA {
  @SuppressWarnings("unchecked")
  static final List<P2<Character, Character>> tokens =
    list(p('(', ')'), p('[', ']'));
 
  static boolean parse(final String s) {
    return fromString(s).foldRight({ char c, Option<String> b =>
      b.bind({ String ss =>
        final List<Character> s = fromString(ss);
          lookup(charEqual,
            tokens.map(P2.<Character, Character>swap_()), c)
          .map(Function.<Character, List<Character>>constant(s.cons(c)))
          .orElse(lookup(charEqual, tokens, c)
          .bind({ char v =>
            s.isEmpty() || v != s.head() ?
              Option.<List<Character>>none() :
              some(s.tail())})).map(asString())})}
      , some("")).exists(isEmpty);
  }
 
  public static void main(final String[] args) {
    for(final String s : args) {
      booleanShow.println(parse(s));
    }
  }
}

A submission by a Java programmer who volunteered:

import java.util.ArrayList;
 
public class IsBalanced {
 
  private Stack stack = new Stack();
 
  public boolean isBalanced(String expression) {
 
    boolean isBalanced = true;
 
    for (int i = 0; i < expression.length(); i++) {
      char nextChar = expression.charAt(i);
 
      if (nextChar == '(') {
        stack.push('(');
      }
      else if (nextChar == '[') {
        stack.push('[');
      } else {
        if (!stack.isEmpty()) {
          char stackChar = (Character) stack.peek();
          if (nextChar == ')') {
            if (stackChar == '(') {
              stack.pop();
            } else {
              return false;
            }
 
          } else if (nextChar == ']') {
            if (stackChar == '[') {
              stack.pop();
            } else {
               return false;
            }
          }
        } else {
          return false;
        }
      }
    }
 
    if (!stack.isEmpty()) {
      isBalanced = false;
    }
    return isBalanced;
 
 
  }
 
  public static void main(String[] args) {
    System.out.println(args[0]);
    System.out.println((new IsBalanced()).isBalanced(args[0]));
  }
 
  private class Stack {
 
    private ArrayList stack = new ArrayList();
 
    public void push(char c) {
      stack.add(c);
 
    }
 
    public void pop() {
      stack.remove(stack.size()-1);
    }
 
 
    public Object peek() {
      return stack.get(stack.size()-1);
    }
 
    public boolean isEmpty() {
      return stack.size() == 0;
    }
  }
}

Forget the syntax for a moment and consider the compositional properties of each of these programs. For example, what if I were to add another pair of brackets such as '{' and '}'? Are there sub-programs that exist independently and can be extracted from the larger program? How does that affect my ability to reason (and therefore, read) the code?

What properties about the code make it easy to read?

46 Responses to “Haskell > Scala > (Java 7 <=> Functional Java) > Java”

  1. augustss Says:

    I’m sorry, but that Haskell code is unreadable. Here’s an alternative:

    import Control.Monad
    import Text.ParserCombinators.Parsec
    import System
    
    pArg =
         do char '('; pArg; char ')'
     <|> do char '['; pArg; char ']'
     <|> return ' '
    
    testArg = either (const False) (const True) . parse (pArg >> eof) ""
    
    main = getArgs >>= print . fmap testArg
    
  2. huu Says:

    valid ::= ‘(’ valid ‘)’ | ‘[' valid ']‘ | valid valid | e

  3. bearophile Says:

    Python version, you can add a new pair of brackets adding them to the parens tuple:
    def parse(s, parens=(”[]“,”()”)):
    __ return (not s or
    __ any((s[0]==o and parse(s[1:-1]) and s[-1]==c) for o,c in parens)
    __ or any(parse(s[:i]) and parse(s[i:]) for i in xrange(1, len(s))))

  4. JN Says:

    object Parser {
    def tokenMap = Map(’(’ -> ‘)’, ‘[' -> ']‘)

    def parseUntil(l : List[Char], end : Char) : List[Char] =
    if (l.head == end) l.tail else parseUntil(parseUntil(l.tail, tokenMap(l.head)), end)

    def parse(l : List[Char]) : List[Char] =
    if (l.isEmpty) l else parse(parseUntil(l.tail, tokenMap(l.head)))

    def main(args : Array[String]) = args map (_.toList) foreach parse
    }

  5. bearophile Says:

    A bit better:
    def parse(s, parens=(”[]“, “()”)):
    __ return (not s or
    ___ (parse(s[1:-1]) and any(s[0]==o and s[-1]==c for o,c in parens))
    ___ or any(parse(s[:i]) and parse(s[i:]) for i in xrange(1, len(s))))

  6. JN Says:

    Or in an imperative style:

    object Parser {
      def tokenMap = Map('(' -> ')', '[' -> ']')
    
      def parsePair(l : List[Char]) : List[Char] = {
        var l2 = l.tail
    
        while (l2.head != tokenMap(l.head))
          l2 = parsePair(l2)
    
        l2.tail
      }
    
      def parse(l : List[Char]) = {
        var l2 = l
    
        while (!l2.isEmpty)
          l2 = parsePair(l2)
    
        l2
      }
    
      def main(args : Array[String]) = args map (_.toList) foreach parse
    }
    
  7. Stephen Gregory Says:

    I agree Java isn’t usually the most elegant for the sake of comparison it could be done like this.


    import java.util.Stack;

    public class IsBalanced {

    private static final boolean matches(char stack, char next){
    switch (stack){
    case '(': return next ==')';
    case '[': return next ==']‘;
    }
    return false;
    }

    public boolean isBalanced(String expression) {
    Stack stack = new Stack();

    for(char nextChar: expression.toCharArray()){
    if (nextChar == ‘(’|| nextChar == ‘[') {
    stack.push(nextChar);
    } else if (!stack.isEmpty()&&matches(stack.peek(), nextChar)) {
    stack.pop();
    } else {
    return false;
    }
    }

    return stack.isEmpty();
    }

    public static void main(String[] args) {
    System.out.println(args[0]);
    System.out.println((new IsBalanced()).isBalanced(args[0]));
    }

    }

  8. walter chang Says:

    even shorter, in scala:

    import util.parsing.combinator.syntactical._

    object BracketParenParsers extends StandardTokenParsers {

    def main(args: Array[String]) {
    args.foreach(parse)
    }

    lexical.delimiters += (”[", "]“, “(”, “)”)

    def pairs: Parser[Any] = ( “[" ~ pairs ~ "]” | “(” ~ pairs ~ “)” )*

    def parse(input: String) {
    phrase(pairs)(new lexical.Scanner(input)) match {
    case Success(_, _) => println(”true”)
    case _ => println(”false”)
    }
    }
    }

  9. olt Says:

    IMHO they are all very hard to read. The only readable is the last Java one, but that one is not extensible.

    So, python FTW!
    http://paste.pocoo.org/show/84473/

  10. Foo Says:

    Man, there’s some crappy Java there.

    import java.util.*;
    import java.io.*;

    public class Parse {
    public static ArrayList in = new ArrayList(Arrays.asList(new Object[] { ‘(’, ‘[' }));
    public static ArrayList out = new ArrayList(Arrays.asList(new Object[] { ‘)’, ‘]’ }));

    public static boolean isBalanced(char[] vals) {
    char[] old = new char[vals.length];
    int j = -1;
    for(char val : vals) {
    int n = out.indexOf(val);
    if (j >= 0 && n >= 0 && old[j] == in.get(n)) –j;
    else old[++j] = val;
    }
    return j == -1;
    }

    public static void main(String[] args) {
    Scanner scan = new Scanner(System.in);
    while(scan.hasNextLine())
    System.out.println(isBalanced(scan.nextLine().toCharArray()));
    }
    }

  11. Evgeniy Says:

    public class IsBalanced {
    private final String[] pairs = {”\\(\\)”, “\\[\\]“, “\\{\\}”};
    public boolean isBalanced(String subj) {
    int len;
    while (true) {
    len = subj.length();
    for (int i = 0; i < pairs.length; i++) {
    subj = subj.replaceAll(pairs[i], “”);
    }
    if (len == subj.length())
    break;
    }
    return subj.length() == 0;
    }
    public final static void main(String[] args) {
    IsBalanced me = new IsBalanced();
    System.out.println(”Subject string is” + (me.isBalanced(args[0]) ? “” : ” NOT”) + ” balanced”);
    }
    }

  12. lamby Says:

    /(?(\((?>(?&r))*\))|(\[(?>(?&r))*\]))*/ under PCRE.

  13. Marcio Says:

    I’m sorry, but what is the point? Your Haskell code is UNREADABLE! It’s like russian or chinese!

  14. cschneid Says:

    <?php

    foreach (array_splice($argv, 1) as $arg)
    {
    $tokens = array(”()”, “[]“);
    while (($new = str_replace($tokens, array(), $arg)) != $arg)
    $arg = $new;
    echo $arg === “” ? “true\n” : “false\n”;
    }

  15. lamby Says:

    Prolog anyone?

    balanced(X) :- balanced(X, []).
    balanced([], []) :- true.
    balanced([H|T], [H|Ts]) :- balanced(T, Ts).
    balanced(['['|T], S) :- balanced(T, [']‘|S]).
    balanced(['('|T], S) :- balanced(T, [')'|S]).

  16. michael Says:

    It depends on the programmer

    List in = Arrays.asList(new Character[] { '(', '[' });
    List out = Arrays.asList(new Character[] { ')', ']' });
    
    boolean isBalanced(char[] characters) {
      Stack stack = new Stack();
    
      for(char character : characters) {
        if(in.contains(character)) {
          stack.push(character);
        } else if(out.contains(character) && !stack.isEmpty() && in.indexOf(stack.peek()) == out.indexOf(character)) {
          stack.pop();
        } else {
          return false;
        }
      }
    
     return stack.isEmpty();
    }
    
  17. Bob Says:

    huu has it right. CFG >> all other languages for this problem.

    I wish people would stop making up trivial problems and using them to “prove” their language is best, but in this case the Haskell code offered is so amusingly obscure it refutes your thesis, so I guess it’s ok.

  18. Stefan Wagner Says:

    struggeling again with the wordpress-syntax - maybe it’s fixable, and maybe a permanent link is possible.

    public class PairMatcher
    {
    	public String [] pairs;
    	public PairMatcher (String [] pairs)
    	{
    		this.pairs = pairs;
    	}
    
    	/** reduce as long as it has an effect */
    	public boolean wellformed (String expression)
    	{
    		String old;
    		do
    		{
    			old = expression;
    			expression = reduce (expression);
    		} while (old.length () > (expression.length ()));
    
    		return expression.length () == 0;
    	} 
    
      	/**	replace a pair in the kern "([()])" => "([])"  	*/
    	public String reduce (String expression)
    	{
    		for (String pair : pairs)
    		{
    			if (expression.contains (pair))
    				return expression.replace (pair, "");
    		}
    		return expression;
    	}
    
    	public static void main (String[] args)
    	{
    		PairMatcher pm = new PairMatcher (new String [] {"[]", "()"});
    		for (String test : args)
    		{
    			System.out.println (test + "\t" + pm.wellformed (test));
    		}
    	}
    }
    

    and maybe I missed the topic of the lesson. :)

  19. Stefan Wagner Says:

    Mhm. Talking about wordpress:
    The <pre>

    foo
        bar
             code
        rab
    oof
    

    </pre>
    works quiet nice, but not in the preview. :)
    And I would like to use those nice colorizing.
    I tried {{syntax:java}} and [[code:java]] String foo=”foo”; [[/code]] without success…

  20. Spencer Tipping Says:

    If you’re willing to use ASCII values, then you can make the code algorithmically very simple by using the two least-significant bits as markers. The second-least significant bit is 1 for closers and 0 for openers, and the least-significant bit is 1 for square brackets and 0 for parens.

    (How do you get the code to indent?)

    import java.util.LinkedList;
    
    public class match {
      private static boolean chars_valid (char[] cs) {
        for (char c : cs)
          if ("()[]".indexOf (c) == -1)
            return false;
        return true;
      }
    
      private static boolean match_valid (char[] cs) {
        LinkedList<Character> state = new LinkedList<Character> ();
        for (char c : cs)
          if ((c & 0x2) == 0)
            state.addFirst (c);
          else if ((state.size () == 0) || (((state.remove () ^ c) & 0x1) != 0))
            return false;
        return state.size () == 0;
      }
    
      public static void main (String[] args) {
        for (String a : args)
          System.out.println (
            chars_valid (a.toCharArray ()) &&
            match_valid (a.replace ('(', '0').replace (')', '2')
                          .replace ('[', '1').replace (']', '3').toCharArray ()));
      }
    }
    

    This might be more elegant in C.

    Spencer

  21. cschneid Says:

    PHP:

    foreach (array_splice($argv, 1) as $arg)
    {
    	$tokens = array("()", "[]");
    	while (($new = str_replace($tokens, array(), $arg)) != $arg)
    		$arg = $new;
    	echo $arg === "" ? "true\n" : "false\n";
    }
    
  22. Tony Morris Says:

    So er thanks for all the comments everybody. There are no doubt more elegant solutions to this problem (heck, the existence of Parsec - QED). However, the “thesis” (Bob) is to observe compositional aspects of the programs.

  23. Java Programmer Says:

    What you write tells lot about you. Looking at your coding, at best it describes you as armature programmer. Do you really use f, s and j as variable names? May be when you are the only one to read your code (or never read again at all)

    If you write code like this in any organization you wouldn’t last to see a single full Friday in office

  24. Will Says:

    Yuck. What is this? Why hasn’t someone chimed in with the proper state machine solution? You guys need some Knuth in your life. Desperately so.

  25. Tony Morris Says:

    Who else would submit the charge of amateurism (armaturism?) for variable names, but a self-proclaimed “Java Programmer”. tehe, how amusing :)

    Will, the proper solution is to use a parser such as those submitted (e.g. monadic parsers as in the case of augustss and scala.util.parsing). I think the point of this post was lost and I should have made it clearer.

    I rushed it in the end, since it had been sitting around for a while. Oh well, next time.

  26. Will Says:

    Sweating the compositional aspects at this point won’t gain you an inch, as you haven’t completed your analysis of the problem at hand and grokked its true nature. None of the solutions are conducive to clear thought. The language politics are of no consequence. Everyone loses.

  27. Will Says:

    I agree that using a prepackaged parser makes sense in general. Sorry if I’m coming across like a pathological douche; I know you’re trying to make some weird point and it’s flying over my head.

  28. jb Says:

    Will, by the time you have contemplated the true nature of the stream, the water will have flowed on, and that stream will no longer be in front of you.

  29. Indefinite Articles » Balanced Tokens in Many Languages Says:

    [...] Tony Morris blogs about his solution to the problem of validating ‘balanced pairs’ of to… [...]

  30. DavidLG Says:

    A cute Scala solution using pattern matching:

    object Parser {
      def main(args: Array[String]) {
        def takeBalanced(s: List[Char]): Option[List[Char]] = s match {
          case Nil => Some(Nil);
          case ('(' :: ss) => takeBalanced(ss) flatMap {
            case (')' :: sss) => takeBalanced(sss); case _ => None }
          case ('[' :: ss) => takeBalanced(ss) flatMap {
            case (']' :: sss) => takeBalanced(sss); case _ => None }
          case ss => Some(ss);
        }
        for (s <- args) println(takeBalanced(s.toList) == Some(Nil));
      }
    }
    

    Btw, this to me is the “functional programming” solution, the one that uses recursion and immutable structures instead of iteration and a mutable stack. I don’t even have a name for the technique where there are “maps” and “folds” on every line, except maybe “unreadable”…

  31. Foo Says:

    Tony, your “thesis” is that Haskell is better than Java. It’s proudly announced in the title of your blog entry. the compositional bit is just tucked in at the very end of a long chunk of code. In your example you post an almost unreadable bit of Haskell, which happens to be shorter than an incompetent bit of Java writing. As I showed, it’s trivial to replace this with Java code which is rather better than your Haskell code from a readability and maintainability standpoint, and is almost as short. (Note that your blogmachine strips out double-minuses in comments, so what should have been “decrement j” got turned into “minus j”; in addition to the stripping out of formatting, bah, is this a coding blog or what?)

    I love Haskell. But as it turns out in for this problem, decent Java is far easier to read, maintain, and write, than decent Haskell. Perhaps that’s not what you wanted to say.

  32. James Iry Says:

    Will, this language is not regular so it can’t be recognized by any (finite) state machine. If it could be recognized by a state machine everybody would have written a regular expression and moved on.

    It can, however, be recognized by a pushdown automaton. Many of the solutions offered are based on that - some use an explicit stack and some use the call stack. The ones based on parsers treat it as a CFG which is more powerful than necessary but cleanly solves the problem.

  33. Spencer Tipping Says:

    This comes across very nicely in C as a purely-functional program. When properly commented, the code should be easy enough to read:

    #include <stdio.h>
    
    int type  (char c) { return c == '(' || c == ')'; }
    int open  (char c) { return c == '(' || c == '['; }
    int valid (char c) { return c == '(' || c == ')' || c == '[' || c == ']'; }
    
    char *matches (char current, char *rest) {
      // If the current character opens a group, then check if the first character
      // of the rest closes it. If so, then jump past the group. Otherwise, enter
      // one level.
    
      // Base case. If we get a null string, then return null.
      if (! rest) return NULL;
    
      // Base case. If we hit the end of the string, then the current character must
      // be null as well to indicate that we've closed everything.
      if (*rest == '') return (current == '') ? rest : NULL;
    
      // Check. If the current character is invalid, then reject.
      if (! valid (*rest)) return NULL;
    
      // Base case. If we open and immediately close, then return the character
      // after the closing paren.
      if (open (current) && ! open (*rest) && type (current) == type (*rest))
        return rest + 1;
    
      // Inductive case. If we open and do not immediately close, then step into the
      // parens. We must be prepared to come back out when that happens, though, so
      // pass the current character into the next run and jump past the closing
      // paren.
      if ((open (current) || current == '') && open (*rest))
        return matches (current, matches (*rest, rest + 1));
    
      // Base case. If nothing else works, then fail.
      return NULL;
    }
    
    int main (int argc, char **argv) {
      int i = 0;
      for (i = 1; i < argc; i++)
        printf ("%s\n", matches ('', argv[i]) ? "true" : "false");
      return 0;
    }
    

    This is more straightforward than trying to use imperative programming, since you get a stack for free.

    Spencer

  34. Kevin Says:

    @Foo: “decent Haskell” would use Parsec.

  35. Tony Morris Says:

    Thank you Kevin, I’m not sure why this needs stating. Can anyone talk about the compositional aspects of both programs already? Why can it all be abstracted to a monadic parser for example? Does the language matter in answering this question?

    This has really got out of hand and I didn’t expect it.

  36. Neal Gafter Says:

    Will: this problem can’t be solved with a state machine. This can be proven using the pumping lemma for regular languages. That is why the proposed solutions use push-down automata.

  37. Steve Cooper Says:

    If we’re interested in composability, I’d definitely want something more general than any of the listed solutions. (I’ll use what I think are haskell type annotations, but my haskell’s not so hot, so excuse any mistakes. I do C# by day.)

    First pass, a little more flexible, would be to have a function which accepts a set of brace pairs, with a signature like;

    [string]->string->bool.

    Or in C#;

    bool Parse(string input, IEnumerable bracePairs)

    which you partially-apply like;

    var tokens = new [] { “[]“, “{}” };
    Predicate bracketParser = args => Parse(args, tokens);

    so you can pass your list of open-close characters in. That would extend it to use any number of bracket pairs.

    More generally, though, why not extract the string data type entirely? The underlying idea, I think, is that we want to pair open and close operations on a sequence. So paren-matching is connected to these predicates (C#);

    char => char == ‘(’
    char => char == ‘)’

    But who cares about characters specifically? We can get to a general matcher, which will match paired open/close predicates in any sequence. I think the haskell type looks like [(a->bool, a->bool)]->a->bool. (That first type should be ‘list of pairs of predicates of a’. In C#

    bool Parse(
    IEnumerable
    inputSequence,
    IEnumerable<Pair<Predicate
    , Predicate> openClosePredicates)

    I’ve written a C# version which I think is fairly general. It’s on another machine, but I’ll post it if you’re interested.

    Basic idea, though, is to use a stack to match open and close predicates. So;

    Create a stack of close-predicates. These are, say, the close-brcakets we’re waiting for,

    Loop through the sequence.

    If you match an open-predicate to the current item (say, char => char == ‘(’) you push the paired close-predicate (say, char => char == ‘)’) onto the stack.

    If you can match the close-predicate at the top of the stack, great — pop it off, we’ve got paired brackets.

    Otherwise, you’ve got a mismatched sequence and return false.

    At the end, you return true iff you’ve got an empty stack.

    Anyway, this allows you to write general open/close parsers. If you wanted to write a parser for a VB-like language, you could pass in paired functions like

    statement => Regex.IsMatch(statement, “if .* then”)
    statement => statement == “end if”

    working on a sequence of lines of code.

    Floating around in the back of my head, half-baked, is the idea that maybe you could combine or pass parsers to each other somehow; something like;

    vbParser = Parser.Compose(IfBlockParser, ForLoopParser, WhileLoopParser)

    Hmm…

  38. Steve Cooper Says:

    Oops. The code that starts that link should look like

    bool Parse<A>(
    IEnumerable<A> inputSequence,
    IEnumerable<Pair<Predicate<A>, Predicate<A>> openClosePredicates)

  39. Bruce Chapman Says:

    java - Emphasising readability over efficiency

    public class BracketParser {
    
        public static void main(String[] args) {
            for(String arg : args) {
                System.out.println(parse(arg));
            }
        }
    
        public static boolean parse(String arg) {
           while(arg.length() !=0 && ( arg.contains("()") || arg.contains("[]"))) {
                arg = arg.replace("[]", "").replace("()", "");
            }
            return arg.length() == 0;
        }
    }
    
  40. aes Says:

    Spencer Tipping beat me to it with a purely functional C solution - ah well. Let’s compete in brevity then.

    A C solution:

    #include <stdio.h>
    
    char matching_parens[256] = { ['('] = ')', ['['] = ']' };
    
    const unsigned char* look_for(unsigned char expected, const unsigned char* str)
    {
        return (str == 0) ? 0
            :  (*str == expected) ? *str == 0 ? str : str + 1
            :  (matching_parens[*str]) ?
               look_for(expected, look_for(matching_parens[*str], str + 1))
            :  0;
    }
    
    int main(int argc, const char* argv[]) {
        while (++argv, --argc) {
            const unsigned char* result = look_for(0, *((unsigned char**) argv));
            printf((result != 0 && *result == 0) ? "true\n" : "false\n");
        }
        return 0;
    }
    
  41. aes Says:

    Trying to write a cleaner and better composed version of the above no less than triples the line count from 20 to 60. But at least it’s neatly split into three ‘modules’:

    Bracket configuration. Single point of configuration of valid opening and closing brackets in the initializer of char matching_brackets[]. Public interface:

    is_opening_bracket(char c)
    get_matching_closing_bracket(char c))

    The parsing module. Public interface:

    check(const char* str)

    The main program. Public interface:

    main(int argc, const char* argv[])

    #include <stdio.h>
    #include <assert.h>
    
    /* Section 1: configuration of brackets and accessors to it. */
    
    /* Mapping from opening brackets to closing brackets. */
    char matching_brackets[256] = {
        ['('] = ')',
        ['['] = ']'
    };
    
    int is_opening_bracket(char c)
    {
        return matching_brackets[(unsigned char) c] != '';
    }
    
    char get_matching_closing_bracket(char c)
    {
        assert(is_opening_bracket(c));
        return matching_brackets[(unsigned char) c];
    }
    
    /* Section 2: the parsing function itself. */
    
    /* Parse a sequence of valid start/end brackets followed by
     * 'expected'.*/
    const char* parse(char expected, const char* str)
    {
        return
                /* In case of NULL input, return NULL. */
                (str == NULL) ? NULL
                /* In case of succeeded match, return the rest of the string;
                 * as a special case, if 'expected' is 0, return pointer to
                 * it to signal success. */
            :   (*str == expected) ? *str == 0 ? str : str + 1
                /* In case of opening bracket, parse until its matching
                 * closing bracket and then continue from there */
            :   is_opening_bracket(*str) ?
                parse(expected, parse(get_matching_closing_bracket(*str), str + 1))
                /* In case of anything else, return NULL */
            :   NULL;
    }
    
    /* Is str a valid sequence of paired brackets? */
    int check(const char* str)
    {
        const char* result = parse(0, str);
        return (result != 0 && *result == 0);
    }
    
    /* Section 3: the main program. */
    
    /* Walk through all parameters and show parse results. */
    int main(int argc, const char* argv[])
    {
        while (++argv, --argc) {
            printf(check(*argv) ? "true\n" : "false\n");
        }
        return 0;
    }
    

    Unlike some might say, it would seem that brevity is occasionally the enemy of maintainability. Or is it just a daydream by us Blub programmers?

  42. Arnaud Bailly Says:

    Hello,
    Thank you for the problem. I proposed it for yesterday’s dojo in Paris and we have had a very interesting session. We came up with a simple solution based on replaceAll(), like one proposal among the preceding comments, except of course we did it TDD :-) An interesting point IMHO is that all of the people around the table (about 10) did manage to help making progress towards the solution, as they were not impaired by the syntax (java) nor the environment (Eclipse), beyond the first 10 minutes setup, although most of them were not coding in java daily. I regret to say we did not achieve the same “effectiveness” with Haskell, which seems to be quite hard to grasp for most people.

    Anyway, thanks again for this nice setting. Could you be a bit more clear about the composability issues you were aiming at ?

  43. Paul King Says:

    Groovy version of Bruce’s solution:

    def parse(s) { def t = s.replace(”[]“, “”).replace(”()”, “”); t == s ? t : parse(t) }
    args.each { println parse(it) == “” }

    A little less imperative:

    brackets = ["[]“, “()”]
    def parse2(s) { def t = brackets.inject(s){ a, b -> a.replace(b,”") }; t == s ? t : parse2(t) }

  44. Steven Shaw Says:

    Here’s my top-down parser version in Java.

    public class SimpleBraceParser {
    
      private int index = 0;
      private String s;
    
      public SimpleBraceParser(String s) { this.s = s; }
    
      private void skip(char c) {
        ++index;
      }
    
      private char currentChar() {
        if (index < s.length()) return s.charAt(index);
        else return ''; // Evil "sentinel" value
      }
    
      private void eat(char c) {
        if (currentChar() == c) {
          ++index;
          return;
        }
        throw new RuntimeException("parse error at " + index + ", expecting '" + c + "'");
      }
    
      public void parseRoundBrace() {
        skip('(');
        parseExpr();
        eat(')');
      }
    
      public void parseSquareBracket() {
        skip('[');
        parseExpr();
        eat(']');
      }
    
      public boolean parseExpr() {
        if (currentChar() == '(') {
          parseRoundBrace();
          return true;
        } else if (currentChar() == '[') {
          parseSquareBracket();
          return true;
        } else return false;
      }
    
      public void parseExprs() {
        while (parseExpr()) {
        }
      }
    
      public void parse() {
        parseExprs();
        if (currentChar() != '')
          throw new RuntimeException("not terminated correctly at " + index);
      }
    
      static boolean reportErrors = false;
    
      public static void main(String[] args) {
        for (String arg : args) {
          SimpleBraceParser p = new SimpleBraceParser(arg);
          try {
            p.parse();
            System.out.println("true");
          } catch (Exception e) {
            if (reportErrors) e.printStackTrace(System.err);
            System.out.println("false");
          }
        }
      }
    
    }
    

    Yep, it’s pretty long! I like top-down parsers as they are pretty intuitive. I’d use Antlr for any more serious parser work.

    I really like the look at the Haskell version with Parsec. I tested your Haskell, Scala and Java versions along with my Java version and the Parsec version (amongst others). Your Haskell version and Parsec are very fast - Parsec is faster :). I only used /usr/bin/time to test so the JVM startup time could have hampered the Java solutions. My recursive Java version and your Scala version require stack-size increases (on my large inputs).

    With the Scala version performance extremely poorly on large inputs. Pretty sure it doesn’t matter if the inputs are heavily nested or not. Maybe Scala isn’t the best language for functional style code. Are there any ways to improve the performance and stay functional?

  45. Krzysztof Kościuszkiewicz Says:

    I have to admit that your Haskell solution is far from being clear, and that doesn’t help you with geting your main point across.

    I’m posting a shorter and less dense solution to your problem below. Hope that helps :)


    import Control.Monad
    import System.Environment

    tokens = [('[',']‘), (’(',’)')]

    parse [] = Just []
    parse (c:cs) = case lookup c tokens of
    Nothing -> return (c:cs)
    Just end -> parse cs >>= consume end >>= parse

    consume c [] = Nothing
    consume c (x:xs) = guard (c == x) >> return xs

    main = getArgs >>= mapM (print . maybe False null . parse)

  46. Tony Morris Says:

    Actually, the fact that you can post a neater solution was exactly the point, but that was lost a long time ago. The neater solution can be derived from the less neat solution by replacing lower-order expressions with higher-order expressions and kapoof! Let there be Parsec.

Leave a Reply