Google Guava CharMatcher

Introduction

Google Guava CharMatcher class is a Predicate for a character value. It also has basic text processing methods. In this post, we will learn about the Google Guava CharMatcher class.

Google Guava CharMatcher

A CharMatcher represents a particular class of characters, like digits, whitespace or any set of characters. It implements Predicate<Character> so it can answer the question – does a character present in the character set the CharMatcher instance represents?

It also offers many text processing methods on strings based on the matching characters. Here “matching characters” refers to any character value c for which this.matches(c) returns true.

Building a CharMatcher instance

There are many static utility methods which allow us to build and get a CharMatcher instance. Let us look at them.

any()

The CharMatcher#any returns a CharMatcher that matches any character. As shown below, it returns true for all sorts of characters.

CharMatcher any = CharMatcher.any();
System.out.println(any.test('c')); //true
System.out.println(any.test('1')); //true
System.out.println(any.test('¶')); //true

none()

As the inverse of the any() method, we have the none() method. It returns a CharMatcher which doesn’t match any character.

CharMatcher none = CharMatcher.none();
System.out.println(none.test('c')); //false
System.out.println(none.test('1')); //false

ascii()

The CharMatcher returned by calling the ascii() method determines if a character is ASCII i.e., its code point is less than 128. Refer to this for the ASCII table.

CharMatcher ascii = CharMatcher.ascii();
System.out.println(ascii.test(' ')); //true
System.out.println(ascii.test('c')); //true
System.out.println(ascii.test('1')); //true
System.out.println(any.test('¶')); //false

anyOf()

The anyOf() method takes a CharSequence and returns a CharMatcher that can match any of those characters. For example, let us create a CharMatcher which can match the characters ‘a’, ‘b’, ‘c’, ‘d’, ‘1’ and ‘2’.

CharMatcher anyOf = CharMatcher.anyOf("abcd12");
System.out.println(anyOf.matches('a')); //true
System.out.println(anyOf.matches('d')); //true
System.out.println(anyOf.matches('1')); //true
System.out.println(anyOf.matches('2')); //true

System.out.println(anyOf.matches('e')); //false
System.out.println(anyOf.matches('3')); //false

As shown above, it doesn’t match the characters ‘e’ and ‘3’.

is() and isNot()

The is() method takes a character and returns a CharMatcher which matches only that character. On the other hand, the isNot()returns a CharMatcher which matches all characters except the passed character. An example is shown below.

CharMatcher is = CharMatcher.is('a');
System.out.println(is.matches('a')); //true
System.out.println(is.matches('b')); //false

CharMatcher isNot = CharMatcher.isNot('a');
System.out.println(isNot.matches('a')); //false
System.out.println(isNot.matches('b')); //true
System.out.println(isNot.matches('1')); //true

inRange()

The inRange takes two ranges (both as characters) and returns a char matcher which can match any character in that range (both endpoints are inclusive). Shown below, we create a char matcher with range ‘b’ to ‘e’. Thus, it matches only the character ‘b’, ‘c’, ‘d’ and ‘e’.

CharMatcher isRange = CharMatcher.inRange('b', 'e');

System.out.println(isRange.matches('a')); //false
System.out.println(isRange.matches('b')); //true
System.out.println(isRange.matches('d')); //true
System.out.println(isRange.matches('e')); //true
System.out.println(isRange.matches('f')); //false
System.out.println(isRange.matches('1')); //false

If the end character is greater than the start, then it will throw an IllegalArgumentException.

Operations on CharMatcher

So far, we learnt about the static methods on the CharMatcher class. Let us see the operations that we can perform on an instance of CharMatcher.

Methods inherited from Predicate

Since a CharMatcher is a Predicate<Character>, we can use the methods available on a Predicate like and(), or(), negate() and test().

and()

The and() takes another CharMatcher returns a new char matcher which matches any character matched by both the matchers.

CharMatcher aTof = CharMatcher.inRange('a', 'f');
CharMatcher bToE = aTof.and(CharMatcher.inRange('b', 'e'));
System.out.println(bToE.matches('a')); //false
System.out.println(bToE.matches('b')); //true
System.out.println(bToE.matches('e')); //true
System.out.println(bToE.matches('f')); //false
  • First, we have an inRange() char matcher for characters ‘a’ to ‘f’.
  • We and() it with an inRange matcher for characters ‘b’ to ‘e’. 
  • The result is a new matcher which can match characters ‘b’ to ‘e’

or()

The or returns a new char matcher which matches any character matched by either of the matchers.

CharMatcher bToE = CharMatcher.inRange('b', 'e');
CharMatcher aToE = bToE.or(CharMatcher.is('a'));
System.out.println(aToE.matches('a')); //true
System.out.println(aToE.matches('b')); //true
System.out.println(aToE.matches('e')); //true
System.out.println(aToE.matches('f')); //false
  • We start with a char matcher for characters ‘b’ to ‘e’.
  • We or() it with a single char matcher for character ‘a’.
  • The resultant char matcher matches characters ‘a’ to ‘e’.

negate()

The negate returns a char matches which can now match any character not originally matched.

CharMatcher bToE = CharMatcher.inRange('b', 'e')
                .negate();
System.out.println(bToE.test('b')); //false
System.out.println(bToE.test('a')); //true
System.out.println(bToE.test('f')); //true

countIn

The countIn() method takes a CharSequence and returns the number of matching characters in the passed sequence.

CharMatcher isRange = CharMatcher.inRange('b', 'e');
System.out.println(isRange.countIn("abdef")); //3
System.out.println(isRange.countIn("java")); //0

Here, the char set is ‘b’, ‘c’, ‘d’ and ‘e’. Hence, in the first call to countIn() there were three matches for the string “abdef”. But no matches for the string “java”.

indexIn

The indexIn() method takes a CharSequence and returns the index of the first matching character. It loops through character by character in the passed string and returns the first character’s index which is present in the char matcher. It returns -1 if no matching character is present.

CharMatcher charMatcher = CharMatcher.anyOf("abcde");
System.out.println(charMatcher.indexIn("java")); //1
System.out.println(charMatcher.indexIn("pear")); //1
System.out.println(charMatcher.indexIn("fig")); //-1

For the string “java”, the first match happens at index 0 (character ‘a’). For the string “good”, the match is at index 3 whereas no match was found for the string “fig” and hence it returned -1.

There is another overloaded indexIn() method which also takes a start parameter and the search will start at this index rather than index 0.

System.out.println(charMatcher.indexIn("java", 2)); //3

Here, using the same char matcher, we start the search at index 2 and hence the first match is at index 3 (the last ‘a’ in “java”).

lastIndexIn

Similar to the indexIn()method, but it returns the last index of the match (the search happens in reverse order).

CharMatcher charMatcher = CharMatcher.anyOf("abcde");
System.out.println(charMatcher.lastIndexIn("java")); //3

Since the search is done in reverse, the last index (3) was the first to match and hence it returned 3.

matchesAllOf, matchesAnyOf and matchesNoneOf

These methods take a CharSequence as input.

  • matchesAllOf: This will return a true if all the characters in the passed sequence matches.
  • matchesAnyOf: This will return a true if at least one of the characters in the passed sequence matches.
  • matchesNoneOf: This will return a true only if the sequence has no matching characters.
CharMatcher charMatcher = CharMatcher.anyOf("abcde");

System.out.println(charMatcher.matchesAllOf("abcd")); //true
System.out.println(charMatcher.matchesAllOf("ef")); //false

System.out.println(charMatcher.matchesAnyOf("abcd")); //true
System.out.println(charMatcher.matchesAnyOf("ef")); //true
System.out.println(charMatcher.matchesAnyOf("xyz")); //false


System.out.println(charMatcher.matchesNoneOf("ef")); //false
System.out.println(charMatcher.matchesNoneOf("xyz")); //true

removeFrom

The removeFrom method accepts a CharSequence as an argument and returns a string which has only the non-matching characters. In other words, it removes all matching characters from the passed char sequence.

CharMatcher charMatcher = CharMatcher.anyOf("ab");
System.out.println(charMatcher.removeFrom("java")); //jv
System.out.println(charMatcher.removeFrom("abab")); // "" (empty string)
System.out.println(charMatcher.removeFrom("c++")); //c++

In the above example, the CharMatcher is for characters ‘a’ and ‘b’. Hence, it will remove those characters from the passed char sequence and returns the rest of the characters as a string.

replaceFrom

The replaceFrom method takes a CharSequence and a replacement char and returns a string which has all the matching characters replaced by the passed replacement character.

Let us say “abcd” is the CharMatcher’s matching chars. The replacement character is ‘x’. Then it will replace all matching characters with the character ‘x’.

CharMatcher charMatcher = CharMatcher.anyOf("abcd");
System.out.println(charMatcher.replaceFrom("java", 'x')); //jxvx
System.out.println(charMatcher.replaceFrom("c++")); //c++

There is another overloaded method which takes a string as a replacement.

System.out.println(charMatcher.replaceFrom("java", "aa")); //jaavaa

retainFrom

This retainFrom is the inverse of the removeFrom. It retains only the matching characters.

CharMatcher charMatcher = CharMatcher.anyOf("ab");
System.out.println(charMatcher.retainFrom("java")); //aa
System.out.println(charMatcher.retainFrom("abab")); // abab
System.out.println(charMatcher.retainFrom("c++")); //"" (empty string)

trimLeadingFrom, trimTrailingFrom and trimFrom

All these methods take a CharSequence as the input.

  • trimLeadingFrom: Returns a substring of the passed char sequence that removes all matching characters from the beginning of the string.
  • trimTrailingFrom: Returns a substring of the passed char sequence that removes all matching characters from the end of the string.
  • trimFrom: This is a combination of the above two functions. It Returns a substring of the passed char sequence that removes all matching characters from the beginning and the end of the string.
CharMatcher charMatcher = CharMatcher.anyOf("xy");
System.out.println(charMatcher.trimLeadingFrom("xyjavayx")); // javayx
System.out.println(charMatcher.trimTrailingFrom("xyjavayx"));// xyjava
System.out.println(charMatcher.trimFrom("xyjavayx")); //java

System.out.println(charMatcher.trimLeadingFrom("xy")); //""
System.out.println(charMatcher.trimTrailingFrom("xy")); //""
System.out.println(charMatcher.trimFrom("xy")); //""

Conclusion

This brings us to the end of the post for Google Guava CharMatcher class. I recommend checking on:

References

CharMatcher Google Guava wiki page

Leave a Reply