Apache Commons Text WordUtils

Introduction

Apache Commons Text has basic classes for handling text (strings). The WordUtils class has static utility methods to perform operations on strings that contain words.In this post, we will learn about the Apache Commons Text WordUtils class.

Importing Apache Commons Text

You can import Apache Commons Text into your Maven project as:

<!-- https://mvnrepository.com/artifact/org.apache.commons/commons-text -->
<dependency>
    <groupId>org.apache.commons</groupId>
    <artifactId>commons-text</artifactId>
    <version>1.9</version>
</dependency>

For gradle it is,

implementation 'org.apache.commons:commons-text:1.9'

Note: Replace 1.9 with the latest version available.

WordUtils#abbreviate

The abbreviate static method from the WordUtils abbreviates the words. It takes the string to abbreviate, a lower and an upper limit and an optional string (appendToEnd) and returns the abbreviated string. It abbreviates the string, cutting off (truncating) the string at the first space after the lower limit and appends the appendToEnd string to the end of the string.
We can use the upper limit argument to forcibly abbreviate the string. If we don’t want to specify an upper limit, we can pass -1 as the upper limit.

The lower limit actually is like the minimum length we want the abbreviated string to be. Hence, it looks for the first space after the lower limit. Example: If we specify the lower limit as 5, it will look for the first space after index 4 and truncate the string there. Let us look at few examples to understand how this works:

String message = "We are using WordUtils from Apache Commons Text library";
System.out.println(WordUtils.abbreviate(message, 21, -1, ""));

The result is:

We are using WordUtils

With a lower limit of 21, we are asking to abbreviate the passed string but it must have at least 21 characters. We will end up with string “We are using WordUtil” when we take the first 21 characters of the string. But it has to include the complete word and hence we end up with the output “We are using WordUtils”.

System.out.println(WordUtils.abbreviate(message, 22, -1, "")); //We are using WordUtils

Passing 22 as the lower limit results in the same output.

System.out.println(WordUtils.abbreviate(message, 23, -1, ""));

Results in:

We are using WordUtils from

Reason: With a lower limit of 23, it has to include the space after the word “WordUtils” and hence it picks the entire word following it.

Passing 0 as the lower limit,

System.out.println(WordUtils.abbreviate(message, 0, -1, "")); //We

Passing a string suffix

When passing the appendToEnd argument, it adds it to the end of the abbreviated string.

System.out.println(WordUtils.abbreviate(message, 18, -1, " from Apache")); //We are using WordUtils from Apache

Passing an upper limit

Let us pass upper limit to forcibly abbreviate the string. 

System.out.println(WordUtils.abbreviate(message, 18, 19, "")); //We are using WordUt

In the above call, the 18th character is U and 19th character is t in the word WordUtils. Hence, it forcibly stops the abbreviation at the 19th character so that the result will be of length 19.

We saw an example where we passed 23 as the lower limit. Including 23 as the upper limit will return the same result as before (Note: The 23rd character is the space after the word WordUtils).

System.out.println(WordUtils.abbreviate(message, 23, 23, "")); //We are using WordUtils 

Let us pass 24 as the upper limit. Then it will include the first character of the word following the space (after the word WordUtils) which is f. Since it is an upper limit, the abbreviation stops there.

System.out.println(WordUtils.abbreviate(message, 23, 24, "")); //We are using WordUtils f

Passing 0 as the lower and upper limit, results in an empty string.

System.out.println(WordUtils.abbreviate(message, 0, 0, "")); //""

Some corner cases

If the upper limit is greater than the length of the string, then it will be adjusted to equal to string length.

System.out.println(WordUtils.abbreviate(message, 100, 200, "")); 

Outputs,

We are using WordUtils from Apache Commons Text library

On the other hand, if the upper limit is lower than the lower limit, then it throws an IllegalArgumentException.

//throws java.lang.IllegalArgumentException: upper value is less than lower value
System.out.println(WordUtils.abbreviate(message, 18, 1, ""));

WordUtils#capitialize and capitializeFully

The capitalize method takes a string and capitalizes each of the individual words in the string (The words are strings that are separated by whitespace).

System.out.println(WordUtils.capitalize("learning java is great"));
System.out.println(WordUtils.capitalize("learning java Is great"));

Result:

Learning Java Is Great
Learning Java Is Great

In the above example, to capitalize the string, it converts the first character in each word to upper case.

It provides an overloaded method where we can pass a different set of characters to determine capitalization (note that the default is whitespace). As an example, let us say the words are delimited by semi-colon and colon.

System.out.println(WordUtils.capitalize("learning;java;is;great", ';'));
System.out.println(WordUtils.capitalize("learning;java;is:great", ';',':'));

For the first call, we pass ; (semi-colon) as the delimiter. In the second call, we pass both semi-colon and a colon as the delimiter (the argument type is a var-args).

The output is,

Learning;Java;Is;Great
Learning;Java;Is:Great

However, it only converts the first character of each word to uppercase. It doesn’t convert the rest of the characters in the word to lowercase (if there are upper case characters). Example:

System.out.println(WordUtils.capitalize("learning java Is gReat"));

Results in:

Learning Java Is GReat

It didn’t convert the second character (R) to lowercase. To do that, we have to use capitalizeFully.

WordUtils#capitializeFully

The capitializeFully converts all whitespace separated words in a string into capitalized words. Each word will have a titlecase character and a series of lowercase characters.

System.out.println(WordUtils.capitalizeFully("learning java Is gReat"));

It produces “Learning Java Is Great“ as the capitalized result.

Even caplitalizeFully method supports having different characters as delimiters as shown below:

System.out.println(WordUtils.capitalizeFully("learning;java;iS:grEAT", ';',':')); //Learning;Java;Is:Great

WordUtils#containsAllWords

The containsAllWords method takes a CharSequence (string) and a var-args of CharSequences (words). It checks if the string contains all the words in the passed var-args (array).

System.out.println(WordUtils.containsAllWords("Word Utils", "Utils", "Word")); //true
System.out.println(WordUtils.containsAllWords("Word Utils", "Utils", "Word", "Apache")); //false

In the first call, the passed string is “Word Utils”. It has all the words passed as the second parameter (var-args) i.e., “Utils“ and “Word“. Hence, it returns true.
However, in the second call, the word (“Word Utils”) doesn’t have the word “Apache” in it and hence it returns false.

Note that it checks if the string has all the words and hence the string can have additional words.
System.out.println(WordUtils.containsAllWords("Word Utils from Apache", "Utils", "Word")); //true

The passed string has the words “Utils“ and “Word“ and hence it returns true (though the string has two additional words “from“ and Apache).

The case does matter when it compares. In the below example, both call returns false.

System.out.println(WordUtils.containsAllWords("Word Utils", "utils", "Word")); //false
System.out.println(WordUtils.containsAllWords("WordUtils", "Utils", "Word")); //false
  • In the first case, the case of the word “utils” doesn’t match.
  • Andn the second case, the words are not present as two words in the string.

WordUtils#initials

The initialize method extracts the initial characters (first character) from each word in the passed String. It joins the initial characters and returns a new string (their case is not changed).

System.out.println(WordUtils.initials("Learning Java Is Great")); //LJIG
System.out.println(WordUtils.initials("Learning Java is great")); //LJig

The first characters of each word – L, J, I and G are concatenated and returned as a string.

An overloaded method allows us to pass a different set of delimiters.

System.out.println(WordUtils.initials("Learning;Java;Is;Great", ';')); //LJIG
System.out.println(WordUtils.initials("Learning;Java;Is:Great", ';', ':')); //LJIG

WordUtils.swapCase

The swapCase swaps the case of a string. It does the following conversion:

  • Upper case character to lower case.
  • Title case character to lower case.
  • Lower case character after whitespace or at start to title case.
  • Other lower case character to upper case.

Note that upper case and title case are not the same. For example, consider the dž character (it is one character). Its upper case form is DŽ and the title case form is Dž. (Reference: StackOverflow post on upper case and title case).

System.out.println(WordUtils.swapCase("Learning WordUtils"));
System.out.println(WordUtils.swapCase("learning wordutils")); 

The output is:

lEARNING wORDuTILS
LEARNING WORDUTILS

WordUtils#uncapitalize

The uncapitalize uncapitalizes all words in a string. It only changes the first character of each word in the string.

System.out.println(WordUtils.uncapitalize("Learning Java Is Great"));
System.out.println(WordUtils.uncapitalize("Learning java Is great"));

It converts the first character of each word to lower case (if it isn’t already one). The output of above code is:

learning java is great
learning java is great

We can pass a var-args of delimiter if we want to identify word using a different delimiter (than whitespace).

System.out.println(WordUtils.uncapitalize("Learning;Java;Is;Great", ';')); //learning;java;is;great
System.out.println(WordUtils.uncapitalize("Learning;Java;Is:Great", ';', ':')); //learning;java;is:great

Note that it will not convert the rest of the characters (other than the first character) to lowercase. It leaves them as they are.

System.out.println(WordUtils.uncapitalize("Learning Java Is grEAT")); //learning java is grEAT

WordUtils#wrap

The wrap method wraps a single line of text into multiple lines based on the wrapLength passed. The wrapLength is the length/size of the column to wrap the words at. Long words (like URLs) will not be wrapped. An example will make it clear:

String message = "We are using WordUtils from Apache Commons Text library";
System.out.println(WordUtils.wrap(message, 19));

We are wrapping the message with a wrapLength of 19, i.e., no line will have more than 19 characters. This results in:

We are using
WordUtils from
Apache Commons Text
library

Note that the length of the substring “Apache Commons Text” is 19. If we used a wrap length of 18, then the result would be,

System.out.println(WordUtils.wrap(message, 18));
We are using
WordUtils from
Apache Commons
Text library

Now, it can no longer accommodate the word “Text” on the same line as the wrapLength is 18.

Let us see what will happen if we use a lower wrapLength.

System.out.println(WordUtils.wrap(message, 5));
We
are
using
WordUtils
from
Apache
Commons
Text
library

In this case, each of the words is in their own line. Note that it doesn’t break words to adhere to the passed wrapLength and hence some lines have more than 5 characters.

Newline string and wrap long words parameters

By default, wrap uses the system property line separator.It has an overloaded method where we can pass a string to use as the newline string. In addition to that, we can pass a flag (boolean) that denotes whether it should wrap long words.

System.out.println(WordUtils.wrap(message, 19, "\n***\n", false));

Here we are passing a string that will be used as the newline sting. The output is:

We are using
***
WordUtils from
***
Apache Commons Text
***
library

When passing true for the wrapLongWords parameter, it will break longer words.

System.out.println(WordUtils.wrap(message, 5, null, true));
We
are
using
WordU
tils
from
Apach
e
Commo
ns
Text
libra
ry

Here, no row has more than 5 characters. For this, it had to break the words.
Note: When we pass null as the newLine string, it will use the system property line separator.

The wrapOn parameter

There is another overloaded wrap method to which we can pass a wrapOn regular expression. We use it to specify the regex to find breakable characters. If we pass an empty string, it will use the space character as default.

Example: Let us say the string value is “This/That”. Shown below are the results of using wrapLength of 4 with wrapLongWords set to true and false.

String message = "This/That";
System.out.println(WordUtils.wrap(message, 4, null, true));
System.out.println();
System.out.println(WordUtils.wrap(message, 4, null, false));
This
/Tha
t

This/That

In the first call, we configured it to break/wrap longer words. In the second, we didn’t and hence it printed the string as it is.
To break the words on the forward slash (/), we can pass a regex to identify it as the wrapOn parameter i.e., it identifies words by the wrapOn regex and not whitespace.

System.out.println(WordUtils.wrap(message, 4, null, false, "/"));

Prints,

This
That

Conclusion

In this post, we learnt about the Apache Commons Text WordUtils class. Check out the other useful must know utilities from apache-commons.

Leave a Reply