StringTokenizer in Java

StringTokenizer

The StringTokenizer class in Java helps to break a string into tokens with the delimiter we specify. In this post, we will learn about various ways to create and use a StringTokenizer in Java.

What is Tokenizing

Tokenizing a string is breaking it into multiple tokens based on a specified delimiter. For example, take the string AB,CD,EF. If we tokenize it with a comma (“,“) as the delimiter, we would get three tokens viz., ABCD and EF.

Three ways to construct a StringTokenizer

The StringTokenizer class offers three constructors to build a StringTokenizer instance.

StringTokenizer with just the string

We build the StringTokenizer object by passing just the string to be tokenized. By default, it uses any white space as a delimiter. Specifically, it uses the delimiter set “ \t\n\r\f” – which includes the 

  • space character
  • tab character
  • newline character
  • carriage-return character and the
  • form-feed character

The delimiter character itself will not be treated as a token i.e., the delimiter character will not be part of the returned tokens.

StringTokenizer with the string and the delimiter

This constructor accepts the string we want to tokenize and the delimiter characters (as a String). Again, the delimiter characters themselves will not be treated as tokens.

StringTokenizer with the string, the delimiter and returnDelims flag

This constructor accepts a flag called returnsDelims in addition to the second version we saw (the string and the delimiter).

  • If the flag returnDelims is set to true, then it will also return the delimiter characters as tokens.
  • If the flat returnDelims is set to false, then it will not return the delimiter characters as tokens.

StringTokenizer methods

Before we look at example code, let us have a brief look at the methods present in the StringTokenizer class.

  • countTokens(): Returns the number of tokens still left. 
  • hasMoreTokens(): To check if there are more tokens left. 
  • nextToken(): Returns the next token from the StringTokenizer.
  • nextToken(String newDelimiter): Changes the delimiter of the StringTokenizer and returns the next token based on it.

StringTokenizer examples

To reduce the verboseness of some of the code samples, I’ll use a static method printTokens() shown below to print the tokens from a StringTokenizer.

public static void printTokens(StringTokenizer stringTokenizer) {
    System.out.println("Number of tokens: " + stringTokenizer.countTokens());

    while (stringTokenizer.hasMoreTokens()) {
       System.out.println(stringTokenizer.nextToken());
    }
}

Tokenizing with default delimiter set

If we pass just the string to tokenize, it will use any set of whitespaces (mentioned earlier) as the delimiter set.

StringTokenizer stringTokenizer = new StringTokenizer("ab cd ef");
System.out.println(stringTokenizer.countTokens()); //3
    while (stringTokenizer.hasMoreTokens()) {
        System.out.println(stringTokenizer.nextToken());
}

It will use the space as the delimiter here and hence there are 3 tokens. This will print:

Number of tokens: 3
ab
cd
ef

Let us add a tab and a newline to the string.

StringTokenizer stringTokenizer = new StringTokenizer("ab cd ef         gh \n ij  ");
System.out.println(stringTokenizer.countTokens()); //5
while (stringTokenizer.hasMoreTokens()) {
    System.out.println(stringTokenizer.nextToken());
}

Since the default delimiter set includes tabs and newlines, this will output five tokens.

Number of tokens: 5
ab
cd
ef
gh
ij

Tokenizing with a specified (custom) delimiter

Let us now see examples of using a StringTokenizer in Java with a custom delimiter set.

Single delimiter character

StringTokenizer stringTokenizer = new StringTokenizer("ab-cd-ef", "-");
printTokens(stringTokenizer);

The above example passed a string with hyphen () as the delimiter. This will tokenize the string as,

Number of tokens: 3
ab
cd
ef

If the same delimiter is present more than once consecutively, then the result is the same.

StringTokenizer stringTokenizer = new StringTokenizer("ab--cd--ef", "-");
printTokens(stringTokenizer);

The above code will return tokens as:

Number of tokens: 3
ab
cd
ef

Multiple delimiter characters

Let us specify multiple delimiter characters, each present at various positions in the input string. 

Let us pass the delimiter string as :- and the input string as ab:cd-ef-gh:i. This means that delimiter set is either a colon(:) or a hyphen().

StringTokenizer stringTokenizer = new StringTokenizer("ab:cd-ef-gh:i", ":-");
printTokens(stringTokenizer);

We will have 4 tokens as a result and the above code will print,

Number of tokens: 5
ab
cd
ef
gh
i

Multiple delimiter characters consecutively

It doesn’t make a difference if the different delimiter characters are present consecutively.

StringTokenizer stringTokenizer = new StringTokenizer("ab:-cd:-ef:-gh:-i", ":-");
printTokens(stringTokenizer);

In the above example, both the delimiter characters (: and ) and present together (one after another) in the string. The result would be the same as the previous case,

Number of tokens: 5
ab
cd
ef
gh
i

Tokenizing with a specified delimiter and with returnDelims as true

In this section, let us revisit the set of examples seen in the previous section but by setting the returnDelims flag as true. 

Single delimiter character

Let us use a hyphen as the delimiter and pass returnDelims flag as true when building the StringTokenizer instance.

StringTokenizer stringTokenizer = new StringTokenizer("ab-cd-ef", "-", true);
printTokens(stringTokenizer);

Setting this flag as true will cause the delimiter characters to be part of the returned tokens as well. Hence, the above code has five tokens (as opposed to the 3 tokens when we didn’t set the returnDelims flag). The above code prints,

Number of tokens: 5
ab
-
cd
-
ef

When multiple hyphen characters are present consecutively, then

StringTokenizer stringTokenizer = new StringTokenizer("ab--cd--ef", "-", true);
printTokens(stringTokenizer);

Prints,

Number of tokens: 7
ab
-
-
cd
-
-
ef

Multiple delimiter characters

Let us see the scenario when we have more than one delimiter.

StringTokenizer stringTokenizer = new StringTokenizer("ab:cd-ef-gh:i", ":-", true);
printTokens(stringTokenizer);

The above code will include the delimiters as part of the tokens. Result is,

Number of tokens: 9
ab
:
cd
-
ef
-
gh
:
i

When the delimiters : and – are present consecutively, then,

StringTokenizer stringTokenizer = new StringTokenizer("ab:-cd:-ef:-gh:-i", ":-", true);
printTokens(stringTokenizer);

This will result in a total of 13 tokens as shown below.

Number of tokens: 13
ab
:
-
cd
:
-
ef
:
-
gh
:
-
i

Change the delimiter dynamically

The nextToken overloaded method that accepts a new delimiter string enables us to change the delimiter set dynamically. We create a StringTokenizer with a delimiter and call nextToken() many times. This method allows us to change the set of characters considered as delimiters dynamically. Once this call it made, it changes the delimiter set of the StringTokenizer instance. All subsequent calls to nextToken() or countTokens() will use this (new) delimiter.

StringTokenizer stringTokenizer = new StringTokenizer("ab-cd:ef:gh", "-");
printTokens(stringTokenizer);

The above code uses hyphen as the delimiter. Hence there would be two tokens.

Number of tokens: 2
ab
cd:ef:gh

Let us change the delimiter after getting the first token.

StringTokenizer stringTokenizer = new StringTokenizer("ab-cd:ef:gh", "-");
System.out.println(stringTokenizer.countTokens()); //2
System.out.println(stringTokenizer.nextToken()); //ab
System.out.println(stringTokenizer.countTokens()); //1

So far, we have obtained the first token. Calling countTokens() returns 1 which states that there is only one more token left (cd:ef:gh). Let us now change the delimiter by calling nextToken method by passing a colon (:).

System.out.println(stringTokenizer.nextToken(":")); //-cd

This will change the delimiter set of the StringTokenizer from hyphen () to colon (:). It also returns the next token as per the new delimiter (which is -cd). If we call countTokens() now, it will return 2 as there are two tokens left as per the new delimiter configuration. 

System.out.println(stringTokenizer.countTokens()); //2
System.out.println(stringTokenizer.nextToken()); //ef
System.out.println(stringTokenizer.nextToken()); //gh

Note: Calling nextToken() when there are no more tokens to return will throw a NoSuchElementException.

StringTokenizer as an Enumerator

StringTokenizer implements Enumerator<Object> and hence it can be used as an Enumerator.  The methods hasMoreElementsnextElement call hasMoreTokens and nextToken methods, respectively. The methods hasMoreElements and nextElement exist so that StringTokenizer can implement the Enumerator interface.

StringTokenizer stringTokenizer = new StringTokenizer("abc-de-f-gh-ij", "-");
while (stringTokenizer.hasMoreElements()) {
    System.out.println(stringTokenizer.nextElement());
}

Note that the nextElement method returns an Object, whereas nextToken returns a String.

StringTokenizer alternatives

The StringTokenizer class is very similar to using the split method on a String. In fact, this class is no longer recommended for use in new code, and the recommendation is to use the String#split method.

Conclusion

This concludes the post on the StringTokenizer in Java. A few recommended posts to read:

Leave a Reply