Files WalkFileTree

Overview

In the last couple of posts, we saw the NIO Files class methods that deal with files and Files class methods for the operations on directories. In this post we will see the Files.walkFileTree method.

Walking a file tree

The walkFileTree method from the NIO Files class is a static method that we used to walk a file tree. The walking starts at a given Path. It can either be a file or a directory. The traversal starts at that node and it recursively visits each node in the tree in a depth-first fashion. In other words, when it encounters a directory, all the files and directories within it will be visited before it continues with the next sibling.

Let us look at an example to understand this. We can visualize the directory structure as a tree. The node at which we start the traversal is the root node. All the directories and files inside this directory will be the children of this node.

Assume we have a directory structure as shown below.

..
|-- data
    |-- dir1
        |-- file3.txt
        |-- file4.txt
    |-- dir2
        |-- file5.txt
    |-- file1.txt
    |-- file2.txt

This when visualized as a tree would look like,

File Walk - Directory Tree
File Walk – Directory Tree

Then walking the file tree rooted at node (directory) data will result in traversing the tree as,

dir1
file3.txt
file4.txt
file1.txt
file2.txt
dir2
file5.txt

This traversal order is yielded by performing a depth-first search on the tree.

The File Visitor

An implementation of the FileVisitor interface is created and passed to the walkFileTree method to walk the file tree. 

The tree traversal starts at the root node and for each node (file or a directory), it invokes a FileVisitor method. The traversal completes when all the nodes (files and directories) in the tree have been visited.

A FileVisitor has four methods

  • preVisitDirectory: This method is invoked before a directory’s entries are visited i.e., this method is called for each directory in the tree before its children (contents/entries of the directory) are visited.
  • postVisitDirectory: Called after all the directory entries are visited. If there were any exceptions encountered during the traversal, the exception will be passed as an argument to this method. 
  • visitFile: Invoked for each file visited in the tree.
  • visitFileFailed: Called when the file cannot be accessed. The exception encountered is passed to the method.

When the traversal starts, it reads the BasicFileAttributes to check if the node is a file or a directory. If it is a file, it calls the visitFile method. If not, it calls the preVisitDirectory method. After this it continues the traversal of the entries or nodes inside the directory. After visiting all the nodes inside the directory, it calls the postVisitDirectory method to signal the completion of the traversal of that subtree. Then the traversal continues at the next sibling of the directory. If an exception is encountered, then it calls the visitFileFailed method. 

Returning a File Visit Result

Each of the methods in the FileVisitor interface returns an enum FileVisitResult. With this return value we can control the traversal operation. Let us look at each of the FileVisitResult enum instances.

  • CONTINUE: The traversal will continue normally. When we return this from a preVisitDirectory method, the entries in the directory will also be visited.
  • TERMINATE: Causes the traversal to terminate. No more nodes will be visited.
  • SKIP_SUBTREE: This asks the traversal to skip the entries in this directory. Thus, the walk will resume at the next sibling of this directory (if any). This is applicable only when returned from a preVisitDirectory method.
  • SKIP_SIBLINGS: Returning this enum will skip all the siblings of this node. When we return this from a visitFile method, the postVisitDirectory of the parent directory will be called and the walk resumes the next sibling of the parent directory (if any). When we return this from a preVisitDirectory method, it not only skips the directory’s siblings but also the entries in this directory. Thus, it will not invoke the postVisitDirectory method.

A Simple File Visitor

Let us create a simple implementation of the FileVisitor interface. This will just print the path of the file or directory visited and it always returns a FileVisitReturn of CONTINUE. We will walk the file tree of the directory structure given above earlier.

public class MySimpleFileVisitorImpl implements FileVisitor<Path> {

    @Override
    public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
        System.out.println("preVisitDirectory " +  dir);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
        System.out.println("visitFile " +  file);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFileFailed(Path file, IOException exc) throws IOException {
        System.out.println("visitFileFailed " +  file);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult postVisitDirectory(Path dir, IOException exc) throws IOException {
        System.out.println("postVisitDirectory " +  dir);
        return FileVisitResult.CONTINUE;
    }
}

Files Walk File Tree

There are two overloaded walkFileTree methods. We will look at the second one in just a bit. 

The first one is simple – it takes the Path from which to start the file walking and an implementation of a FileVisitor. We use the above FileVisitor implementation and walk the file tree starting at the folder ‘data’ in our example as:

Path path = Paths.get("/Users/JavaDeveloperCentral/data");
Files.walkFileTree(path, new MySimpleFileVisitorImpl());

It returns the starting file Path (which I have ignored here).

Note: The walkFileTree will throw an IOException. Since it is a checked exception we either have to catch it or re-throw it back.
The above will produce the following output. I have added new lines to make the visualization better.
preVisitDirectory /Users/JavaDeveloperCentral/data

visitFile /Users/JavaDeveloperCentral/data/file2.txt
visitFile /Users/JavaDeveloperCentral/data/file1.txt

preVisitDirectory /Users/JavaDeveloperCentral/data/dir2
visitFile /Users/JavaDeveloperCentral/data/dir2/file5.txt
postVisitDirectory /Users/JavaDeveloperCentral/data/dir2

preVisitDirectory /Users/JavaDeveloperCentral/data/dir1
visitFile /Users/JavaDeveloperCentral/data/dir1/file3.txt
visitFile /Users/JavaDeveloperCentral/data/dir1/file4.txt
postVisitDirectory /Users/JavaDeveloperCentral/data/dir1

postVisitDirectory /Users/JavaDeveloperCentral/data

First it visits the directory rooted at ‘data’. Next, it visits the leaf nodes (files) file1.txt and file2.txt.

Next, it visits the directory dir2. Hence, it calls the preVisitDirectory for dir2. After this, it visits each of the files (we have only one here) inside this directory. It calls the visitFile method for file5.txt. As all the entries in the directory dir2 has been visited, it calls the postVisitDirectory for it.

Then the same happens for directory dir1. The preVisitDirectory is called for dir1 and following which the files in it are visited and finally the postVisitDirectory for dir1 is invoked.

Finally, the walking terminates with calling the postVisitDirectory for the node from which the file walking started. (data).

Note: It does not guarantee the order in which the sibling nodes are visited. In the above example, it could pick file1.txt before file2.txt or it could visit dir1 before dir2. But it does guarantee a depth-first traversal i.e., once it visits dir2, it will visit the next sibling of dir2 only when all the entries within the directory dir2 are visited.  

Files WalkFileTree following symbolic links and with max depth option

Another overloaded walkFileTree method accepts a set of FileVisitOptions and a max depth to limit the depth of the traversal.

A FileVisitOption enum has one enum constant called FOLLOW_LINKS which we use if we want to follow the symbolic links. By default, it does not follow symbolic links. When following symbolic links, if it encounters a cycle, it calls the visitFileFailed method with an instance of FileSystemLoopException.

The maxDepth parameter specifies the maximum number of levels of directories to visit. A value of 0 means that only the starting node will be visited.

Set<FileVisitOption> fileVisitOptions = EnumSet.of(FileVisitOption.FOLLOW_LINKS);
Files.walkFileTree(path, fileVisitOptions, 1, new FileVisitorImpl());

In the above code, we pass a FileVisitOption to follow symbolic links (but we do not have any in our example). We also set a maxDepth parameter to 1. This produces output as (newlines added for visualization).

preVisitDirectory /Users/JavaDeveloperCentral/data

visitFile /Users/JavaDeveloperCentral/data/file2.txt
visitFile /Users/JavaDeveloperCentral/data/file1.txt
visitFile /Users/JavaDeveloperCentral/data/dir2
visitFile /Users/JavaDeveloperCentral/data/dir1

postVisitDirectory /Users/JavaDeveloperCentral/data

You might be surprised to see visitFile method being called for the directories dir1 and dir2. This is because the walkFileTree method will pass even a directory as a visited file if it lies at the max depth we have configured. 

This is stated in the javadoc of walkFileTree

The visitFile method is invoked for all files, including directories, encountered at maxDepth, unless the basic file attributes cannot be read, in which case the visitFileFailed method is invoked.

Refer to the Stack Overflow Question – walkFileTree calling visitFile on directories

Walk File Tree – Skipping Subtree

Let us modify our Simple File Visitor to skip the subtree of dir2. From the MySimpleFileVisitorImpl class, we have changed the preVisitDirectory method as shown below (the other methods remain as it is).

@Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
    System.out.println("preVisitDirectory " +  dir);
    if (dir.toString().endsWith("dir2")) {
        System.out.println("Skipping subtree of " + dir);
        return FileVisitResult.SKIP_SUBTREE;
    }
    return FileVisitResult.CONTINUE;
}

Running

Files.walkFileTree(path, new MySimpleFileVisitorImpl());

Gives the output as

preVisitDirectory /Users/JavaDeveloperCentral/data

visitFile /Users/JavaDeveloperCentral/data/file2.txt
visitFile /Users/JavaDeveloperCentral/data/file1.txt

preVisitDirectory /Users/JavaDeveloperCentral/data/dir2
Skipping subtree of /Users/JavaDeveloperCentral/data/dir2

preVisitDirectory /Users/JavaDeveloperCentral/data/dir1
visitFile /Users/JavaDeveloperCentral/data/dir1/file3.txt
visitFile /Users/JavaDeveloperCentral/data/dir1/file4.txt
postVisitDirectory /Users/JavaDeveloperCentral/dir1

postVisitDirectory /Users/JavaDeveloperCentral/data

When it visits the directory dir2, we return a FileVisitResult of SKIP_SUBTREE. Hence, the entries of the directory dir2 are not visited (file5.txt). Also, note that since the directory was not processed, it did not call the postVisitDirectory method for dir2.

Walk File Tree – Skipping Siblings

If we had returned a FileVisitResult of SKIP_SIBLINGS in the preVisitDirectory for the directory dir2, the output would be as follows.

@Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs) throws IOException {
    System.out.println("preVisitDirectory " +  dir);
    if (dir.toString().endsWith("dir2")) {
        System.out.println("Skipping siblings of " + dir);
        return FileVisitResult.SKIP_SIBLINGS;
    }
    return FileVisitResult.CONTINUE;
}
preVisitDirectory /Users/JavaDeveloperCentral/data

visitFile /Users/JavaDeveloperCentral/data/file2.txt
visitFile /Users/JavaDeveloperCentral/data/file1.txt

preVisitDirectory /Users/JavaDeveloperCentral/data/dir2
Skipping siblings of /Users/JavaDeveloperCentral/data/dir2

postVisitDirectory /Users/JavaDeveloperCentral/data

When dir2 is visited, we returned SKIP_SIBLINGS which results in

  1. Skipping the entries of dir2 (file5.txt)
  2. Skips the remaining siblings of dir2 (dir1)
Note that the postVisitDirectory method is not called for dir2 as we did not enter it.

Copying a directory (recursively)

In the last post on the NIO Files Directory, we learnt that the Files.copy() will not work for copying a directory. We need a way to copy the entries in the directory recursively (the files and directories within a directory).

We can use the walkFileTree with a custom implementation of a FileVisitor to achieve this.

Before moving to the FileVisitor implementation that will copy a directory recursively, I want to introduce a class called SimpleFileVisitor. It is one of the implementation of a FileVisitor interface. It provides a good default implementation of a FileVisitor where each method returns a FileVisitStatus of CONTINUE. Thus, we can use it and override only the methods we want to override.

A File Visitor for copying

Let us implement a FileVisitor to recursively copy a directory. The visitor will have the source and the target (or destination) paths. 

  1. When it first encounters a directory (in the preVisitDirectory method), it will create an empty directory at the destination (or target)
  2. When it visits a file, it will copy the file from the source to the destination.
public static class CopyingFileVisitor extends SimpleFileVisitor<Path> {
    private final Path source;
    private final Path target;

    public CopyingFileVisitor(Path source, Path target) {
        this.source = source;
        this.target = target;
    }

    @Override
    public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs)
            throws IOException {
        System.out.println("Processing directory: " + dir);
        Path targetPath = target.resolve(source.relativize(dir));
        if (!targetPath.toFile().exists()) { 
            Files.createDirectory(targetPath);
        }
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
        System.out.println("Copying file " +  file);
        Path targetPath = target.resolve(source.relativize(file));
        Files.copy(file, targetPath, StandardCopyOption.COPY_ATTRIBUTES, StandardCopyOption.REPLACE_EXISTING);
        return FileVisitResult.CONTINUE;
    }
}

This is used as

Path source = Paths.get("/Users/JavaDeveloperCentral/data");
Path target = Paths.get("/Users/JavaDeveloperCentral/copied_data");

Files.walkFileTree(source, new CopyingFileVisitor(source, target));

The output (print statements) of running the copying is shown below. 

Processing directory: /Users/JavaDeveloperCentral/data
Copying file /Users/JavaDeveloperCentral/data/file2.txt
Copying file /Users/JavaDeveloperCentral/data/file1.txt

Processing directory: /Users/JavaDeveloperCentral/data/dir2
Copying file /Users/JavaDeveloperCentral/data/dir2/file5.txt

Processing directory: /Users/JavaDeveloperCentral/data/dir1
Copying file /Users/JavaDeveloperCentral/data/dir1/file3.txt
Copying file /Users/JavaDeveloperCentral/data/dir1/file4.txt

Let us go over the details one by one.

Logic of preVisitDirectory
When we visit a new directory, we have to create an empty directory at the destination. We first have to construct the target path. This is done by using the Path class’s relativize and the resolve methods.
 
We only create a new directory only if it does not exist (else it will throw a FileAlreadyExistsException).

relativize and resolve methods of Path

The relativize method constructs a relative path between this path (the path object on which the relativize method is called) and a given path.
Example:

Path source = Paths.get("/Users/JavaDeveloperCentral/data");
Path dir = Paths.get("/Users/JavaDeveloperCentral/data/newFolder");
Path file = Paths.get("/Users/JavaDeveloperCentral/data/newFolder/file1");

System.out.println(source.relativize(dir)); //newFolder
System.out.println(source.relativize(file)); //newFolder/file1

The resolve method resolves the given path against this path
Example: (using the above example)

Path target = Paths.get("/Users/JavaDeveloperCentral/data/newFolder");

System.out.println(target.resolve(source.relativize(dir)));
// /Users/JavaDeveloperCentral/data/newFolder

System.out.println(target.resolve(source.relativize(file)));
// /Users/JavaDeveloperCentral/data/newFolder/file1

Play around with this example till you understand how these two methods work.

Logic of visitFile

We simply copy the source file to the target path (constructed as explained above). We set the CopyOptions COPY_ATTRIBUTES to copy the file attributes and REPLACE_EXISTING to overwrite the file at the target if it exists.

Optionally, you could override visitFileFailed method to take an action if the copying fails in the middle. The default implementation in the SimpleFileVisitor throws the exception back.

Notes/Observations:

  • The above implementation behaves like the copy command cp -r <source> <dest>
  • If the target directory exists, the files and folders from the source will be merged into it. 
  • If any of the target directories has additional files or folders, they will be retained (and will not be deleted).
  • We did not have to use createDirectories method because walking the file tree is performed depth first. Thus, when we create a directory, all the parent paths/directories will be present.

Deleting a directory recursively

Let us look at how we can implement a FileVisitor to delete a directory recursively along with all its subdirectories and the files within it.
When we delete a directory, it has to be empty (else it will throw a DirectoryNotEmptyException). So, we have to delete all the entries of a directory before we delete the directory.

Logic of deleting a directory:
We start the file walk starting at the root (the directory we want to delete)

  1. When we encounter a file, we delete it.
  2. Once all the entries of a directory are visited, it will call the postVisitDirectory method. When the walk invokes this method, we can be sure that all the entries in the directory are deleted and the directory is empty. Thus, we can delete it successfully. In the postVisitDirectory method, we throw back an exception if the passed exc argument is not null. This means that the deletion of the entries of the directory was not successful.
public static class DeletingFileVisitor extends SimpleFileVisitor<Path> {
    @Override
    public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
        System.out.println("Deleting file: " +  file);
        Files.delete(file);
        return FileVisitResult.CONTINUE;
    }

    @Override
    public FileVisitResult postVisitDirectory(Path dir, IOException exc)
            throws IOException {
        System.out.println("Deleting directory: " + dir);
        if (exc == null) {
            Files.delete(dir);
            return FileVisitResult.CONTINUE;
        } else {
            throw exc;
        }
    }
}

Let us delete the directory we created above (the new copied directory).

Path path = Paths.get("/Users/JavaDeveloperCentral/copied_data");
Files.walkFileTree(path, new DeletingFileVisitor());

Output,

Deleting file: /Users/JavaDeveloperCentral/copied_data/file2.txt
Deleting file: /Users/JavaDeveloperCentral/copied_data/file1.txt

Deleting file: /Users/JavaDeveloperCentral/copied_data/dir2/file5.txt
Deleting directory: /Users/JavaDeveloperCentral/copied_data/dir2

Deleting file: /Users/JavaDeveloperCentral/copied_data/dir1/file3.txt
Deleting file: /Users/JavaDeveloperCentral/copied_data/dir1/file4.txt
Deleting directory: /Users/JavaDeveloperCentral/copied_data/dir1

Deleting directory: /Users/JavaDeveloperCentral/copied_data

Observe how the deletion happens bottom up (from the leaf files and works its way up to the directory).

Conclusion

The walkFileTree method walks a directory tree in a depth-first order. We saw how we can use it with a FileVisitor implementation. First, we learnt about the methods in a FileVisitor and its purposes. Second, we saw the possible return values from the FileVisitor methods (FileVisitResult). Third, we explored the walkFileTree method with a simple FileVisitor. Fourth, we saw what happens to the traversal when we skip the subtree and the siblings. Finally, we learnt how to recursively delete a directory and recursively copy a directory by using appropriate FileVisitors.

Useful resources

Walking the File Tree from Oracle
Java 8: Copy directory recursively?
Callicoder – How to delete a directory recursively with all its subdirectories and files in Java
Difference between Files list, walkFileTree and walk

Leave a Reply