Monday, January 26, 2015

Reason for Slower Reading of Large Lines in JDK 7 and JDK 8

I earlier posted the blog post Reading Large Lines Slower in JDK 7 and JDK 8 and there were some useful comments on the post describing the issue. This post provides more explanation regarding why the file reading demonstrated in that post (and used by Ant's LineContainsRegExp) is so much slower in Java 7 and Java 8 than in Java 6.

X Wang's post The substring() Method in JDK 6 and JDK 7 describes how String.substring() was changed between JDK 6 and JDK 7. Wang writes in that post that the JDK 6 substring() "creates a new string, but the string's value still points to the same [backing char] array in the heap." He contrasts that with the JDK 7 approach, "In JDK 7, the substring() method actually create a new array in the heap."

Wang's post is very useful for understanding the differences in String.substring() between Java 6 and Java 7. The comments on this post are also insightful. The comments include the sentiment that I can appreciate, "I would say 'different' not 'improved'." There are also explanations of how JDK 7 avoids a potential memory leak that could occur in JDK 6.

The StackOverflow thread Java 7 String - substring complexity explains the motivation of the change and references bug JDK-4513622 : (str) keeping a substring of a field prevents GC for object. That bug states, "An OutOfMemory error [occurs] because objects don't get garbage collected if the caller stores a substring of a field in the object." The bug contains sample code that demonstrates this error occurring. I have adapted that code here:

/**
 * Minimally adapted from Bug JDK-4513622.
 *
 * {@link http://bugs.java.com/view_bug.do?bug_id=4513622}
 */
public class TestGC
{
   private String largeString = new String(new byte[100000]);
    
   private String getString()
   {
      return this.largeString.substring(0,2);
   }
    
   public static void main(String[] args)
   {
      java.util.ArrayList<String> list = new java.util.ArrayList<String>();
      for (int i = 0; i < 1000000; i++)
      {
         final TestGC gc = new TestGC();
         list.add(gc.getString());
      }
   }
}

The next screen snapshot demonstrates that last code snippet (adapted from Bug JDK-4513622) executed with both Java 6 (jdk1.6 is part of the path of the executable Java launcher) and Java 8 (the default version on my host). As the screen snapshot shows, an OutOfMemoryError is thrown when the code is run in Java 6 but is not thrown when it is run in Java 8.

In other words, the change in Java 7 fixes a potential memory leak at the cost of a performance impact when executing String.substring against lengthy Java Strings. This means that any implementations that use String.substring (including Ant's LineContainsRegExp) to process really long lines probably need to be changed to implement this differently or should be avoided when processing very long lines when migrating from Java 6 to Java 7 and beyond.

Once the issue is known (change of String.substring implementation in this case), it is easier to find documentation online regarding what is happening (thanks for the comments that made these resources easy to find). The duplicate bugs of JDK-4513622 have writes-ups that provide additional details. These bugs are JDK-4637640 : Memory leak due to String.substring() implementation and JDK-6294060 : Use of substring() causes memory leak. Other related online resources include Changes to String.substring in Java 7 [which includes a reference to String.intern() – there are better ways], Java 6 vs Java 7: When implementation matters, and the highly commented (over 350 comments) Reddit thread TIL Oracle changed the internal String representation in Java 7 Update 6 increasing the running time of the substring method from constant to N.

The post Changes to String internal representation made in Java 1.7.0_06 provides a good review of this change and summarizes the original issue, the fix, and the new issue associated with the fix:

Now you can forget about a memory leak described above and never ever use new String(String) constructor anymore. As a drawback, you now have to remember that String.substring has now a linear complexity instead of a constant one.

Reading Large Lines Slower in JDK 7 and JDK 8

I recently ran into a case where a particular task (LineContainsRegExp) in an Apache Ant build file ran considerably slower in JDK 7 and JDK 8 than it did in JDK 6for extremely long character lines. Based on a simple example adapted from the Java code used by the LineContainsRegExp task, I was able to determine that the slowness has nothing to do with the regular expression, but rather has to do with reading characters from a file. The remainder of the post demonstrates this.

For my simple test, I first wrote a small Java class to write out a file that includes a line with as many characters as specified on the command line. The simple class, FileMaker, is shown next:

FileMaker.java
import static java.lang.System.out;

import java.io.FileWriter;

/**
 * Writes a file with a line that contains the number of characters provided.
 */
public class FileMaker
{
   /**
    * Create a file with a line that has the number of characters specified.
    *
    * @param arguments Command-line arguments where the first argument is the
    *    name of the file to be written and the second argument is the number
    *   of characters to be written on a single line in the output file.
    */
   public static void main(final String[] arguments)
   {
      if (arguments.length > 1)
      {
         final String fileName = arguments[0];
         final int maxRowSize = Integer.parseInt(arguments[1]);
         try
         {
            final FileWriter fileWriter = new FileWriter(fileName);
            for (int count = 0; count < maxRowSize; count++)
            {
               fileWriter.write('.');
            }
            fileWriter.flush();
         }
         catch (Exception ex)
         {
            out.println("ERROR: Cannot write file '" + fileName + "': " + ex.toString());
         }
      }
      else
      {
         out.println("USAGE: java FileMaker <fileName> <maxRowSize>");
         System.exit(-1);
      }
   }
}

The above Java class exists solely to generate a file with a line that has as many characters as specified (actually one more than specified when the \n is counted). The next class actually demonstrates the difference between the runtime behavior between Java 6 and Java 7. The code for this Main class is adapted from Ant classes that help perform the file reading functionality used by LineContainsRegExp without the regular expression matching. In other words, the regular expression support is not included in my example, but this class executes much more quickly for very large lines when run in Java 6 than when run in Java 7 or Java 8.

Main.java
import static java.lang.System.out;

import java.io.IOException;
import java.io.FileReader;
import java.io.Reader;
import java.util.concurrent.TimeUnit;

/**
 * Adapted from and intended to represent the basic character reading from file
 * used by the Apache Ant class org.apache.tools.ant.filters.LineContainsRegExp.
 */
public class Main
{
   private Reader in;
   private String line;

   public Main(final String nameOfFile)
   {
      if (nameOfFile == null || nameOfFile.isEmpty())
      {
         throw new IllegalArgumentException("ERROR: No file name provided.");
      }
      try
      {
         in = new FileReader(nameOfFile);
      }
      catch (Exception ex)
      {
         out.println("ERROR: " + ex.toString());
         System.exit(-1);
      }
   }
   

   /**
    * Read a line of characters through '\n' or end of stream and return that
    * line of characters with '\n'; adapted from readLine() method of Apache Ant
    * class org.apache.tools.ant.filters.BaseFilterReader.
    */
   protected final String readLine() throws IOException
   {
      int ch = in.read();

      if (ch == -1)
      {
         return null;
      }
        
      final StringBuilder line = new StringBuilder();

      while (ch != -1)
      {
         line.append ((char) ch);
         if (ch == '\n')
         {
            break;
         }
         ch = in.read();
      }

      return line.toString();
   }

   /**
    * Provides the next character in the stream; adapted from the method read()
    * in the Apache Ant class org.apache.tools.ant.filters.LineContainsRegExp.
    */
   public int read() throws IOException
   {
      int ch = -1;
 
      if (line != null)
      {
         ch = line.charAt(0);
         if (line.length() == 1)
         {
            line = null;
         }
         else
         {
            line = line.substring(1);
         }
      }
      else
      {
         for (line = readLine(); line != null; line = readLine())
         {
            if (line != null)
            {
               return read();
            }
         }
      }
      return ch;
   }

   /**
    * Process provided file and read characters from that file and display
    * those characters on standard output.
    *
    * @param arguments Command-line arguments; expect one argument which is the
    *    name of the file from which characters should be read.
    */
   public static void main(final String[] arguments) throws Exception
   {
      if (arguments.length > 0)
      {
        final long startTime = System.currentTimeMillis();
         out.println("Processing file '" + arguments[0] + "'...");
         final Main instance = new Main(arguments[0]);
         int characterInt = 0;
         int totalCharacters = 0;
         while (characterInt != -1)
         {
            characterInt = instance.read();
            totalCharacters++;
         }
         final long endTime = System.currentTimeMillis();
         out.println(
              "Elapsed Time of "
            + TimeUnit.MILLISECONDS.toSeconds(endTime - startTime)
            + " seconds for " + totalCharacters + " characters.");
      }
      else
      {
         out.println("ERROR: No file name provided.");
      }
   }
}

The runtime performance difference when comparing Java 6 to Java 7 or Java 8 is more pronounced as the lines get larger in terms of number of characters. The next screen snapshot demonstrates running the example in Java 6 (indicated by "jdk1.6" being part of path name of java launcher) and then in Java 8 (no explicit path provided because Java 8 is my default JRE) against a freshly generated file called dustin.txt that includes a line with 1 million (plus one) characters.

Although a Java 7 example is not shown in the screen snapshot above, my tests have shown that Java 7 has similar slowness to Java 8 in terms of processing very lone lines. Also, I have seen this in Windows and RedHat Linux JVMs. As the example indicates, the Java 6 version, even for a million characters in a line, reads the file in what rounds to 0 seconds. When the same compiled-for-Java-6 class file is executed with Java 8, the average length of time to handle the 1 million characters is over 150 seconds (2 1/2 minutes). This same slowness applies when the class is executed in Java 7 and also exists even when the class is compiled with JDK 7 or JDK 8.

Java 7 and Java 8 seem to be exponentially slower reading file characters as the number of characters on a line increases. When I raise the 1 million character line to 10 million characters as shown in the next screen snapshot, Java 6 still reads those very fast (still rounded to 0 seconds), but Java 8 requires over 5 hours to complete the task!

I don't know why Java 7 and Java 8 read a very long line from a file so much slower than Java 6 does. I hope that someone else can explain this. While I have several ideas for working around the issue, I would like to understand why Java 7 and Java 8 read lines with very large number of characters so much slower than Java 6. Here are the observations that can be made based on my testing:

  • The issue appears to be a runtime issue (JRE) rather than a JDK issue because even the file-reading class compiled with JDK 6 runs significantly slower in JRE 7 and JRE 8.
  • Both the Windows 8 and RedHat Linux JRE environments consistently indicated that the file reading is dramatically slower for very large lines in Java 7 and in Java 8 than in Java 6.
  • Processing time for reading very long lines appears to increase exponentially with the number of characters in the line in Java 7 and Java 8.

Monday, January 19, 2015

Total Bummer: Pivotal Drops Groovy

Pivotal announced today that Groovy 2.4 And Grails 3.0 will be the last major releases under Pivotal sponsorship. This is big news that has not surprisingly created a lot of buzz in the blogosphere. In this post, I describe some of the questions that others and I are wondering about and speculate on Groovy's future.

After reading multiple Reddit references to this announcement, my initial thought was to see what Guillaume Laforge had to say about this. Apparently, a lot of people had the same idea because I encountered a 503 error when trying to access his page.

Fortunately, I did not have to wait for Laforge's blog to be available to get more insight from him on this announcement because there were a couple of interviews with him regarding the announcement already online: Voxxed.com's Pivotal’s "Sad and Odd" Decision to Set Groovy Adrift and InfoQ's Pivotal Pulls Groovy/Grails Funding. Since that time, Laforge's blog is available again and has a post on the subject called The Groovy project is looking for a new home. Anothrt person frequently and deservedly associated with Groovy, Graeme Rocher, has also posted on the subject: The Future of Groovy and Grails Sponsorship.

Laforge and Rocher were co-founders of G2One, which was acquired by SpringSource in late 2008. VMWare then acquired SpringSource about one year later (and VMWare had been owned by EMC since late 2003). EMC would later announce the spin-off of Pivotal in 2013 and Pivotal today announced the dropping of Groovy support as of 21 March 2015.

Questions, Answers, and Speculations

The posts referenced here in my post have collectively answered some of my questions about Groovy and at the same time presented additional questions.

Why is Pivotal dropping financial support of Groovy and Grails?
Answer: Pivotal's announcement: "The decision to conclude its sponsorship of Groovy and Grails is part of Pivotal’s larger strategy to concentrate resources on accelerating both commercial and open source projects that support its growing traction in Platform-as-a-Service, Data, and Agile development. Pivotal has determined that the time is right to let further development of Groovy and Grails be led by other interested parties in the open source community who can best serve the goals of those projects."
Who Might Sponsor Groovy and/or Grails Development?
Speculation: Many organizations benefit from Groovy and Gravy, but many probably aren't prepared to invest as fully in their development as G2One, SpringSource, VMWare, and even Pivotal have been. An example of an organization with an obvious vested interest in Groovy's future is GradleWare. Ken Kousen has tweeted and written a blog post on the opportunity of acquiring Groovy and Grails sponsorship.
What does this announcement mean for Groovy's future?
Answer Mixed with Speculation: Based on Laforge's and Rocher's posts, it seems clear that the core developers plan to continue working on Groovy. However, it is understandable that if this effort is not funded (sponsored), it will have to be at a slower pace than before (I have found through personal experience that home projects take a lot longer to complete than paid projects). I believe that Groovy has strong momentum already that will continue for some time. It is vital to Gradle, is used with other open source projects and tools such as SoapUI, and could have a promising future running on Android. I primarily use Groovy for scripting and simple "glue" tools in Java applications. The language is mature and serves these purposes well and I see no reason to stop using it at this time.
What does this mean for the future of the Spring Framework?
Speculation: There is some concern that perhaps Spring Framework could be jettisoned next from Pivotal. This seems unlikely to me, but I didn't expect Pivotal to drop Groovy either. As much as I love Groovy and as much effect on Java and JVM development as I acknowledge it has had, I think Spring Framework has been even more pervasive in Java EE development than Groovy and Grails have been in Java SE and Java EE development. That stated, Pivotal has shown that they are willing to, as most successful business are, drop a product offering that is perceived as not benefiting their strategy and bottom line. I can certainly understand if this development concerns users of Spring.
Is Standards-Based More Important than Being Open Source?
Answer: This is a difficult question to answer that often depends on numerous contextual factors including the tools being compared, the expected length of life of the products being built, etc. Fortunately, we often don't have to choose between these as many reference implementations in the Java world are also open source. However, a point can be made that any product that is not standard (including commercial or proprietary) is subject to losing support or not being available any longer. The theory is that if standards-based products are used, one can then shift to another implementation of that standard if necessary. However, a standard is only as good as its implementations and if there is only one realistic implementation of a standard, there's not much of an advantage of transferability there.

Conclusion

Although I understand Pivotal's motivation for dropping Groovy, I am still sorry to hear that news. I appreciate the effort that key Groovy contributes such as Laforge and Rocher have made and I appreciate the companies that have sponsored that work. Through this sponsorship and work, we have a really nice language to use for scripting and other purposes. I hope that a sponsor can be found for Groovy, but I plan to continue to use it either way.

Thursday, January 15, 2015

2015 Starts Off Strong for Java 8

JDK 8 is starting 2015 with a surge in popularity in terms of blog posts and articles. This is coinciding with Java being automatically upgraded to JDK 8 this month. In this post, I list and briefly describe some of the numerous articles and posts on JDK 8 that have been published already in 2015.

JDK 8 Streams have been justifiably popular in recent posts. My first blog post of 2015 was Stream-Powered Collections Functionality in JDK 8 and it demonstrates performing some common functions against Java collections with greater ease and conciseness using Streams than was possible before Streams. The post Fail-fast validations using Java 8 streams looks at fluent fail-fast validation of state that was improved from its original writing based on feedback. The post Java 8: No more loops talks about streams providing concise alternatives to looping on collections. What is the difference between Collections and Streams in Java 8? and Java 8 Streams API as Friendly ForkJoinPool Facade were also posted this month.

Lambda expressions are obviously a big part of JDK 8. The post Java 8 Stream and Lambda Expressions – Parsing File Example demonstrates use of lambda expressions and streams to parse a log file. A quick overview of features new to JDK 8 is available in What Are the Most Important New Features in the Java 8 Release?. The post Java 8 Default Methods Explained in 5 minutes describes JDK 8's default methods.

Daniel Shaya warns of two potential caveats using JDK 8 functionality in the posts Java8 Sorting - Performance Pitfall and What's Stopping Me Using Java8 Lambdas - Try Debugging Them. Peter Ledbrook looks reexamines use of Groovy in JDK 8 in the post Groovy in light of Java 8.

We are only half-way through the first month of 2015 and JDK 8 continues to see increased adoption and, correspondingly, increased online coverage of its features. Most of the focus seems to be on the functional aspects that JDK 8 brings to Java.

Monday, January 12, 2015

Book Review: Learning Concurrent Programming in Scala

The subtitle of Aleksandar Prokopec's Learning Concurrent Programming in Scala (Packt Publishing, November 2014) is, "Learn the art of building intricate, modern, scalable concurrent applications using Scala." Learning Concurrent Programming in Scala consists of 9 chapters and a little over 300 substantive pages.

I was impressed that Martin Odersky, the creator and lead designer of Scala, wrote a Foreword for Learning Concurrent Programming in Scala, but was even more impressed by what Ordersky wrote regarding the book:

The book could not have a more expert author. Aleksandar Prokopec contributed to some of the most popular Scala libraries for concurrent and parallel programming. He also invented some of the most intricate data structures and algorithms. With this book, he created a readable tutorial at the same time and an authoritative reference for the area that he had worked in. I believe that Learning Concurrent Programming in Scala will be a mandatory reading for everyone who writes concurrent and parallel programs in Scala.

Preface

I have found the Preface of Packt Publishing books to be a good source of information on what to expect in the book. The 11-page Preface of Learning Concurrent Programming in Scala is full of information about the book. After a few paragraphs on why knowledge of concurrency is important and how this book will help developers learn about concurrency with Scala, the Preface describes how the book is organized. Several paragraphs here describe how the book approaches the coverage it provides and states that the "goal of this book is not to give a comprehensive overview of every dark corner of the Scala concurrency APIs. Instead, this book will teach you the most important concepts of concurrent programming."

The "What this book covers" section of the Preface states that "the book covers the fundamental concurrent APIs that are a part of the Scala runtime, introduces more complex concurrency primitives, and gives an extensive overview of high-level concurrency abstractions." This section then provides brief descriptions of each of the book's nine chapters.

The "What you need for this book" section of the Preface states that the Java Development Kit (JDK) and Simple Build Tool (SBT) are needed for the examples. This section also states that no specific IDE or text editor is assumed. Another section of the Preface explains how to install JDK 7.

The "Installing and using SBT" section of the Preface describes the Simple Build Tool (SBT) as "a command-line build tool used for Scala projects" and explains how to download and install SBT. It then describes and demonstrates creating an SBT project and writing and running a HelloWorld.scala example.

The Preface of Learning Concurrent Programming in Scala contains step-by-step instructions on how to reference and reload external libraries in SBT. The section on using SBT also demonstrates how to ensure that "most of the examples [in the book run] in the same JVM instance as SBT itself." The section of the Preface titled "Using Eclipse, IntelliJ IDEA, or another IDE" briefly discusses the virtues of using a Java IDE but also adds a caveat about running the book's examples in an IDE: "editors such as Eclipse and IntelliJ IDEA run the program inside a separate JVM process."

The "Who this book is for" section of Packt Prefaces is often a good source of information on who the author had in mind when he or she wrote the book. This section of Learning Concurrent Programming in Scala states:

This book is primarily intended for developers who have learned how to write sequential Scala programs, and wish to learn how to write correct concurrent programs. The book assumes that you have a basic knowledge of the Scala programming language.

The Preface of Learning Concurrent Programming in Scala adds, "Even with an elementary knowledge of Scala, you should have no problem understanding various concurrency topics." It also states that "a basic understanding of object-oriented or functional programming should be a sufficient prerequisite" and that "this book is a good introduction to modern concurrent programming in the broader sense."

I spent a relatively large amount of time in this review on the longer-than-normal Preface because I believe it advertises well what potential readers would want to know about Learning Concurrent Programming in Scala.

Chapter 1: Introduction

The initial chapter of Learning Concurrent Programming in Scala "explains the basics of concurrent computing and presents some Scala preliminaries required for this book." The chapter begins with a nice introduction to concurrent computing, what it is, why it is desirable, and how it is different from distributed computing. The chapter looks at some of the issues facing low-level ("traditional") concurrency constructs before the section "Modern concurrency paradigms" blends an introduction to modern concurrency paradigms and their common characteristics with descriptions of which chapters in the book discuss each paradigm as implemented in Scala in more detail.

Chapter 1's section "The Advantages of Scala" explains three reasons that Scala's "support for concurrent programming is rich and powerful." The chapter provides a brief explanation of "how Scala programs are typically executed" before presenting "a Scala primer" in 4 1/2 pages.

Chapter 2: Concurrency on the JVM and the Java Memory Model

Chapter 2 of Learning Concurrent Programming in Scala covers the "lower-level primitives" upon which "most, if not all, higher-level Scala concurrency constructs are implemented." The chapter explains "the cornerstones of concurrency on the JVM" and discusses "how they interact with some Scala-specific features."

The second chapter introduces threads and processes, describes them, and explains how they are related. Another section of the chapter explains JVM threads and how they are related operating system threads, and how Scala's threading is JVM threading. The section provides an introduction to starting and terminating threads in Scala. It also explains why "most multithreaded programs are nondeterministic".

Learning Concurrent Programming in Scala's second chapter discusses atomic operations, race conditions, and use of the synchronized keyword. There is also good coverage of deadlock, what causes deadlock, and how to avoid deadlock. The chapter also covers other basic low-level Java/Scala concurrency concepts such as guarded blocks, interrupted threads, and graceful shutdown. Chapter 2's coverage of volatile introduces the concept, compares Java's and Scala's use of it, and compares use of volatile to synchronized.

Chapter 2 concludes with coverage of the Java Memory Model (JMM), immutable objects, and final fields. This coverage describes differences in Java's final and Scala's final and looks at some other Scala language design features related to concurrency. The point of this final portion of Chapter 2 is to establish that "the only way to correctly reason about the semantics of a multithreaded program is in terms of happens-before relationships defined by the JMM."

Chapter 3: Traditional Building Blocks of Concurrency

Learning Concurrent Programming in Scala's third chapter begins by explaining that the "concurrency primitives" covered in Chapter 2 are typically avoided because "their low-level nature makes them delicate and prone to errors" and undesirable effects such as "data races, reordering, visibility, deadlocks, and nondeterminism." This introduction explains that the third chapter demonstrates how to use "more advanced building blocks of concurrency" that "capture common patterns in concurrent programs and are a lot safer to use."

Chapter 3 introduces the Executor as an abstraction that "allows programmers to encapsulate the decision of how to run concurrently executable work tasks." It then specifically focuses on the ForkJoinPool implementation of Executor and ExecutorService. This section on declaring concurrent executions includes discussion of Scala's specific ExecutionContext.

The section of Chapter 3 on working with data in a concurrent environment begins with discussion of "atomic variables that provide basic support for executing multiple memory reads and writes at once." The chapter defines atomic variables as "close cousins" of volatile variables that "are more expressive than volatile variables" and "are used to build complex concurrent operations without relying on the synchronized statement." This section discusses compare-and-set (AKA compare-and-swap) and calls CAS "a fundamental building block for lock-free programming." The Scala-specific @tailrec annotation is also introduced here.

Chapter 3's section "Lock-free programming" discusses the potential advantages realized via lock-free programming, but also explains and demonstrates why it is not always easy to write lock-free code or even prove that code is lock free. For example, the section warns of conditions with implicit locks.

There is a section in Chapter 3 called "Implementing locks explicitly" that begins with the reminder that there are times when "we really do want locks" and points out that "atomic variables allow us to implement locks that do not have to block the caller." To illustrate these points, this portion of the chapter introduces the "concurrent filesystem API" example.

Chapter 3 features a section on the "ABA problem." The author acknowledges that there is "no general technique to avoid the ABA problem," but provides some "guidelines" for "avoiding the ABA problem in a managed runtime."

There is a section of Chapter 3 devoted to Scala's lazy values. The author explains that "Lazy values are extremely useful in practice, and you will often deal with them in Scala," but warns that "using them in concurrent programs can have some unexpected interactions." A couple of important observations are explained and highlighted here:

  1. "Cyclic dependencies between lazy values are unsupported in both sequential and concurrent Scala programs. The difference is that they potentially manifest themselves as deadlocks instead of stack overflows in concurrent programming."
  2. "Never call synchronized on publicly available objects; always use a dedicated, private dummy object for synchronization."

The "Concurrent collections" section of Chapter 3 demonstrates why "predicting how multiple threads affect the collection state in the absence of synchronization is neither recommended nor possible." This section examines a couple of approaches (immutable collections and use of synchronized) and their weaknesses before moving into discussion of concurrent collections. Regarding these concurrent collections, the author states, "Conceptually, the same operations can be achieved using atomic primitives, synchronized statements, and guarded blocks, but concurrent collections ensure far better performance and scalability."

Chapter 3 features subsections of the "Concurrent collections" section that focus on concurrent queues (BlockingQueue) and concurrent sets and maps (introduces asScala). The "Concurrent traversals" subsection introduces Scala's TrieMap for collection iteration in a concurrent environment.

Chapter 3 wraps up with coverage of "creating and handling processes" using the scala.sys.process package to work with processes as a concurrency alternative other than threads. This coverage includes introduction to the ! and !! methods for running a process that returns a return code or its standard output respectively.

Chapter 4: Asynchronous Programming with Futures and Promises

As Learning Concurrent Programming in Scala's Preface states, Chapter 4 "is the first chapter that deals with a Scala-specific concurrency framework," futures and promises. The chapter describes futures, describes when they are useful, distinguishes between future values and future computations, describes callbacks on futures versus functional composition on futures, and introduces flatMap as a basic example of a Scala combinator.

Chapter 4 introduces the Promise in relation to the Future: A promise and a future represent two aspects of a single-assignment variable: the promise allows you to assign a value to the future object, whereas the future allows you to read that value." The chapter also describes how to "use promises to bridge the gap between callback-based APIs and futures" and "use promises to extend futures with additional functional combinators."

Chapter 4 includes coverage of Scala Async, which is described as "a convenient library for futures and promises that allows expressing chains of asynchronous computations more conveniently." The author adds that Scala Async is "currently not a part of the Scala standard library." The chapter concludes with very brief coverage of some alternative frameworks implementing futures and promises in Scala.

Chapter 5: Data-Parallel Collections

The subject of Chapter 5 is data parallelism. The chapter provides an overview of the Scala Collections framework, differentiates between mutable and immutable collections, and describes using the par method to get parallel collections.

Chapter 5 also looks at characteristics of the JVM and of modern computer hardware that affect concurrency and performance. It discusses why these characteristics can make it difficult to accurately measure performance.

The "Caveats of parallel collections" section of Chapter 5 describes "non-parallelizable collections," "non-parallelizable operations," "Side effects in parallel operations," "nondeterministic parallel operations," and "Commutative and associative operators." There are also sections on "Using parallel and concurrent collections together" and "Implementing custom parallel collections."

The chapter ends with a section on "Alternative data-parallel frameworks" that discusses the issues associated with autoboxing when trying to use Scala collections with primitives. This section introduces Scala Macros and the ScalaBlitz Collections Framework. The author does provide a caveat: "ScalaBlitz was in the early stages of development at the time of writing this book, and macros are partly an experimental feature of Scala."

Chapter 6: Concurrent Programming with Reactive Extensions

The sixth chapter of Learning Concurrent Programming in Scala states that the "one disadvantage of futures is that they can only deal with a single result." It introduces event-driven programming, reactive programming and Reactive Extensions. The chapter covers Observables and Subscribers and some of the nuances of using Observables in a fair amount of detail. The Scheduler is also covered with extra focus on writing custom Schedulers. The chapter concludes with coverage of Subject, which it describes as "simultaneously an Observable object and an Observer object."

Chapter 7: Software Transactional Memory

Learning Concurrent Programming in Scala's seventh chapter states that "the disadvantage of using locks is that they can easily cause deadlocks" before introducing Software Transactional Memory (STM), which it describes as "a concurrency control mechanism for controlling access to shared memory, which greatly reduces the risk of deadlocks and races." The author explains that STM provides the best of atomic variables and synchronized code blocks. The particular STM implementation focused on in Chapter 7 is ScalaSTM and the chapter provides fairly detailed coverage of different issues to consider when working with transactional memory.

Chapter 8: Actors

The author opens Chapter 8 of Learning Concurrent Programming in Scala by explaining that the actor model applies both to applications using shared memory and to distributed applications whereas techniques covered in the last few prior chapters are limited to shared memory applications. The implementation of the actor model that is focused on in Chapter 8 is Akka's actor model. The chapter covers quite a few considerations when using Akka Actors and provides references to sources of additional information.

Chapter 9: Concurrency in Practice

The stated goal of the final chapter of Learning Concurrent Programming in Scala is "to introduce the big picture of concurrent programming." This includes a summary of the "plethora of different concurrency facilities" covered in the book. This summary presents tables that compare concurrency concepts covered in the book (categorized as "data abstractions" or "concurrency frameworks") in terms of data storage, data access, concurrent computations, and concurrent execution. The author's brief analysis of these tables leads to a bullet-formatted "summary of what different concurrency libraries are good for." This is probably my favorite section of the book and I like the highlighted point made in this section: "There is no one-size-fits-all technology. Use your own best judgment when deciding which concurrency framework to use for a specific programming task."

After Chapter 9's useful summary of the concurrency constructs and frameworks covered earlier in the book, the chapter moves onto a "remote file browser" sample application to demonstrate bringing the book's concepts together. I was happy to see the author explicitly point out that although this particular example brought all of the covered concepts into play intentionally, most realistic applications should not use all of them.

After presenting the summary of topics covered in previous chapters of the book and a demonstrative example of using those topics, Chapter 9 transitions to a section on "debugging concurrent programs." This section describes "some of the typical causes of errors in concurrent programs" and discusses "different methods of dealing with them." The specific areas of focus are deadlocks (including demonstration of VisualVM with color screen snapshots), incorrect output, and performance issues.

General Observations

  • Although Learning Concurrent Programming in Scala is best suited for developers comfortable with Scala in sequential development, it contains details that may appeal to Java developers and developers of other JVM-based languages. In particular, the first three chapters provide useful details that apply generally to JVM-based programming languages with only a few Scala-specific mentions. As evidenced by the length of my review of Chapter 3, I believe this is particularly true of Chapter 3.
  • Learning Concurrent Programming in Scala contains several graphics to illustrate points being made. The focus on these graphics is definitely more on content and substance than on presentation. The graphics tend to be simple drawings in black on white (or grayscale) even in the PDF version, but there are some color screen snapshots.
  • Learning Concurrent Programming in Scala's chapters each tend to end with references to a few other resources (typically other books or Scala's or the framework's online documentation) on the subject covered in the chapter. This is useful because although the book is fairly detailed in its coverage of each framework and approach, there is more information on each framework or approach available than can fit in a single chapter.
  • Each chapter of Learning Concurrent Programming in Scala includes "Exercises" for the reader to evaluate what they've learned from the chapter.
  • Code listings in Learning Concurrent Programming in Scala are black font on white background with no color syntax highlighting and no line numbers.
  • I have found Packt Publishing books to cover a wide spectrum in terms of language clarity and finishing from very well edited, polished books (such as Java EE 7 with GlassFish 4 Application Server) to some that seem like they've had barely, if any, editing. Learning Concurrent Programming in Scala is one of the more polished and better edited Packt Publishing books that I've read; although it has a couple awkward sentences and typos, they are few and far between.
  • I'll quote again from Scala creator and expert Martin Odersky's Foreword regarding Learning Concurrent Programming in Scala because he can obviously judge a book on Scala better than I and because he summarizes my less-informed opinion on this book, "With this book, [Aleksandar Prokopec] created a readable tutorial at the same time and an authoritative reference for the area that he had worked in. I believe that Learning Concurrent Programming in Scala will be a mandatory reading for everyone who writes concurrent and parallel programs in Scala."

Conclusion

Learning Concurrent Programming in Scala delivers on its advertisement in the Preface: "By reading this book, you will gain both a solid theoretical understanding of concurrent programming, and develop a set of useful practical skills that are required to write correct and efficient concurrent programs." The early chapters do provide the introductory material and background needed for a "solid theoretical understanding of concurrent programming" and the middle and later chapters introduce tips and suggestions that help readers to understand the considerations to be made when writing concurrent programs. Although Learning Concurrent Programming in Scala is obviously focused primarily on concurrent programming with the Scala language and Scala frameworks, some of the covered concepts and topics (particularly in the first part of the book) are relevant for Java and JVM developers.

Tuesday, January 6, 2015

Book Review: Mockito Essentials

The subtitle of Sujoy Acharya's Mockito Essentials (Packt Publishing, October 2014) is: "A practical guide to get you up and running with unit testing using Mockito." The Preface and seven chapters in Mockito Essentials span approximately 190 substantive pages.

Preface

In the Preface, author Sujoy Acharya writes that Mockito Essentials "is an advanced-level guide that will help software developers to get complete expertise in unit testing using Mockito as the mocking framework." This Preface contains short summaries (typically two or three sentences) of each of the book's seven chapters.

The Preface's section "What you need for this book" lists the software needed to run examples provided in Mockito Essentials and provides links to the versions used in the book (referenced as "latest" at time of writing for some of these). These products include Mockito, JDK 7 or higher, and Eclipse (Luna 4.4.1). I would add that one also needs JUnit for most/all of the examples, PowerMock for some of the examples, and Java EE servlet JARs for some of the examples.

I quote the entire section "Who this book is for" from Mockito Essentials's Preface here because this provides a good idea of expectation of the reader and who the book in written to:

This book is for advanced to novice level software testers/developers using Mockito in the JUnit framework, with a reasonable knowledge level and understanding of unit testing elements and applications. It is ideal for developers who have some experience in Java application development as well as some basic knowledge of JUnit testing, but it covers the basic fundamentals of JUnit testing and the Mockito framework to get you acquainted with these concepts before using them.

Mockito Essentials's Preface also states that a PDF can be downloaded with versions of the book's graphics in color. I downloaded this PDF from the provided link and verified that most of the images are in color. I was also happy to see the the PDF version of the book that I reviewed already had these graphics in color. For those with printed copies of the book, however, this separate PDF with colored graphics could be helpful.

Chapter 1: Exploring Test Doubles

Mockito Essential's initial chapter, for the most part, does not cover Mockito specifically other than referencing when general unit testing practices and concepts are implemented by Mockito. Instead, the first chapter provides an overview of unit testing in general. The chapter begins with a look at why unit testing is valuable and identifies the characteristics commonly associated with effective unit tests. This short section is useful for those new to unit testing, but could probably be skipped for those familiar with unit testing concepts.

The next major section of the first chapter is "Understanding test doubles" and it's much lengthier than the first section on unit testing advantages and effective unit testing characteristics. This second section provides code listings and text explanations of the types of test doubles (term coined in XUnit Test Patterns) described in the chapter: dummy objects, stubs, spies, mock objects, and fake objects.

Chapter 2: Socializing with Mockito

Because the initial chapter of Mockito Essentials is about generic unit testing, Chapter 2 is the first chapter of the book to focus on Mockito. The chapter begins by providing links to both the Mockito main page and Wiki on github and describing Mockito and its open source (MIT) license.

Chapter 2's section on "Exploring unit test qualities" looks at "principles for readability, flexibility, and maintainability" in unit tests. Some of this content repeats ideas from the first chapter, but it's a quick section. The section "Realizing the significance of Mockito" discusses how Mockito addresses "testing-unfriendly behaviors" and interactions "with testing-unfriendly external objects" by mocking those things so the unit tests don't have to be hampered by them.

The "Working with Mockito" section of Chapter 2 starts off by displaying the Mockito logo (in color in PDF version) and then dives into specific basics of using Mockito. This section covers downloading Mockito and configuring it as a dependency in Eclipse, Maven, and Gradle. The subsection on "Stubbing method calls" provides an example of an application for testing that consists of a jQuery client that communicates with a back-end that appears to be based on Spring Web MVC. The example then demonstrates using Mockito to mock and stub classes used by the back-end class to be tested. Code demonstrates using Mockito.mock(Class) or using static imports so that it can be simply called as mock(Class). This section also introduces use of the @Mock annotation.

Chapter 2 introduces Mockito's "trigger" method when(T) along with the associated "trigger action" methods thenReturn(-), thenThrow(-), thenAnswer(-), and thenCallRealMethod(-). Chapter 2 provides an example of using a unit test method annotated with JUnit 4's @Test(expected="") along with Mockito's thenThrow method.

Mockito Essentials's second chapter illustrates use of and explains Mockito's argument matchers and references org.mockito.Matchers documentation. It then introduces ArgumentMatcher as a "Hamcrest matcher with the predefined describeTo() method" that "allows us to create our own custom argument matchers." The chapter then describes and illustrates the use of JUnit 4 with some common Hamcrest matchers such as equalTo, is, not, either, both, anyOf, and allOf.

The section in Chapter 2 called "Verifying method calls" discusses use of Mockito's static method verify to "verify the invocation" of a method on a mock object and describes situations where this might be desirable.

Chapter 2's final section ("Understanding the Mockito architecture") may have been the most (pleasantly) surprising one for me. I like the author's use of a sequence diagram to illustrate how Mockito uses CGLib (Byte Code Generation Library) to "[apply] the proxy design pattern to create mock objects." I also like that the author provides explanations and code listings that demonstrate how to "create a custom mocking framework to handle external dependencies" with Java reflection and dynamic proxies. Most readers trying to learn basics of Mockito probably don't require this knowledge, but I think it's helpful to understand the tool at the deeper level that this section provides.

Chapter 3: Accelerating Mockito

The third chapter of Mockito Essentials is intended to cover more advanced Mockito topics and begins by addressing the well-known issue of unit testing void methods (including throwing exceptions from void methods and void method callbacks). This part of the chapter also looks at doNothing(), doReturn(), ArgumentCaptor, and InOrder.

Chapter 3 features a section on "spying objects" that states, "A Mockito spy allows us to use real objects instead of mocks by replacing some of the methods with stubbed ones. This behavior allows us to test the legacy code." Text and code listings demonstrate use of Mockito's spy facility and there is a warning to use doReturn() instead of thenReturn() when working with Mockito Spy.

Chapter 3's section "Exploring Mockito Annotations" looks at three Mockito annotations such (@Captor, @Spy, and @InjectMocks). The section "Changing the default Mockito settings" describes configuration of default values returned by "nonstubbed methods of a mock object" using the five available values of the Answers enum.

Chapter 3 introduces Mockito.reset(T...) and provides a caution regarding its use similar to that in the method's Javadoc documentation. A short section of Chapter 3 covers inline stubbing. Another short section describes use of Mockito.mockingDetails (introduced in Mockito 1.9.5) to determine if an object is a mock or spy.

Chapter 4: Behavior-driven Development with Mockito

The fourth chapter of Mockito Essentials opens with the introductory sentence: "This chapter explores Behavior-driven Development (BDD) and how BDD can help you minimize project failure risks." The chapter describes top-down and bottom-up approaches and problems with each to set the context for BDD. The chapter then introduces behavior-driven development with references to Martin Fowler's TestDrivenDevelopment and domain driven design and to Agile Methods for Software Development. The chapter then references and summarizes Dan North's Introducing BDD.

After summarizing BDD, Chapter 4 moves onto "exercising BDD with Mockito." This section introduces BDDMockito and its static given(T) method. An example of using this class and method are included and the BDD-supporting Mockito syntax is briefly described.

Mockito Essentials's Chapter 4's coverage of Mockito BDD support is a relatively small part of the chapter. For developers entirely new to BDD, the entire chapter is worth reading to get an overview of the problems BDD is designed to address. For those familiar with BDD concepts already who just want to see how Mockito can be used to implement BDD testing, the last 3 pages of the chapter should be sufficient. For developers not interested in BDD, the entire chapter could be skipped.

Chapter 5: Unit Testing the Legacy Code with Mockito

Chapter 5 of Mockito Essentials begins with an introductory description of legacy code, references and quotes from the book Working Effectively with Legacy Code, and describes and why legacy code can be do difficult to work with. The chapter then describes how testing frameworks and the Java language can require developers to change otherwise good designs for testability. Given this challenge, the chapter introduces PowerMock.

Mockito Essentials's fifth chapter states, "Mockito could do the things PowerMock does, but it doesn't because those are test smells and strong indications that you are following a poor design." The author goes onto describe some of the typical ways code can be refactored to be more testable without use of PowerMock. The author then asserts, "PowerMock is a fallback for legacy code that they should aim to stop using with time." With these caveats stated, the chapter does a nice job of concisely describing what PowerMock is and how it is able to provide "its special mocking capabilities."

The fifth chapter provides links for information on PowerMock and for downloading PowerMock and then describes using PowerMockito. The chapter features several sections that describe how to apply "mocking capabilities of PowerMockito for untestable constructs" such as stubbing static methods, suppressing static blocks, suppressing a superclass constructor and class's own constructor, suppressing methods, stubbing private methods and final methods, and mocking final classes.

The section of the fifth chapter on "designing for testability with Mockito" "covers the design for testability, or rather, things to avoid in code." This section is not necessarily specific to Mockito because it covers issues common to most mocking frameworks and unit testability. This discussion is useful in terms of describing code patterns and idioms that are not mockable and presenting one or more alternatives for improving their ability to be mocked. Mockito is mentioned specifically during the discussion, but mostly to remind the reader that the code constructs to be tested need to be refactored for mocking with Mockito to be possible. It is repeatedly emphasized in this section that placing "testing impediments" inside these code constructs that are not mockable prevents them from being unit tested and moving those testing impediments to code that is mockable allows other parts to be unit tested while mocking the testing impediments instead of dealing with them directly.

Chapter 6: Developing SOA with Mockito

The sixth chapter of Mockito Essentials opens by stating that the chapter "explores web services, web service styles—SOAP-based and RESTful, web service components, and building and unit testing SOAP and RESTful web services with Mockito." The chapter begins with a brief summary of Service-Oriented Architecture (SOA) and the advantages and characteristics often associated with SOA. It moves from this brief introduction to SOA to web services with the segue that "SOA can rely on web services for interoperability between heterogeneous applications and technologies."

Chapter 6's introduction to web services presents basic characteristics of web services without distinction between SOAP-based web services and REST-based web services. It then introduces JAX-WS and JAX-RS.

Mockito Essentials's sixth chapter begins its deeper dive into SOAP-based web services by listing and briefly describing characteristics of WSDL and briefly describing the two most common approaches to building SOAP-based web services with JAX-WS (top-down/contract-first and bottom-up/Java-first). The section on JAX-WS development provides thorough coverage with text and screen snapshots how to use Eclipse with Apache Tomcat and Apache Axis to write and deploy a JAX-WS/SOAP-based web service and client. This section also describes and illustrates refactoring the code to make it more testable and then testing it and using Mockito for mocking. I have found that the tools are what making working with JAX-WS bearable, so it's not surprising that this is a tool-heavy section and one of the few sections of Mockito Essentials where Eclipse-specific behavior is significant to the narrative.

Chapter 6 also has an in-depth look at developing and testing a REST-based web service with JAX-RS. This section begins with a high-level overview of REST and major concepts that are fundamental to REST such as HTTP, URIs, HTTP status, HATEOAS, etc. Coverage then moves to "building a RESTful web service with Spring Framework." An early sentence is this section states, "This section describes the Spring MVC architecture and how RESTful web applications can be unit tested using Spring MVC." Like the section on JAX-WS, this section provides a thorough overview of developing and testing a JAX-RS/Spring-based RESTful web service using Mockito to mock certain aspects.

Chapter 7: Unit Testing GWT Code with Mockito

The final chapter of Mockito Essentials "provides an overview of Ajax/GWT, explains the Model View Presenter (MVP) pattern and loose coupling, and provides examples and strategies to mock GWT widgets using Mockito." The chapter begins with an introduction to Ajax and an example of JavaScript code using Ajax (XMLHttpRequest).

Chapter 7 describes how Google Web Toolkit (GWT) can be appealing because it hides some of JavaScript's quirks in terms of development and testing and lists several advantages of GWT. The section "Learning the MVP pattern" describes using GWT to implement an application with a Model-View-Presenter design pattern and provides background explanation regarding MVP.

Chapter 7's section "Developing a GWT application using MVP" demonstrates use of Eclipse to create a web application using Google Web Toolkit, compiling the Java code into JavaScript code, and building the overall application. This is a lengthy and detailed section that could be seen as a tutorial on using GWT. It's a completely different section, "Unit testing the GWT code," that addresses unit testing (and using Mockito) the GWT code. In addition to discussing use of PockerMockito with GWT testing, this section introduces GWTMockUtilities and GWTTestCase. I've had little exposure to Google Web Toolkit and did not realize its significant support for unit testing. I also appreciated this section's reference to HtmlUnit.

The "Summary" section of Chapter 7 is really a book summary more than a chapter summary.

General Observations

  • The code listings in the PDF version of Mockito Essentials that I reviewed are black font on white background with no color syntax and no line numbers. There is bold emphasis in many of the Java listings for Java keywords, class attributes' names, variables' names, and for literal strings.
  • Although Eclipse is the IDE used and referenced by the author, a Java developer should be able to use his or her favorite IDE. Most of the references to Eclipse are easily translated to other modern Java IDEs such as NetBeans and IntelliJ IDEA. The notable exceptions to this are the demonstrations of using Eclipse to generate JAX-WS artifacts and to generate Google Web Toolkit applications.
  • Although most of Mockito Essentials is relatively easy to read (I have included several direct quotes in this review to try to establish the relatively easy-to-understand writing style), there are some typos and significantly awkward sentences that can make a few things a bit more difficult to understand and lead me to believe another edit would have been in order. Here are some examples to provide an idea of the level of the typos and awkward sentences:
    • A method name "FindalMethodDependency" is referenced (extra lowercase "d")
    • "This is the better way is to refactor the source and make more test friendly."
    • "Building an application in an unplanned way suffers many problems, such as adding new features, making a huge effort as the architecture becomes rigid, maintaining the software (activities such as bug fixing) can turn into a nightmare, white box testing or unit testing the code becomes very difficult, and conflict and integration issues when many people work with the same or similar features."
    • "@Test(execpted=)" (a juxtaposition of characters I often type myself and the reason I left the IDE's code completion handle this one)
    • "Sometimes, we cannot unit test our code, as the special Java constructs hide the testing impediments (a LAN connection or database connection in a private method, final method, static method, or initialization block), such as private methods, final methods and classes, static methods and initialization blocks, new operator, and so on." (the constructs in parentheses are the "testing impediments" and the constructs at the end of the sentence are the "special Java constructs")
  • I liked that Mockito Essentials contains links to related tutorials, blog posts, articles, and tools' web sites. This is especially handy in the electronic edition with easy copy-and-paste.
  • Despite its title of Mockito Essentials, this book covers more than Mockito essentials.
    • It provides relatively substantial introductions to SOAP-based and REST-based web services and to developing web applications with Google Web Toolkit and Spring MVC. Although these lengthy introductions ultimately lead to some discussion about unit testing and mocking those types of applications, there is a significant amount of time spent on the development before even getting to the testing. This can be seen as a positive or a negative depending on the reader's perspective. If a potential reader knows very little about one of these, he or she might appreciate the significant background. If a reader already knows these well or doesn't care to learn them, these sections are probably extraneous.
    • For the reader interested in "core Mockito," the most interesting chapters of Mockito Essentials will be Chapter 1, Chapter 2, Chapter 3, and Chapter 5. Chapter 1 is more general than simply Mockito and provides good background for those new to mocking and Mockito and could probably be skipped by someone with basic familiarity with mocking and Mockito.
    • Chapter 4 will be of most interest to those practicing behavior-driven development (BDD) or interested in learning more about it and potentially practicing it. It's a relatively short chapter and provides an interesting BDD discussion and practical examples.
    • Chapter 6 will be of primary interest to those developing and testing web services (SOAP-based or REST-based). My guess is that most developers interested in this chapter will already be familiar with JAX-RS or JAX-WS and won't need the detailed introduction to generation of web services with those APIs, but that introductory information might be useful to the reader with less familiarity who wants a taste of web service development in Java.
    • Chapter 7, like the previous chapter on web services, is going to be of most interest to developers who use Google Web Toolkit. As with the web services chapter, the extensive description of generating a GWT-based application is probably not necessary for most of those folks.
    • I liked some of the broad categories and topics covered in Mockito Essentials, but think it's important to emphasize that the book, at times, is broader than Mockito. Although this is always done with the intent of providing examples of using Mockito to mock portions of the generated examples, there are chapters where the general development discussion is longer than the unit testing discussion.
  • I am comfortable recommending Mockito Essentials for the developer who wishes to learn more about basics and uses of Mockito. The most significant caveat for me in making this recommendation is that Mockito provides outstanding online documentation and many of Mockito's most common use cases are well described on its API page (contains 22 code snippets and descriptions as of this writing).

Conclusion

Mockito Essentials covers the basics of Mockito and presents some realistic examples of how Mockito can be used to mock portions of Java-based applications that would otherwise violate fundamental tenets of unit testing and make unit tests less effective. Mockito Essentials provides detailed examples of applying Mockito with other tools and frameworks such as PowerMock, Google Web ToolKit, JAX-WS, and JAX-RS. Along the way, many of the commonly accepted practices for writing effective tests and for effective mocking are introduced and explained.

Monday, January 5, 2015

Stream-Powered Collections Functionality in JDK 8

This post presents application of JDK 8-introduced Streams with Collections to more concisely accomplish commonly desired Collections-related functionality. Along the way, several key aspects of using Java Streams will be demonstrated and briefly explained. Note that although JDK 8 Streams provide potential performance benefits via parallelization support, that is not the focus of this post.

The Sample Collection and Collection Entries

For purposes of this post, instances of Movie will be stored in a collection. The following code snippet is for the simple Movie class used in these examples.

Movie.java
package dustin.examples.jdk8.streams;

import java.util.Objects;

/**
 * Basic characteristics of a motion picture.
 * 
 * @author Dustin
 */
public class Movie
{
   /** Title of movie. */
   private String title;

   /** Year of movie's release. */
   private int yearReleased;

   /** Movie genre. */
   private Genre genre;

   /** MPAA Rating. */
   private MpaaRating mpaaRating;

   /** imdb.com Rating. */
   private int imdbTopRating;

   public Movie(final String newTitle, final int newYearReleased,
                final Genre newGenre, final MpaaRating newMpaaRating,
                final int newImdbTopRating)
   {
      this.title = newTitle;
      this.yearReleased = newYearReleased;
      this.genre = newGenre;
      this.mpaaRating = newMpaaRating;
      this.imdbTopRating = newImdbTopRating;
   }

   public String getTitle()
   {
      return this.title;
   }

   public int getYearReleased()
   {
      return this.yearReleased;
   }

   public Genre getGenre()
   {
      return this.genre;
   }

   public MpaaRating getMpaaRating()
   {
      return this.mpaaRating;
   }

   public int getImdbTopRating()
   {
      return this.imdbTopRating;
   }

   @Override
   public boolean equals(Object other)
   {
      if (!(other instanceof Movie))
      {
         return false;
      }
      final Movie otherMovie = (Movie) other;
      return   Objects.equals(this.title, otherMovie.title)
            && Objects.equals(this.yearReleased, otherMovie.yearReleased)
            && Objects.equals(this.genre, otherMovie.genre)
            && Objects.equals(this.mpaaRating, otherMovie.mpaaRating)
            && Objects.equals(this.imdbTopRating, otherMovie.imdbTopRating);
   }

   @Override
   public int hashCode()
   {
      return Objects.hash(this.title, this.yearReleased, this.genre, this.mpaaRating, this.imdbTopRating);
   }

   @Override
   public String toString()
   {
      return "Movie: " + this.title + " (" + this.yearReleased + "), " + this.genre + ", " + this.mpaaRating + ", "
            + this.imdbTopRating;
   }
}

Multiple instances of Movie are placed in a Java Set. The code that does this is shown below because it also shows the values set in these instances. This code declares the "movies" as a static field on the class and then uses a static initialization block to populate that field with five instances of Movie.

Populating movies Set with Instances of Movie Class
private static final Set<Movie> movies;

static
{
   final Set<Movie> tempMovies = new HashSet<>();
   tempMovies.add(new Movie("Raiders of the Lost Ark", 1981, Genre.ACTION, MpaaRating.PG, 31));
   tempMovies.add(new Movie("Star Wars: Episode V - The Empire Strikes Back", 1980, Genre.SCIENCE_FICTION, MpaaRating.PG, 12));
   tempMovies.add(new Movie("Inception", 2010, Genre.SCIENCE_FICTION, MpaaRating.PG13, 13));
   tempMovies.add(new Movie("Back to the Future", 1985, Genre.SCIENCE_FICTION, MpaaRating.PG, 49));
   tempMovies.add(new Movie("The Shawshank Redemption", 1994, Genre.DRAMA, MpaaRating.R, 1));
   movies = Collections.unmodifiableSet(tempMovies);
}
A First Look at JDK 8 Streams with Filtering

One type of functionality commonly performed on collections is filtering. The next code listing shows how to filter the "movies" Set for all movies that are rated PG. I'll highlight some observations that can be made from this code after the listing.

Filtering Movies with PG Rating
/**
 * Demonstrate using .filter() on Movies stream to filter by PG ratings
 * and collect() as a Set.
 */
private void demonstrateFilteringByRating()
{
   printHeader("Filter PG Movies");
   final Set<Movie> pgMovies =
      movies.stream().filter(movie -> movie.getMpaaRating() == MpaaRating.PG)
            .collect(Collectors.toSet());
   out.println(pgMovies);
}

One thing that this first example includes that all examples in this post will also have is the invocation of the method stream() on the collection. This method returns an object implementing the java.util.Stream interface. Each of these returned Streams use the collection the stream() method is invoked against as their data source. All operations at this point are on the Stream rather than on the collection which is the source of the data for the Stream.

In the code listing above, the filter(Predicate) method is called on the Stream based on the "movies" Set. In this case, the Predicate is given by the lambda expression movie -> movie.getMpaaRating() == MpaaRating.PG. This fairly readable representation tells us that the predicate is each movie in the underlying data that has an MPAA rating of PG.

The Stream.filter(Predicate) method is an intermediate operation, meaning that it returns an instance of Stream that can be further operated on by other operations. In this case, there is another operation, collect(Collector), that is called upon the Stream returned by Stream.filter(Predicate). The Collectors class features numerous static methods that each provide an implementation of Collector that can be provided to this collect(Collector) method. In this case, Collectors.toSet() is used to get a Collector that will instruct the stream results to be arranged in a Set. The Stream.collect(Collector) method is a terminal operation, meaning that it's the end of the line and does NOT return a Stream instance and so no more Stream operations can be executed after this collect has been executed.

When the above code is executed, it generates output like the following:

===========================================================
= Filter PG Movies
===========================================================
[Movie: Raiders of the Lost Ark (1981), ACTION, PG, 31, Movie: Back to the Future (1985), SCIENCE_FICTION, PG, 49, Movie: Star Wars: Episode V - The Empire Strikes Back (1980), SCIENCE_FICTION, PG, 12]
Filtering for Single (First) Result
/**  
 * Demonstrate using .filter() on Movies stream to filter by #1 imdb.com
 * rating and using .findFirst() to get first (presumably only) match.
 */
private void demonstrateSingleResultImdbRating()
{
   printHeader("Display One and Only #1 IMDB Movie");
   final Optional<Movie> topMovie =
      movies.stream().filter(movie -> movie.getImdbTopRating() == 1).findFirst();
   out.println(topMovie.isPresent() ? topMovie.get() : "none");
}

This example shares many similarities with the previous example. Like that previous code listing, this listing shows use of Stream.filter(Predicate), but this time the predicate is the lambda expression movie -> movie.getImdbTopRating() == 1). In other words, the Stream resulting from this filter should contain only instances of Movie that have the method getImdbTopRating() returning the number 1. The terminating operation Stream.findFirst() is then executed against the Stream returned by Stream.filter(Predicate). This returns the first entry encountered in the stream and, because our underlying Set of Movie instances only had one instance with IMDb Top 250 Rating of 1, it will be the first and only entry available in the stream resulting from the filter.

When this code listing is executed, its output appears as shown next:

===========================================================
= Display One and Only #1 IMDB Movie
===========================================================
Movie: The Shawshank Redemption (1994), DRAMA, R, 1

The next code listing illustrates use of Stream.map(Function).

/**
 * Demonstrate using .map to get only specified attribute from each
 * element of collection.
 */
private void demonstrateMapOnGetTitleFunction()
{
   printHeader("Just the Movie Titles, Please");
   final List<String> titles = movies.stream().map(Movie::getTitle).collect(Collectors.toList());
   out.println(titles.size() + " titles (in " + titles.getClass() +"): " + titles);
}

The Stream.map(Function) method acts upon the Stream against which it is called (in our case, the Stream based on the underlying Set of Movie objects) and applies the provided Function against that Steam to return a new Stream that results from the application of that Function against the source Stream. In this case, the Function is represented by Movie::getTitle, which is an example of a JDK 8-introduced method reference. I could have used the lambda expression movie -> movie.getTitle() instead of the method reference Movie::getTitle for the same results. The Method References documentation explains that this is exactly the situation a method reference is intended to address:

You use lambda expressions to create anonymous methods. Sometimes, however, a lambda expression does nothing but call an existing method. In those cases, it's often clearer to refer to the existing method by name. Method references enable you to do this; they are compact, easy-to-read lambda expressions for methods that already have a name.

As you might guess from its use in the code above, Stream.map(Function) is an intermediate operation. This code listing applies a terminating operation of Stream.collect(Collector) just as the previous two examples did, but in this case it's Collectors.toList() that is passed to it and so the resultant data structure is a List rather than a Set.

When the above code listing is run, its output looks like this:

===========================================================
= Just the Movie Titles, Please
===========================================================
5 titles (in class java.util.ArrayList): [Inception, The Shawshank Redemption, Raiders of the Lost Ark, Back to the Future, Star Wars: Episode V - The Empire Strikes Back]
Reduction (to Single Boolean) Operations anyMatch and allMatch

The next example does not use Stream.filter(Predicate), Stream.map(Function), or even the terminating operation Stream.collect(Collector) that were used in most of the previous examples. In this example, the reduction and terminating operations Stream.allMatch(Predicate) and Stream.anyMatch(Predicate) are applied directly on the Stream based on our Set of Movie objects.

/**
 * Demonstrate .anyMatch and .allMatch on stream.
 */
private void demonstrateAnyMatchAndAllMatchReductions()
{
   printHeader("anyMatch and allMatch");
   out.println("All movies in IMDB Top 250? " + movies.stream().allMatch(movie -> movie.getImdbTopRating() < 250));
   out.println("All movies rated PG? " + movies.stream().allMatch(movie -> movie.getMpaaRating() == MpaaRating.PG));
   out.println("Any movies rated PG? " + movies.stream().anyMatch(movie -> movie.getMpaaRating() == MpaaRating.PG));
   out.println("Any movies not rated? " + movies.stream().anyMatch(movie -> movie.getMpaaRating() == MpaaRating.NA));
}

The code listing demonstrates that Stream.anyMatch(Predicate) and Stream.allMatch(Predicate) each return a boolean indicating, as their names respectively imply, whether the Stream has at least one entry matching the predicate or all of the entries matching the predicate. In this case, all movies come from the imdb.com Top 250, so that "allMatch" will return true. Not all of the movies are rated PG, however, so that "allMatch" returns false. Because at least one movie is rated PG, the "anyMatch" for PG rating predicate returns true, but the "anyMatch" for N/A rating predicate returns false because not even one movie in the underlying Set had a MpaaRating.NA rating. The output from running this code is shown next.

===========================================================
= anyMatch and allMatch
===========================================================
All movies in IMDB Top 250? true
All movies rated PG? false
Any movies rated PG? true
Any movies not rated? false
Easy Identification of Minimum and Maximum

The final example of applying the power of Stream to collection manipulation in this post demonstrates use of Stream.reduce(BinaryOperator) with two different instances of BinaryOperator: Integer::min and Integer::max.

private void demonstrateMinMaxReductions()
{
   printHeader("Oldest and Youngest via reduce");
   // Specifying both Predicate for .map and BinaryOperator for .reduce with lambda expressions
   final Optional<Integer> oldestMovie = movies.stream().map(movie -> movie.getYearReleased()).reduce((a,b) -> Integer.min(a,b));
   out.println("Oldest movie was released in " + (oldestMovie.isPresent() ? oldestMovie.get() : "Unknown"));
   // Specifying both Predicate for .map and BinaryOperator for .reduce with method references
   final Optional<Integer> youngestMovie = movies.stream().map(Movie::getYearReleased).reduce(Integer::max);
   out.println("Youngest movie was released in " + (youngestMovie.isPresent() ? youngestMovie.get() : "Unknown"));
}

This convoluted example illustrates using Integer.min(int,int) to find the oldest movie in the underlying Set and using Integer.max(int,int) to find the newest movie in the Set. This is accomplished by first using Stream.map to get a new Stream of Integers provided by the release year of each Movie in the original Stream. This Stream of Integers then has Stream.reduce(BinaryOperation) operation executed with the static Integer methods used as the BinaryOperation.

For this code listing, I intentionally used lambda expressions for the Predicate and BinaryOperation in calculating the oldest movie (Integer.min(int,int)) and used method references instead of lambda expressions for the Predicate and BinaryOperation used in calculating the newest movie (Integer.max(int,int)). This proves that either lambda expressions or method references can be used in many cases.

The output from running the above code is shown next:

===========================================================
= Oldest and Youngest via reduce
===========================================================
Oldest movie was released in 1980
Youngest movie was released in 2010
Conclusion

JDK 8 Streams introduce a powerful mechanism for working with Collections. This post has focused on the readability and conciseness that working against Streams brings as compared to working against Collections directly, but Streams offer potential performance benefits as well. This post has attempted to use common collections handling idioms as examples of the conciseness that Streams bring to Java. Along the way, some key concepts associated with using JDK streams have also been discussed. The most challenging parts about using JDK 8 Streams are getting used to new concepts and new syntax (such as lambda expression and method references), but these are quickly learned after playing with a couple examples. A Java developer with even light experience with the concepts and syntax can explore the Stream API's methods for a much lengthier list of operations that can be executed against Streams (and hence against collections underlying those Streams) than illustrated in this post.

Additional Resources

The purpose of this post was to provide a light first look at JDK 8 streams based on simple but fairly common collections manipulation examples. For a deeper dive into JDK 8 streams and for more ideas on how JDK 8 streams make Collections manipulation easier, see the following articles: