Saturday, April 18, 2015

The JDK 8 SummaryStatistics Classes

Three of the new classes introduced in JDK 8 are DoubleSummaryStatistics, IntSummaryStatistics, and LongSummaryStatistics of the java.util package. These classes make quick and easy work of calculating total number of elements, minimum value of elements, maximum value of elements, average value of elements, and the sum of elements in a collection of doubles, integers, or longs. Each class's class-level Javadoc documentation begins with the same single sentence that succinctly articulates this, describing each as "A state object for collecting statistics such as count, min, max, sum, and average."

The class-level Javadoc for each of these three classes also states of each class, "This class is designed to work with (though does not require) streams." The most obvious reason for the inclusion of these three types of SummaryStatistics classes is to be used with streams that were also introduced with JDK 8.

Indeed, each of the three class's class-level Javadoc comments also provide an example of using each class in conjunction with streams of the corresponding data type. These examples demonstrate invoking the respective Streams' collect(Supplier, BiConsumer, BiConsumer) method (a mutable reduction terminal stream operation) and passing each SummaryStatistics class's new instance (constructor), accept, and combine methods (as method references) to this collect method as its "supplier", "accumulator", and "combiner" arguments respectively.

The rest of this post demonstrates use of IntSummaryStatistics, LongSummaryStatistics, and DoubleSummaryStatistics. Several of these examples will reference a map of The X-Files television series's seasons to the Nielsen rating for that season's premiere. This is shown in the next code listing.

Declaring and Initializing xFilesSeasonPremierRatings
/**
 * Maps the number of each X-Files season to the Nielsen rating
 * (millions of viewers) for the premiere episode of that season.
 */
private final static Map<Integer, Double> xFilesSeasonPremierRatings;

static
{
   final Map<Integer, Double> temporary = new HashMap<>();
   temporary.put(1, 12.0);
   temporary.put(2, 16.1);
   temporary.put(3, 19.94);
   temporary.put(4, 21.11);
   temporary.put(5, 27.34);
   temporary.put(6, 20.24);
   temporary.put(7, 17.82);
   temporary.put(8, 15.87);
   temporary.put(9, 10.6);
   xFilesSeasonPremierRatings = Collections.unmodifiableMap(temporary);
}

The next code listing uses the map created in the previous code listing, demonstrates applying DoubleSummaryStatistics to stream of the "values" portion of the map, and is very similar to the examples provided in the Javadoc for the three SummaryStatistics classes. The DoubleSummaryStatistics class, the IntSummaryStatistics class, and the LongSummaryStatistics class have essentially the same fields, methods, and APIs (only differences being the supported datatypes). Therefore, even though this and many of this post's examples specifically use DoubleSummaryStatistics (because the X-Files's Nielsen ratings are doubles), the principles apply to the other two integral types of SummaryStatistics classes.

Using DoubleSummaryStatistics with a Collection-based Stream
/**
 * Demonstrate use of DoubleSummaryStatistics collected from a
 * Collection Stream via use of DoubleSummaryStatistics method
 * references "new", "accept", and "combine".
 */
private static void demonstrateDoubleSummaryStatisticsOnCollectionStream()
{
   final DoubleSummaryStatistics doubleSummaryStatistics =
      xFilesSeasonPremierRatings.values().stream().collect(
         DoubleSummaryStatistics::new,
         DoubleSummaryStatistics::accept,
         DoubleSummaryStatistics::combine);
   out.println("X-Files Season Premieres: " + doubleSummaryStatistics);
}

The output from running the above demonstration is shown next:

X-Files Season Premieres: DoubleSummaryStatistics{count=9, sum=161.020000, min=10.600000, average=17.891111, max=27.340000}

The previous example applied the SummaryStatistics class to a stream based directly on a collection (the "values" portion of a Map). The next code listing demonstrates a similar example, but uses an IntSummaryStatistics and uses a stream's intermediate map operation to specify which Function to invoke on the collection's objects for populating the SummaryStatistics object. In this case, the collection being acted upon in a Set<Movie> as returned by the Java8StreamsMoviesDemo.getMoviesSample() method and spelled out in my blog post Stream-Powered Collections Functionality in JDK 8.

Using IntSummaryStatistics with Stream's map(Function)
/**
 * Demonstrate collecting IntSummaryStatistics via mapping of
 * certain method calls on objects within a collection and using
 * lambda expressions (method references in particular).
 */
private static void demonstrateIntSummaryStatisticsWithMethodReference()
{
   final Set<Movie> movies = Java8StreamsMoviesDemo.getMoviesSample();
   IntSummaryStatistics intSummaryStatistics =
      movies.stream().map(Movie::getImdbTopRating).collect(
         IntSummaryStatistics::new, IntSummaryStatistics::accept, IntSummaryStatistics::combine);
   out.println("IntSummaryStatistics on IMDB Top Rated Movies: " + intSummaryStatistics);
}

When the demonstration above is executed, its output looks like this:

IntSummaryStatistics on IMDB Top Rated Movies: IntSummaryStatistics{count=5, sum=106, min=1, average=21.200000, max=49}

The examples so far have demonstrated using the SummaryStatistics classes in their most common use case (in conjunction with data from streams based on existing collections). The next example demonstrates how a DoubleStream can be instantiated from scratch via use of DoubleStream.Builder and then the DoubleStream's summaryStatistics() method can be called to get an instance of DoubleSummaryStatistics.

Obtaining Instance of DoubleSummaryStatistics from DoubleStream
/**
 * Uses DoubleStream.builder to build an arbitrary DoubleStream.
 *
 * @return DoubleStream constructed with hard-coded doubles using
 *    a DoubleStream.builder.
 */
private static DoubleStream createSampleOfArbitraryDoubles()
{
   return DoubleStream.builder().add(12.4).add(13.6).add(9.7).add(24.5).add(10.2).add(3.0).build();
}

/**
 * Demonstrate use of an instance of DoubleSummaryStatistics
 * provided by DoubleStream.summaryStatistics().
 */
private static void demonstrateDoubleSummaryStatisticsOnDoubleStream()
{
   final DoubleSummaryStatistics doubleSummaryStatistics =
      createSampleOfArbitraryDoubles().summaryStatistics();
   out.println("'Arbitrary' Double Statistics: " + doubleSummaryStatistics);
}

The just-listed code produces this output:

'Arbitrary' Double Statistics: DoubleSummaryStatistics{count=6, sum=73.400000, min=3.000000, average=12.233333, max=24.500000}

Of course, similarly to the example just shown, IntStream and IntStream.Builder can provide an instance of IntSummaryStatistics and LongStream and LongStream.Builder can provide an instance of LongSummaryStatistics.

One doesn't need to have a collection stream or other instance of BaseStream to use the SummaryStatistics classes because they can be instantiated directly and used directly for the predefined numeric statistical operations. The next code listing demonstrates this by directly instantiating and then populating an instance of DoubleSummaryStatistics.

Directly Instantiating DoubleSummaryStatistics
/**
 * Demonstrate direct instantiation of and population of instance
 * of DoubleSummaryStatistics instance.
 */
private static void demonstrateDirectAccessToDoubleSummaryStatistics()
{
   final DoubleSummaryStatistics doubleSummaryStatistics =
      new DoubleSummaryStatistics();
   doubleSummaryStatistics.accept(5.0);
   doubleSummaryStatistics.accept(10.0);
   doubleSummaryStatistics.accept(15.0);
   doubleSummaryStatistics.accept(20.0);
   out.println("Direct DoubleSummaryStatistics Usage: " + doubleSummaryStatistics);
}

The output from running the previous code listing is shown next:

Direct DoubleSummaryStatistics Usage: DoubleSummaryStatistics{count=4, sum=50.000000, min=5.000000, average=12.500000, max=20.000000}

As done in the previous code listing for a DoubleSummaryStatistics, the next code listing instantiates a LongSummaryStatistics directly and populates it). This example also demonstrates how the SummaryStatistics classes provide individual methods for requesting individual statistics.

Directly Instantiating LongSummaryStatistics / Requesting Individual Statistics
/**
 * Demonstrate use of LongSummaryStatistics with this particular
 * example directly instantiating and populating an instance of
 * LongSummaryStatistics that represents hypothetical time
 * durations measured in milliseconds.
 */
private static void demonstrateLongSummaryStatistics()
{
   // This is a series of longs that might represent durations
   // of times such as might be calculated by subtracting the
   // value returned by System.currentTimeMillis() earlier in
   // code from the value returned by System.currentTimeMillis()
   // called later in the code.
   LongSummaryStatistics timeDurations = new LongSummaryStatistics();
   timeDurations.accept(5067054);
   timeDurations.accept(7064544);
   timeDurations.accept(5454544);
   timeDurations.accept(4455667);
   timeDurations.accept(9894450);
   timeDurations.accept(5555654);
   out.println("Test Results Analysis:");
   out.println("\tTotal Number of Tests: " + timeDurations.getCount());
   out.println("\tAverage Time Duration: " + timeDurations.getAverage());
   out.println("\tTotal Test Time: " + timeDurations.getSum());
   out.println("\tShortest Test Time: " + timeDurations.getMin());
   out.println("\tLongest Test Time: " + timeDurations.getMax());
}

The output from this example is now shown:

Test Results Analysis:
 Total Number of Tests: 6
 Average Time Duration: 6248652.166666667
 Total Test Time: 37491913
 Shortest Test Time: 4455667
 Longest Test Time: 9894450

In most examples in this post, I relied on the SummaryStatistics classes' readable toString() implementations to demonstrate the statistics available in each class. This last example, however, demonstrated that each individual type of statistic (number of values, maximum value, minimum value, sum of values, and average value) can be retrieved individually in numeric form.

Conclusion

Whether the data being analyzed is directly provided as a numeric Stream, is provided indirectly via a collection's stream, or is manually placed in the appropriate SummaryStatistics class instance, the three SummaryStatistics classes can provide useful common statistical calculations on integers, longs, and doubles.

Friday, March 20, 2015

Displaying Paths in Ant

In the blog posts Java and Ant Properties Refresher and Ant <echoproperties /> Task, I wrote about how being able to see how properties are seen by an Ant build can be helpful in understanding that build better. It is often the case that it'd also be valuable to see various paths used in the build as the build sees them, especially if the paths are composed of other paths and pieces from other build files. Fortunately, as described in the StackOverflow thread Ant: how to echo class path variable to a file, this is easily done with Ant's PathConvert task.

The following XML snippet is a very simple Ant build file that demonstrates use of <pathconvert> to display an Ant path's contents via the normal mechanisms used to display Ant properties.

build-show-paths.xml: Ant build.xml Using pathconvert
<project name="ShowPaths" default="showPaths" basedir=".">

   <path id="classpath">
      <pathelement path="C:\groovy-2.4.0\lib"/>
      <pathelement location="C:\lib\tika-1.7\tika-app-1.7.jar"/>
   </path>
   
   <target name="showPaths">
      <pathconvert property="classpath.path" refid="classpath" />
      <echo message="classpath = ${classpath.path}" />
   </target>

</project>

The simple Ant build file example shown above creates an Ant path named "classpath". It then uses the pathconvert task to create a new property ("classpath.path") that holds the value held in the "classpath" path. With this done, the property "classpath.path" can have its value displayed using Ant's echo task as demonstrated in "Java and Ant Properties Refresher."

When debugging issues with Ant builds, use of Ant's -verbose is often handy. However, sometimes -verbose is a heavier solution than is actually required and often the simple ability to easily identify what properties and paths the Ant build "sees" can be very helpful in diagnosing build issues.

Thursday, March 19, 2015

Validating XML Against XSD(s) in Java

There are numerous tools available for validating an XML document against an XSD. These include operating system scripts and tools such as xmllint, XML editors and IDEs, and even online validators. I have found it useful to have my own easy-to-use XML validation tool because of limitations or issues of the previously mentioned approaches. Java makes it easy to write such a tool and this post demonstrates how easy it is to develop a simple XML validation tool in Java.

The Java tool developed in this post requires JDK 8. However, the simple Java application can be modified fairly easily to work with JDK 7 or even with a version of Java as old as JDK 5. In most cases, I have tried to comment the code that requires JDK 7 or JDK 8 to identify these dependencies and provide alternative approaches in earlier versions of Java. I have done this so that the tool can be adapted to work even in environments with older versions of Java.

The complete code listing for the Java-based XML validation tool discussed in this post is included at the end of the post. The most significant lines of code from that application when discussing validation of XML against one or more XSDs is shown next.

Essence of Validating XML Against XSD with Java
final Schema schema = schemaFactory.newSchema(xsdSources);
final Validator validator = schema.newValidator();
validator.validate(new StreamSource(new File(xmlFilePathAndName)));

The previous code listing shows the straightforward approach available in the standard JDK for validating XML against XSDs. An instance of javax.xml.validation.Schema is instantiated with a call to javax.xml.validation.SchemaFactory.newSchema(Source[]) (where the array of javax.xml.transform.Source objects represents one or more XSDs). An instance of javax.xml.validation.Validator is obtained from the Schema instance via Schema's newValidator() method. The XML to be validated can be passed to that Validator's validate(Source) method to perform the validation of the XML against the XSD or XSDs originally provided to the Schema object created with SchemaFactory.newSchema(Source[]).

The next code listing includes the code just highlighted but represents the entire method in which that code resides.

validateXmlAgainstXsds(String, String[])
/**
 * Validate provided XML against the provided XSD schema files.
 *
 * @param xmlFilePathAndName Path/name of XML file to be validated;
 *    should not be null or empty.
 * @param xsdFilesPathsAndNames XSDs against which to validate the XML;
 *    should not be null or empty.
 */
public static void validateXmlAgainstXsds(
   final String xmlFilePathAndName, final String[] xsdFilesPathsAndNames)
{
   if (xmlFilePathAndName == null || xmlFilePathAndName.isEmpty())
   {
      out.println("ERROR: Path/name of XML to be validated cannot be null.");
      return;
   }
   if (xsdFilesPathsAndNames == null || xsdFilesPathsAndNames.length < 1)
   {
      out.println("ERROR: At least one XSD must be provided to validate XML against.");
      return;
   }
   final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

   final StreamSource[] xsdSources = generateStreamSourcesFromXsdPathsJdk8(xsdFilesPathsAndNames);

   try
   {
      final Schema schema = schemaFactory.newSchema(xsdSources);
      final Validator validator = schema.newValidator();
      out.println(  "Validating " + xmlFilePathAndName + " against XSDs "
                  + Arrays.toString(xsdFilesPathsAndNames) + "...");
      validator.validate(new StreamSource(new File(xmlFilePathAndName)));
   }
   catch (IOException | SAXException exception)  // JDK 7 multi-exception catch
   {
      out.println(
           "ERROR: Unable to validate " + xmlFilePathAndName
         + " against XSDs " + Arrays.toString(xsdFilesPathsAndNames)
         + " - " + exception);
   }
   out.println("Validation process completed.");
}

The code listing for the validateXmlAgainstXsds(String, String[]) method shows how a SchemaFactory instance can be obtained with the specified type of schema (XMLConstants.W3C_XML_SCHEMA_NS_URI). This method also handles the various types of exceptions that might be thrown during the validation process. As the comment in the code states, the JDK 7 language change supporting catching of multiple exceptions in a single catch clause is used in this method but could be replaced with separate catch clauses or catching of a single more general exception for code bases earlier than JDK 7.

The method just shown calls a method called generateStreamSourcesFromXsdPathsJdk8(String[]) and the next listing is of that invoked method.

generateStreamSourcesFromXsdPathsJdk8(String[])
/**
 * Generates array of StreamSource instances representing XSDs
 * associated with the file paths/names provided and use JDK 8
 * Stream API.
 *
 * This method can be commented out if using a version of
 * Java prior to JDK 8.
 *
 * @param xsdFilesPaths String representations of paths/names
 *    of XSD files.
 * @return StreamSource instances representing XSDs.
 */
private static StreamSource[] generateStreamSourcesFromXsdPathsJdk8(
   final String[] xsdFilesPaths)
{
   return Arrays.stream(xsdFilesPaths)
                .map(StreamSource::new)
                .collect(Collectors.toList())
                .toArray(new StreamSource[xsdFilesPaths.length]);
}

The method just shown uses JDK 8 stream support to convert the array of Strings representing paths/names of XSD files to instances of StreamSource based on the contents of the XSDs pointed to by the path/name Strings. In the class's complete code listing, there is also a deprecated method generateStreamSourcesFromXsdPathsJdk7(final String[]) that could be used instead of this method for code bases on a version of Java earlier than JDK 8.

This single-class Java application is most useful when it's executed from the command line. To enable this, a main function is defined as shown in the next code listing.

Executable main(String[]) Function
/**
 * Validates provided XML against provided XSD.
 *
 * @param arguments XML file to be validated (first argument) and
 *    XSD against which it should be validated (second and later
 *    arguments).
 */
public static void main(final String[] arguments)
{
   if (arguments.length < 2)
   {
      out.println("\nUSAGE: java XmlValidator <xmlFile> <xsdFile1> ... <xsdFileN>\n");
      out.println("\tOrder of XSDs can be significant (place XSDs that are");
      out.println("\tdependent on other XSDs after those they depend on)");
      System.exit(-1);
   }
   // Arrays.copyOfRange requires JDK 6; see
   // http://stackoverflow.com/questions/7970486/porting-arrays-copyofrange-from-java-6-to-java-5
   // for additional details for versions of Java prior to JDK 6.
   final String[] schemas = Arrays.copyOfRange(arguments, 1, arguments.length);
   validateXmlAgainstXsds(arguments[0], schemas);
}

The executable main(String[]) function prints a usage statement if fewer than two command line arguments are passed to it because it expects at least the name/path of the XML file to be validated and the name/path of an XSD to validate the XML against.

The main function takes the first command line argument and treats that as the XML file's path/name and then treats all remaining command lin arguments as the paths/names of one or more XSDs.

The simple Java tool for validating XML against one or more XSDs has now been shown (complete code listing is at bottom of post). With it in place, we can run it against an example XML file and associated XSDs. For this demonstration, I'm using a very simple manifestation of a Servlet 2.5 web.xml deployment descriptor.

Sample Valid Servlet 2.5 web.xml
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
         version="2.5"> 

    <display-name>Sample Java Servlet 2.5 Web Application</display-name>
</web-app>

The simple web.xml file just shown is valid per the Servlet 2.5 XSDs and the output of running this simple Java-based XSD validation tool prove that by not reporting any validation errors.

An XSD-valid XML file does not lead to very interesting results with this tool. The next code listing shows an intentionally invalid web.xml file that has a "title" element not specified in the associated Servlet 2.5 XSD. The output with the most significant portions of the error message highlighted is shown after the code listing.

Sample Invalid Servlet 2.5 web.xml (web-invalid.xml)
<web-app xmlns="http://java.sun.com/xml/ns/javaee"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
         version="2.5">

    <display-name>Java Servlet 2.5 Web Application</display-name>
    <title>A handy example</title>
</web-app>

As the last output shows, things are more interesting in terms of output when the provided XML is not XSD valid.

There is one important caveat I wish to emphasize here. The XSDs provided to this Java-based tool sometimes need to be specified in a particular order. In particular, XSDs with "include" dependencies on other XSDs should be listed on the command line AFTER the XSD they include. In other words, XSDs with no "include" dependencies will generally be provided on the command line before those XSDs that include them.

The next code listing is for the complete XmlValidator class.

XmlValidator.java (Complete Class Listing)
package dustin.examples.xmlvalidation;

import org.xml.sax.SAXException;

import javax.xml.XMLConstants;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

import static java.lang.System.out;

/**
 * Validate provided XML against the provided XSDs.
 */
public class XmlValidator
{
   /**
    * Validate provided XML against the provided XSD schema files.
    *
    * @param xmlFilePathAndName Path/name of XML file to be validated;
    *    should not be null or empty.
    * @param xsdFilesPathsAndNames XSDs against which to validate the XML;
    *    should not be null or empty.
    */
   public static void validateXmlAgainstXsds(
      final String xmlFilePathAndName, final String[] xsdFilesPathsAndNames)
   {
      if (xmlFilePathAndName == null || xmlFilePathAndName.isEmpty())
      {
         out.println("ERROR: Path/name of XML to be validated cannot be null.");
         return;
      }
      if (xsdFilesPathsAndNames == null || xsdFilesPathsAndNames.length < 1)
      {
         out.println("ERROR: At least one XSD must be provided to validate XML against.");
         return;
      }
      final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);

      final StreamSource[] xsdSources = generateStreamSourcesFromXsdPathsJdk8(xsdFilesPathsAndNames);

      try
      {
         final Schema schema = schemaFactory.newSchema(xsdSources);
         final Validator validator = schema.newValidator();
         out.println("Validating " + xmlFilePathAndName + " against XSDs "
            + Arrays.toString(xsdFilesPathsAndNames) + "...");
         validator.validate(new StreamSource(new File(xmlFilePathAndName)));
      }
      catch (IOException | SAXException exception)  // JDK 7 multi-exception catch
      {
         out.println(
            "ERROR: Unable to validate " + xmlFilePathAndName
            + " against XSDs " + Arrays.toString(xsdFilesPathsAndNames)
            + " - " + exception);
      }
      out.println("Validation process completed.");
   }

   /**
    * Generates array of StreamSource instances representing XSDs
    * associated with the file paths/names provided and use JDK 8
    * Stream API.
    *
    * This method can be commented out if using a version of
    * Java prior to JDK 8.
    *
    * @param xsdFilesPaths String representations of paths/names
    *    of XSD files.
    * @return StreamSource instances representing XSDs.
    */
   private static StreamSource[] generateStreamSourcesFromXsdPathsJdk8(
      final String[] xsdFilesPaths)
   {
      return Arrays.stream(xsdFilesPaths)
                   .map(StreamSource::new)
                   .collect(Collectors.toList())
                   .toArray(new StreamSource[xsdFilesPaths.length]);
   }

   /**
    * Generates array of StreamSource instances representing XSDs
    * associated with the file paths/names provided and uses
    * pre-JDK 8 Java APIs.
    *
    * This method can be commented out (or better yet, removed
    * altogether) if using JDK 8 or later.
    *
    * @param xsdFilesPaths String representations of paths/names
    *    of XSD files.
    * @return StreamSource instances representing XSDs.
    * @deprecated Use generateStreamSourcesFromXsdPathsJdk8 instead
    *    when JDK 8 or later is available.
    */
   @Deprecated
   private static StreamSource[] generateStreamSourcesFromXsdPathsJdk7(
      final String[] xsdFilesPaths)
   {
      // Diamond operator used here requires JDK 7; add type of
      // StreamSource to generic specification of ArrayList for
      // JDK 5 or JDK 6
      final List<StreamSource> streamSources = new ArrayList<>();
      for (final String xsdPath : xsdFilesPaths)
      {
         streamSources.add(new StreamSource(xsdPath));
      }
      return streamSources.toArray(new StreamSource[xsdFilesPaths.length]);
   }

   /**
    * Validates provided XML against provided XSD.
    *
    * @param arguments XML file to be validated (first argument) and
    *    XSD against which it should be validated (second and later
    *    arguments).
    */
   public static void main(final String[] arguments)
   {
      if (arguments.length < 2)
      {
         out.println("\nUSAGE: java XmlValidator <xmlFile> <xsdFile1> ... <xsdFileN>\n");
         out.println("\tOrder of XSDs can be significant (place XSDs that are");
         out.println("\tdependent on other XSDs after those they depend on)");
         System.exit(-1);
      }
      // Arrays.copyOfRange requires JDK 6; see
      // http://stackoverflow.com/questions/7970486/porting-arrays-copyofrange-from-java-6-to-java-5
      // for additional details for versions of Java prior to JDK 6.
      final String[] schemas = Arrays.copyOfRange(arguments, 1, arguments.length);
      validateXmlAgainstXsds(arguments[0], schemas);
   }
}

Despite what the length of this post might initially suggest, using Java to validate XML against an XSD is fairly straightforward. The sample application shown and explained here attempts to demonstrate that and is a useful tool for simple command line validation of XML documents against specified XSDs. One could easily port this to Groovy to be even more script-friendly. As mentioned earlier, this simple tool requires JDK 8 as currently written but could be easily adapted to work on JDK 5, JDK 6, or JDK 7.

UPDATE (20 March 2015): I have pushed the Java class shown in this post (XmlValidator.java) onto the GitHub repository dustinmarx/xmlutilities.

Tuesday, March 17, 2015

Will Internet Explorer Be Missed?

It is with at least a small amount of nostalgic sadness that I have read about the end of GeoCities, Dr. Dobb's, Codehause, and Google Code. However, I cannot say that I'll miss Internet Explorer and I felt more relief than sorrow when I read today that Microsoft is killing off the Internet Explorer brand. It sounds like the browser in development under the name Project Spartan is designed to be the main web browser for Windows 10.

I have heard more than one user of Internet Explorer not-so-affectionately refer to it as "Internet Exploder" and some of this is deserved from both a user perspective and a web developer perspective. Although the developer experience and user experience associated with Internet Explorer over the years, I still prefer Chrome and Firefox over Internet Explorer both as a user and as a web developer.

It used to be far worse and I still have bad memories of developing web applications and dealing with Internet Explorer's spotty support for standardization of JavaScript, HTML, and CSS. Although most browser vendors have been and are guilty of picking and choosing how well they'll support each standard, Internet Explorer consistently seemed especially slow to adopt new standards. As a user, I was frustrated with the perceived slowness of Internet Explorer when compared to Firefox and then especially when compared to Chrome. I was also frustrated when those addictive tabs seemed to take a long time to make it into Internet Explorer.

Although I won't miss Internet Explorer, I think it's polite to speak well of the dying. In that vein, I'll finish this post by outlining some of Internet Explorer's positive contributions to web development and to the Internet experience.

  • Internet Explorer brought the web to the masses - My first experience with graphical web browsers was using Mosaic and then using Mozilla and Netscape. Many of my fellow college students and contemporary young software developers also used these browsers. However, Microsoft Internet Explorer was the first browser used by many people because it came pre-installed with their Microsoft-provided operating system. For many people who transitioned from a vendor-specific online experience such as provided by America Online (now AOL) at the time to a more general browser-based web experience, Internet Explorer was the browser that allowed this to happen.
  • XMLHttpRequest (Ajax) - Jesse James Garrett's article Ajax: A New Approach to Web Applications changed the way many of who were developing web applications thought about web development. As Garrett pointed out in a follow-up to that article, "XMLHttpRequest is only part of the Ajax equation," but it is the "technical component that makes the asynchronous server communication possible." For me, XMLHttpRequest was the biggest missing piece in making this type of asynchronous communication happen. The post A Brief History of Ajax points out that the XMLHttpRequest was pioneered by Microsoft in Internet Explorer 5. It has since been adopted by all major web browses as a standard.
  • A Starting Point - Related to the first bullet above (ubiquitous nature of Internet Explorer on Windows platforms), it has been pointed out that one advantage of Internet Explorer is that its automatically being available on Windows machines has made it easy for people to download alternative web browsers of their choosing. This is becoming less of an issue as more devices are not Windows platforms and have their own standard browser often provided.

Although I won't particularly miss Internet Explorer, there are those who will. The following are some examples of posts written by people or about people who believe that Internet Explorer has advantages over its competitors.

Goodbye, Internet Explorer!

Monday, March 16, 2015

The End of Google Code

In the 21 January 2014 post Google Code is dead, Evert Pot referenced the post A Change to Google Code Download Service and wrote that "It's been sort of obvious for a while that [Google] stopped caring about their code hosting." The title of Pot's post was borne out with the announcement this past week that Google is Bidding farewell to Google Code.

According to the post Bidding farewell to Google Code on the Google Open Source Blog, "To meet developers where they are, we ourselves migrated nearly a thousand of our own open source projects from Google Code to GitHub." That post also outlines the final days of Google Code. No new projects can be created (as of 12 May 2015) and the site will become read-only on 24 August 2015 with closure of the project hosting on 25 January 2016 (though tarballs will be available throughout 2016).

I recently posted on the fall of Codehaus and mentioned several useful projects that were (or are) hosted there. Google Code also saw its share of important and useful projects in its heyday. These include Google's Guava (now on GitHub), Mockito (now on GitHub), charts4j (now on GitHub), easyb, Google's Go programming language (now at https://golang.org/), Google's Google Web Toolkit (now at http://www.gwtproject.org/), and Google's Chromium(now at http://www.chromium.org/).

Coman Hamilton concludes his article Google Code is dead – but today is a good day for open source with the statement, "Rather than lament the loss of one significant member of the open-source hosting community, we should rejoice in the fact that there are so many other great open-source hosters, that not even Google can compete."

Friday, March 13, 2015

Excellent! Groovy Intends to Join Apache Software Foundation

In the post "Total Bummer: Pivotal Drops Groovy", I briefly wrote about Pivotal's decision to drop Groovy and hoped that Groovy would find a new home. I was pleased to read the announcement that the Groovy project intends to join the Apache Software Foundation. My experience is that some of the best maintained, best supported, and best documented open source projects are those with a corporate sponsor or those associated with the Apache Software Foundation. I have benefited tremendously from several Apache projects over the years including Ant, Struts, Apache HTTP Server, Apache Commons, Camel, Log4J, Lucene, Apache POI, Apache FOP, and Tomcat. The Apache Software Foundation also houses several other highly popular projects including Hadoop, HBase, Apache Cordova, MyFaces, and Solr.

Groovy already enjoys some associations with Apache projects. For example, Groovy bakes in Ant support and Commons CLI support (Groovy's CliBuilder). The Apache page listing projects grouped by programming languages includes a "Groovy" section that lists Apache Camel and Apache OFBiz.

According to Guillaume Laforge, there were discussions about Groovy's next home with several organizations including the Eclipse Foundation, the Software Freedom Conservancy, and the Apache Software Foundation. Matt Raible has provided follow-up to this post in a question-and-answer format with Laforge in the post Groovy Moving to Apache. Of particular interest to me is the expanding on the "gray areas" Laforge alluded to. These "gray areas" include differences and limitations associated with the Apache Software Foundation such as process, repository control, and potential corporate funding of an individual project.

One of the several advantages of using Apache projects is the liberal Apache 2 License. Groovy was already available under this license and obviously will continue to use that license as part of the Apache Software Foundation.

Like all projects introduced to the Apache Software Foundation, Groovy will begin in the Apache Incubator. Grails is not at this time slated for the Apache Software Foundation, though that could come in the future. C├ędric Champeau briefly mentions Groovy and Apache in his post Who is Groovy?

Monday, March 9, 2015

JDK 8 Streams and Grouping

I wrote about the powerful features of using JDK 8's Streams with Java collections in the post Stream-Powered Collections Functionality in JDK 8. I did not cover use of the groupingBy Collector reduction operation in that post and so address grouping in this post.

The examples in this post will demonstrate how to combine Collection-backed Streams with groupingBy Collectors to reorganize the underlying Collection's data in groups prescribed by a provided classification. These examples are based on the Movie class and Set of Movie classes described in my earlier post Stream-Powered Collections Functionality in JDK 8.

The next code listing demonstrates how a simple statement can be used to group the provided Set of Movies into a Map of movie ratings (key) to movies with that rating (value). The groupingBy Collector provides this Map as a map of key type (the MpaaRating in this case) to a List of the type of objects being grouped (Movie in this case).

/**
 * Demonstrate use of JDK 8 streams and Collectors.groupingBy to
 * group movies by their MPAA ratings.
 */
private static void demonstrateGroupingByRating()
{
   final Map<MpaaRating, List<Movie>> moviesByRating =
      movies.stream().collect(groupingBy(Movie::getMpaaRating));
   out.println("Movies grouped by MPAA Rating: " + moviesByRating);
}

In the example just shown (and in the examples that follow in this post), statically importing java.util.stream.Collectors.groupingBy allows me to NOT need to scope groupingBy calls with the Collectors class name. This simple code snippet groups the movies by their ratings with the returned Map mapping key of movie rating to Lists of movies associated with each rating. Here is an example of the output when the provided Movie set is the same as in my previously referenced post.

Movies grouped by MPAA Rating: {PG13=[Movie: Inception (2010), SCIENCE_FICTION, PG13, 13], R=[Movie: The Shawshank Redemption (1994), DRAMA, R, 1], PG=[Movie: Raiders of the Lost Ark (1981), ACTION, PG, 31, Movie: Back to the Future (1985), SCIENCE_FICTION, PG, 49, Movie: Star Wars: Episode V - The Empire Strikes Back (1980), SCIENCE_FICTION, PG, 12]}

A specific use of the capability just demonstrated is to generate a Map of unique keys to objects in a Collection to the object of that Collection with that key. This might be useful, for example, when needing to look up objects repeatedly and quickly via map but being provided with the objects of interest in a Set or List instead of a Map. Pretending for the moment that movies have unique titles (they only do for my small set), such functionality can be accomplished as shown in the next code listing.

/**
  * Demonstrate use of JDK 8 streams and Collectors.groupingBy to
  * group movies by their title.
  */
private static void demonstrateGroupingByTitle()
{
   final Map<String, List<Movie>> moviesByTitle =
      movies.stream().collect(groupingBy(Movie::getTitle));
   out.println("Movies grouped by title: " + moviesByTitle);
}

Assuming that title is unique for each movie in the original collection, the code above provides a map of movie title to single-element List containing only the movie for which that title is applicable. Any client wanting to quickly look up a movie by its title could call moviesByTitle.get(String).get(0) to get the full Movie object corresponding to that title. The output of doing this with my simple movie set is shown next.

Movies grouped by title: {The Shawshank Redemption=[Movie: The Shawshank Redemption (1994), DRAMA, R, 1], Star Wars: Episode V - The Empire Strikes Back=[Movie: Star Wars: Episode V - The Empire Strikes Back (1980), SCIENCE_FICTION, PG, 12], Back to the Future=[Movie: Back to the Future (1985), SCIENCE_FICTION, PG, 49], Raiders of the Lost Ark=[Movie: Raiders of the Lost Ark (1981), ACTION, PG, 31], Inception=[Movie: Inception (2010), SCIENCE_FICTION, PG13, 13]}

It is possible to group by two different characteristics. This allows for a Collection to be grouped by one characteristic and then have each of those groups sub-grouped by a second characteristic. For example, the following code listing groups movies by rating and then by genre.

/**
 * Demonstrate use of JDK 8 streams and cascaded groupingBy
 * to group movies by ratings and then by genres within ratings.
 */
private static void demonstrateGroupingByRatingAndGenre()
{
   final Map<MpaaRating, Map<Genre, List<Movie>>> moviesByRatingAndGenre =
      movies.stream().collect(groupingBy(Movie::getMpaaRating, groupingBy(Movie::getGenre)));
   out.println("Movies by rating and genre: " + moviesByRatingAndGenre);
}

The code listing just shown first groups the underlying movies by rating and then groups each movie with a particular group of ratings again, but this time by genre. In other words, we get double-level groups of movies by ratings and genre. Output on my simple set of movies is shown next.

Movies by rating and genre: {PG13={SCIENCE_FICTION=[Movie: Inception (2010), SCIENCE_FICTION, PG13, 13]}, R={DRAMA=[Movie: The Shawshank Redemption (1994), DRAMA, R, 1]}, PG={SCIENCE_FICTION=[Movie: Back to the Future (1985), SCIENCE_FICTION, PG, 49, Movie: Star Wars: Episode V - The Empire Strikes Back (1980), SCIENCE_FICTION, PG, 12], ACTION=[Movie: Raiders of the Lost Ark (1981), ACTION, PG, 31]}}

The groupingBy collector makes it easy to group elements of a List or Set into a map with the grouping characteristic as the key and the objects belonging to each group in a List associated with that grouping characteristic key. This allows one all the advantages of a Map, including use of some of the handy methods on Map that have been introduced with JDK 8.