Sudoku, Dancing Links, Backtracking

https://medium.com/javarevisited/building-a-sudoku-solver-in-java-with-dancing-links-180274b0b6c1

November 6, 2019November 6, 2019

JDK API evolvement

https://github.com/marchof/java-almanac

November 6, 2019November 6, 2019

Small Java Apps

http://august.nagro.us/small-java.html

October 17, 2019October 17, 2019

Azuls Medium-Term Supported JDK 13

https://www.azul.com/tall-grande-or-venti-support-for-your-jdk/#

October 17, 2019October 17, 2019

https://www.reddit.com/r/java/comments/di7plt/local_methods_coming_to_java/
public static int binarySearch(int[] a, int key) { return binarySearch0(a, 0, a.length, key); int binarySearch0(int[] a, int start, int stop, int key) { ....do the actual search calls binarySearch0(.....) recursively } }

September 23, 2019September 23, 2019

8 Essential Maven Plugins for Java Developers

https://blog.codota.com/8-essential-maven-plugins-beyond-the-core/

September 23, 2019September 23, 2019

Java 13

https://learnworthy.net/everything-you-need-to-know-about-java-13/

September 13, 2019September 13, 2019

Java Konferenzen 2020

https://marcus-biel.com/java-conferences-2020/

July 1, 2019

Java Testframeworks

https://www.theserverside.com/video/5-Java-test-frameworks-and-tools-JDK-developers-must-know

March 6, 2019

Writing polyglot Applications with GraalVM

http://www.graalvm.org/

March 6, 2019

Small Java apps with JLink

http://august.nagro.us/small-java.html

February 21, 2019February 21, 2019

Java Properties: Part 2

Load a properties file from current directory.

Properties properties = new Properties();
properties.load(new FileReader(new File(".").getCanonicalPath() + File.separator + "java.properties"));

Write a start script

#! /bin/bash
scriptdir="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
cd $scriptdir
java -jar MyExecutable.jar
cd -

Now you can start your configurable java program from any location, e.g. by linking the startscript to /usr/bin/ .

February 20, 2019

Java Properties: Part 1

Printing

printProperties(System.getProperties());

private static void printProperties(Properties properties) {
  new TreeSet<>(properties.keySet()).forEach((k) -> {
    System.out.println(k + " : " + properties.get(k));
  });
}

December 17, 2018December 17, 2018

Converting XML to JSON

You can use standard tools for it

1. Use the tool xjc from your jdk to generate Java classes from schema

Since Java 9 you must explicitly add JAXB as module with –add-modules java.se.ee See: How to resolve java.lang.NoClassDefFoundError: javax/xml/bind/JAXBException in Java 9

Since Java 11 you have to download xjc in an extra step from https://javaee.github.io/jaxb-v2/

2. Read in as XML write out as JSON using Jackson

Example

With https://schema.datacite.org/meta/kernel-4.1/metadata.xsd

1. Use the tool xjc from your jdk

In my example I will use a fairly complex example based on datacite schemas.

/path/to/jdk/bin/xjc -d /path/to/java/project \
-p stack24174963.datacite \
 https://schema.datacite.org/meta/kernel-4.1/metadata.xsd

This will reply with

parsing a schema...
compiling a schema...
stack24174963/datacite/Box.java
stack24174963/datacite/ContributorType.java
stack24174963/datacite/DateType.java
stack24174963/datacite/DescriptionType.java
stack24174963/datacite/FunderIdentifierType.java
stack24174963/datacite/NameType.java
stack24174963/datacite/ObjectFactory.java
stack24174963/datacite/Point.java
stack24174963/datacite/RelatedIdentifierType.java
stack24174963/datacite/RelationType.java
stack24174963/datacite/Resource.java
stack24174963/datacite/ResourceType.java
stack24174963/datacite/TitleType.java
stack24174963/datacite/package-info.java

If you look into Resource.Creator and Resource.Contributor you will see that the member variables givenName and familyName are not correctly typed. Change their type from Object to String, also apply your changes to the corresponding getter and setter methods!

2. Read in as XML write out as JSON using Jackson

import com.fasterxml.jackson.databind.ObjectMapper;
  import com.fasterxml.jackson.databind.SerializationFeature;

  import stack24174963.datacite.Resource;

  public class HowToXmlToJsonWithSchema {
    @Test
    public void readXmlAndConvertToSchema() throws Exception {
        String example = "schemas/datacite/kernel-4.1/example/datacite-example-complicated-v4.1.xml";
        try (InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream(example)) {
            Resource resource = JAXB.unmarshal(in, Resource.class);
            System.out.println(asJson(resource));
        }
    }

    private String asJson(Object obj) throws Exception {
        StringWriter w = new StringWriter();
        new ObjectMapper().configure(SerializationFeature.INDENT_OUTPUT, true).writeValue(w, obj);
        String result = w.toString();
        return result;
    }
  }

Prints:

{
  "identifier" : {
    "value" : "10.5072/testpub",
    "identifierType" : "DOI"
  },
  "creators" : {
    "creator" : [ {
      "creatorName" : {
        "value" : "Smith, John",
        "nameType" : "PERSONAL"
      },
      "givenName" : "John",
      "familyName" : "Smith",
      "nameIdentifier" : [ ],
      "affiliation" : [ ]
    }, {
      "creatorName" : {
        "value" : "つまらないものですが",
        "nameType" : null
      },
      "givenName" : null,
      "familyName" : null,
      "nameIdentifier" : [ {
        "value" : "0000000134596520",
        "nameIdentifierScheme" : "ISNI",
        "schemeURI" : "http://isni.org/isni/"
      } ],
      "affiliation" : [ ]
    } ]
  },
  "titles" : {
    "title" : [ {
      "value" : "Właściwości rzutowań podprzestrzeniowych",
      "titleType" : null,
      "lang" : "pl"
    }, {
      "value" : "Translation of Polish titles",
      "titleType" : "TRANSLATED_TITLE",
      "lang" : "en"
    } ]
  },
  "publisher" : "Springer",
  "publicationYear" : "2010",
  "resourceType" : {
    "value" : "Monograph",
    "resourceTypeGeneral" : "TEXT"
  },
  "subjects" : {
    "subject" : [ {
      "value" : "830 German & related literatures",
      "subjectScheme" : "DDC",
      "schemeURI" : null,
      "valueURI" : null,
      "lang" : "en"
    }, {
      "value" : "Polish Literature",
      "subjectScheme" : null,
      "schemeURI" : null,
      "valueURI" : null,
      "lang" : "en"
    } ]
  },
  "contributors" : {
    "contributor" : [ {
      "contributorName" : {
        "value" : "Doe, John",
        "nameType" : "PERSONAL"
      },
      "givenName" : "John",
      "familyName" : "Doe",
      "nameIdentifier" : [ {
        "value" : "0000-0001-5393-1421",
        "nameIdentifierScheme" : "ORCID",
        "schemeURI" : "http://orcid.org/"
      } ],
      "affiliation" : [ ],
      "contributorType" : "DATA_COLLECTOR"
    } ]
  },
  "dates" : null,
  "language" : "de",
  "alternateIdentifiers" : {
    "alternateIdentifier" : [ {
      "value" : "937-0-4523-12357-6",
      "alternateIdentifierType" : "ISBN"
    } ]
  },
  "relatedIdentifiers" : {
    "relatedIdentifier" : [ {
      "value" : "10.5272/oldertestpub",
      "resourceTypeGeneral" : null,
      "relatedIdentifierType" : "DOI",
      "relationType" : "IS_PART_OF",
      "relatedMetadataScheme" : null,
      "schemeURI" : null,
      "schemeType" : null
    } ]
  },
  "sizes" : {
    "size" : [ "256 pages" ]
  },
  "formats" : {
    "format" : [ "pdf" ]
  },
  "version" : "2",
  "rightsList" : {
    "rights" : [ {
      "value" : "Creative Commons Attribution-NoDerivs 2.0 Generic",
      "rightsURI" : "http://creativecommons.org/licenses/by-nd/2.0/",
      "lang" : null
    } ]
  },
  "descriptions" : {
    "description" : [ {
      "content" : [ "\n      Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea\n      takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores\n      et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.\n    " ],
      "descriptionType" : "ABSTRACT",
      "lang" : "la"
    } ]
  },
  "geoLocations" : null,
  "fundingReferences" : null
}

For XML input:

<?xml version="1.0" encoding="UTF-8"?>
  <resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd">
    <identifier identifierType="DOI">10.5072/testpub</identifier>
    <creators>
      <creator>
        <creatorName nameType="Personal">Smith, John</creatorName>
        <givenName>John</givenName>
        <familyName>Smith</familyName>
      </creator>
      <creator>
        <creatorName>つまらないものですが</creatorName>
        <nameIdentifier nameIdentifierScheme="ISNI" schemeURI="http://isni.org/isni/">0000000134596520</nameIdentifier>
      </creator>
    </creators>
    <titles>
      <title xml:lang="pl">Właściwości rzutowań podprzestrzeniowych</title>
      <title xml:lang="en" titleType="TranslatedTitle">Translation of Polish titles</title>
    </titles>
    <publisher>Springer</publisher>
    <publicationYear>2010</publicationYear>
    <subjects>
      <subject xml:lang="en" subjectScheme="DDC">830 German &amp; related literatures</subject>
      <subject xml:lang="en">Polish Literature</subject>
    </subjects>
    <contributors>
      <contributor contributorType="DataCollector">
        <contributorName nameType="Personal">Doe, John</contributorName>
        <givenName>John</givenName>
        <familyName>Doe</familyName>
        <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0001-5393-1421</nameIdentifier>
      </contributor>
    </contributors>
    <language>de</language>
    <resourceType resourceTypeGeneral="Text">Monograph</resourceType>
    <alternateIdentifiers>
      <alternateIdentifier alternateIdentifierType="ISBN">937-0-4523-12357-6</alternateIdentifier>
    </alternateIdentifiers>
    <relatedIdentifiers>
      <relatedIdentifier relatedIdentifierType="DOI" relationType="IsPartOf">10.5272/oldertestpub</relatedIdentifier>
    </relatedIdentifiers>
    <sizes>
      <size>256 pages</size>
    </sizes>
    <formats>
      <format>pdf</format>
    </formats>
    <version>2</version>
    <rightsList>
      <rights rightsURI="http://creativecommons.org/licenses/by-nd/2.0/">Creative Commons Attribution-NoDerivs 2.0 Generic</rights>
    </rightsList>
    <descriptions>
      <description xml:lang="la" descriptionType="Abstract">
        Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea
        takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores
        et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
      </description>
    </descriptions>
  </resource>

Process parts of XML with XPath in Java

This shows you how to

1. Read in an XML file to a DOM
2. Filter out a set of Nodes with XPath
3. Perform a certain action on each of the extracted Nodes.

We will call the code with the following statement

processFilteredXml(xmlIn, xpathExpr,(node) -> {/*Do something...*/;});

In our case we want to print some creatorNames from a book.xml using “//book/creators/creator/creatorName” as xpath to perform a printNode action on each Node that matches the XPath.

Full code

@Test
public void printXml() {
    try (InputStream in = readFile("book.xml")) {
        processFilteredXml(in, "//book/creators/creator/creatorName", (node) -> {
            printNode(node, System.out);
        });
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

private InputStream readFile(String yourSampleFile) {
    return Thread.currentThread().getContextClassLoader().getResourceAsStream(yourSampleFile);
}

private void processFilteredXml(InputStream in, String xpath, Consumer<Node> process) {
    Document doc = readXml(in);
    NodeList list = filterNodesByXPath(doc, xpath);
    for (int i = 0; i < list.getLength(); i++) {
        Node node = list.item(i);
        process.accept(node);
    }
}

public Document readXml(InputStream xmlin) {
    try {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        return db.parse(xmlin);
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

private NodeList filterNodesByXPath(Document doc, String xpathExpr) {
    try {
        XPathFactory xPathFactory = XPathFactory.newInstance();
        XPath xpath = xPathFactory.newXPath();
        XPathExpression expr = xpath.compile(xpathExpr);
        Object eval = expr.evaluate(doc, XPathConstants.NODESET);
        return (NodeList) eval;
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

private void printNode(Node node, PrintStream out) {
    try {
        Transformer transformer = TransformerFactory.newInstance().newTransformer();
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
        StreamResult result = new StreamResult(new StringWriter());
        DOMSource source = new DOMSource(node);
        transformer.transform(source, result);
        String xmlString = result.getWriter().toString();
        out.println(xmlString);
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Prints

<creatorName>Fosmire, Michael</creatorName>

<creatorName>Wertz, Ruth</creatorName>

<creatorName>Purzer, Senay</creatorName>

For book.xml

<book>
  <creators>
    <creator>
      <creatorName>Fosmire, Michael</creatorName>
      <givenName>Michael</givenName>
      <familyName>Fosmire</familyName>
    </creator>
    <creator>
      <creatorName>Wertz, Ruth</creatorName>
      <givenName>Ruth</givenName>
      <familyName>Wertz</familyName>
    </creator>
    <creator>
      <creatorName>Purzer, Senay</creatorName>
       <givenName>Senay</givenName>
       <familyName>Purzer</familyName>
    </creator>
  </creators>
  <titles>
    <title>Critical Engineering Literacy Test (CELT)</title>
  </titles>
</book>

https://github.com/jschnasse/overflow/tree/master/src/test/java/stack52720162

December 4, 2018

Managing Java SDKs

And here it is. A management tool for Java SDKs.

https://sdkman.io/

November 23, 2018November 23, 2018

Direct Accessing XML with Java

Motivation

Processing of huge XML files can become cumbersome if your hardware is limited.

“Parsing a sample 20 MB XML document^[1] containing Wikipedia document abstracts into a DOM tree using the Xerces library roughly consumes about 100 MB of RAM. Other document model implementations^[2] such as Saxon’s TinyTree are more memory efficient; parsing the same document in Saxon consumes about 50 MB of memory. These numbers will vary with document contents, but generally the required memory scales linearly with document size, and is typically a single-digit multiple of the file size on disk.”

Probst, Martin. “Processing Arbitrarily Large XML using a Persistent DOM.” 2010. https://www.balisage.net/Proceedings/vol5/html/Probst01/BalisageVol5-Probst01.html

A good way to deal with huge files is to split them into smaller ones. But sometimes you don’t have that option.

Here is where Random Access comes into play. While random access of binary files is well supported by standard Java tools, this is not true for higher-order text-based formats like XML.

The Plan

Find proper access points, by taking XML structure into account.
Translate character offsets to byte offsets (take encoding into account)

This sounds straightforward.

Existing Libraries

The StAX library offers streaming access to XML data without the need of loading a complete DOM model into memory. The library comes with an XMLStreamReader offering a method getLocation().getCharacterOffset() .

But unfortunately this will only return character offsets. In order to access the file with standard java readers we need byte offsets. UTF-8 uses variable lengths for encoding characters. This means that we have to reread the whole file from the beginning to calculate the byte offset from character offset. This seems not acceptable.

Solution

In the following I will introduce a solution, based on a generated XML parser using ANTLR4.

We will use the parser to walk through the XML file. While the parser is doing it’s work it will spit out byte offsets whenever a certain criteria is fulfilled (in the example we will search for XML-Elements with the name ‘page’).
I will use the byte offsets to access the XML file and to read portions of XML into a Java bean using JAXB.

The Following works very well on a ~17GB Wikipedia dump/20170501/dewiki-20170501-pages-articles-multistream.xml.bz2 . I still had to increase heap size using -xX6GB but compared to a DOM approach this looks much more acceptable.

1. Get XML Grammar

cd /tmp
git clone https://github.com/antlr/grammars-v4

2. Generate Parser

cd /tmp/grammars-v4/xml/
mvn clean install

3. Copy Generated Java files to your Project

cp -r target/generated-sources/antlr4 /path/to/your/project/gen

4. Hook in with a Listener to collect character offsets

package stack43366566;

import java.util.ArrayList;
import java.util.List;

import org.antlr.v4.runtime.ANTLRFileStream;
import org.antlr.v4.runtime.CommonTokenStream;
import org.antlr.v4.runtime.tree.ParseTreeWalker;

import stack43366566.gen.XMLLexer;
import stack43366566.gen.XMLParser;
import stack43366566.gen.XMLParser.DocumentContext;
import stack43366566.gen.XMLParserBaseListener;

public class FindXmlOffset {

    List<Integer> offsets = null;
    String searchForElement = null;

    public class MyXMLListener extends XMLParserBaseListener {
        public void enterElement(XMLParser.ElementContext ctx) {
            String name = ctx.Name().get(0).getText();
            if (searchForElement.equals(name)) {
                offsets.add(ctx.start.getStartIndex());
            }
        }
    }

    public List<Integer> createOffsets(String file, String elementName) {
        searchForElement = elementName;
        offsets = new ArrayList<>();
        try {
            XMLLexer lexer = new XMLLexer(new ANTLRFileStream(file));
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            XMLParser parser = new XMLParser(tokens);
            DocumentContext ctx = parser.document();
            ParseTreeWalker walker = new ParseTreeWalker();
            MyXMLListener listener = new MyXMLListener();
            walker.walk(listener, ctx);
            return offsets;
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

    public static void main(String[] arg) {
        System.out.println("Search for offsets.");
        List<Integer> offsets = new FindXmlOffset().createOffsets("/tmp/dewiki-20170501-pages-articles-multistream.xml",
                        "page");
        System.out.println("Offsets: " + offsets);
    }

}

5. Result

Prints:

Offsets: [2441, 10854, 30257, 51419 ….

6. Read from Offset Position

To test the code I’ve written class that reads in each wikipedia page to a java object

@JacksonXmlRootElement
class Page {
 public Page(){};
 public String title;
}

using basically this code

private Page readPage(Integer offset, String filename) {
        try (Reader in = new FileReader(filename)) {
            in.skip(offset);
            ObjectMapper mapper = new XmlMapper();
             mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
            Page object = mapper.readValue(in, Page.class);
            return object;
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

Download

Find complete example on github.

November 15, 2018November 15, 2018

URL encoding in Java – Explained

Recently I introduced a solution for URL encoding in Java .

public static String encode(String url) {
    try {
      URL u = new URL(url);
      URI uri = new URI(u.getProtocol(), 
                        u.getUserInfo(), 
                        IDN.toASCII(u.getHost()), 
                        u.getPort(), 
                        u.getPath(),
                        u.getQuery(), 
                        u.getRef());
      String correctEncodedURL = uri.toASCIIString();
      return correctEncodedURL;
    } catch (Exception e) {
      throw new RuntimeException(e);
    }
}

Now I like to introduce a set of URLs to test the code. Good test sets are provided at the ‘Web Platform Tests’ (wpt) repository. A comprehensible assembly of infos about the URL standard can be found at whatwg.org.

On basis of the ‘Web Platform Tests’ I created a file to hold test urls together with the expected outcome. The test set is provided in the following form:

{ 
  "in" : "http://你好你好.urltest.lookout.net/",
  "out" : "http://xn--6qqa088eba.urltest.lookout.net/" 
}, 
{ 
  "in" : "http://urltest.lookout.net/?q=\"asdf\"", 
  "out" : "http://urltest.lookout.net/?q=%22asdf%22" 
}

To test my URL encoding implementation I use the following code

try (InputStream in = Thread.currentThread().getContextClassLoader()
        .getResourceAsStream("url-succeding-tests.json")) {
  ObjectMapper mapper = new ObjectMapper();
  JsonNode testdata = mapper.readValue(in, JsonNode.class).at("/tests");
  for (JsonNode test : testdata) {
    String url = test.at("/in").asText();
    String expected = test.at("/out").asText();
    String encodedUrl = URLUtil.encode(url);
    org.junit.Assert.assertTrue(expected.equals(encodedUrl));
  }
} catch (Exception e) {
  throw new RuntimeException(e);
}

During my tests I also found some URLs that were not encoded correctly. I collected them in another JSON file.

Here are some failing examples:

{
  "in" : "http://www.example.com/##asdf",
  "out" : "http://www.example.com/##asdf"
},{
  "in" : "http://www.example.com/#a\nb\rc\td",
  "out" : "http://www.example.com/#abcd"
}, {
  "in" : "file:c:\\\\foo\\\\bar.html",
  "out" : "file:///C:/foo/bar.html"
}, {
  "in" : "  File:c|////foo\\\\bar.html",
  "out" : "file:///C:////foo/bar.html"
},{
  "in" : "http://look󠀠out.net/",
  "out" : "http://look%F3%A0%80%A0out.net/"
}, {
  "in" : "http://look־out.net/",
  "out" : "http://look%D6%BEout.net/"
},

Here is, how my encoding routine fails:

In:	http://www.example.com/##asdf
Expect:	http://www.example.com/##asdf
Actual:	http://www.example.com/#%23asdf

In:	http://www.example.com/#a
b
c	d
Expect:	http://www.example.com/#abcd
Actual:	http://www.example.com/#a%0Ab%0Dc%09d

In:	file:c:\\foo\\bar.html
Expect:	file:///C:/foo/bar.html
Actual:	ERROR

In:	  File:c|////foo\\bar.html
Expect:	file:///C:////foo/bar.html
Actual:	ERROR

java.net.URISyntaxException: Relative path in absolute URI: file://c:%5C%5Cfoo%5C%5Cbar.html
java.net.URISyntaxException: Relative path in absolute URI: file://c%7C////foo%5C%5Cbar.html
java.lang.IllegalArgumentException: java.text.ParseException: A prohibited code point was found in the inputlook󠀠out
In:	http://look󠀠out.net/
Expect:	http://look%F3%A0%80%A0out.net/
Actual:	ERROR

java.lang.IllegalArgumentException: java.text.ParseException: The input does not conform to the rules for BiDi code points.look־out
In:	http://look־out.net/
Expect:	http://look%D6%BEout.net/
Actual:	ERROR

Fazit

My URL encoding routine needs still some refinement. Especially cases of double encoding and the handling of URL fragments must be subjects of further improvement. However I’m already very happy with this standard Java solution. A more sophisticated approach can be found here: https://github.com/smola/galimatias and will also be subject of future tests.

Since this research is based on one of my stackoverflow answers, you can find the relevant code in my overflow repository.

November 9, 2018

URL encoding in Java.

Here is, how I encode URLs in Java.

Split URL into structural parts. Use java.net.URL for it.
Encode each part properly
Use IDN.toASCII(putDomainNameHere) to Punycode encode the host name!

Use java.net.URI.toASCIIString() to percent-encode, NFC encoded unicode – (better would be NFKC!). For more info see: How to encode properly this URL

URL url= new URL("http://search.barnesandnoble.com/booksearch/first book.pdf);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), IDN.toASCII(url.getHost()), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
String correctEncodedURL=uri.toASCIIString(); 
System.out.println(correctEncodedURL);

Prints

http://search.barnesandnoble.com/booksearch/first%20book.pdf

Category: java

Java in a Container