Aus QUIC wird HTTP/3
https://mailarchive.ietf.org/arch/msg/quic/RLRs4nB1lwFCZ_7k0iuz0ZBa35s
Aus QUIC wird HTTP/3
https://mailarchive.ietf.org/arch/msg/quic/RLRs4nB1lwFCZ_7k0iuz0ZBa35s
You can use standard tools for it
1. Use the tool xjc
from your jdk to generate Java classes from schema
Since Java 9
you must explicitly add JAXB as module with –add-modules java.se.ee
See: How to resolve java.lang.NoClassDefFoundError: javax/xml/bind/JAXBException in Java 9
Since Java 11 you have to download xjc
in an extra step from https://javaee.github.io/jaxb-v2/
2. Read in as XML
write out as JSON
using Jackson
Example
With https://schema.datacite.org/meta/kernel-4.1/metadata.xsd
1. Use the tool xjc
from your jdk
In my example I will use a fairly complex example based on datacite schemas.
/path/to/jdk/bin/xjc -d /path/to/java/project \ -p stack24174963.datacite \ https://schema.datacite.org/meta/kernel-4.1/metadata.xsd
This will reply with
parsing a schema... compiling a schema... stack24174963/datacite/Box.java stack24174963/datacite/ContributorType.java stack24174963/datacite/DateType.java stack24174963/datacite/DescriptionType.java stack24174963/datacite/FunderIdentifierType.java stack24174963/datacite/NameType.java stack24174963/datacite/ObjectFactory.java stack24174963/datacite/Point.java stack24174963/datacite/RelatedIdentifierType.java stack24174963/datacite/RelationType.java stack24174963/datacite/Resource.java stack24174963/datacite/ResourceType.java stack24174963/datacite/TitleType.java stack24174963/datacite/package-info.java
If you look into Resource.Creator and Resource.Contributor you will see that the member variables givenName and familyName are not correctly typed. Change their type from Object to String, also apply your changes to the corresponding getter and setter methods!
2. Read in as XML
write out as JSON
using Jackson
import com.fasterxml.jackson.databind.ObjectMapper; import com.fasterxml.jackson.databind.SerializationFeature; import stack24174963.datacite.Resource; public class HowToXmlToJsonWithSchema { @Test public void readXmlAndConvertToSchema() throws Exception { String example = "schemas/datacite/kernel-4.1/example/datacite-example-complicated-v4.1.xml"; try (InputStream in = Thread.currentThread().getContextClassLoader().getResourceAsStream(example)) { Resource resource = JAXB.unmarshal(in, Resource.class); System.out.println(asJson(resource)); } } private String asJson(Object obj) throws Exception { StringWriter w = new StringWriter(); new ObjectMapper().configure(SerializationFeature.INDENT_OUTPUT, true).writeValue(w, obj); String result = w.toString(); return result; } }
Prints:
{ "identifier" : { "value" : "10.5072/testpub", "identifierType" : "DOI" }, "creators" : { "creator" : [ { "creatorName" : { "value" : "Smith, John", "nameType" : "PERSONAL" }, "givenName" : "John", "familyName" : "Smith", "nameIdentifier" : [ ], "affiliation" : [ ] }, { "creatorName" : { "value" : "つまらないものですが", "nameType" : null }, "givenName" : null, "familyName" : null, "nameIdentifier" : [ { "value" : "0000000134596520", "nameIdentifierScheme" : "ISNI", "schemeURI" : "http://isni.org/isni/" } ], "affiliation" : [ ] } ] }, "titles" : { "title" : [ { "value" : "Właściwości rzutowań podprzestrzeniowych", "titleType" : null, "lang" : "pl" }, { "value" : "Translation of Polish titles", "titleType" : "TRANSLATED_TITLE", "lang" : "en" } ] }, "publisher" : "Springer", "publicationYear" : "2010", "resourceType" : { "value" : "Monograph", "resourceTypeGeneral" : "TEXT" }, "subjects" : { "subject" : [ { "value" : "830 German & related literatures", "subjectScheme" : "DDC", "schemeURI" : null, "valueURI" : null, "lang" : "en" }, { "value" : "Polish Literature", "subjectScheme" : null, "schemeURI" : null, "valueURI" : null, "lang" : "en" } ] }, "contributors" : { "contributor" : [ { "contributorName" : { "value" : "Doe, John", "nameType" : "PERSONAL" }, "givenName" : "John", "familyName" : "Doe", "nameIdentifier" : [ { "value" : "0000-0001-5393-1421", "nameIdentifierScheme" : "ORCID", "schemeURI" : "http://orcid.org/" } ], "affiliation" : [ ], "contributorType" : "DATA_COLLECTOR" } ] }, "dates" : null, "language" : "de", "alternateIdentifiers" : { "alternateIdentifier" : [ { "value" : "937-0-4523-12357-6", "alternateIdentifierType" : "ISBN" } ] }, "relatedIdentifiers" : { "relatedIdentifier" : [ { "value" : "10.5272/oldertestpub", "resourceTypeGeneral" : null, "relatedIdentifierType" : "DOI", "relationType" : "IS_PART_OF", "relatedMetadataScheme" : null, "schemeURI" : null, "schemeType" : null } ] }, "sizes" : { "size" : [ "256 pages" ] }, "formats" : { "format" : [ "pdf" ] }, "version" : "2", "rightsList" : { "rights" : [ { "value" : "Creative Commons Attribution-NoDerivs 2.0 Generic", "rightsURI" : "http://creativecommons.org/licenses/by-nd/2.0/", "lang" : null } ] }, "descriptions" : { "description" : [ { "content" : [ "\n Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea\n takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores\n et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.\n " ], "descriptionType" : "ABSTRACT", "lang" : "la" } ] }, "geoLocations" : null, "fundingReferences" : null }
For XML input:
<?xml version="1.0" encoding="UTF-8"?> <resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 http://schema.datacite.org/meta/kernel-4.1/metadata.xsd"> <identifier identifierType="DOI">10.5072/testpub</identifier> <creators> <creator> <creatorName nameType="Personal">Smith, John</creatorName> <givenName>John</givenName> <familyName>Smith</familyName> </creator> <creator> <creatorName>つまらないものですが</creatorName> <nameIdentifier nameIdentifierScheme="ISNI" schemeURI="http://isni.org/isni/">0000000134596520</nameIdentifier> </creator> </creators> <titles> <title xml:lang="pl">Właściwości rzutowań podprzestrzeniowych</title> <title xml:lang="en" titleType="TranslatedTitle">Translation of Polish titles</title> </titles> <publisher>Springer</publisher> <publicationYear>2010</publicationYear> <subjects> <subject xml:lang="en" subjectScheme="DDC">830 German & related literatures</subject> <subject xml:lang="en">Polish Literature</subject> </subjects> <contributors> <contributor contributorType="DataCollector"> <contributorName nameType="Personal">Doe, John</contributorName> <givenName>John</givenName> <familyName>Doe</familyName> <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0001-5393-1421</nameIdentifier> </contributor> </contributors> <language>de</language> <resourceType resourceTypeGeneral="Text">Monograph</resourceType> <alternateIdentifiers> <alternateIdentifier alternateIdentifierType="ISBN">937-0-4523-12357-6</alternateIdentifier> </alternateIdentifiers> <relatedIdentifiers> <relatedIdentifier relatedIdentifierType="DOI" relationType="IsPartOf">10.5272/oldertestpub</relatedIdentifier> </relatedIdentifiers> <sizes> <size>256 pages</size> </sizes> <formats> <format>pdf</format> </formats> <version>2</version> <rightsList> <rights rightsURI="http://creativecommons.org/licenses/by-nd/2.0/">Creative Commons Attribution-NoDerivs 2.0 Generic</rights> </rightsList> <descriptions> <description xml:lang="la" descriptionType="Abstract"> Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. </description> </descriptions> </resource>
See also:
https://stackoverflow.com/a/49977184/1485527
and
https://github.com/jschnasse/overflow/tree/master/src/test/java/stack24174963
This shows you how to
1. Read in an XML file to a DOM
2. Filter out a set of Nodes with XPath
3. Perform a certain action on each of the extracted Nodes.
We will call the code with the following statement
processFilteredXml(xmlIn, xpathExpr,(node) -> {/*Do something...*/;});
In our case we want to print some creatorNames from a book.xml using “//book/creators/creator/creatorName” as xpath to perform a printNode action on each Node that matches the XPath.
Full code
@Test public void printXml() { try (InputStream in = readFile("book.xml")) { processFilteredXml(in, "//book/creators/creator/creatorName", (node) -> { printNode(node, System.out); }); } catch (Exception e) { throw new RuntimeException(e); } } private InputStream readFile(String yourSampleFile) { return Thread.currentThread().getContextClassLoader().getResourceAsStream(yourSampleFile); } private void processFilteredXml(InputStream in, String xpath, Consumer<Node> process) { Document doc = readXml(in); NodeList list = filterNodesByXPath(doc, xpath); for (int i = 0; i < list.getLength(); i++) { Node node = list.item(i); process.accept(node); } } public Document readXml(InputStream xmlin) { try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); return db.parse(xmlin); } catch (Exception e) { throw new RuntimeException(e); } } private NodeList filterNodesByXPath(Document doc, String xpathExpr) { try { XPathFactory xPathFactory = XPathFactory.newInstance(); XPath xpath = xPathFactory.newXPath(); XPathExpression expr = xpath.compile(xpathExpr); Object eval = expr.evaluate(doc, XPathConstants.NODESET); return (NodeList) eval; } catch (Exception e) { throw new RuntimeException(e); } } private void printNode(Node node, PrintStream out) { try { Transformer transformer = TransformerFactory.newInstance().newTransformer(); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes"); transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2"); StreamResult result = new StreamResult(new StringWriter()); DOMSource source = new DOMSource(node); transformer.transform(source, result); String xmlString = result.getWriter().toString(); out.println(xmlString); } catch (Exception e) { throw new RuntimeException(e); } }
Prints
<creatorName>Fosmire, Michael</creatorName> <creatorName>Wertz, Ruth</creatorName> <creatorName>Purzer, Senay</creatorName>
For book.xml
<book> <creators> <creator> <creatorName>Fosmire, Michael</creatorName> <givenName>Michael</givenName> <familyName>Fosmire</familyName> </creator> <creator> <creatorName>Wertz, Ruth</creatorName> <givenName>Ruth</givenName> <familyName>Wertz</familyName> </creator> <creator> <creatorName>Purzer, Senay</creatorName> <givenName>Senay</givenName> <familyName>Purzer</familyName> </creator> </creators> <titles> <title>Critical Engineering Literacy Test (CELT)</title> </titles> </book>
See also
https://stackoverflow.com/a/52736526/1485527
https://github.com/jschnasse/overflow/tree/master/src/test/java/stack52720162
Seit dem 11. Dezember ist Friedrich Merz der neue Vorsitzende des Aufsichtsrates am Flughafen Köln Bonn. Damit bekommt der nächtliche Fluglärm für mich ein Gesicht. Gratulation an die Leute vom Casting – Topbesetzung!
Derweil:
Die Lärmschutzgemeinschaft Flughafen Köln/Bonn e.V. fordert mittlerweile nur noch ein Nachtflugverbot (0-5Uhr) für Passagiermaschinen.
http://www.fluglaerm-koeln-bonn.de/index.php/was-wird-aus-der-nachtflugregelung/
Obwohl die Seite nirgendwo verlinkt ist. Ein kurzer Blick auf die Referrer in Matomo zeigt an, dass die Seite ausschließlich per “Direct Entry” zugegriffen wurde. Mit einer Suche nach “jan schnasse” ist sie dennoch auffindbar.
Auch Suchmaschinen wurden von Matomo bislang nicht gesichtet. Wie kommt die Seite nur in den Google Index? Vielleicht über einen, z.B. von Google Analytics, ausgewerteten Http-Referer beim Klick auf einen der ausgehenden Links? Google Ressourcen, z.B. Fonts, lade ich meines Wissens nicht. Meine Vermutung ist, dass die Youtube-Links mich verraten haben.
And here it is. A management tool for Java SDKs.
Tilo Jung interviewt Staatssekretär Gatzer. Verstehe ich das richtig, und die “Schwarze Null” wurde als Leitlinie etabliert, damit die nächste Bankenkrise wieder in der bewährten Art abgehandelt werden kann?
Da ich gerade ein wenig durch den media.ccc channel pflüge. Hier zwei weitere Tipps:
Wie baut man eigentlich Raumschiffe?
https://www.youtube.com/watch?v=5qNHtdN07FM
Wie fliegt man eigentlich Raumschiffe?
https://www.youtube.com/watch?v=7cpdOR4nFRU
Doch Vorsicht. Da unter solchen Videos natürlich diverse Mystiker ihr Weltbild verbreiten, findet es Youtube logisch, einem anschließend die krudesten Titel in die Vorschlagsliste zu setzen. Also, einfach vorher “private browsing” aktivieren. Das Zurücksetzen von Youtube Cookies hilft übrigens nichts. Wenn man ohne Cookies kommt, bekommt man auch seltsame Titel in die Vorschlagsliste…. seufz.
Processing of huge XML files can become cumbersome if your hardware is limited.
“Parsing a sample 20 MB XML document[1] containing Wikipedia document abstracts into a DOM tree using the Xerces library roughly consumes about 100 MB of RAM. Other document model implementations[2] such as Saxon’s TinyTree are more memory efficient; parsing the same document in Saxon consumes about 50 MB of memory. These numbers will vary with document contents, but generally the required memory scales linearly with document size, and is typically a single-digit multiple of the file size on disk.”
Probst, Martin. “Processing Arbitrarily Large XML using a Persistent DOM.” 2010. https://www.balisage.net/Proceedings/vol5/html/Probst01/BalisageVol5-Probst01.html
A good way to deal with huge files is to split them into smaller ones. But sometimes you don’t have that option.
Here is where Random Access comes into play. While random access of binary files is well supported by standard Java tools, this is not true for higher-order text-based formats like XML.
This sounds straightforward.
The StAX library offers streaming access to XML data without the need of loading a complete DOM model into memory. The library comes with an XMLStreamReader offering a method getLocation().getCharacterOffset() .
But unfortunately this will only return character offsets. In order to access the file with standard java readers we need byte offsets. UTF-8 uses variable lengths for encoding characters. This means that we have to reread the whole file from the beginning to calculate the byte offset from character offset. This seems not acceptable.
In the following I will introduce a solution, based on a generated XML parser using ANTLR4.
The Following works very well on a ~17GB Wikipedia dump/20170501/dewiki-20170501-pages-articles-multistream.xml.bz2
. I still had to increase heap size using -xX6GB
but compared to a DOM approach this looks much more acceptable.
cd /tmp git clone https://github.com/antlr/grammars-v4
cd /tmp/grammars-v4/xml/ mvn clean install
cp -r target/generated-sources/antlr4 /path/to/your/project/gen
package stack43366566; import java.util.ArrayList; import java.util.List; import org.antlr.v4.runtime.ANTLRFileStream; import org.antlr.v4.runtime.CommonTokenStream; import org.antlr.v4.runtime.tree.ParseTreeWalker; import stack43366566.gen.XMLLexer; import stack43366566.gen.XMLParser; import stack43366566.gen.XMLParser.DocumentContext; import stack43366566.gen.XMLParserBaseListener; public class FindXmlOffset { List<Integer> offsets = null; String searchForElement = null; public class MyXMLListener extends XMLParserBaseListener { public void enterElement(XMLParser.ElementContext ctx) { String name = ctx.Name().get(0).getText(); if (searchForElement.equals(name)) { offsets.add(ctx.start.getStartIndex()); } } } public List<Integer> createOffsets(String file, String elementName) { searchForElement = elementName; offsets = new ArrayList<>(); try { XMLLexer lexer = new XMLLexer(new ANTLRFileStream(file)); CommonTokenStream tokens = new CommonTokenStream(lexer); XMLParser parser = new XMLParser(tokens); DocumentContext ctx = parser.document(); ParseTreeWalker walker = new ParseTreeWalker(); MyXMLListener listener = new MyXMLListener(); walker.walk(listener, ctx); return offsets; } catch (Exception e) { throw new RuntimeException(e); } } public static void main(String[] arg) { System.out.println("Search for offsets."); List<Integer> offsets = new FindXmlOffset().createOffsets("/tmp/dewiki-20170501-pages-articles-multistream.xml", "page"); System.out.println("Offsets: " + offsets); } }
Prints:
Offsets: [2441, 10854, 30257, 51419 ….
To test the code I’ve written class that reads in each wikipedia page to a java object
@JacksonXmlRootElement class Page { public Page(){}; public String title; }
using basically this code
private Page readPage(Integer offset, String filename) { try (Reader in = new FileReader(filename)) { in.skip(offset); ObjectMapper mapper = new XmlMapper(); mapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false); Page object = mapper.readValue(in, Page.class); return object; } catch (Exception e) { throw new RuntimeException(e); } }
Find complete example on github.
Filmrunde 18.11.2018
“Expect it to be worthless in the future, and it becomes worthless now.”
Read article by John Lewis, The seven deadly paradoxes of cryptocurrency.
Recently I introduced a solution for URL encoding in Java .
public static String encode(String url) { try { URL u = new URL(url); URI uri = new URI(u.getProtocol(), u.getUserInfo(), IDN.toASCII(u.getHost()), u.getPort(), u.getPath(), u.getQuery(), u.getRef()); String correctEncodedURL = uri.toASCIIString(); return correctEncodedURL; } catch (Exception e) { throw new RuntimeException(e); } }
Now I like to introduce a set of URLs to test the code. Good test sets are provided at the ‘Web Platform Tests’ (wpt) repository. A comprehensible assembly of infos about the URL standard can be found at whatwg.org.
On basis of the ‘Web Platform Tests’ I created a file to hold test urls together with the expected outcome. The test set is provided in the following form:
{ "in" : "http://你好你好.urltest.lookout.net/", "out" : "http://xn--6qqa088eba.urltest.lookout.net/" }, { "in" : "http://urltest.lookout.net/?q=\"asdf\"", "out" : "http://urltest.lookout.net/?q=%22asdf%22" }
To test my URL encoding implementation I use the following code
try (InputStream in = Thread.currentThread().getContextClassLoader() .getResourceAsStream("url-succeding-tests.json")) { ObjectMapper mapper = new ObjectMapper(); JsonNode testdata = mapper.readValue(in, JsonNode.class).at("/tests"); for (JsonNode test : testdata) { String url = test.at("/in").asText(); String expected = test.at("/out").asText(); String encodedUrl = URLUtil.encode(url); org.junit.Assert.assertTrue(expected.equals(encodedUrl)); } } catch (Exception e) { throw new RuntimeException(e); }
During my tests I also found some URLs that were not encoded correctly. I collected them in another JSON file.
Here are some failing examples:
{ "in" : "http://www.example.com/##asdf", "out" : "http://www.example.com/##asdf" },{ "in" : "http://www.example.com/#a\nb\rc\td", "out" : "http://www.example.com/#abcd" }, { "in" : "file:c:\\\\foo\\\\bar.html", "out" : "file:///C:/foo/bar.html" }, { "in" : " File:c|////foo\\\\bar.html", "out" : "file:///C:////foo/bar.html" },{ "in" : "http://lookout.net/", "out" : "http://look%F3%A0%80%A0out.net/" }, { "in" : "http://look־out.net/", "out" : "http://look%D6%BEout.net/" },
Here is, how my encoding routine fails:
In: http://www.example.com/##asdf Expect: http://www.example.com/##asdf Actual: http://www.example.com/#%23asdf In: http://www.example.com/#a b c d Expect: http://www.example.com/#abcd Actual: http://www.example.com/#a%0Ab%0Dc%09d In: file:c:\\foo\\bar.html Expect: file:///C:/foo/bar.html Actual: ERROR In: File:c|////foo\\bar.html Expect: file:///C:////foo/bar.html Actual: ERROR java.net.URISyntaxException: Relative path in absolute URI: file://c:%5C%5Cfoo%5C%5Cbar.html java.net.URISyntaxException: Relative path in absolute URI: file://c%7C////foo%5C%5Cbar.html java.lang.IllegalArgumentException: java.text.ParseException: A prohibited code point was found in the inputlookout In: http://lookout.net/ Expect: http://look%F3%A0%80%A0out.net/ Actual: ERROR java.lang.IllegalArgumentException: java.text.ParseException: The input does not conform to the rules for BiDi code points.look־out In: http://look־out.net/ Expect: http://look%D6%BEout.net/ Actual: ERROR
My URL encoding routine needs still some refinement. Especially cases of double encoding and the handling of URL fragments must be subjects of further improvement. However I’m already very happy with this standard Java solution. A more sophisticated approach can be found here: https://github.com/smola/galimatias and will also be subject of future tests.
Since this research is based on one of my stackoverflow answers, you can find the relevant code in my overflow repository.
Wenn der Wind aus Westen weht, gehen die Flieger im Minutentakt über den Süden Kölns – auch in der Nacht. Es gibt zwar auch lokale Initiativen, aber dieser Verein scheint die meisten Mitglieder auf sich zu versammeln:
… ist schon lange kein gewerkschaftliches Thema mehr. Jedenfalls wird eine Verkürzung der Wochenarbeitszeit auch in den nächsten Tarifverhandlungen keine Rolle spielen. Zu gering sind die Aussichten auf Erfolg.
Möglichkeiten sich zu dem Thema zu organisieren gibt es aber auch außerhalb der Gewerkschaften, z.B. über diese Attac-AG – AG ArbeitFairTeilen.
https://www.attac-netzwerk.de/ag-arbeitfairteilen/was-wir-wollen/
Albrecht von Lucke bei Jung und Naiv, hier ab 64min mal 10 Minuten reinhören, lohnt sich:
Interested in the current state of the art (in real world)? Read this:
https://news.ycombinator.com/item?id=18442637
Favorite comment so far:
“We have absolutely no idea how to write code. I always wonder if it’s like this for other branches of engineering too? I wonder if engineers who designed my elevator or airplane had “ok it’s very surprising that it’s working, let’s not touch this” moments. Or chemical engineers synthesize medicines in way nobody but a rockstar guru understands but everyone changes all the time. I wonder if my cellphone is made by machines designed in early 1990s because nobody was able to figure out what that one cog is doing.
Software is a mess. I’ve seen some freakishly smart people capable of solving very hard problems writing code that literally changes the world at this very moment. But the code itself is, well, a castle of shit. Why? Is it because our tools (programming languages, compilers etc) are still stone age technology? Is it because software is inherently a harder problem than say machines or chemical processes for the human brain? Is it because software engineers are less educated than other engineers? ”
New google service gives hints on how to improve your website.
It also offers learning resources under
Looks useful, though it is still beta …