před 11 roky · fb60787990
--- a/doc/diagram/architecture.dot
+++ b/doc/diagram/architecture.dot
@@ -0,0 +1,50 @@
 
				+digraph {
			
 
				+	compound=true
			
 
				+	fontname="Inconsolata, Consolas"
			
 
				+	fontsize=10
			
 
				+	margin="0,0"
			
 
				+	ranksep=0.2
			
 
				+	nodesep=0.5
			
 
				+	penwidth=0.5
			
 
				+	colorscheme=spectral7
			
 
				+	
			
 
				+	node [shape=box, fontname="Inconsolata, Consolas", fontsize=10, penwidth=0.5, style=filled, fillcolor=white]
			
 
				+	edge [fontname="Inconsolata, Consolas", fontsize=10, penwidth=0.5]
			
 
				+
			
 
				+	subgraph cluster1 {
			
 
				+		margin="10,10"
			
 
				+		labeljust="left"
			
 
				+		label = "SAX"
			
 
				+		style=filled
			
 
				+		fillcolor=6
			
 
				+
			
 
				+		Reader -> Writer [style=invis]
			
 
				+	}
			
 
				+
			
 
				+	subgraph cluster2 {
			
 
				+		margin="10,10"
			
 
				+		labeljust="left"
			
 
				+		label = "DOM"
			
 
				+		style=filled
			
 
				+		fillcolor=7
			
 
				+
			
 
				+		Value
			
 
				+		Document
			
 
				+	}
			
 
				+
			
 
				+	Handler [label="<<concept>>\nHandler"]
			
 
				+
			
 
				+	{
			
 
				+		edge [arrowtail=onormal, dir=back]
			
 
				+		Value -> Document
			
 
				+		Handler -> Document
			
 
				+		Handler -> Writer
			
 
				+	}
			
 
				+
			
 
				+	{
			
 
				+		edge [arrowhead=vee, style=dashed, constraint=false]
			
 
				+		Reader -> Handler [label="calls"]
			
 
				+		Value -> Handler [label="calls"]
			
 
				+		Document -> Reader [label="uses"]
			
 
				+	}
			
 
				+}
			
--- a/doc/diagram/architecture.png
+++ b/doc/diagram/architecture.png
--- a/doc/diagram/utilityclass.dot
+++ b/doc/diagram/utilityclass.dot
@@ -0,0 +1,73 @@
 
				+digraph {
			
 
				+	rankdir=LR
			
 
				+	compound=true
			
 
				+	fontname="Inconsolata, Consolas"
			
 
				+	fontsize=10
			
 
				+	margin="0,0"
			
 
				+	ranksep=0.3
			
 
				+	nodesep=0.15
			
 
				+	penwidth=0.5
			
 
				+	colorscheme=spectral7
			
 
				+	
			
 
				+	node [shape=box, fontname="Inconsolata, Consolas", fontsize=10, penwidth=0.5, style=filled, fillcolor=white]
			
 
				+	edge [fontname="Inconsolata, Consolas", fontsize=10, penwidth=0.5]
			
 
				+
			
 
				+	subgraph cluster0 {
			
 
				+		style=filled
			
 
				+		fillcolor=4
			
 
				+
			
 
				+		Encoding [label="<<concept>>\nEncoding"]
			
 
				+
			
 
				+		edge [arrowtail=onormal, dir=back]
			
 
				+		Encoding -> { UTF8; UTF16; UTF32; ASCII; AutoUTF }
			
 
				+		UTF16 -> { UTF16LE; UTF16BE }
			
 
				+		UTF32 -> { UTF32LE; UTF32BE }
			
 
				+	}
			
 
				+
			
 
				+	subgraph cluster1 {
			
 
				+		style=filled
			
 
				+		fillcolor=5
			
 
				+
			
 
				+		Stream [label="<<concept>>\nStream"]
			
 
				+		InputByteStream [label="<<concept>>\nInputByteStream"]
			
 
				+		OutputByteStream [label="<<concept>>\nOutputByteStream"]
			
 
				+
			
 
				+		edge [arrowtail=onormal, dir=back]
			
 
				+		Stream -> { 
			
 
				+			StringStream; InsituStringStream; StringBuffer; 
			
 
				+			EncodedInputStream; EncodedOutputStream; 
			
 
				+			AutoUTFInputStream; AutoUTFOutputStream 
			
 
				+			InputByteStream; OutputByteStream
			
 
				+		}
			
 
				+
			
 
				+		InputByteStream ->	{ MemoryStream; FlieReadStream }
			
 
				+		OutputByteStream -> { MemoryBuffer; FileWriteStream } 
			
 
				+	}
			
 
				+
			
 
				+	subgraph cluster2 {
			
 
				+		style=filled
			
 
				+		fillcolor=3
			
 
				+
			
 
				+		Allocator [label="<<concept>>\nAllocator"]
			
 
				+
			
 
				+		edge [arrowtail=onormal, dir=back]
			
 
				+		Allocator -> { CrtAllocator; MemoryPoolAllocator }
			
 
				+	}
			
 
				+
			
 
				+	{
			
 
				+		edge [arrowtail=odiamond, arrowhead=vee, dir=both]
			
 
				+		EncodedInputStream -> InputByteStream
			
 
				+		EncodedOutputStream -> OutputByteStream
			
 
				+		AutoUTFInputStream -> InputByteStream
			
 
				+		AutoUTFOutputStream -> OutputByteStream
			
 
				+		MemoryPoolAllocator -> Allocator [label="base", tailport=s]
			
 
				+	}
			
 
				+
			
 
				+	{
			
 
				+		edge [arrowhead=vee, style=dashed]
			
 
				+		AutoUTFInputStream -> AutoUTF
			
 
				+		AutoUTFOutputStream -> AutoUTF
			
 
				+	}
			
 
				+
			
 
				+	//UTF32LE -> Stream [style=invis]
			
 
				+}
			
--- a/doc/diagram/utilityclass.png
+++ b/doc/diagram/utilityclass.png
--- a/doc/internals.md
+++ b/doc/internals.md
@@ -4,24 +4,234 @@ This section records some design and implementation details.
 
				 
			
 
				 [TOC]
			
 
				 
			
 
				+# Architecture {#Architecture}
			
 
				+
			
 
				+## SAX and DOM
			
 
				+
			
 
				+The basic relationships of SAX and DOM is shown in the following UML diagram.
			
 
				+
			
 
				+![Architecture UML class diagram](diagram/architecture.png)
			
 
				+
			
 
				+The core of the relationship is the `Handler` concept. From the SAX side, `Reader` parses a JSON from a stream and publish events to a `Handler`. `Writer` implements the `Handler` concept to handle the same set of events. From the DOM side, `Document` implements the `Handler` concept to build a DOM according to the events. `Value` supports a `Value::Accept(Handler&)` function, which traverses the DOM to publish events.
			
 
				+
			
 
				+With this design, SAX is not dependent on DOM. Even `Reader` and `Writer` have no dependencies between them. This provides flexibility to chain event publisher and handlers. Besides, `Value` does not depends on SAX as well. So, in addition to stringify a DOM to JSON, user may also stringify it to a XML writer, or do anything else.
			
 
				+
			
 
				+## Utility Classes
			
 
				+
			
 
				+Both SAX and DOM APIs depends on 3 additional concepts: `Allocator`, `Encoding` and `Stream`. Their inheritance hierarchy is shown as below.
			
 
				+
			
 
				+![Utility classes UML class diagram](diagram/utilityclass.png)
			
 
				+
			
 
				 # Value {#Value}
			
 
				 
			
 
				+`Value` (actually a typedef of `GenericValue<UTF8<>>`) is the core of DOM API. This section describes the design of it.
			
 
				+
			
 
				 ## Data Layout {#DataLayout}
			
 
				 
			
 
				+`Value` is a [variant type](http://en.wikipedia.org/wiki/Variant_type). In RapidJSON's context, an instance of `Value` can contain 1 of 6 JSON value types. This is possble by using `union`. Each `Value` contains two members: `union Data data_` and a`unsigned flags_`. The `flags_` indiciates the JSON type, and also additional information. 
			
 
				+
			
 
				+The following tables show the data layout of each type. The 32-bit/64-bit columns indicates the size of the field in bytes.
			
 
				+
			
 
				+| Null              |                                  |32-bit|64-bit|
			
 
				+|-------------------|----------------------------------|:----:|:----:|
			
 
				+| (unused)          |                                  |4     |8     | 
			
 
				+| (unused)          |                                  |4     |4     |
			
 
				+| (unused)          |                                  |4     |4     |
			
 
				+| `unsigned flags_` | `kNullType | kNullFlag`          |4     |4     |
			
 
				+
			
 
				+| Bool              |                                                    |32-bit|64-bit|
			
 
				+|-------------------|----------------------------------------------------|:----:|:----:|
			
 
				+| (unused)          |                                                    |4     |8     | 
			
 
				+| (unused)          |                                                    |4     |4     |
			
 
				+| (unused)          |                                                    |4     |4     |
			
 
				+| `unsigned flags_` | `kBoolType | `(either `kTrueFlag` or `kFalseFlag`) |4     |4     |
			
 
				+
			
 
				+| String              |                                     |32-bit|64-bit|
			
 
				+|---------------------|-------------------------------------|:----:|:----:|
			
 
				+| `Ch* str`           | Pointer to the string (may own)     |4     |8     | 
			
 
				+| `SizeType length`   | Length of string                    |4     |4     |
			
 
				+| (unused)            |                                     |4     |4     |
			
 
				+| `unsigned flags_`   | `kStringType | kStringFlag | ...`   |4     |4     |
			
 
				+
			
 
				+| Object              |                                     |32-bit|64-bit|
			
 
				+|---------------------|-------------------------------------|:----:|:----:|
			
 
				+| `Member* members`   | Pointer to array of members (owned) |4     |8     | 
			
 
				+| `SizeType size`     | Number of members                   |4     |4     |
			
 
				+| `SizeType capacity` | Capacity of members                 |4     |4     |
			
 
				+| `unsigned flags_`   | `kObjectType | kObjectFlag`         |4     |4     |
			
 
				+
			
 
				+| Array               |                                     |32-bit|64-bit|
			
 
				+|---------------------|-------------------------------------|:----:|:----:|
			
 
				+| `Value* values`     | Pointer to array of values (owned)  |4     |8     | 
			
 
				+| `SizeType size`     | Number of values                    |4     |4     |
			
 
				+| `SizeType capacity` | Capacity of values                  |4     |4     |
			
 
				+| `unsigned flags_`   | `kArrayType | kArrayFlag`           |4     |4     |
			
 
				+
			
 
				+| Number (Int)        |                                     |32-bit|64-bit|
			
 
				+|---------------------|-------------------------------------|:----:|:----:|
			
 
				+| `int i`             | 32-bit signed integer               |4     |4     | 
			
 
				+| (zero padding)      | 0                                   |4     |4     |
			
 
				+| (unused)            |                                     |4     |8     |
			
 
				+| `unsigned flags_`   | `kNumberType | kNumberFlag | kIntFlag | kInt64Flag | ...` |4     |4     |
			
 
				+
			
 
				+| Number (UInt)       |                                     |32-bit|64-bit|
			
 
				+|---------------------|-------------------------------------|:----:|:----:|
			
 
				+| `unsigned u`        | 32-bit unsigned integer             |4     |4     | 
			
 
				+| (zero padding)      | 0                                   |4     |4     |
			
 
				+| (unused)            |                                     |4     |8     |
			
 
				+| `unsigned flags_`   | `kNumberType | kNumberFlag | kUIntFlag | kUInt64Flag | ...` |4     |4     |
			
 
				+
			
 
				+| Number (Int64)      |                                     |32-bit|64-bit|
			
 
				+|---------------------|-------------------------------------|:----:|:----:|
			
 
				+| `int64_t i64`       | 64-bit signed integer               |8     |8     | 
			
 
				+| (unused)            |                                     |4     |8     |
			
 
				+| `unsigned flags_`   | `kNumberType | kNumberFlag | kInt64Flag | ...`          |4     |4     |
			
 
				+
			
 
				+| Number (Uint64)     |                                     |32-bit|64-bit|
			
 
				+|---------------------|-------------------------------------|:----:|:----:|
			
 
				+| `uint64_t i64`      | 64-bit unsigned integer             |8     |8     | 
			
 
				+| (unused)            |                                     |4     |8     |
			
 
				+| `unsigned flags_`   | `kNumberType | kNumberFlag | kInt64Flag | ...`          |4     |4     |
			
 
				+
			
 
				+| Number (Double)     |                                     |32-bit|64-bit|
			
 
				+|---------------------|-------------------------------------|:----:|:----:|
			
 
				+| `uint64_t i64`      | Double precision floating-point     |8     |8     | 
			
 
				+| (unused)            |                                     |4     |8     |
			
 
				+| `unsigned flags_`   | `kNumberType | kNumberFlag | kDoubleFlag`          |4     |4     |
			
 
				+
			
 
				+Here are some notes:
			
 
				+* To reduce memory consumption for 64-bit architecture, `SizeType` is typedef as `unsigned` instead of `size_t`.
			
 
				+* Zero padding for 32-bit number may be placed after or before the actual type, according to the endianess. This makes possible for interpreting a 32-bit integer as a 64-bit integer, without any conversion.
			
 
				+* An `Int` is always an `Int64`, but the converse is not always true.
			
 
				+
			
 
				 ## Flags {#Flags}
			
 
				 
			
 
				+The 32-bit `flags_` contains both JSON type and other additional information. As shown in the above tables, each JSON type contains redundant `kXXXType` and `kXXXFlag`. This design is for optimizing the operation of testing bit-flags (`IsNumber()`) and obtaining a sequental number for each type (`GetType()`).
			
 
				+
			
 
				+String has two optional flags. `kCopyFlag` means that the string owns a copy of the string. `kInlineStrFlag` means using [Short-String Optimizatoin](#ShortString).
			
 
				+
			
 
				+Number is a bit more complicated. For normal integer values, it can contains `kIntFlag`, `kUintFlag`,  `kInt64Flag` and/or `kUint64Flag`, according to the range of the integer. For numbers with fraction, and integers larger than 64-bit range, they will be stored as `double` with `kDoubleFlag`.
			
 
				+
			
 
				+## Short-String Optimization {#ShortString}
			
 
				+
			
 
				+ Kosta (@Kosta-Github) provided a very neat short-string optimization. The optimization idea is given as follow. Excluding the `flags_`, a `Value` has 12 or 16 bytes (32-bit or 64-bit) for storing actual data. Instead of storing a pointer to a string, it is possible to store short strings in these space internally. For encoding with 1-byte character type (e.g. `char`), it can store maxium 11 or 15 characters string inside the `Value` type.
			
 
				+
			
 
				+| ShortString (Ch=char) |                                    |32-bit|64-bit|
			
 
				+|---------------------|-------------------------------------|:----:|:----:|
			
 
				+| `Ch str[MaxChars]`  | String buffer                       |11    |15    | 
			
 
				+| `Ch invLength`      | MaxChars - Length                   |1     |1     |
			
 
				+| `unsigned flags_`   | `kStringType | kStringFlag | ...`   |4     |4     |
			
 
				+
			
 
				+A special technique is applied. Instead of storing the length of string directly, it stores (MaxChars - length). This make it possible to store 11 characters with trailing `\0`.
			
 
				+
			
 
				+This optimization can reduce memory usage for copy-string. It can also improve cache-coherence thus improve runtime performance.
			
 
				+
			
 
				 # Allocator {#Allocator}
			
 
				 
			
 
				+`Allocator` is a concept in RapidJSON:
			
 
				+~~~cpp
			
 
				+concept Allocator {
			
 
				+    static const bool kNeedFree;    //!< Whether this allocator needs to call Free().
			
 
				+
			
 
				+    // Allocate a memory block.
			
 
				+    // \param size of the memory block in bytes.
			
 
				+    // \returns pointer to the memory block.
			
 
				+    void* Malloc(size_t size);
			
 
				+
			
 
				+    // Resize a memory block.
			
 
				+    // \param originalPtr The pointer to current memory block. Null pointer is permitted.
			
 
				+    // \param originalSize The current size in bytes. (Design issue: since some allocator may not book-keep this, explicitly pass to it can save memory.)
			
 
				+    // \param newSize the new size in bytes.
			
 
				+    void* Realloc(void* originalPtr, size_t originalSize, size_t newSize);
			
 
				+
			
 
				+    // Free a memory block.
			
 
				+    // \param pointer to the memory block. Null pointer is permitted.
			
 
				+    static void Free(void *ptr);
			
 
				+};
			
 
				+~~~
			
 
				+
			
 
				+Note that `Malloc()` and `Realloc()` are member functions but `Free()` is static member function.
			
 
				+
			
 
				 ## MemoryPoolAllocator {#MemoryPoolAllocator}
			
 
				 
			
 
				+`MemoryPoolAllocator` is the default allocator for DOM. It allocate but do not free memory. This is suitable for building a DOM tree.
			
 
				+
			
 
				+Internally, it allocates chunks of memory from the base allocator (by default `CrtAllocator`) and stores the chunks as a singly linked list. When user requests an allocation, it allocates memory from the following order:
			
 
				+
			
 
				+1. User supplied buffer if it is available. (See [User Buffer section in DOM](dom.md))
			
 
				+2. If user supplied buffer is full, use the current memory chunk.
			
 
				+3. If the current block is full, allocate a new block of memory.
			
 
				+
			
 
				 # Parsing Optimization {#ParsingOptimization}
			
 
				 
			
 
				-## Skip Whitespace with SIMD {#SkipwhitespaceWithSIMD}
			
 
				+## Skip Whitespaces with SIMD {#SkipwhitespaceWithSIMD}
			
 
				+
			
 
				+When parsing JSON from a stream, the parser need to skip 4 whitespace characters:
			
 
				+
			
 
				+1. Space (`U+0020`)
			
 
				+2. Character Tabulation (`U+000B`)
			
 
				+3. Line Feed (`U+000A`)
			
 
				+4. Carriage Return (`U+000D`)
			
 
				+
			
 
				+A simple implementation will be simply:
			
 
				+~~~cpp
			
 
				+void SkipWhitespace(InputStream& s) {
			
 
				+    while (s.Peek() == ' ' || s.Peek() == '\n' || s.Peek() == '\r' || s.Peek() == '\t')
			
 
				+        s.Take();
			
 
				+}
			
 
				+~~~
			
 
				 
			
 
				-## Pow10() {#Pow10}
			
 
				+However, this requires 4 comparisons and a few branching for each character. This was found to be a hot spot.
			
 
				+
			
 
				+To accelerate this process, SIMD was applied to compare 16 characters with 4 white spaces for each iteration. Currently RapidJSON only supports SSE2 and SSE4.1 instructions for this. And it is only activated for UTF-8 memory streams, including string stream or *in situ* parsing. 
			
 
				 
			
 
				 ## Local Stream Copy {#LocalStreamCopy}
			
 
				 
			
 
				+During optimization, it is found that some compilers cannot localize some member data access of streams into local variables or registers. Experimental results show that for some stream types, making a copy of the stream and used it in inner-loop can improve performance. For example, the actual (non-SIMD) implementation of `SkipWhitespace()` is implemented as:
			
 
				+
			
 
				+~~~cpp
			
 
				+template<typename InputStream>
			
 
				+void SkipWhitespace(InputStream& is) {
			
 
				+    internal::StreamLocalCopy<InputStream> copy(is);
			
 
				+    InputStream& s(copy.s);
			
 
				+
			
 
				+    while (s.Peek() == ' ' || s.Peek() == '\n' || s.Peek() == '\r' || s.Peek() == '\t')
			
 
				+        s.Take();
			
 
				+}
			
 
				+~~~
			
 
				+
			
 
				+Depending on the traits of stream, `StreamLocalCopy` will make (or not make) a copy of the stream object, use it locally and copy the states of stream back to the original stream.
			
 
				+
			
 
				+## Parsing to Double {#ParsingDouble}
			
 
				+
			
 
				+Parsing string into `double` is difficult. The standard library function `strtod()` can do the job but it is slow. By default, the parsers use normal precision setting. This has has maximum 3 [ULP](http://en.wikipedia.org/wiki/Unit_in_the_last_place) error and implemented in `internal::StrtodNormalPrecision()`.
			
 
				+
			
 
				+When using `kParseFullPrecisionFlag`, the parsers calls `internal::StrtodFullPrecision()` instead, and this function actually implemented 3 versions of conversion methods.
			
 
				+1. [Fast-Path](http://www.exploringbinary.com/fast-path-decimal-to-floating-point-conversion/).
			
 
				+2. Custom DIY-FP implementation as in [double-conversion](https://github.com/floitsch/double-conversion).
			
 
				+3. Big Integer Method as in (Clinger, William D. How to read floating point numbers accurately. Vol. 25. No. 6. ACM, 1990).
			
 
				+
			
 
				+If the first conversion methods fail, it will try the second, and so on.
			
 
				+
			
 
				+# Generation Optimizatoin {#GenerationOptimization}
			
 
				+
			
 
				+## Integer-to-String conversion {#itoa}
			
 
				+
			
 
				+The naive algorithm for integer-to-string conversion involves division per each decimal digit. We have implemented various implementations and evaluated them in [itoa-benchmark](https://github.com/miloyip/itoa-benchmark).
			
 
				+
			
 
				+Although SSE2 version is the fastest but the difference is minor by comparing to the first running-up `branchlut`. And `branchlut` is pure C++ implementation so we adopt `branchlut` in RapidJSON.
			
 
				+
			
 
				+## Double-to-String conversion {#dtoa}
			
 
				+
			
 
				+Originally RapidJSON uses `snprintf(..., ..., "%g")`  to achieve double-to-string conversion. This is not accurate as the default precision is 6. Later we also find that this is slow and there is an alternative.
			
 
				+
			
 
				+Google's V8 [double-conversion](https://github.com/floitsch/double-conversion
			
 
				+) implemented a newer, fast algorithm called Grisu3 (Loitsch, Florian. "Printing floating-point numbers quickly and accurately with integers." ACM Sigplan Notices 45.6 (2010): 233-243.).
			
 
				+
			
 
				+However, since it is not header-only so that we implemented a header-only version of Grisu2. This algorithm guarantees that the result is always accurate. And in most of cases it produces the shortest (optimal) string representation.
			
 
				+
			
 
				+The header-only conversion function has been evaluated in [dtoa-benchmark](https://github.com/miloyip/dtoa-benchmark).
			
 
				+
			
 
				 # Parser {#Parser}
			
 
				 
			
 
				 ## Iterative Parser {#IterativeParser}