[Valid Atom 1.0] This is a valid Atom 1.0 feed.


This feed is valid, but interoperability with the widest range of feed readers could be improved by implementing the following recommendations.


  1. <?xml version='1.0' encoding='utf-8'?>
  2. <feed xmlns=""><title>Henri Sivonen’s pages</title><link href=""></link><link href="" rel="self"></link><updated>2019-04-25T13:34:44Z</updated><subtitle>Articles and blogish notes</subtitle><rights>Copyright Henri Sivonen</rights><author><name>Henri Sivonen</name><email>[email protected]</email></author><id></id><entry><title type="xhtml"><div xmlns="">It’s Time to Stop Adding New Features for Non-Unicode Execution Encodings in C++</div></title><summary type="xhtml"><div xmlns="">I think the C++ standard should adopt the approach of “Unicode-only internally” for <i>new</i> text processing facilities and should not support non-Unicode execution encodings in newly-introduced features. This allows new features to have less abstraction obfuscation for Unicode usage, avoids digging legacy applications deeper into non-Unicode commitment, and avoids the specification and implementation effort of adapting new features to make sense for non-Unicode execution encodings.</div></summary><link href=""></link><id></id><updated>2019-04-25T13:30:03Z</updated><content type="xhtml"><div xmlns="">
  4. <p>Henri Sivonen, 2019-04-24</p>
  6. <p>Disclosure: I work for Mozilla, and my professional activity includes being the Gecko module owner for character encodings.</p>
  8. <p>Disclaimer: Even though this document links to code and documents written as part of my Mozilla actitivities, this document is written in personal capacity.</p>
  10. <h2 id="summary">Summary</h2>
  12. <p>Text processing facilities in the C++ standard library have been mostly agnostic of the actual character encoding of text. The few operations that are sensitive to the actual character encoding are defined to behave according to the implementation-defined “narrow execution encoding” (for buffers of <code>char</code>) and the implementation-defined “wide execution encoding” (for buffers of <code>wchar_t</code>).</p>
  14. <p>Meanwhile, over the last two decades, a different dominant design has arisen for text processing in other programming languages as well as in C and C++ usage <i>despite</i> what the C and C++ standard-library facilities provide: Representing text as Unicode, and <i>only</i> Unicode, <i>internally</i> in the application even if some other representation is required <i>externally</i> for backward compatibility.</p>
  16. <p>I think the C++ standard should adopt the approach of “Unicode-only internally” for <i>new</i> text processing facilities and should not support non-Unicode execution encodings in newly-introduced features. This allows new features to have less abstraction obfuscation for Unicode usage, avoids digging legacy applications deeper into non-Unicode commitment, and avoids the specification and implementation effort of adapting new features to make sense for non-Unicode execution encodings.</p>
  18. <p>Concretely, I suggest:</p>
  19. <ul>
  20. <li>In <i>new</i> features, do not support numbers other than Unicode scalar values as a numbering scheme for abstract characters, and design new APIs to be aware of Unicode scalar values as appropriate instead of allowing other numbering schemes. (I.e. make Unicode the only coded character set supported for new features.)</li>
  22. <li>Use <code>char32_t</code> directly as the concrete type for an <i>individual</i> Unicode scalar value without allowing for parametrization of the type that conceptually represents a Unicode scalar value. (For sequences of Unicode scalar values, UTF-8 is preferred.)</li>
  24.    <li>When introducing <i>new</i> text processing facilities (other than the next item on this list), support only UTF in-memory text representations: UTF-8 and, potentially, depending on feature, also UTF-16 or also UTF-16 and UTF-32. That is, do not seek to make <i>new</i> text processing features applicable to non-UTF execution encodings. (This document should not be taken as a request to add features for UTF-16 or UTF-32 beyond iteration over string views by scalar value. To avoid distraction from the main point, this document should also not be taken as advocating against providing any particular feature for UTF-16 or UTF-32.)</li>
  26.    <li>Non-UTF character encodings may be supported in a conversion API whose purpose is to convert from a legacy encoding into a UTF-only representation near the IO boundary or at the boundary between a legacy part (that relies on execution encoding) and a new part (that uses Unicode) of an application. Such APIs should be <code>std::span</code>-based instead of iterator-based.</li>
  28. <li>When an operation logically requires a valid sequence of Unicode scalar values, the API must either define the operation to fail upon encountering invalid UTF-8/16/32 or must replace each error with a U+FFFD REPLACEMENT CHARACTER as follows: What constitutes a single error in UTF-8 is <a href="">defined in the WHATWG Encoding Standard</a> (which matches the “best practice” from the Unicode Standard). In UTF-16, each unpaired surrogate is an error. In UTF-32, each code unit whose numeric value isn’t a valid Unicode scalar value is an error.</li>
  30. <li>Instead of standardizing Text_view as proposed, standardize a way to obtain a Unicode scalar value iterator from <code>std::u8string_view</code>, <code>std::u16string_view</code>, and <code>std::u32string_view</code>.</li>
  31. </ul>
  33. <h2 id="context">Context</h2>
  35. <p>This write-up is in response to (and in disagreement with) the <a href="">“Character Types” section</a> in the P0244R2 Text_view paper:</p>
  37. <blockquote>
  38. <p>This library defines a character class template parameterized by character
  39. set type used to represent character values.  The purpose of this class
  40. template is to make explicit the association of a code point value and a
  41. character set.</p>
  43. <p>It has been suggested that <code>char32_t</code> be supported as a character
  44. type that is implicitly associated with the Unicode character set and that
  45. values of this type always be interpreted as Unicode code point values.  This
  46. suggestion is intended to enable UTF-32 string literals to be directly usable
  47. as sequences of character values (in addition to being sequences of code unit
  48. and code point values).  This has a cost in that it prohibits use of the
  49. <code>char32_t</code> type as a code unit or code point type for other
  50. encodings.  Non-Unicode encodings, including the encodings used for ordinary
  51. and wide string literals, would still require a distinct character type (such
  52. as a specialization of the character class template) so that the correct
  53. character set can be inferred from objects of the character type.</p>
  55. <p>This suggestion raises concerns for the author.  To a certain degree, it can
  56. be accommodated by removing the current members of the character class template
  57. in favor of free functions and type trait templates.  However, it results in
  58. ambiguities when enumerating the elements of a UTF-32 string literal; are the
  59. elements code point or character values?  Well, the answer would be both (and
  60. code unit values as well).  This raises the potential for inadvertently
  61. writing (generic) code that confuses code points and characters, runs as
  62. expected for UTF-32 encodings, but fails to compile for other encodings.  The
  63. author would prefer to enforce correct code via the type system and is unaware
  64. of any particular benefits that the ability to treat UTF-32 string literals
  65. as sequences of character type would bring.</p>
  67. <p>It has also been suggested that <code>char32_t</code> might suffice as the
  68. only character type; that decoding of any encoded string include implicit
  69. transcoding to Unicode code points.  The author believes that this suggestion
  70. is not feasible for several reasons:</p>
  71. <ol>
  72.  <li>Some encodings use character sets that define characters such that round
  73.      trip transcoding to Unicode and back fails to preserve the original code
  74.      point value.  For example, Shift-JIS (Microsoft code page 932) defines
  75.      duplicate code points for the same character for compatibility with IBM
  76.      and NEC character set extensions.<br></br>
  77.      <a href="">
  78.</a> [sic; dead link]</li>
  79.  <li>Transcoding to Unicode for all non-Unicode encodings would carry
  80.      non-negligible performance costs and would pessimize platforms such as
  81.      IBM’s z/OS that use EBCIDC by default for the non-Unicode execution
  82.      character sets.</li>
  83. </ol>
  84. </blockquote>
  86. <p>To summarize, it raises three concerns:</p>
  87. <ol>
  88. <li>Ambiguity between code units and scalar values (the paper says “code points”, but I say “scalar values” to emphasize the exclusion of surrogates) in the UTF-32 case.</li>
  89. <li>Some encodings, particularly Microsoft code page 932, can represent one Unicode scalar value in more than one way, so the distinction of which way does not round-trip.</li>
  90. <li>Transcoding non-Unicode execution encodings has a performance cost that pessimizes particularly IBM z/OS.</li>
  91. </ol>
  93. <h2 id="terminology">Terminology and Background</h2>
  95. <p>(This section and the next section should not be taken as ’splaining to SG16 what they already know. The over-explaining is meant to make this document more coherent for a broader audience of readers who might be interested in C++ standardization without full familiarity with text processing terminology or background, or the details of Microsoft code page 932.)</p>
  97. <p>An <i>abstract character</i> is an atomic unit of text. Depending on writing system, the analysis of what constitutes an atomic unit may differ, but a given implementation on a computer has to identify some things as atomic units. Unicode’s opinion of what is an abstract character is the most widely applied opinion. In fact, Unicode itself has multiple opinions on this, and <a href="">Unicode Normalization Forms</a> bridge these multiple opinions.</p>
  99. <p>A <i>character set</i> is a set of abstract characters. In principle, a set of characters can be defined without assigning numbers to them.</p>
  101. <p>A <i>coded character set</i> assigns numbers, <i>code points</i>, to the items in the character set to each abstract character.</p>
  103. <p>When the Unicode code space was extended beyond the Basic Multilingual Plane, some code points were set aside for the UTF-16 surrogate mechanism and, therefore, do not represent abstract characters. A Unicode <i>scalar value</i> is a Unicode code point that is not a surrogate code point. For consistency with Unicode, I use the term scalar value below when referring to non-Unicode coded character sets, too.</p>
  105. <p>A <i>character encoding</i> is a way to represent a conceptual sequence of scalar values from one <i>or more</i> coded character sets as a concrete sequence of bytes. The bytes are called <i>code units</i>. Unicode defines in-memory <i>Unicode encoding forms</i> whose code unit is not a byte: UTF-16 and UTF-32. (For these Unicode encoding forms, there are corresponding <i>Unicode encoding schemes</i> that use byte code units and represent a non-byte code unit from a correspoding encoding form as multiple bytes and, therefore, could be used in byte-oriented IO even though UTF-8 is preferred for interchange. UTF-8, of course, uses byte code units as both a Unicode encoding form and as a Unicode encoding scheme.)</p>
  107. <p>Coded character sets that assign scalar values in the range 0...255 (decimal) can be considered to trivially imply a character encoding for themselves: You just store the scalar value as an unsigned byte value. (Often such coded character sets import US-ASCII as the lower half.)</p>
  109. <p>However, it is possible to define less obvious encodings even for character sets that only have up to 256 characters. IBM has <a href="">several</a> EBCDIC character encodings for the set of characters defined in ISO-8859-1. That is, compared to the trivial ISO-8859-1 encoding (the original, not the Web alias for windows-1252), these EBCDIC encodings permute the byte value assignments.</p>
  111. <p>Unicode is the universal coded character set that <i>by design</i> includes abstract characters from all notable legacy coded character sets such that character encodings for legacy coded character sets can be redefined to represent Unicode scalar values. Consider representing ż in the ISO-8859-2 encoding. When we treat the ISO-8859-2 encoding as an encoding for the Unicode coded character set (as opposed treating it as an encoding for the ISO-8859-2 coded character set), byte 0xBF decodes to Unicode scalar value U+017C (and not as scalar value 0xBF).</p>
  113. <p>A <i>compatibility character</i> is a character that according to Unicode principles should not be a distinct abstract character but that Unicode nonetheless codes as a distinct abstract character because some legacy coded character set treated it as distinct.</p>
  115. <h2 id="cp932">The Microsoft Code Page 932 Issue</h2>
  117. <p>Usually in C++ a “character type” refers to a code unit type, but the Text_view paper uses the term “character type” to refer to a Unicode scalar value when the encoding is a Unicode encoding form. The paper implies that an analogous non-Unicode type exists for Microsoft code page 932 (Microsoft’s version of Shift_JIS), but does one really exist?</p>
  119. <p>Microsoft code page 932 takes the 8-bit encoding of JIS X 0201 coded character set, whose upper half is half-width katakana and lower half is ASCII-based, and replaces the lower half with actual US-ASCII (moving the difference between US-ASCII and the lower half of 8-bit-encoded JIS X 0201 into a font problem!). It then takes the JIS X 0208 coded character set and represents it with two-byte sequences (for the lead byte making use of the unassigned range of JIS X 0201). JIS X 0208 code points aren’t really one-dimensional scalars, but instead two-dimensional row and column numbers in a 94 by 94 grid. (See the first 94 rows of <a href="">the visualization</a> supplied with the Encoding Standard; avoid opening the link on RAM-limited device!) Shift_JIS / Microsoft code page 932 does not put these two numbers into bytes directly, but conceptually arranges each two rows of 94 columns into one row of a 188 columns and then transforms these new row and column numbers into bytes with some offsetting.</p>
  121. <p>While the JIS X 0208 grid is rearranged into 47 rows of a 188-column grid, the full 188-column grid has 60 rows. The last 13 rows are used for IBM extensions and for private use. The private use area maps to the (start of the) Unicode Private Use Area. (See a <a href="">visualization of the rearranged grid</a> with the private use part showing up as unassigned; again avoid opening the link on a RAM-limited device.)</p>
  123. <p>The extension part is where the concern that the Text_view paper seeks to address comes in. NEC and IBM came up with some characters that they felt JIS X 0208 needed to be extended with. NEC’s own extensions go onto row 13 (in one-based numbering) of the 94 by 94 JIS X 0208 grid (unallocated in JIS X 0208 proper), so that extension can safely be treated as if it had always been part of JIS X 0208 itself. The IBM extension, however, goes onto the last 3 rows of the 60-row Shift_JIS grid, i.e. outside the space that the JIS X 0208 94 by 94 grid maps to. However, US-ASCII, the half-width katakana part of JIS X 0201, and JIS X 0208 are also encoded, in a different way, by EUC-JP. EUC-JP can only encode the 94 by 94 grid of JIS X 0208. To make the IBM extensions fit into the 94 by 94 grid, NEC relocated the IBM extensions within the 94 by 94 grid in space that the JIS X 0208 standard left unallocated.</p>
  125. <p>When considering IBM Shift_JIS and NEC EUC-JP (without later JIS X 0213 extension), both encode the same set of characters, but in a different way. Furthermore, both can round-trip via Unicode. Unicode principles analyze some of the IBM extension kanji as duplicates of kanji that were already in the original JIS X 0208. However, to enable round-tripping (which was thought worthwhile to achieve at the time), Unicode treats the IBM duplicates as compatibility characters. (Round-tripping is lost, of course, if the text decoded into Unicode is normalized such that compatibility characters are replaced with their canonical equivalents before re-encoding.)</p>
  127. <p>This brings us to the issue that the Text_view paper treats as significant: Since Shift_JIS can represent the whole 94 by 94 JIS X 0208 grid and NEC put the IBM extension there, a naïve conversion from EUC-JP to Shift_JIS can fail to relocate the IBM extension characters to the end of the Shift_JIS code space and can put them in the position where they land if the 94 by 94 grid is simply transformed as the first 47 rows of the 188-column-wide Shift_JIS grid. When <i>decoding</i> to Unicode, Microsoft code page 932 supports both locations for the IBM extensions, but when <i>encoding</i> from Unicode, it has to pick one way of doing things, and it picks the end of the Shift_JIS code space.</p>
  129. <p>That is, Unicode does not assign another set of compatibility characters to Microsoft code page 932’s duplication of the IBM extensions, so despite NEC EUC-JP and IBM Shift_JIS being round-trippable via Unicode, Microsoft code page 932, i.e. Microsoft Shift_JIS, is not. This makes sense considering that there is no analysis that claims the IBM and NEC instances of the IBM extensions as semantically different: They clearly have provenance that indicates that the duplication isn’t an attempt to make a distinction in meaning. The Text_view paper takes the position that C++ should round-trip the NEC instance of the IBM extensions in Microsoft code page 932 as distinct from the IBM instance of the IBM extensions even though Microsoft’s own implementation does not. In fact, the whole point of the Text_view paper mentioning Microsoft code page 932 is to give an example of a legacy encoding that doesn’t round-trip via Unicode, despite Unicode generally having been designed to round-trip legacy encodings, and to opine that it ought to round-trip in C++.</p>
  131. <p>So:</p>
  133. <ul>
  134. <li>The Text_view paper wants there to exist a non-transcoding-based, non-Unicode analog for what for UTF-8 would be a Unicode scalar value but for Microsoft code page 932 instead.</li>
  135. <li>The standards that Microsoft code page 932 has been built on do not give us such a scalar.
  136. <ul>
  137. <li>Even if the private use space and the extensions are considered to occupy a consistent grid with the JIS X 0208 characters, the US-ASCII plus JIS X 0201 part is not placed on the same grid.</li>
  138. <li>The canonical way of referring to JIS X 0208 independently of bytes isn’t a reference by one-dimensional scalar but a reference by two (one-based) numbers identifying a cell on the 94 by 94 grid.</li>
  139. </ul>
  140. </li>
  141. <li>The Text_view paper wants the scalar to be defined such that a distinction between the IBM instance of the IBM extensions and the NEC instance of the IBM extensions is maintained even though Microsoft, the originator of the code page, does not treat these two instances as meaningfully distinct.</li>
  142. </ul>
  144. <h2 id="inferring">Inferring a Coded Character Set from an Encoding</h2>
  146. <p>(This section is based on the constraints imposed by Text_view paper instead of being based on what the <a href="">reference implementation</a> does for Microsoft code page 932. From code inspection, it appears that support for multi-byte narrow execution encodings is unimplemented, and when trying to verify this experimentally, I timed out trying to get it running due to an internal compiler error when trying to build with a newer GCC and a GCC compilation error when trying to build the known-good GCC revision.)
  148. </p><p>While the standards don’t provide a scalar value definition for Microsoft code page 932, it’s easy to make one up based on tradition: Traditionally, the two-byte characters in CJK legacy encodings have been referred to by interpreting the two bytes as 16-bit big-endian unsigned number presented as hexadecimal (and single-byte characters as a 8-bit unsigned number).</p>
  150. <p>As an example, let’s consider 猪 (which Wiktionary translates as wild boar). Its canonical Unicode scalar value is U+732A. That’s what the JIS X 0208 instance decodes to when decoding Microsoft code page 932 into Unicode. The compatibility character for the IBM kanji purpose is U+FA16. That’s what both the IBM instance of the IBM extension and the NEC instance of the IBM extension decode to when decoding Microsoft code page 932 into Unicode. (For reasons unknown to me, Unicode couples U+FA16 with the IBM kanji compatibility purpose and assigns <i>another</i> compatibility character, U+FAA0, for compatibility with North Korean KPS 10721-2000 standard, which is irrelevant to Microsoft code page 932. Note that not all IBM kanji have corresponding DPRK compatibility characters, so we couldn’t repurpose the DPRK compatibility characters for distinguishing the IBM and NEC instances of the IBM extensions even if we wanted to.)</p>
  152. <p>When interpreting the Microsoft code page 932 bytes as a big-endian integer, the JIS X 0208 instance of 猪 would be 0x9296, the IBM instance would be 0xFB5E, and the NEC instance would be 0xEE42. To highlight how these “scalars” are coupled with the encoding instead of the standard character sets that the encodings originally encode, in EUC-JP the JIS X 0208 instance would be 0xC3F6 and the NEC instance would be 0xFBA3. Also, for illustration, if the same rule was applied to UTF-8, the scalar would be 0xE78CAA instead of U+732A. Clearly, we don’t want the scalars to be different between UTF-8, UTF-16, and UTF-32, so it is at least theoretically unsatisfactory for Microsoft code page 932 and EUC-JP to get different scalars for what are clearly the same characters in the underlying character sets.</p>
  154. <p>It would be possible to do something else that’d give the same scalar values for Shift_JIS and EUC-JP without a lookup table. We could number the characters on the two-dimensional grid starting with 256 for the top left cell to reserve the scalars 0…255 for the JIS X 0201 part. It’s worth noting, though, that this approach wouldn’t work well for Korean and Simplified Chinese encodings that take inspiration from the 94 by 94 structure of JIS X 0208. KS X 1001 and GB2312 also define a 94 by 94 grid like JIS X 0208. However, while Microsoft code page 932 extends the grid down, so a consecutive numbering would just add greater numbers to the end, Microsoft code pages 949 and 936 extend the KS X 1001 and GB2312 grids above and to the left, which means that a consecutive numbering of the extended grid would be totally different from the consecutive numbering of the unextended grid. On the other hand, interpreting each byte pair as a big-endian 16-bit integer would yield the same values in the extended and unextended Korean and Simplified Chinese cases. (See visualizations for <a href="">949</a> and <a href="">936</a>; again avoid opening on a RAM-limited device. Search for “U+3000” to locate the top left corner of the original 94 by 94 grid.)</p>
  156. <h2 id="ebcdic">What About EBCDIC?</h2>
  158. <p>Text_view wants to avoid transcoding overhead on z/OS, but z/OS has multiple character encodings for the ISO-8859-1 character set. It seems conceptually bogus for all these to have different scalar values for the same character set. However, for all of them to have the same scalar values, a lookup table-based permutation would be needed. If that table permuted to the ISO-8859-1 order, it would be the same as the Unicode order, at which point the scalar values might as well be Unicode scalar values, which Text_view wanted to avoid on z/OS citing performance concerns. (Of course, z/OS also has EBCDIC encodings whose character set is not ISO-8859-1.)</p>
  160. <h2 id="gb18030">What About GB18030?</h2>
  162. <p>The whole point of GB18030 is that it encodes Unicode scalar values in a way that makes the encoding byte-compatible with GBK (Microsoft code page 936) and GB2312. This operation is inherently lookup table-dependent. Inventing a scalar definition for GB18030 that achieved the Text_view goal of avoiding lookup tables would break the design goal of GB18030 that it encodes all Unicode scalar values. (In the Web Platform, due to legacy reasons, <a href="">all but one scalar value and representing one scalar value twice</a>.)</p>
  164. <h2 id="wrong">What’s Wrong with This?</h2>
  166. <p>Let’s evaluate the above in the light of P1238R0, the <a href=""><i>SG16: Unicode Direction</i></a> paper.</p>
  168. <p>The reason why Text_view tries to fit Unicode-motivated operations onto legacy encodings is that, as noted by “1.1 Constraint: The ordinary and wide execution encodings are implementation defined”, non-UTF execution encodings <i>exist</i>. This is, obviously, true. However, I disagree with the conclusion of making new features apply to these pre-existing execution encodings. I think there is <i>no obligation</i> to adapt <i>new features</i> to make sense for non-UTF execution encodings. It should be sufficient to keep existing legacy code running, i.e. not removing existing features should be sufficient. On the topic of <code>wchar_t</code> the Unicode Direction paper, says “1.4. Constraint: wchar_t is a portability deadend”. I think <code>char</code> with non-UTF-8 execution encoding should also be declared as a deadend whereas the Unicode Direction paper merely notes “1.3. Constraint: There is no portable primary execution encoding”. Making new features work with deadend foundation lures applications deeper into deadends, which is bad.</p>
  170. <p>While inferring scalar values for an encoding by interpreting the encoded bytes for each character as a big-endian integer (thereby effectively inferring a, potentially non-standard, coded character set from an encoding) might be argued to be traditional enough to fit “2.1. Guideline: Avoid excessive inventiveness; look for existing practice”, it is a bad fit for “1.6. Constraint: Implementors cannot afford to rewrite ICU”. If there is concern about implementors not having the bandwidth to implement text processing features from scratch and, therefore, should be prepared to delegate to ICU, it makes no sense make implementations or the C++ standard come up with non-Unicode numberings for abstract characters, since such numberings aren’t supported by ICU and necessarily would require writing new code for anachronistic non-Unicode schemes.</p>
  172. <p>Aside: Maybe analyzing the approach of using byte sequences interpreted as big-endian numbers looks like attacking a straw man and there could be some other non-Unicode numbering instead, such as the consecutive numbering outlined above. Any alternative non-Unicode numbering would still fail “1.6. Constraint: Implementors cannot afford to rewrite ICU” and would <i>also</i> fail “2.1. Guideline: Avoid excessive inventiveness; look for existing practice”.</p>
  174. <p>Furthermore, I think the Text_view paper’s aspiration of distinguishing between the IBM and NEC instances of the IBM extensions in Microsoft code page 932 fails “2.1. Guideline: Avoid excessive inventiveness; look for existing practice”, because it effectively amounts to inventing additional compatibility characters that aren’t recognized as distinct by Unicode or the originator of the code page (Microsoft).</p>
  176. <p>Moreover, iterating over a buffer of text by scalar value is a relatively simple operation when considering the range of operations that make sense to offer for Unicode text but that may not obviously fit non-UTF execution encodings. For example, in the light of “4.2. Directive: Standardize generic interfaces for Unicode algorithms” it would be reasonable and expected to provide operations for performing Unicode Normalization on strings. What does it mean to normalize a string to Unicode Normalization Form D under the ISO-8859-1 execution encoding? What does it mean to apply <i>any</i> Unicode Normalization Form under the windows-1258 execution encoding, which represents Vietnamese in a way that doesn’t match any Unicode Normalization Form? If the answer just is to make these no-ops for non-UTF encodings, would that be the right answer for GB18030? Coming up with answers other than just saying that new text processing operations shouldn’t try to fit non-UTF encodings at all would very quickly violate the guideline to “Avoid excessive inventiveness”.</p>
  178. <p>Looking at other programming languages in the light of “2.1. Guideline: Avoid excessive inventiveness; look for existing practice” provides the way forward. Notable other languages have settled on not supporting coded character sets other than Unicode. That is, only the Unicode way of assigning scalar values to abstract characters is supported. Interoperability with legacy character <i>encodings</i> is achieved by decoding into Unicode upon input and, if non-UTF-8 output is truly required for interoperability, by encoding into legacy encoding upon output. The Unicode Direction paper already acknowledges this dominant design in “4.4. Directive: Improve support for transcoding at program boundaries”. I think C++ should consider the boundary between non-UTF-8 <code>char</code> and non-UTF-16/32 <code>wchar_t</code> on one hand and Unicode (preferably represented as UTF-8) on the other hand as a similar transcoding boundary between legacy code and new code such that new text processing features (other than the encoding conversion feature itself!) are provided on the <code>char8_t</code>/<code>char16_t</code>/<code>char32_t</code> side but not on the non-UTF execution encoding side. That is, while the Text_view paper says “Transcoding to Unicode for all non-Unicode encodings would carry non-negligible performance costs and would pessimize platforms such as IBM’s z/OS that use EBCIDC [sic] by default for the non-Unicode execution character sets.”, I think it’s more appropriate to impose such a cost at the boundary of legacy and future parts of z/OS programs than to contaminate all new text processing APIs with the question “What does this operation even mean for non-UTF encodings generally and EBCDIC encodings specifically?”. (In the case of Windows, the system already works in UTF-16 internally, so all narrow execution encodings already involve transcoding at the system interface boundary. In that context, it seems inappropriate to pretend that the legacy narrow execution encodings on Windows were somehow free of transcoding cost to begin with.)</p>
  180. <p>To avoid a distraction from my main point, I’m explicitly not opining <i>in this document</i> on whether new text processing features should be available for sequences of <code>char</code> when the narrow execution encoding is UTF-8, for sequences of <code>wchar_t</code> when <code>sizeof(wchar_t)</code> is 2 and the wide execution encoding is UTF-16, or for sequences of <code>wchar_t</code> when <code>sizeof(wchar_t)</code> is 4 and the wide execution encoding is UTF-32.</p>
  182. <h2 id="type">The Type for a Unicode Scalar Value Should Be <code>char32_t</code></h2>
  184. <p>The conclusion of the previous section is that new C++ facilities should not support number assignments to abstract characters other than Unicode, i.e. should not support coded character sets (either standardized or inferred from an encoding) other than Unicode. The conclusion makes it unnecessary to abstract type-wise over Unicode scalar values and some other kinds of scalar values. It just leaves the question of what the concrete type for a Unicode scalar value should be.</p>
  186. <p>The Text_view paper says:</p>
  187. <blockquote><p>“It has been suggested that <code>char32_t</code> be supported as a character
  188. type that is implicitly associated with the Unicode character set and that
  189. values of this type always be interpreted as Unicode code point values.  This
  190. suggestion is intended to enable UTF-32 string literals to be directly usable
  191. as sequences of character values (in addition to being sequences of code unit
  192. and code point values).  This has a cost in that it prohibits use of the
  193. <code>char32_t</code> type as a code unit or code point type for other
  194. encodings.</p></blockquote>
  196. <p>I disagree with this and am firmly in the camp that <code>char32_t</code> should be the type for a Unicode scalar value.</p>
  198. <p>The sentence “This has a cost in that it prohibits use of the <code>char32_t</code> type as a code unit or code point type for other encodings.” is particularly alarming. Seeking to use <code>char32_t</code> as a code unit type for encodings other than UTF-32 would dilute the meaning of <code>char32_t</code> into another <code>wchar_t</code> mess. (I’m happy to see that P1041R4 “Make char16_t/char32_t string literals be UTF-16/32” was voted into C++20.)</p>
  200. <p>As for the appropriateness of using the same type both for a UTF-32 code unit and a Unicode scalar value, the <i>whole point</i> of UTF-32 is that its code unit value is directly the Unicode scalar value. That is what UTF-32 is all about, and UTF-32 has nothing else to offer: The value space that UTF-32 can represent is more compactly represented by UTF-8 and UTF-16 both of which are more commonly needed for interoperation with existing interfaces. When having the code units be directly the scalar values is UTF-32’s whole point, it would be unhelpful to distinguish type-wise between UTF-32 code units and Unicode scalar values. (Also, considering that buffers of UTF-32 are rarely useful but iterators yielding Unicode scalar values make sense, it would be sad to make the iterators have a complicated type.)</p>
  202. <p>To provide interfaces that are generic across <code>std::u8string_view</code>, <code>std::u16string_view</code>, and <code>std::u32string_view</code> (and, thereby, strings for which these views can be taken), all of these should have a way to obtain a scalar value iterator that yields <code>char32_t</code> values. To make sure such iterators really yield only Unicode scalar values in an interoperable way, the iterator should yield U+FFFD upon error. What constitutes a single error in UTF-8 is defined in the WHATWG Encoding Standard (matches the “best practice” from the Unicode Standard). In UTF-16, each unpaired surrogate is an error. In UTF-32, each code unit whose numeric value isn’t a valid Unicode scalar value is an error. (The last sentence might be taken as admission that UTF-32 code units and scalar values are not the same after all. It is not. It is merely an acknowledgement that C++ does not statically prevent programs that could erroneously put an invalid value into a buffer that is supposed to be UTF-32.)</p>
  204. <p>In general, new APIs should be defined to handle invalid UTF-8/16/32 either according to the replacement behavior described in the previous paragraph or by stopping and signaling error on the first error. In particular, the replacement behavior should not be left as implementation-defined, considering that differences in the replacement behavior between V8 and Blink lead to a <a href="">bug</a>. (See <a href="">another write-up on this topic</a>.)</p>
  206. <h2 id="transcoding">Transcoding Should Be <code>std::span</code>-Based Instead of Iterator-Based</h2>
  208. <p>Since the above contemplates a conversion facility between legacy encodings and Unicode encoding forms, it seems on-topic to briefly opine on what such an API should look like. The Text_view paper says:</p>
  210. <blockquote>
  211. <p>Transcoding between encodings that use the same character set is currently
  212. possible.  The following example transcodes a UTF-8 string to UTF-16.
  214. </p><blockquote class="code">
  215. <pre><code>
  216. std::string in = get_a_utf8_string();
  217. std::u16string out;
  218. std::back_insert_iterator&lt;std::u16string&gt; out_it{out};
  219. auto tv_in = make_text_view&lt;utf8_encoding&gt;(in);
  220. auto tv_out = make_otext_iterator&lt;utf16_encoding&gt;(out_it);
  221. std::copy(tv_in.begin(), tv_in.end(), tv_out);
  222. </code></pre>
  223. </blockquote>
  225. <p>Transcoding between encodings that use different character sets is not
  226. currently supported due to lack of interfaces to transcode a code point
  227. from one character set to the code point of a different one.
  229. </p><p>Additionally, naively transcoding between encodings using std::copy()
  230. works, but is not optimal; techniques are known to accelerate transcoding
  231. between some sets of encoding.  For example, SIMD instructions can be
  232. utilized in some cases to transcode multiple code points in parallel.
  234. </p><p>Future work is intended to enable optimized transcoding and transcoding
  235. between distinct character sets.
  237. </p></blockquote>
  239. <p>I agree with the assessment that iterator and <code>std::copy()</code>-based transcoding is not optimal due to SIMD considerations. To enable the use of SIMD, the input and output should be <code>std::span</code>s, which, unlike iterators, allow the converter to look at more than one element of the <code>std::span</code> at a time. I have designed and implemented such an <a href="">API for C++</a>, and I invite SG16 to adopt its general API design. I have a written a document that covers the <a href="">API design problems</a> that I sought to address and <a href="">design of the API</a> (in Rust but directly applicable to C++). (Please don’t be distracted by the implementation internals being Rust instead of C++. The API design is still valid for C++ even if the design constraint of the implementation internals being behind C linkage is removed. Also, please don’t be distracted by the API predating <code>char8_t</code>.)
  241. </p><h2 id="implications">Implications for Text_view</h2>
  243. <p>Above I’ve opined that only UTF-8, UTF-16, and UTF-32 (as Unicode encoding forms—not as Unicode encoding schemes!) should be supported for iteration by scalar value and that legacy encodings should be addressed by a conversion facility. Therefore, I think that Text_view should not be standardized as proposed. Instead, I think <code>std::u8string_view</code>, <code>std::u16string_view</code>, and <code>std::u32string_view</code> should gain a way to obtain a Unicode scalar value iterator (that yields values of type <code>char32_t</code>), and a <code>std::span</code>-based encoding conversion API should be provided as a distinct feature (as opposed to trying to connect Unicode scalar value iterators with <code>std::copy()</code>).</p>
  245. </div></content></entry><entry><title type="xhtml"><div xmlns="">encoding_rs</div></title><summary type="xhtml"><div xmlns="">A Web-Compatible Character Encoding Library in Rust. (Used in Firefox.)</div></summary><link href=""></link><id></id><updated>2019-04-01T17:28:19Z</updated></entry><entry><title type="xhtml"><div xmlns="">Rust 2019</div></title><summary type="xhtml"><div xmlns="">The Rust team <a href="">encouraged</a> people to write blog posts reflecting on Rust in 2018 and proposing goals and directions for 2019. Here’s mine.</div></summary><link href=""></link><id></id><updated>2018-12-15T19:36:20Z</updated></entry><entry><title type="xhtml"><div xmlns="">encoding_rs: a Web-Compatible Character Encoding Library in Rust</div></title><summary type="xhtml"><div xmlns=""><a href="">encoding_rs</a> is a high-decode-performance, low-legacy-encode-footprint and high-correctness implementation of the WHATWG <a href="">Encoding Standard</a> written in Rust.</div></summary><link href=""></link><id></id><updated>2018-12-05T11:22:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">How I Wrote a Modern C++ Library in Rust</div></title><summary type="xhtml"><div xmlns="">Patterns that I used to make <a href="">encoding_rs</a> appear as a modern C++ library to C++ code.</div></summary><link href=""></link><id></id><updated>2018-12-05T11:20:42Z</updated></entry><entry><title type="xhtml"><div xmlns="">Using cargo-fuzz to Transfer Code Review of Simple Safe Code to Complex Code that Uses <code>unsafe</code></div></title><summary type="xhtml"><div xmlns=""><a href="rust2018/">#Rust2018</a></div></summary><link href=""></link><id></id><updated>2018-12-03T09:42:34Z</updated></entry><entry><title type="xhtml"><div xmlns="">A Rust Crate that
  246. Also Quacks Like
  247. a Modern C++ Library</div></title><summary type="xhtml"><div xmlns="">My RustFest Paris 2018 talk. (Slides about pointers in zero-length slices have been edited after RustFest to avoid spreading out-of-date information.) <a href="">Video is available</a>.</div></summary><link href=""></link><id></id><updated>2018-06-04T10:28:16Z</updated></entry><entry><title type="xhtml"><div xmlns="">#Rust2018</div></title><summary type="xhtml"><div xmlns="">The Rust team <a href="">encouraged</a> people to write blog posts reflecting on Rust in 2017 and proposing goals and directions for 2018. Here’s mine.</div></summary><link href=""></link><id></id><updated>2018-01-11T18:44:27Z</updated></entry><entry><title type="xhtml"><div xmlns="">No Namespaces in JSON, Please</div></title><summary type="xhtml"><div xmlns="">I think that experience from Namespaces in XML should lead to the conclusion not to repeat the same (or almost same) thing with JSON. I think the developer community as a whole should not pay the cost of the use cases of the part of the developer community that believes (out of the scope of this post if rightly or wrongly) that identifiers in data formats should fit into a global naming scheme and, more specifically, that naming scheme should make every identifier into a URI. Instead,  I think that the part of the developer community that believes that it needs to be able merge data thanks to identifiers being URIs should bear the cost of doing whatever name mangling it needs to do upon data ingest given the information of which format a given ingested piece of JSON was in.</div></summary><link href=""></link><id></id><updated>2017-05-31T07:14:40Z</updated></entry><entry><title type="xhtml"><div xmlns="">A Lecture about HTML5</div></title><summary type="xhtml"><div xmlns="">I was invited to give a lecture about HTML5 on a course titled <a href="">WWW Applications</a> at the Department of Media Technology of Helsinki University of Technology.</div></summary><link href=""></link><id></id><updated>2016-07-26T16:32:45Z</updated></entry><entry><title type="xhtml"><div xmlns="">Julkisesti luotettu varmenne ikidomainille TLS:ää (SSL:ää) varten</div></title><summary type="xhtml"><div xmlns="">Aiemmin <a href="">ikidomainille</a>,
  248. kuten <code></code>, on ollut vaikeaa saada julkisesti
  249. luotettua TLS-varmennetta. Uusi voittoa tavoittelematon varmentaja <a href="">Let’s Encrypt</a> tarkistaa
  250. isäntänimen (hostname) hallinnan ja mahdollistaa näin julkisesti luotetun varmenteen saamisen ikidomaineille. <span>(<i>English summary:</i> Previously it was
  251. impractical to get a publicly trusted TLS certificate for an iki domain (e.g.
  252. <code></code>). Thanks to Let’s Encrypt performing validation
  253. on a per-hostname basis, it’s now practical to get a publicly trusted
  254. certificate for an iki domain.)</span></div></summary><link href=""></link><id></id><updated>2016-04-02T12:20:12Z</updated></entry><entry><title type="xhtml"><div xmlns="">-webkit-HTML5</div></title><summary type="xhtml"><div xmlns="">Apple took some of their Safari Technology Demos from their
  255. developer site and published them at <a href=""></a>
  256. as an “HTML5 Showcase”. <a href="">Christopher
  257. Blizzard's blog post</a> about the subject says almost everything I'd
  258. have to say, so please read Blizzard's post. I'm posting just my
  259. diffs here.</div></summary><link href=""></link><id></id><updated>2015-07-14T11:26:36Z</updated></entry><entry><title type="xhtml"><div xmlns="">Activating Browser Modes with Doctype</div></title><summary type="xhtml"><div xmlns="">A document about the essentials of the layout modes of newer browsers. (<a href="">Una vecchia versione disponibile in italiano</a>.)</div></summary><link href=""></link><id></id><updated>2015-07-14T11:26:05Z</updated></entry><entry><title type="xhtml"><div xmlns="">Lists in Attribute Values</div></title><summary type="xhtml"><div xmlns="">Whitespace-separation is good.</div></summary><link href=""></link><id></id><updated>2015-07-14T11:25:29Z</updated></entry><entry><title type="xhtml"><div xmlns="">The Sad Story of PNG Gamma “Correction”</div></title><summary type="xhtml"><div xmlns="">Why you might not want to use PNG images when you want image colors and CSS colors to match.</div></summary><link href=""></link><id></id><updated>2015-07-14T11:23:23Z</updated></entry><entry><title type="xhtml"><div xmlns="">If You Want Software Freedom on Phones, You Should Work on Firefox OS, Custom Hardware and Web App Self-Hostablility</div></title><summary type="xhtml"><div xmlns="">To achieve full-stack Software Freedom on mobile phones, I think it makes sense to focus on Firefox OS, commission custom hardware and develop self-hostable Free Software Web apps and an easy deployment platform for them.</div></summary><link href=""></link><id></id><updated>2015-01-24T10:04:16Z</updated></entry><entry><title type="xhtml"><div xmlns="">Character Encoding Menu in 2014</div></title><summary type="xhtml"><div xmlns="">This post is about a UI feature that I wish no one would have to use. Happily, it is indeed <i>almost</i> unused. Still, I made it more usable in the case when it <i>is</i> used. (The change was more driven by code removal than usability, though.)</div></summary><link href=""></link><id></id><updated>2014-12-05T10:22:46Z</updated></entry><entry><title type="xhtml"><div xmlns="">HTML5 Parser Improvements</div></title><summary type="xhtml"><div xmlns="">As <a href="test-html5-parsing/">mentioned</a> <a href="speculative-html5-parsing/">earlier</a>, there is an ongoing project for replacing Gecko’s old HTML parser with an <a href="">HTML5</a> parser. Significant improvements have landed lately, so if you’ve previously tried the HTML5 parser and turned it off due to crashiness or Web compatibility issues, now is a good time to turn it back on.</div></summary><link href=""></link><id></id><updated>2014-11-21T09:44:53Z</updated></entry><entry><title type="xhtml"><div xmlns="">ARIA in HTML5 Integration: Document Conformance (Draft, Take Two)</div></title><summary type="xhtml"><div xmlns="">Now a runnable suggestion.</div></summary><link href=""></link><id></id><updated>2014-11-21T09:37:07Z</updated></entry><entry><title type="xhtml"><div xmlns=""> and Pre-Existing Communities</div></title><summary type="xhtml"><div xmlns="">I have been reading tweets and blog posts expressing various
  260. levels of disappointment and unhappiness about not using
  261. RDFa, not using Microformats or not having been developed in the open
  262. with the community. Since other people’s perspectives differ from
  263. mine, I feel <a href="">compelled</a> to write
  264. down my take.</div></summary><link href=""></link><id></id><updated>2014-11-21T09:34:49Z</updated></entry><entry><title type="xhtml"><div xmlns="">Lowering memory requirements by replacing Schematron</div></title><summary type="xhtml"><div xmlns="">For long time, <a href="thesis/html5-conformance-checker#opt-schematron">I’ve said</a> is that the Schematron schema in the <a href="">HTML5 facet of</a> was merely a rapid prototype that should be replaced with custom Java code.</div></summary><link href=""></link><id></id><updated>2014-11-21T09:17:14Z</updated></entry><entry><title type="xhtml"><div xmlns="">HTML5 Parsing in Gecko: A Build</div></title><summary type="xhtml"><div xmlns="">The effort of putting an <a href="">HTML5
  265. parser</a> inside Gecko takes a step out of the vaporware land.</div></summary><link href=""></link><id></id><updated>2014-11-21T09:16:00Z</updated></entry><entry><title type="xhtml"><div xmlns="">Introducing SAX Tree</div></title><summary type="xhtml"><div xmlns="">I chose to write yet another XML tree package.</div></summary><link href=""></link><id></id><updated>2014-11-21T09:13:33Z</updated></entry><entry><title type="xhtml"><div xmlns="">NVDL Support in</div></title><summary type="xhtml"><div xmlns="">I enabled NVDL today.</div></summary><link href=""></link><id></id><updated>2014-11-21T09:11:12Z</updated></entry><entry><title type="xhtml"><div xmlns="">HOWTO Avoid Being Called a Bozo When Producing XML</div></title><summary type="xhtml"><div xmlns="">Dos and don’ts about producing XML programmatically.</div></summary><link href=""></link><id></id><updated>2014-11-21T09:08:37Z</updated></entry><entry><title type="xhtml"><div xmlns="">An Unofficial Q&amp;A about the Discontinuation of the XHTML2 WG</div></title><summary type="xhtml"><div xmlns="">Many of the comments on Zeldman’s
  266. post indicate that there are people who are badly misinformed about
  267. the matters surrounding this announcement. To help remedy that,
  268. here’s some quick Q&amp;A for getting informed.</div></summary><link href=""></link><id></id><updated>2014-11-21T09:03:57Z</updated></entry><entry><title type="xhtml"><div xmlns="">The HTML Parser</div></title><summary type="xhtml"><div xmlns="">An implementation of the HTML5 parsing algorithm in Java. (Used in Firefox by the means of automated translation to C++.)</div></summary><link href=""></link><id></id><updated>2014-11-21T08:54:13Z</updated></entry><entry><title type="xhtml"><div xmlns="">Thoughts on HTML5 Becoming a W3C Recommendation</div></title><summary type="xhtml"><div xmlns="">Since I’ve participated in the development of HTML5 for a decade now (since before it was commonly called “HTML5”), I’ve been asked for my thoughts about HTML5 becoming a W3C Recommendation. Hence, I figured I’d post something here.</div></summary><link href=""></link><id></id><updated>2014-10-26T19:45:55Z</updated></entry><entry><title type="xhtml"><div xmlns="">Four Finnish Banks Training Users to Give Banking Credentials to Another Site</div></title><summary type="xhtml"><div xmlns="">A person who turns to me for technical advice was logging in to government service using banking for a bank called Handelsbanken. However, the page that was asking for the Handelsbanken login credentials was not served from <code>https://*</code>!  After investigating what was going on, I decided to review how other banks  in Finland handle this. Here are my findings.</div></summary><link href=""></link><id></id><updated>2013-12-16T06:55:31Z</updated></entry><entry><title type="xhtml"><div xmlns="">Unimpressed by Leopard</div></title><summary type="xhtml"><div xmlns="">Sadly, Leopard is not a clear improvement over Tiger.</div></summary><link href=""></link><id></id><updated>2013-11-23T14:13:19Z</updated></entry><entry><title type="xhtml"><div xmlns="">Sergeant Semantics</div></title><summary type="xhtml"><div xmlns="">So the W3C launched a <a href="">logo for HTML5</a>. And not just for <a href="">HTML5-the-spec</a> but <a href="">for HTML5-the-buzzword</a>. Regardless of the logo itself or what it stands for, I find the choice of the ancillary visual elements weird.</div></summary><link href=""></link><id></id><updated>2013-11-23T14:11:28Z</updated></entry><entry><title type="xhtml"><div xmlns="">The Content Sink Inheritance Diagram – 2006-06-30</div></title><summary type="xhtml"><div xmlns="">I have discovered that my <a href="../content-sink/">previous diagram</a> showed only a part of the inheritance graph below <code>nsIContentSink</code>. There is more.</div></summary><link href=""></link><id></id><updated>2013-11-23T14:08:21Z</updated></entry><entry><title type="xhtml"><div xmlns="">An HTML5 Conformance Checker</div></title><summary type="xhtml"><div xmlns="">My master’s thesis</div></summary><link href=""></link><id></id><updated>2013-10-20T15:09:40Z</updated></entry><entry><title type="xhtml"><div xmlns="">What is EME?</div></title><summary type="xhtml"><div xmlns="">It was suggested at the Mozilla Summit that there isn’t good information around about what <a href="">Encrypted Media Extensions</a> (EME) actually is. Since I’m on the HTML working group and have been reading the email threads about EME there, I thought that I could provide an introduction that explains things that may not be apparent from the specification itself.</div></summary><link href=""></link><id></id><updated>2013-10-16T13:47:00Z</updated></entry><entry><title type="xhtml"><div xmlns="">About the Hiragino Fonts with CSS</div></title><summary type="xhtml"><div xmlns="">A short document about a couple of observations on using the Hiragino fonts with CSS. (The Hiragino fonts come with Mac OS X.)</div></summary><link href=""></link><id></id><updated>2012-10-02T12:59:25Z</updated></entry><entry><title type="xhtml"><div xmlns="">About Points and Pixels as Units</div></title><summary type="xhtml"><div xmlns="">A document about points being often mistakenly though as pixel units. Points are not pixel units. Defining the font size in points on Web pages is considered harmful. <strong>This document needs to be updated.</strong></div></summary><link href=""></link><id></id><updated>2012-10-02T12:56:58Z</updated></entry><entry><title type="xhtml"><div xmlns="">The Performance Cost of the HTML Tree Builder</div></title><summary type="xhtml"><div xmlns="">I’ve been thinking about the performance gap between the
  269. HTML Parser and Xerces. What can be attributed to the
  270. “extra fix-ups” that an HTML parser has to do and what can be
  271. attributed to my code being worse than the Xerces code?</div></summary><link href=""></link><id></id><updated>2012-09-17T12:24:57Z</updated></entry><entry><title type="xhtml"><div xmlns="">Social Media Impression Management</div></title><summary type="xhtml"><div xmlns="">I asked if they had researched the
  272. image formation of social media sites. They hadn’t.</div></summary><link href=""></link><id></id><updated>2012-09-17T12:24:26Z</updated></entry><entry><title type="xhtml"><div xmlns="">The <code>spacer</code> Element Is Gone</div></title><summary type="xhtml"><div xmlns="">Today, I landed a <a href="">patch</a>
  273. that made the HTML5 parser in Gecko unaware of the HTML <code>spacer</code>
  274. element.</div></summary><link href=""></link><id></id><updated>2012-09-17T12:22:30Z</updated></entry><entry><title type="xhtml"><div xmlns="">Openmind 2006</div></title><summary type="xhtml"><div xmlns="">I attended <a href="">Openmind</a> 2006
  275. last week. Here are some notes.</div></summary><link href=""></link><id></id><updated>2012-09-17T12:19:16Z</updated></entry><entry><title type="xhtml"><div xmlns="">Performance Mistake</div></title><summary type="xhtml"><div xmlns="">In the spirit of documenting one’s mistakes…</div></summary><link href=""></link><id></id><updated>2012-09-17T12:17:14Z</updated></entry><entry><title type="xhtml"><div xmlns="">XHTML and Mobile Devices</div></title><summary type="xhtml"><div xmlns=""><a href="">Simon
  276. Pieters’ mobile XHTML test results</a> need more publicity.</div></summary><link href=""></link><id></id><updated>2012-09-17T12:16:32Z</updated></entry><entry><title type="xhtml"><div xmlns="">WebM-Enabled Browser Usage Share Exceeds H.264-Enabled Browser Usage Share on Desktop (in StatCounter Numbers)</div></title><summary type="xhtml"><div xmlns="">Looking at StatCounter stats, it occurred to me that they might not match the common narrative about H.264 market share. I decide to run some numbers using StatCounter stats.</div></summary><link href=""></link><id></id><updated>2012-09-17T12:15:57Z</updated></entry><entry><title type="xhtml"><div xmlns="">HTML5 Parser-Based View Source Syntax Highlighting</div></title><summary type="xhtml"><div xmlns="">A new implementation of the View Source HTML and XML syntax highlighting has landed in Firefox.</div></summary><link href=""></link><id></id><updated>2012-03-14T16:10:01Z</updated></entry><entry><title type="xhtml"><div xmlns="">Vendor Prefixes Are Hurting the Web</div></title><summary type="xhtml"><div xmlns="">I think vendor prefixes are hurting the Web. I think we (people developing browsers and Web standards) should stop hurting the Web.</div></summary><link href=""></link><id></id><updated>2012-02-10T13:15:55Z</updated></entry><entry><title type="xhtml"><div xmlns=""><code>Accept-Charset</code> Is No More</div></title><summary type="xhtml"><div xmlns="">Now that Firefox 10 has been released, <del>none of the major browsers send</del> <ins>only Chrome sends</ins> the <code>Accept-Charset</code> HTTP header.</div></summary><link href=""></link><id></id><updated>2012-02-07T06:56:01Z</updated></entry><entry><title type="xhtml"><div xmlns="">Dualroids</div></title><summary type="xhtml"><div xmlns="">A two-player asteroid shooting network game written in Java.</div></summary><link href=""></link><id></id><updated>2011-12-22T13:27:39Z</updated></entry><entry><title type="xhtml"><div xmlns="">Writing Structural Stylable Documents in Mozilla Editor</div></title><summary type="xhtml"><div xmlns="">The Mozilla Editor is designed around HTML 4 Transitional. If special steps aren’t taken, it is easy to produce presentational documents that lack stylable structure. This document describes some basic good authoring practices for the purpose of writing structural and stylable documents.</div></summary><link href=""></link><id></id><updated>2011-12-22T13:22:32Z</updated></entry><entry><title type="xhtml"><div xmlns="">ISO-8859-15 on haitallinen</div></title><summary type="xhtml"><div xmlns="">UTF-8 is the way to go. (In Finnish.)</div></summary><link href=""></link><id></id><updated>2011-12-22T13:18:09Z</updated></entry><entry><title type="xhtml"><div xmlns="">Hourglass</div></title><summary type="xhtml"><div xmlns="">Yet another ray tracing gallery page.</div></summary><link href=""></link><id></id><updated>2011-12-22T13:16:08Z</updated></entry><entry><title type="xhtml"><div xmlns="">The Scientific Method According to Hixie</div></title><summary type="xhtml"><div xmlns="">Quote of the week from the topic of #developers on</div></summary><link href=""></link><id></id><updated>2011-12-22T12:57:52Z</updated></entry><entry><title type="xhtml"><div xmlns="">Maemo Source Code</div></title><summary type="xhtml"><div xmlns="">To save others the trouble of requesting the source, here are the contents of the package called “2.2006.39-14-srcs”.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:57:07Z</updated></entry><entry><title type="xhtml"><div xmlns="">Karpelan lukkovertaus ontuu</div></title><summary type="xhtml"><div xmlns="">Anti-circumvention legislation does not make sense, and it is fallacious to compare circumventing DRM to breaking into an apartment. (In Finnish)</div></summary><link href=""></link><id></id><updated>2011-12-22T12:56:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Digitaalisesta arkistoinnista</div></title><summary type="xhtml"><div xmlns="">Documents about archiving digital documents (in Finnish)</div></summary><link href=""></link><id></id><updated>2011-12-22T12:51:35Z</updated></entry><entry><title type="xhtml"><div xmlns="">ARIA in HTML5 Integration: Document Conformance (Draft)</div></title><summary type="xhtml"><div xmlns="">This is not a spec and has not been endorsed by anyone.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:50:37Z</updated></entry><entry><title type="xhtml"><div xmlns="">XHTML—What’s the Point? (Draft, incomplete)</div></title><summary type="xhtml"><div xmlns="">This document is incomplete, but I put it on the Web in order to avoid retyping the same thing over and over again in newsgroup discussions.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Mac OS X Browser Comparison</div></title><summary type="xhtml"><div xmlns="">This document is a rough yes/no feature comparison of the Web browsers that run natively on Mac OS X. It does not cover browsers that run on the Classic VM or require an implementation of the X11 windowing system. <strong>Severely out of date. For historical reference only!</strong></div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">HOWTO Spot a Wannabe Web Standards Advocate</div></title><summary type="xhtml"><div xmlns="">I have seen this too often. (<a href="wannabe/fr/">Aussi disponible en français</a>;
  277. <a href="wannabe/de/">Auch vorhanden auf Deutsch</a>; <a href="wannabe/pl/">jest dostępny po polsku</a>)</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">An Idea About Intermediate Language Trees and Web UI Generation</div></title><summary type="xhtml"><div xmlns="">An idea about Web UI generation I had when I was studying compiler technology.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Thoughts on Using SSL/TLS Certificates as the Solution to Phishing</div></title><summary type="xhtml"><div xmlns="">Comments on <a href=""><cite>Staying Safe From
  278. Phishing With Firefox</cite></a>.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Bureaucracy Meets the Web</div></title><summary type="xhtml"><div xmlns="">Three things from the past week happened to be related to bureaucracy and the Web…</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Europe Day</div></title><summary type="xhtml"><div xmlns="">Tuesday 2006-05-09 was the Europe Day. I traveled to Tampere for a show debate.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">HOWTO Establish a 100% Literacy Rate</div></title><summary type="xhtml"><div xmlns="">This is one of my favorite pieces of West
  279. Wing script writing.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">What to Do with All These Photos?</div></title><summary type="xhtml"><div xmlns="">I have a lot of photos that aren’t shared properly, which makes
  280. them less useful than they could be. Considering that it has been
  281. possible to publish photos on the Web for over a decade, I find it
  282. interesting and annoying how many unsolved problems there still are.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Charmod Norm Checking</div></title><summary type="xhtml"><div xmlns=""><a href="">Charmod Norm</a> is
  283. still in the Working Draft state, but if it were to become a
  284. normative part of (X)HTML5, it would belong to the area of the
  285. conformance checking service that I am working on now, so I
  286. prototyped Charmod Norm enforcement as well.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Validator Web Service Interface Ideas</div></title><summary type="xhtml"><div xmlns="">I am just writing this down so I don’t forget it.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">DTDs Don’t Work on the Web</div></title><summary type="xhtml"><div xmlns="">Last weekend, Slashdot <a href="">linked</a>
  287. to an <a href="">article</a>
  288. that observed that Netscape had removed the <a href="">RSS
  289. 0.91 DTD</a>. I hope this episode has a silver
  290. lining and helps in making people realize that DTDs don’t belong on
  291. the Web.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">EFFI’s Day in Court</div></title><summary type="xhtml"><div xmlns=""><a href="bureaucracy-meets-the-web/">As
  292. mentioned earlier</a>, <a href="">Electronic
  293. Frontier Finland</a> (EFFI) was suspected of illegal fundraising. The
  294. case was tried today. I went to the court house to observe the
  295. proceedings.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Speaking at XTech</div></title><summary type="xhtml"><div xmlns="">I’ll be speaking at <a href="">XTech</a>.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Security Quote of the Day</div></title><summary type="xhtml"><div xmlns="">Cluelessness and incompetence of epic proportions.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Not Part of the Technology Stack</div></title><summary type="xhtml"><div xmlns="">At XTech 2006, I got a W3C brochure entitled <cite>Leading the Web
  296. to its Full Potential</cite> that had a diagram visualizing the W3C
  297. technology stack(s).</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Out of Context</div></title><summary type="xhtml"><div xmlns="">Last week on W3C mailing lists.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">The <code>html5.parser.enable</code> Pref is Gone</div></title><summary type="xhtml"><div xmlns="">Just a quick note to Firefox nightly testers and bug triagers: I pushed
  298. a patch that makes Firefox no longer honor the <code>html5.parser.enable</code> pref.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Miscellaneous Java Code</div></title><summary type="xhtml"><div xmlns="">Utility code.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Autozoom Extension for Firefox®</div></title><summary type="xhtml"><div xmlns="">When Autozoom is activated, the current document is analyzed for the
  299. dominant font size and the view is zoomed by the factor that makes
  300. the dominant size match your font size preference.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Älä käytä Creative Commons 1.0 -lisenssejä
  301. – käytä 2.5-sarjaa</div></title><summary type="xhtml"><div xmlns="">The Finland version of the Creative Commons
  302. suite of licenses is still at 1.0. The 1.0 series of CC licenses has
  303. three serious known bugs (in Finnish)</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Tag Soup: How Mac IE 5 and Safari handle &lt;x&gt; &lt;y&gt; &lt;/x&gt;
  304. &lt;/y&gt;</div></title><summary type="xhtml"><div xmlns="">What happens with the DOM in Safari and Mac IE 5 when the nesting of the markup is broken?</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:18Z</updated></entry><entry><title type="xhtml"><div xmlns="">Mustaa valkoisella</div></title><summary type="xhtml"><div xmlns="">A document request to the Ministry of Education. (In Finnish)</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">Is Atom What We Really Need?</div></title><summary type="xhtml"><div xmlns="">Atom (formerly known as Pie, Echo and Necho) has been created as a cleaner and better-defined alternative to RSS 2.0, which is underspecified. But is a reformulated version of RSS 2.0 really what we need?</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">10 Safari 1.0 issues</div></title><summary type="xhtml"><div xmlns="">Hyatt requested lists like this.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">Names of Browser Engines</div></title><summary type="xhtml"><div xmlns="">A table of browser names, engine names and script engine names.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">Natural Hazards: NA</div></title><summary type="xhtml"><div xmlns="">Thoughts about nuclear power plants in stormy situations.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">Who knows prefixed XHTML from a hole in the ground?</div></title><summary type="xhtml"><div xmlns="">Remember to test prefixed XHTML as well.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">Aula 2006</div></title><summary type="xhtml"><div xmlns="">Yesterday, I went to listen to the public speeches that were part
  305. of <a href="">Aula 2006 – Movement</a>.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">Charmod Checking</div></title><summary type="xhtml"><div xmlns="">Here’s how I have addressed the requirements of Charmod that
  306. apply to content (marked as [C] is Charmod).</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">Printing Web Apps 1.0</div></title><summary type="xhtml"><div xmlns="">This is a quick guide for getting a dead-tree version of the Web
  307. Applications 1.0 spec.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns=""> Gets Out of the Java Trap</div></title><summary type="xhtml"><div xmlns="">This week, I upgraded the operating system on the <a href="">Xen</a>
  308. virtual machine that powers <code><a href=""></a></code>
  309. and <code><a href=""></a></code>
  310. to <a href="">Ubuntu</a> <a href="">Hardy</a>.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">Browser Technology Stack</div></title><summary type="xhtml"><div xmlns="">I took a quick attempt at drawing a stack for Web browsing.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">Thou Shalt Not Spec a Feature that Might Inadvertently Compete with RDF when Used Contrary to How It Is Designed to Be Used</div></title><summary type="xhtml"><div xmlns="">From the minutes of the TAG meeting on November 2<sup>nd</sup> 2009.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">SVG and MathML in <code>text/html</code> in Firefox and</div></title><summary type="xhtml"><div xmlns="">I enabled SVG and MathML-related stuff recently on both
  311. mozilla-central and on</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">What Could Microsoft Do about IE6?</div></title><summary type="xhtml"><div xmlns="">Microsoft has started a <a href="">campaign
  312. to drive down the market share of IE6</a>. Getting rid of IE6 is a
  313. righteous goal. Microsoft’s proposed solution isn’t righteous, though.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">UTF-8 to Code Point Array Converter in PHP</div></title><summary type="xhtml"><div xmlns="">This package contains a PHP include file which provides two functions for converting between UTF-8 strings and arrays of ints representing Unicode code points.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">HTML Syntax Checker in PHP</div></title><summary type="xhtml"><div xmlns="">An HTML linter written in PHP.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">SaxCompiler</div></title><summary type="xhtml"><div xmlns="">SaxCompiler is a tool for recording SAX <a href=""><code>ContentHandler</code></a>
  314. events as Java code that can play back the events without parsing
  315. XML.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">Photo Group Feed</div></title><summary type="xhtml"><div xmlns=""><a href="">Flickr</a> doesn’t provide feeds
  316. for private groups. It doesn’t provide feeds for comments on photos
  317. in a group, either. It is reasonable to want such feeds, so here’s
  318. a script that generates them on your HTTP server.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">Photo and <em>Metadata</em> Backup for Flickr</div></title><summary type="xhtml"><div xmlns="">This is a photo and <em>metadata</em> backup utility for Flickr
  319. written as a <em>self-contained</em> Java command line tool. The
  320. metadata is written is an XML file whose format is an aggregation of
  321. the response data from the Flickr API.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">Thoughts About a Print <abbr title="User Interface">UI</abbr> for Mozilla</div></title><summary type="xhtml"><div xmlns="">Some thoughts about printing from a Web browser.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:17Z</updated></entry><entry><title type="xhtml"><div xmlns="">Oops! I broke MathML – 2006-07-05</div></title><summary type="xhtml"><div xmlns="">Or, well, one could argue that it was already broken but my
  322. content sink changes and a suitably crafted test case just exposed
  323. the layout issues that were already there.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:16Z</updated></entry><entry><title type="xhtml"><div xmlns="">Week 35</div></title><summary type="xhtml"><div xmlns="">The weekly report for week 35.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:16Z</updated></entry><entry><title type="xhtml"><div xmlns="">Natural Hazards Again</div></title><summary type="xhtml"><div xmlns="">Looking across the street, I can see that there’s something
  324. extra in the air between where I sit and the house on the other side
  325. of the street.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:16Z</updated></entry><entry><title type="xhtml"><div xmlns="">CMS Stuff</div></title><summary type="xhtml"><div xmlns="">Papers and code related to a CMS project.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:16Z</updated></entry><entry><title type="xhtml"><div xmlns="">Assembling Web Pages Using Document Trees</div></title><summary type="xhtml"><div xmlns="">A paper about a template engine that operates on XML document trees. (<a href="cms/">Source code available.</a>)</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:16Z</updated></entry><entry><title type="xhtml"><div xmlns="">Kesäkoodi Starting – 2006-05-23</div></title><summary type="xhtml"><div xmlns="">So what’s this <a href="">Kesäkoodi</a>
  326. thing about?</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">DOM Traversal Performance – 2006-05-26</div></title><summary type="xhtml"><div xmlns="">But there is a problem. My JavaScript implementation is slow.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Week 21</div></title><summary type="xhtml"><div xmlns="">The weekly report for week 21.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Week 22</div></title><summary type="xhtml"><div xmlns="">The weekly report for week 22.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Planning the XML Content Sink Incrementalization Work – 2006-06-10</div></title><summary type="xhtml"><div xmlns="">I’ve been researching the problem area of <a href="">bug
  327. 18333</a>.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Week 23</div></title><summary type="xhtml"><div xmlns="">The weekly report for week 23.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Week 24</div></title><summary type="xhtml"><div xmlns="">The weekly report for week 24.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Week 25</div></title><summary type="xhtml"><div xmlns="">The weekly report for week 25.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Eclipse CDT – 2006-06-27</div></title><summary type="xhtml"><div xmlns="">After working in TextWrangler (and a bit in XCode) for a couple of
  328. weeks, I really started to miss Eclipse. </div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Week 26</div></title><summary type="xhtml"><div xmlns="">The weekly report for week 26.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Builds! – 2006-07-06</div></title><summary type="xhtml"><div xmlns="">Now there is something to test. I am providing builds with my
  329. preliminary patches for four target platforms.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Builds, Take Two – 2006-07-07</div></title><summary type="xhtml"><div xmlns="">The <a href="">builds</a>
  330. have been respun with fixes for interrupting Expat properly.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Week 27</div></title><summary type="xhtml"><div xmlns="">The weekly report for week 27.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Week 30</div></title><summary type="xhtml"><div xmlns="">The weekly report for week 30.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Week 31</div></title><summary type="xhtml"><div xmlns="">The weekly report for week 31.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Week 32</div></title><summary type="xhtml"><div xmlns="">The weekly report for week 32.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Week 33</div></title><summary type="xhtml"><div xmlns="">The weekly report for week 33.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Speaking Gig – 2006-08-28</div></title><summary type="xhtml"><div xmlns="">I have been booked to speak at the Openbyte pre-conference of the
  331. <a href="">Openmind 2006</a> event in Tampere
  332. Hall on 2006-10-24. </div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Week 34</div></title><summary type="xhtml"><div xmlns="">The weekly report for week 34.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">On Clipboard Formats – 2006-09-15</div></title><summary type="xhtml"><div xmlns="">This stuff is so underdocumented that it isn’t even funny. This
  333. document is written so that others might find something when they
  334. search the Web.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Kesäkoodi Wrap-Up – 2006-09-19</div></title><summary type="xhtml"><div xmlns="">The last week of Kesäkoodi stretched to two sparse weeks.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:15Z</updated></entry><entry><title type="xhtml"><div xmlns="">Outlining the “Ultimate” Blogging Server</div></title><summary type="xhtml"><div xmlns="">I’ve been thinking what a really
  335. good blogging system or a news site content management system would
  336. be like. Here’s my attempt at outlining the “ultimate”
  337. blogging server.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:14Z</updated></entry><entry><title type="xhtml"><div xmlns="">How Not to Advertise an Election Candidate</div></title><summary type="xhtml"><div xmlns="">On Sunday and Monday elections were held at the local congregation in order to select a new vicar. I didn’t like the campaigning.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:14Z</updated></entry><entry><title type="xhtml"><div xmlns="">Unused Icons</div></title><summary type="xhtml"><div xmlns="">Unhelpful Microsoft wizardiness</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:14Z</updated></entry><entry><title type="xhtml"><div xmlns="">Comedy is the Real News</div></title><summary type="xhtml"><div xmlns="">An observation I made last year when watching TV in the U.S.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:14Z</updated></entry><entry><title type="xhtml"><div xmlns="">Need a Taxi at a Taxi Station? You Lose!</div></title><summary type="xhtml"><div xmlns="">A taxi station is the worst place to be in Helsinki when you need a taxi (unless there’s one already there).</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:14Z</updated></entry><entry><title type="xhtml"><div xmlns="">Reality Distortion Fields</div></title><summary type="xhtml"><div xmlns="">Where Joel Spolsky’s analysis of the IE version targeting issue goes wrong.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:14Z</updated></entry><entry><title type="xhtml"><div xmlns="">The Last of the Parsing Quirks</div></title><summary type="xhtml"><div xmlns="">I implemented a single quirk for HTML5 parsing yesterday.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:14Z</updated></entry><entry><title type="xhtml"><div xmlns="">The Joy of <code>about:blank</code></div></title><summary type="xhtml"><div xmlns=""><code><a href="about:blank">about:blank</a></code>
  338. is probably the hardest Web page to load. In fact, it is so hard that
  339. in order to turn the HTML5 parser on by default in Firefox last year,
  340. we decided to special-case <code>about:blank</code>
  341. to use the old parser in Firefox 4.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:14Z</updated></entry><entry><title type="xhtml"><div xmlns="">XTech 2006</div></title><summary type="xhtml"><div xmlns="">I went to the <a href="">XTech 2006</a>
  342. conference last week.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:13Z</updated></entry><entry><title type="xhtml"><div xmlns="">Built-in Accessibility Roles in HTML5</div></title><summary type="xhtml"><div xmlns="">A quick table of <a href="">WAI-ARIA</a> roles and what <a href="">HTML 5</a> provides natively for each role as of July 2007.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:13Z</updated></entry><entry><title type="xhtml"><div xmlns=""> Downtime</div></title><summary type="xhtml"><div xmlns=""> was down last week.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:13Z</updated></entry><entry><title type="xhtml"><div xmlns="">I Want an Affordable Snapshot-Saving Crypto-Backupping RAID NAS</div></title><summary type="xhtml"><div xmlns="">This week, I lost over one potential work day to HFS+. And it
  343. wasn’t the first time I’ve lost time to HFS+. I want to
  344. make arrangements to avoid losing time to HFS+ in the future.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:13Z</updated></entry><entry><title type="xhtml"><div xmlns="">SVG Filter Effects in HTML without External References</div></title><summary type="xhtml"><div xmlns="">The project of putting an <a href="html5-gecko-build/">HTML5
  345. parser inside Gecko</a> has progressed. I merged in code from the
  346. trunk in order to experiment with cool new stuff such as <a href="">SVG
  347. filter effects for HTML</a>.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:13Z</updated></entry><entry><title type="xhtml"><div xmlns="">HTML5 Script Execution Changes in Firefox 4 Beta 7</div></title><summary type="xhtml"><div xmlns="">In Firefox 4 beta 7, script execution changed to be more
  348. HTML5-compliant than before. This means that in some cases sites that
  349. sniff for Firefox or Gecko may break. <i>If your site/app works
  350. cross-browser without browser sniffing, you don’t need to read
  351. further.</i> (However, if you triage bugs on, you might still want to read on.)</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:13Z</updated></entry><entry><title type="xhtml"><div xmlns="">Vihreiden tekijänoikeuslinja ja teosten tekijöiden eläketurva</div></title><summary type="xhtml"><div xmlns="">Vihreät julkaisivat äskettäin <a href="">tekijänoikeuslinjapaperin</a>.
  352. On positiivista, että puolue kiinnittää huomiota aihepiiriin niin
  353. paljon, että siitä julkaistaan erillinen linjapaperi. Minua
  354. kuitenkin häiritsee paperissa suhtautuminen teosten tekijöiden
  355. eläketurvaan. <span>(<i>English summary:</i> I’m unhappy that the newly
  356. released copyright policy paper of the Finnish Green Party suggests
  357. that authors of copyrighted works should get royalties for the
  358. commercial use of the works they have created long after the creation
  359. of the work in order to get money in the pensioner age.)</span></div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:13Z</updated></entry><entry><title type="xhtml"><div xmlns="">Windows 8 App Support Matrix</div></title><summary type="xhtml"><div xmlns="">Over the last few days, there’s been quite a bit of speculation about whether Windows 8 on ARM will ship the desktop environment and allow recompiled code written to the legacy Win32 APIs run.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:13Z</updated></entry><entry><title type="xhtml"><div xmlns="">Regular Expressions, Computer Science and Practice</div></title><summary type="xhtml"><div xmlns="">Disregard of computer science can crash your app.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:12Z</updated></entry><entry><title type="xhtml"><div xmlns="">Almost Precedent</div></title><summary type="xhtml"><div xmlns="">Why the Gecko Almost Standards Mode shouldn’t be used to justify IE engine version targeting.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:12Z</updated></entry><entry><title type="xhtml"><div xmlns="">Access Blocked</div></title><summary type="xhtml"><div xmlns="">I followed a link from a message to a spec in the <a href="">/TR/</a>
  360. space on <a href=""></a>.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:12Z</updated></entry><entry><title type="xhtml"><div xmlns="">Extended Uncertainty</div></title><summary type="xhtml"><div xmlns="">I use <a href="">myvidoop</a> as my OpenID
  361. delegate. They used to have an <a href="">EV
  362. certificate</a>. Yesterday, they didn’t.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:12Z</updated></entry><entry><title type="xhtml"><div xmlns="">View Originl Bookmarklet</div></title><summary type="xhtml"><div xmlns="">It takes way too many clicks to get from a Flickr photo page to the original JPEG file. I wrote a bookmarklet that does it with just one click.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:12Z</updated></entry><entry><title type="xhtml"><div xmlns="">Things to Take into Account When Moving to Standards-Compliant HTML and CSS Authoring</div></title><summary type="xhtml"><div xmlns="">This is a mixed collection of a few issues that are worth taking into account when writing Web pages according to the W3C Recommendations.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:11Z</updated></entry><entry><title type="xhtml"><div xmlns="">Makasiinit</div></title><summary type="xhtml"><div xmlns="">So the <span>Makasiinit</span> burned today.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:11Z</updated></entry><entry><title type="xhtml"><div xmlns="">Imitating Reflective Caustics in POV-Ray</div></title><summary type="xhtml"><div xmlns="">A tutorial on imitating reflective caustics in the official distribution of POV-Ray</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:10Z</updated></entry><entry><title type="xhtml"><div xmlns="">RFC 2119 Key Words in Management Textbooks</div></title><summary type="xhtml"><div xmlns="">Just a random observation about the vocabulary of management textbooks.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:10Z</updated></entry><entry><title type="xhtml"><div xmlns="">Atom Feed</div></title><summary type="xhtml"><div xmlns="">I now have an Atom 1.0 feed.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:10Z</updated></entry><entry><title type="xhtml"><div xmlns="">ISO Opens Up a Little</div></title><summary type="xhtml"><div xmlns="">It turns out that ISO now has some standards on the Web. That’s
  363. good, but putting all of them there in a Web-friendly format would be
  364. even better.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:10Z</updated></entry><entry><title type="xhtml"><div xmlns="">Table Integrity Checker</div></title><summary type="xhtml"><div xmlns="">The first non-schema checker prototype is a table integrity checker.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:10Z</updated></entry><entry><title type="xhtml"><div xmlns="">Three Styles</div></title><summary type="xhtml"><div xmlns="">Well, four styles if you count the original.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:10Z</updated></entry><entry><title type="xhtml"><div xmlns="">Thesis Defense on XForms</div></title><summary type="xhtml"><div xmlns="">On Friday 2007-01-12, I went to listen to the thesis defense of
  365. <a href="">Mikko Honkala</a>.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:10Z</updated></entry><entry><title type="xhtml"><div xmlns="">IM Logs</div></title><summary type="xhtml"><div xmlns="">Quote of the week.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:10Z</updated></entry><entry><title type="xhtml"><div xmlns="">Browser Sniffing History in the Chrome UA String</div></title><summary type="xhtml"><div xmlns=""><a href="">Google Chrome</a> has the following cruft in the HTTP <code>User-Agent</code> header.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:10Z</updated></entry><entry><title type="xhtml"><div xmlns="">Speculative HTML5 Parsing Landed</div></title><summary type="xhtml"><div xmlns="">As mentioned earlier, there is an ongoing project for replacing Gecko’s old HTML parser with an HTML5 parser. Today, a significant milestone landed: off-the-main-thread speculative HTML5 parsing.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:10Z</updated></entry><entry><title type="xhtml"><div xmlns="">The Old HTML Fragment Parser is Gone</div></title><summary type="xhtml"><div xmlns="">Just a quick note to Firefox nightly testers and bug triagers.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:10Z</updated></entry><entry><title type="xhtml"><div xmlns="">Big Brother EU</div></title><summary type="xhtml"><div xmlns="">On Tuesday 2005-11-22, I went to a public discussion event titled “Big Brother EU”.</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:09Z</updated></entry><entry><title type="xhtml"><div xmlns="">Can Anti-DRM Clauses in Content Licenses be Free?</div></title><summary type="xhtml"><div xmlns="">Are anti-DRM clauses a good idea? Are the current clauses merely badly drafted and an anti-DRM clause in general could be free? Or is any anti-DRM clause inherently non-free?</div></summary><link href=""></link><id></id><updated>2011-12-22T12:43:09Z</updated></entry><entry><title type="xhtml"><div xmlns="">An Introduction to Unicode</div></title><summary type="xhtml"><div xmlns="">PDF slides about Unicode.</div></summary><link href=""></link><id></id><updated>2005-07-27T17:07:37Z</updated></entry><entry><title type="xhtml"><div xmlns="">W3C DOM -esittely</div></title><summary type="xhtml"><div xmlns="">An introduction to the W3C DOM (in Finnish).</div></summary><link href=""></link><id></id><updated>2002-06-28T17:59:02Z</updated></entry><entry><title type="xhtml"><div xmlns="">Testing HTML5 Parsing</div></title><summary type="xhtml"><div xmlns="">I have been using a browser with an HTML5 parser for both my work
  366. and leisure browsing for a bit over a week now. I think in-browser
  367. HTML5 parsing is now ready to be tested by others as well.</div></summary><link href=""></link><id></id><updated>1970-01-01T00:00:00Z</updated></entry><entry><title type="xhtml"><div xmlns="">Help Test HTML5 Parsing in Gecko</div></title><summary type="xhtml"><div xmlns="">The HTML5 parsing algorithm is meant to demystify HTML parsing and
  368. make it uniform across implementations in a backwards-compatible way.
  369. The algorithm has had “in the lab” testing, but so far it hasn’t
  370. been tested inside a browser by a large number of people. <em>You</em>
  371. can help change that now!</div></summary><link href=""></link><id></id><updated>1970-01-01T00:00:00Z</updated></entry><entry><title type="xhtml"><div xmlns=""></div></title><summary type="xhtml"><div xmlns="">Validation 2.0.</div></summary><link href=""></link><id></id><updated>1970-01-01T00:00:00Z</updated></entry></feed>

If you would like to create a banner that links to this page (i.e. this validation result), do the following:

  1. Download the "valid Atom 1.0" banner.

  2. Upload the image to your own server. (This step is important. Please do not link directly to the image on this server.)

  3. Add this HTML to your page (change the image src attribute if necessary):

If you would like to create a text link instead, here is the URL you can use:

Copyright © 2002-9 Sam Ruby, Mark Pilgrim, Joseph Walton, and Phil Ringnalda