HttpContent: Default charset for "text/*" should be ISO-8859-1

Topics: Web Api
Sep 7, 2011 at 7:28 AM


started looking into WCF Web API and it seems to offer what I need, especially things that go without mentioning in HTTP like (the somewhat misnamed) "content negotiation". Awesome!

Now to the issue: According to the HTTP Spec, the default encoding for text/* is "ISO-8859-1". I'd very much like it to be UTF-8, but it isn't.

If possible, please change it (in HttpContent.ReadAsString()) to ISO-8859-1, but only for "text/*", so that it doesn't touch things like "application/json" etc., which afaik are not specified in the HTTP Spec.

Here's the relevant excerpt:

3.7.1 Canonicalization and Text Defaults

[...] The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. See section 3.4.1 for compatibility problems.

By the way, any news on the "release plans" front? Like, when can we expect the next version? And are we heading towards a version 1.0 any time soon or something meant to be used in production?


Sep 7, 2011 at 6:01 PM

The spec has changed -- the latest HTTP/1.1 bis draft on content negotiation [1] says

   Remove the default character encoding for text media types; the
   default now is whatever the media type definition says.
   (Section 2.3.1)


Sep 8, 2011 at 1:30 AM

Thanks, Dan. That's very interesting!

Albeit still a draft, I very much like that they do not use latin-1 as the default. Interesting though that they dare to make a breaking change, which in RFC 2616 they seemed to prevent at all costs (like introducing new http headers that more or less did the same as the HTTP/1.0 ones).

I guess then UTF-8 is a good default, although if following the spec "default now is whatever the media type definition says", but I guess this is OK for a general-purpose class like HttpContent. And it does respect explicit setting of the charset, so... Fine with me.

Sep 8, 2011 at 3:45 AM


If you are interested in the four years of debate that went into this, the summary is here


Sep 8, 2011 at 8:07 AM

Awesome, thanks for the link.