Feb 23, 2016

URL里的+

最近一个同学反馈用我做的某个工具的时候，明明form提交的值里是空格，但是拿到的是参数值里变成了+号，回想一下并没有在form提交的时候做这种转换呀，那是哪儿转出来的呢？

做了一个小例子：

<html>
<head>
  <title> 测试form </title>
</head>
<body>
  <form action="/">
    <input type="text" name="hello world" />
    <input type="button" value="click" />
  </form>
</body>
</html>

没有指定form的enctype，我们都知道form的默认enctype是application/x-www-form-urlencoded，如果在输入框里输入了空格，比如hello world，点击click之后，页面中的链接就会变成：

http://localhost:3000/?hello+world=hello+world

这里没有任何后端服务，那么确实是浏览器把空格替换成了+，那如果就是想在值里有一个+怎么办？输入hello+world，得到的链接是：

http://localhost:3000/?hello+world=hello%2Bworld

+会转成%2B，+号对应的unicode编码是43，十六进制是2B，在URL里用%2B表示也是正常

'+'.charCodeAt(0).toString(16) //2b

那这个转换行为是规范里的吗？找了一下，确实是https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1，我把它摘抄一下：

application/x-www-form-urlencoded

This is the default content type. Forms submitted with this content type must be encoded as follows:

Control names and values are escaped. Space characters are replaced by ’+’, and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by ‘%HH’, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as “CR LF” pairs (i.e., ‘%0D%0A’). The control names/values are listed in the order they appear in the document. The name is separated from the value by ’=’ and name/value pairs are separated from each other by ’&‘.

对应我们这个问题就是，键与值都要被转义，空格要被转为+，非字母与数字转为一个%号与两位16进制的ASCII值，这下明确了，就是浏览器按规范进行了转换。

那一般的框架在处理请求的时候应该也会对这个URL进行反转义，也就是把+号转为空格，我们来测试一下NodeJS:

var urlModule = require('url')
var urlStr = 'http://localhost:3000/?hello+world=good+hello+%2B+hell'
console.log(urlModule.parse(urlStr, true))

//返回下面的值
// Url {
//   protocol: 'http:',
//   slashes: true,
//   auth: null,
//   host: 'localhost:3000',
//   port: '3000',
//   hostname: 'localhost',
//   hash: null,
//   search: '?hello+world=good+hello+%2B+hell',
//   query: { 'hello world': 'good hello + hell' },
//   pathname: '/',
//   path: '/?hello+world=good+hello+%2B+hell',
//   href: 'http://localhost:3000/?hello+world=good+hello+%2B+hell' }

可见NodeJS的处理也是没有问题，为什么我的程序里有问题呢？好吧，因为我们有些历史URL是GB2312的编码，为了对这些URL进行兼容，自己单独写了一个 URL parse的方法，没有处理+号的这种情况。

至此，真相大白，本周一篇目标达到，耶~

html form url encoding javascript nodejs

URL里的+

application/x-www-form-urlencoded

推荐文章