URL里的+


最近一个同学反馈用我做的某个工具的时候,明明form提交的值里是空格,但是拿到的是参数值里变成了+号, 回想一下并没有在form提交的时候做这种转换呀,那是哪儿转出来的呢?

做了一个小例子:

<html>
<head>
  <title> 测试form </title>
</head>
<body>
  <form action="/">
    <input type="text" name="hello world" />
    <input type="button" value="click" />
  </form>
</body>
</html>

没有指定form的enctype,我们都知道form的默认enctype是application/x-www-form-urlencoded, 如果在输入框里输入了空格,比如hello world,点击click之后,页面中的链接就会变成:

http://localhost:3000/?hello+world=hello+world

这里没有任何后端服务,那么确实是浏览器把空格替换成了+,那如果就是想在值里有一个+怎么办? 输入hello+world,得到的链接是:

http://localhost:3000/?hello+world=hello%2Bworld

+会转成%2B+号对应的unicode编码是43,十六进制是2B,在URL里用%2B表示也是正常

'+'.charCodeAt(0).toString(16) //2b

那这个转换行为是规范里的吗?找了一下,确实是https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1,我把它摘抄一下:

application/x-www-form-urlencoded

This is the default content type. Forms submitted with this content type must be encoded as follows:

Control names and values are escaped. Space characters are replaced by ’+’, and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by ‘%HH’, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as “CR LF” pairs (i.e., ‘%0D%0A’). The control names/values are listed in the order they appear in the document. The name is separated from the value by ’=’ and name/value pairs are separated from each other by ’&‘.

对应我们这个问题就是,键与值都要被转义,空格要被转为+,非字母与数字转为一个%号与两位16进制的ASCII值,这下明确了,就是浏览器按规范进行了转换。

那一般的框架在处理请求的时候应该也会对这个URL进行反转义,也就是把+号转为空格,我们来测试一下NodeJS:

var urlModule = require('url')
var urlStr = 'http://localhost:3000/?hello+world=good+hello+%2B+hell'
console.log(urlModule.parse(urlStr, true))

//返回下面的值
// Url {
//   protocol: 'http:',
//   slashes: true,
//   auth: null,
//   host: 'localhost:3000',
//   port: '3000',
//   hostname: 'localhost',
//   hash: null,
//   search: '?hello+world=good+hello+%2B+hell',
//   query: { 'hello world': 'good hello + hell' },
//   pathname: '/',
//   path: '/?hello+world=good+hello+%2B+hell',
//   href: 'http://localhost:3000/?hello+world=good+hello+%2B+hell' }

可见NodeJS的处理也是没有问题,为什么我的程序里有问题呢?好吧,因为我们有些历史URL是GB2312的编码,为了对这些URL进行兼容,自己单独写了一个 URL parse的方法,没有处理+号的这种情况。

至此,真相大白,本周一篇目标达到,耶~


推荐文章