URL里的+
最近一个同学反馈用我做的某个工具的时候,明明form提交的值里是空格,但是拿到的是参数值里变成了+号, 回想一下并没有在form提交的时候做这种转换呀,那是哪儿转出来的呢?
做了一个小例子:
<html>
<head>
<title> 测试form </title>
</head>
<body>
<form action="/">
<input type="text" name="hello world" />
<input type="button" value="click" />
</form>
</body>
</html>
没有指定form的enctype,我们都知道form的默认enctype是application/x-www-form-urlencoded
,
如果在输入框里输入了空格,比如hello world
,点击click之后,页面中的链接就会变成:
http://localhost:3000/?hello+world=hello+world
这里没有任何后端服务,那么确实是浏览器把空格替换成了+
,那如果就是想在值里有一个+
怎么办?
输入hello+world
,得到的链接是:
http://localhost:3000/?hello+world=hello%2Bworld
+
会转成%2B
,+
号对应的unicode编码是43,十六进制是2B,在URL里用%2B表示也是正常
'+'.charCodeAt(0).toString(16) //2b
那这个转换行为是规范里的吗?找了一下,确实是https://www.w3.org/TR/html401/interact/forms.html#h-17.13.4.1,我把它摘抄一下:
application/x-www-form-urlencoded
This is the default content type. Forms submitted with this content type must be encoded as follows:
Control names and values are escaped. Space characters are replaced by ’+’, and then reserved characters are escaped as described in [RFC1738], section 2.2: Non-alphanumeric characters are replaced by ‘%HH’, a percent sign and two hexadecimal digits representing the ASCII code of the character. Line breaks are represented as “CR LF” pairs (i.e., ‘%0D%0A’). The control names/values are listed in the order they appear in the document. The name is separated from the value by ’=’ and name/value pairs are separated from each other by ’&‘.
对应我们这个问题就是,键与值都要被转义,空格要被转为+
,非字母与数字转为一个%号与两位16进制的ASCII值,这下明确了,就是浏览器按规范进行了转换。
那一般的框架在处理请求的时候应该也会对这个URL进行反转义,也就是把+
号转为空格,我们来测试一下NodeJS:
var urlModule = require('url')
var urlStr = 'http://localhost:3000/?hello+world=good+hello+%2B+hell'
console.log(urlModule.parse(urlStr, true))
//返回下面的值
// Url {
// protocol: 'http:',
// slashes: true,
// auth: null,
// host: 'localhost:3000',
// port: '3000',
// hostname: 'localhost',
// hash: null,
// search: '?hello+world=good+hello+%2B+hell',
// query: { 'hello world': 'good hello + hell' },
// pathname: '/',
// path: '/?hello+world=good+hello+%2B+hell',
// href: 'http://localhost:3000/?hello+world=good+hello+%2B+hell' }
可见NodeJS的处理也是没有问题,为什么我的程序里有问题呢?好吧,因为我们有些历史URL是GB2312的编码,为了对这些URL进行兼容,自己单独写了一个
URL parse的方法,没有处理+
号的这种情况。
至此,真相大白,本周一篇目标达到,耶~