javabyte数组转string(java byte类型相加)-昊阳知识网

string类型和[]byte类型是我们编程时最常使用到的数据结构。本文将探讨两者之间的转换方式，通过分析它们之间的内在联系来拨开迷雾。

两种转换方式

标准转换

go中string与[]byte的互换，相信每一位gopher都能立刻想到以下的转换方式，我们将之称为标准转换。

1//stringto[]byte2s1:=”hello”3b:=[]byte(s1)45//[]bytetostring6s2:=string(b)

强转换

通过unsafe和reflect包，可以实现另外一种转换方式，我们将之称为强转换（也常常被人称作黑魔法）。

1funcString2Bytes(sstring)[]byte{ 2sh:=(*reflect.StringHeader)(unsafe.Pointer(&s)) 3bh:=reflect.SliceHeader{ 4Data:sh.Data, 5Len:sh.Len, 6Cap:sh.Len, 7} 8return*(*[]byte)(unsafe.Pointer(&bh)) 9}1011funcBytes2String(b[]byte)string{12return*(*string)(unsafe.Pointer(&b))13}

性能对比

既然有两种转换方式，那么我们有必要对它们做性能对比。

1//测试强转换功能 2funcTestBytes2String(t*testing.T){ 3x:=[]byte(“HelloGopher!”) 4y:=Bytes2String(x) 5z:=string(x) 6 7ify!=z{ 8t.Fail() 9}10}1112//测试强转换功能13funcTestString2Bytes(t*testing.T){14x:=”HelloGopher!”15y:=String2Bytes(x)16z:=[]byte(x)1718if!bytes.Equal(y,z){19t.Fail()20}21}2223//测试标准转换string()性能24funcBenchmark_NormalBytes2String(b*testing.B){25x:=[]byte(“HelloGopher!HelloGopher!HelloGopher!”)26fori:=0;i<b.N;i {27_=string(x)28}29}3031//测试强转换[]byte到string性能32funcBenchmark_Byte2String(b*testing.B){33x:=[]byte(“HelloGopher!HelloGopher!HelloGopher!”)34fori:=0;i<b.N;i {35_=Bytes2String(x)36}37}3839//测试标准转换[]byte性能40funcBenchmark_NormalString2Bytes(b*testing.B){41x:=”HelloGopher!HelloGopher!HelloGopher!”42fori:=0;i<b.N;i {43_=[]byte(x)44}45}4647//测试强转换string到[]byte性能48funcBenchmark_String2Bytes(b*testing.B){49x:=”HelloGopher!HelloGopher!HelloGopher!”50fori:=0;i<b.N;i {51_=String2Bytes(x)52}53}

测试结果如下

1$gotest-bench=”.”-benchmem 2goos:darwin 3goarch:amd64 4pkg:workspace/example/stringBytes 5Benchmark_NormalBytes2String-83836341327.9ns/op48B/op1allocs/op 6Benchmark_Byte2String-810000000000.265ns/op0B/op0allocs/op 7Benchmark_NormalString2Bytes-83257708034.8ns/op48B/op1allocs/op 8Benchmark_String2Bytes-810000000000.532ns/op0B/op0allocs/op 9PASS10okworkspace/example/stringBytes3.170s

注意，-benchmem可以提供每次操作分配内存的次数，以及每次操作分配的字节数。

当x的数据均为”Hello Gopher!”时，测试结果如下

1$gotest-bench=”.”-benchmem 2goos:darwin 3goarch:amd64 4pkg:workspace/example/stringBytes 5Benchmark_NormalBytes2String-82459076744.86ns/op0B/op0allocs/op 6Benchmark_Byte2String-810000000000.266ns/op0B/op0allocs/op 7Benchmark_NormalString2Bytes-82023293865.92ns/op0B/op0allocs/op 8Benchmark_String2Bytes-810000000000.532ns/op0B/op0allocs/op 9PASS10okworkspace/example/stringBytes4.383s

强转换方式的性能会明显优于标准转换。

读者可以思考以下问题

1.为什么强转换性能会比标准转换好？

2.为什么在上述测试中，当x的数据较大时，标准转换方式会有一次分配内存的操作，从而导致其性能更差，而强转换方式却不受影响？

3.既然强转换方式性能这么好，为什么go语言提供给我们使用的是标准转换方式？

原理分析

要回答以上三个问题，首先要明白是string和[]byte在go中到底是什么。

[]byte

在go中，byte是uint8的别名，在go标准库builtin中有如下说明：

1//byteisanaliasforuint8andisequivalenttouint8inallways.Itis2//used,byconvention,todistinguishbytevaluesfrom8-bitunsigned3//integervalues.4typebyte=uint8

在go的源码中src/runtime/slice.go，slice的定义如下：

1typeslicestruct{2arrayunsafe.Pointer3lenint4capint5}

array是底层数组的指针，len表示长度，cap表示容量。对于[]byte来说，array指向的就是byte数组。

string

关于string类型，在go标准库builtin中有如下说明：

1//stringisthesetofallstringsof8-bitbytes,conventionallybutnot2//necessarilyrepresentingUTF-8-encodedtext.Astringmaybeempty,but3//notnil.Valuesofstringtypeareimmutable.4typestringstring

翻译过来就是：string是8位字节的集合，通常但不一定代表UTF-8编码的文本。string可以为空，但是不能为nil。string的值是不能改变的。

在go的源码中src/runtime/string.go，string的定义如下：

1typestringStructstruct{2strunsafe.Pointer3lenint4}

1//go:nosplit2funcgostringnocopy(str*byte)string{3ss:=stringStruct{str:unsafe.Pointer(str),len:findnull(str)}4s:=*(*string)(unsafe.Pointer(&ss))5returns6}

可以看到，入参str指针就是指向byte的指针，那么我们可以确定string的底层数据结构就是byte数组。

综上，string与[]byte在底层结构上是非常的相近（后者的底层表达仅多了一个cap属性，因此它们在内存布局上是可对齐的），这也就是为何builtin中内置函数copy会有一种特殊情况copy(dst []byte, src string) int的原因了。

1//Thecopybuilt-infunctioncopieselementsfromasourcesliceintoa2//destinationslice.(Asaspecialcase,italsowillcopybytesfroma3//stringtoasliceofbytes.)Thesourceanddestinationmayoverlap.Copy4//returnsthenumberofelementscopied,whichwillbetheminimumof5//len(src)andlen(dst).6funccopy(dst,src[]Type)int7区别

对于[]byte与string而言，两者之间最大的区别就是string的值不能改变。这该如何理解呢？下面通过两个例子来说明。

对于[]byte来说，以下操作是可行的：

1b:=[]byte(“HelloGopher!”)2b[1]=’T’

string，修改操作是被禁止的：

1s:=”HelloGopher!”2s[1]=’T’

而string能支持这样的操作：

1s:=”HelloGopher!”2s=”TelloGopher!”

那么，以下操作的含义是不同的：

1s:=”S1″//分配存储”S1″的内存空间，s结构体里的str指针指向这块内存2s=”S2″//分配存储”S2″的内存空间，s结构体里的str指针转为指向这块内存34b:=[]byte{1}//分配存储’1’数组的内存空间，b结构体的array指针指向这个数组。5b=[]byte{2}//将array的内容改为’2′

图解如下

因为string的指针指向的内容是不可以更改的，所以每更改一次字符串，就得重新分配一次内存，之前分配的空间还需要gc回收，这是导致string相较于[]byte操作低效的根本原因。

标准转换的实现细节

[]byte(string)的实现（源码在src/runtime/string.go中）

1//Theconstantisknowntothecompiler. 2//Thereisnofundamentaltheorybehindthisnumber. 3consttmpStringBufSize=32 4 5typetmpBuf[tmpStringBufSize]byte 6 7funcstringtoslicebyte(buf*tmpBuf,sstring)[]byte{ 8varb[]byte 9ifbuf!=nil&&len(s)<=len(buf){10*buf=tmpBuf{}11b=buf[:len(s)]12}else{13b=rawbyteslice(len(s))14}15copy(b,s)16returnb17}1819//rawbytesliceallocatesanewbyteslice.Thebytesliceisnotzeroed.20funcrawbyteslice(sizeint)(b[]byte){21cap:=roundupsize(uintptr(size))22p:=mallocgc(cap,nil,false)23ifcap!=uintptr(size){24memclrNoHeapPointers(add(p,uintptr(size)),cap-uintptr(size))25}2627*(*slice)(unsafe.Pointer(&b))=slice{p,size,int(cap)}28return29}

这里有两种情况：s的长度是否大于32。当大于32时，go需要调用mallocgc分配一块新的内存（大小由s决定），这也就回答了上文中的问题2：当x的数据较大时，标准转换方式会有一次分配内存的操作。

最后通过copy函数实现string到[]byte的拷贝，具体实现在src/runtime/slice.go中的slicestringcopy方法。

1funcslicestringcopy(to[]byte,fmstring)int{ 2iflen(fm)==0||len(to)==0{ 3return0 4} 5 6//copy的长度取决与string和[]byte的长度最小值 7n:=len(fm) 8iflen(to)<n{ 9n=len(to)10}1112//如果开启了竞态检测-race13ifraceenabled{14callerpc:=getcallerpc()15pc:=funcPC(slicestringcopy)16racewriterangepc(unsafe.Pointer(&to[0]),uintptr(n),callerpc,pc)17}18//如果开启了memorysanitizer-msan19ifmsanenabled{20msanwrite(unsafe.Pointer(&to[0]),uintptr(n))21}2223//该方法将string的底层数组从头部复制n个到[]byte对应的底层数组中去（这里就是copy实现的核心方法，在汇编层面实现源文件为memmove_*.s）24memmove(unsafe.Pointer(&to[0]),stringStructOf(&fm).str,uintptr(n))25returnn26}

copy实现过程图解如下

string([]byte)的实现（源码也在src/runtime/string.go中）

1//Bufisafixed-sizebufferfortheresult, 2//itisnotniliftheresultdoesnotescape. 3funcslicebytetostring(buf*tmpBuf,b[]byte)(strstring){ 4l:=len(b) 5ifl==0{ 6//Turnsouttobearelativelycommoncase. 7//Considerthatyouwanttoparseoutdatabetweenparensin”foo()bar”, 8//youfindtheindicesandconvertthesubslicetostring. 9return””10}11//如果开启了竞态检测-race12ifraceenabled{13racereadrangepc(unsafe.Pointer(&b[0]),14uintptr(l),15getcallerpc(),16funcPC(slicebytetostring))17}18//如果开启了memorysanitizer-msan19ifmsanenabled{20msanread(unsafe.Pointer(&b[0]),uintptr(l))21}22ifl==1{23stringStructOf(&str).str=unsafe.Pointer(&staticbytes[b[0]])24stringStructOf(&str).len=125return26}2728varpunsafe.Pointer29ifbuf!=nil&&len(b)<=len(buf){30p=unsafe.Pointer(buf)31}else{32p=mallocgc(uintptr(len(b)),nil,false)33}34stringStructOf(&str).str=p35stringStructOf(&str).len=len(b)36//拷贝字节数组至字符串37memmove(p,(*(*slice)(unsafe.Pointer(&b))).array,uintptr(len(b)))38return39}4041//实例stringStruct对象42funcstringStructOf(sp*string)*stringStruct{43return(*stringStruct)(unsafe.Pointer(sp))44}

可见，当数组长度超过32时，同样需要调用mallocgc分配一块新内存。最后通过memmove完成拷贝。

强转换的实现细节

1. 万能的unsafe.Pointer指针

而string和slice在reflect包中，对应的结构体是reflect.StringHeader和reflect.SliceHeader，它们是string和slice的运行时表达。

1typeStringHeaderstruct{ 2Datauintptr 3Lenint 4} 5 6typeSliceHeaderstruct{ 7Datauintptr 8Lenint 9Capint10}

2. 内存布局

从string和slice的运行时表达可以看出，除了SilceHeader多了一个int类型的Cap字段，Date和Len字段是一致的。所以，它们的内存布局是可对齐的，这说明我们就可以直接通过unsafe.Pointer进行转换。

[]byte转string图解

string转[]byte图解

Q&A

Q1.为什么强转换性能会比标准转换好？

对于标准转换，无论是从[]byte转string还是string转[]byte都会涉及底层数组的拷贝。而强转换是直接替换指针的指向，从而使得string和[]byte指向同一个底层数组。这样，当然后者的性能会更好。

Q2.为什么在上述测试中，当x的数据较大时，标准转换方式会有一次分配内存的操作，从而导致其性能更差，而强转换方式却不受影响？

标准转换时，当数据长度大于32个字节时，需要通过mallocgc申请新的内存，之后再进行数据拷贝工作。而强转换只是更改指针指向。所以，当转换数据较大时，两者性能差距会愈加明显。

Q3.既然强转换方式性能这么好，为什么go语言提供给我们使用的是标准转换方式？

首先，我们需要知道Go是一门类型安全的语言，而安全的代价就是性能的妥协。但是，性能的对比是相对的，这点性能的妥协对于现在的机器而言微乎其微。另外强转换的方式，会给我们的程序带来极大的安全隐患。

如下示例

1a:=”hello”2b:=String2Bytes(a)3b[0]=’H’

a是string类型，前面我们讲到它的值是不可修改的。通过强转换将a的底层数组赋给b，而b是一个[]byte类型，它的值是可以修改的，所以这时对底层数组的值进行修改，将会造成严重的错误（通过defer recover也不能捕获）。

1unexpectedfaultaddress0x10b61392fatalerror:fault3[signalSIGBUS:buserrorcode=0x2addr=0x10b6139pc=0x1088f2c]

Q4. 为什么string要设计为不可修改？

我认为有必要思考一下该问题。string不可修改，意味它是只读属性，这样的好处就是：在并发场景下，我们可以在不加锁的控制下，多次使用同一字符串，在保证高效共享的情况下而不用担心安全问题。

取舍场景

在你不确定安全隐患的条件下，尽量采用标准方式进行数据转换。

当程序对运行性能有高要求，同时满足对数据仅仅只有读操作的条件，且存在频繁转换（例如消息转发场景），可以使用强转换。

javabyte数组转string(java byte类型相加)

相关推荐

发表评论