Li_阴宅

这个屌丝很懒，什么也没留下！

热门标签

C++小白的逆袭之路——初阶（第八章下：string类模拟实现）

作者：Li_阴宅 | 2024-08-04 21:27:46

踩

1.浅谈一下字符编码
2.一步一步模拟实现string
3.string模拟实现代码整合

1.浅谈一下字符编码

我们常说的ASCII编码表，实际上是早期，美国根据自己的语言和符号，制定的一个和计算机二进制编码，一一对应的表格，全名叫American Standard Code for Information Interchange。

在这里插入图片描述

它一共收录了128个常用字符，其中最核心的就是包括大小写英文字母在内的52个字符。但是，随着计算机的推广，渐渐有了用二进制编码表示各国符号文字的需求。

中国的汉字上万，想用一个字节（8bit）来存储显然是不够的，至少也要两个字节，能表示2^16次方个字符，但是有一些生僻字仍然无法表示，还是不够用。而且不止中国有这种需求，各个国家都需要对应的编码表来表示自己国家的符号和文字。

于是，有人就研制出统一码（Unicode），也叫万国码，为每一种语言中的每个字符都设定了统一并且唯一的二进制编码。基于万国码，又细分出了很多不同的方案。

打个比方，像中国的文字就比较多，给一个字节的大小来表示汉字就很难表示的下。但是其他国家的文字可能没有中国那么多，它们可以用一个字节就表示出所有文字，并不需要那么大空间。

基于以上种种问题，主要产生了三类方案，叫UTF系列：UTF-8、UTF-16、UTF-32。下面我们主要看UTF-8：

在这里插入图片描述

首先，UTF-8兼容ASCII，一个字节编，格式是0开头。其次，相对常见的汉字用两个字节编，第一个字节开头是110，第二个字节开头10；生僻一点的汉字用三个字节编，格式在表格中；再生僻一点的汉字就用四个字节来编。这里我们只需要记住一点，常见的汉字都用两个字节来编。

可以发现，UTF-8的格式并不统一，是一个变长的编码。然而有些时候，我们比较需要统一的格式，做文字工作的时候也并不需要兼容ASCII，这时又出现了UTF-16和UTF-32，它们的格式就比较统一了。UTF-32不管你每个值是多大，都用四个字节统一表示，常见的汉字和不常见的汉字都编到一起，但是比较浪费空间。UTF-16又进行了一些折中，具体的编码方式大家可以自行查阅，这里不再介绍了。

基于上述几种不同的编码方式，C++在早期又搞出了wchar_t的类型，它是一个变长字符串类型，一个字符就占两个字节。wstring使用来存储wchar_t的容器。

wchar_t ch;
cout << sizeof(ch) << endl;	// 大小为2字节
1
2

后来C++11觉得wchar_t不是很规范，就又搞出了char16_t和char32_t。char16_t一个字符两个字节，char32_t一个字符四个字节。

在这里插入图片描述

平常我们用UTF-8用的最多，string类型就是适合存储用UTF-8编成的字符串，字符类型是char；u16string适合存UTF-16编成的字符串，字符类型是char16_t；u32string适合存UTF-32变成的字符串，字符类型是char32_t。

UTF系列是适用于全世界的编码表，但是中华文化博大精深，为了更贴合汉字，中国自己又搞出了gbk编码。windows很懂中国，windows下的很多东西默认就是gbk编码。而Linux下更多使用的则是UTF-8。

上面这些知识，我们在日常的学习中一般不会碰到。但是我们以后可能会做一些国际业务，就需要用到其他的编码方式。并且windows下的一些接口也涉及char16_t或char32_t的字符串，在windows编程中可能会用到。

2.一步一步模拟实现string

在模拟实现的过程中，我们选择用一个自己定义的命名空间，将库中的string和我们自己写的string区分开来。同时，采用将成员函数写在类内的方式定义成员函数，都写在string.h头文件中，不将声明和定义分离。测试文件命名为string_test.cpp。

2.1实现构造函数、析构函数、拷贝构造函数

namespace LHY
{
	class string
	{
	public:
		string()	// 处理空字符串的情况
			:_str(new char[1]{'\0'})	// 默认开一个字节空间，放`\0`
			,_size(0)
			,_capacity(0)
		{}

		string(const char* str)		// 用常量字符串来初始化
			:_size(strlen(str))
			,_capacity(_size)
		{
			_str = new char[_capacity + 1];	// 加一是要多存一个'\0'
			strcpy(_str, str);				// strcpy会拷贝`\0`
		}

		string(const string& s)
		{
			_str = new char[s.capacity() + 1];	// capacity这个函数后面会讲，功能就是返回s的容量
			strcpy(_str, s._str);
			_size = s._size;
			_capacity = s._capacity;
		}
		
		~string()
		{
			delete[] _str;
			_str = nullptr;
			_size = _capacity = 0;
		}

		const char* c_str() const	// 暂时没有重载流插入，用这个函数配合默认的流插入来打印数据
		{
			return _str;
		}

		// ...

	private:
		char* _str;
		size_t _size;
		size_t _capacity;
	};
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

测试：

int main()
{
	// 测试构造函数
	LHY::string s1("hello world");
	cout << s1.c_str() << endl;
	
	// 测试拷贝构造
	LHY::string s2(s1);
	cout << s2.c_str() << endl;
	return 0;
}
1
2
3
4
5
6
7
8
9
10
11

几个注意的点：

构造函数要写两个，一个无参的，一个有参的，参数是字符类型指针。并且要注意考虑末尾的\0。
初始化列表的执行顺序是按照声明的顺序执行的，这一点尤其要注意。例如，在写有参的构造函数时，不能将_str = new char[_capacity + 1];这段代码写在初始化列表中，因为_str是先声明的，如果放在初始化列表中会先执行这段代码，然而此时_capacity还未定义，值是未知的，所以给字符串开辟的空间也是未知的。
不能直接把str赋值_str，涉及权限放大。

2.2模拟重载[]

namespace LHY
{
	class string
	{
	public:
		size_t size() const
		{
			return _size;
		}

		size_t capacity() const
		{
			return _capacity;
		}

		char& operator[](size_t pos)	// 查看加修改
		{
			assert(pos < _size);
			return _str[pos];
		}

		const char& operator[](size_t pos) const	// 仅供查看，不能修改，与const类型适配
		{
			assert(pos < _size);
			return _str[pos];
		}

		// ...

	private:
		char* _str;
		size_t _size;
		size_t _capacity;
	};
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

测试：

int main()
{
	LHY::string s1("hello world");
	
	const LHY::string s2("hello world");
	// 遍历，测试重载const []
	for (size_t i = 0; i < s2.size(); i++)
	{
		cout << s2[i] << " ";
	}
	cout << endl;

	// 遍历加修改，测试重载[]
	for (size_t i = 0; i < s1.size(); i++)
	{
		s1[i] = '*';
		cout << s1[i] << " ";
	}
	cout << endl;

	return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

注意：

要提供两个重载的[]函数，一个只读，一个可写可读，要包含传const类型的情况。而且为了使用遍历，还要提供一个size()函数返回大小，顺便把capacity()函数也实现了。

2.3模拟迭代器

namespace LHY
{
	class string
	{
	public:
		typedef char* iterator;
		typedef const char* const_iterator;

		iterator begin()
		{
			return _str;
		}

		const_iterator begin() const
		{
			return _str;
		}

		iterator end()
		{
			return _str + _size;	// 指向'\0'
		}

		const_iterator end() const
		{
			return _str + _size;
		}

		// ...

	private:
		char* _str;
		size_t _size;
		size_t _capacity;
	};
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36

测试：

int main()
{
	LHY::string s1("hello world");

	const LHY::string s2("hello world");

	// 测试迭代器
	LHY::string::iterator it = s1.begin();
	while (it != s1.end())
	{
		cout << *it << " ";
		++it;
	}
	cout << endl;
	
	// 测试const迭代器
	LHY::string::const_iterator cit;
	cit = s2.begin();
	while (cit != s2.end())
	{
		cout << *cit << " ";
		++cit;
	}
	cout << endl;

	for (auto ch : s1)	// 范围for底层完全是迭代器，并且有非常严格的规范
	{
		cout << ch << " ";
	}
	cout << endl;
	// 相当于
	/*it = s1.begin();
	while (it != s1.end())
	{
		auto ch = *it;
		cout << ch << " ";
		++it;
	}
	cout << endl;*/

	for (auto& ch : s1)
	{
		ch = '*';
		cout << ch << " ";
	}
	cout << endl;
	// 相当于
	/*it = s1.begin();
	while (it != s1.end())
	{
		auto& ch = *it;
		ch = '*';
		cout << ch << " ";
		++it;
	}
	cout << endl;*/

	return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59

注意：

需要实现两种类型的迭代器，一种是char*类型，一种是const char*类型。
范围for是傻瓜式地替换成迭代器，稍微改一改就编不过。比如在定义时将begin()写成Begin()，迭代器能跑，但是范围for跑不了，会显示找不到begin()。

2.4模拟push_back和append，顺便重载+=

namespace LHY
{
	class string
	{
	public:
		void reserve(size_t n)
		{
			if (n > _capacity)	// reserve在n < _capacity的情况下不缩容也不用扩容
			{
				char* tmp = new char[n + 1];	// 多开一个空间给'\0'
				strcpy(tmp, _str);
				delete[] _str;
				_str = tmp;
				_capacity = n;
			}
		}

		void push_back(char ch)
		{
			if (_size == _capacity)
			{
				reserve(_capacity == 0 ? 4 : _capacity * 2);	// 这里要用一个三目操作符，解决_capacity为0的情况
			}

			_str[_size] = ch;
			++_size;
			_str[_size] = '\0';
		}

		void append(const char* str)
		{
			size_t len = strlen(str);
			if (_size + len > _capacity)	// 这里不要有扩二倍的想法，因为可能不够
			{
				reserve(_size + len);
			}

			strcpy(_str + _size, str);
			_size += len;
		}

		string& operator+=(char ch)
		{
			push_back(ch);
			return *this;
		}

		string& operator+=(const char* str)
		{
			append(str);
			return *this;
		}

		// ...

	private:
		char* _str;
		size_t _size;
		size_t _capacity;
	};
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62

测试：

int main()
{
	LHY::string s1 = "hello world";
	s1.push_back('x');
	cout << s1.c_str() << endl;
	s1.append("yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy");
	cout << s1.c_str() << endl;

	LHY::string s2("hello world");
	s2 += 'x';
	cout << s2.c_str() << endl;
	s2 += "yyyyyyyyyyyyyyyyyy";
	cout << s2.c_str() << endl;
	
	LHY::string s3;
	s3 += 'x';
	cout << s3.c_str() << endl;
	s3 += "hello world";
	cout << s3.c_str() << endl;
	
	return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

注意：

写push_back()和append()时有一个关键问题，就是容量不够的时候要扩容，这时我们就要再实现一个reserve()。
reserve()在扩容时要多new一个字节的空间，放\0。
在append()时，容量不够，不要直接扩二倍，因为可能不够，直接扩_size + len即可。
在写reserve()时不能不判断n是否大于_capacity，因为库中的reserve()在n <= _capacity时是不做处理的，要和库中保持一致。
在写push_back()时，如果要尾插的字符串是一个空字符串，_capacity为0，则需要特殊处理，直接给4个字节的空间。因为如果不给的话，_capacity * 2还是0，扩容扩了个寂寞。

2.5模拟insert

先看一段有问题的insert()：

namespace LHY
{
	class string
	{
	public:
		void insert(size_t pos, char ch)
		{
			assert(pos <= _size);		// 等于_size就是尾插
			if (_size == _capacity)
			{
				reserve(_capacity * 2);
			}

			size_t end = _size;
			while (end >= pos)
			{
				_str[end + 1] = _str[end];
				--end;
			}
			_str[pos] = ch;
			_size++;
		}

		// ...

	private:
		char* _str;
		size_t _size;
		size_t _capacity;
	};
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31

测试：

int main()
{
	LHY::string s = "hello world";
	s.insert(s.size(), '%');
	cout << s.c_str() << endl;

	s.insert(5, '%');
	cout << s.c_str() << endl;

	s.insert(0, '%');
	cout << s.c_str() << endl;

	return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14

发现尾插和中间的插入都可以，但是头插崩了，这是为什么？

在这里插入图片描述
按照上面代码的逻辑，我们是想让end一直--，此时pos等于0，当end小于pos时，等于-1时，在pos位置插入数据，循环停止。但是，end可以等于-1吗？显然是不可以的，end的类型是size_t无符号的整型，如果让end等于-1，编译器会把end理解成一个非常大的数，是无符号整型的最大值。所以end永远无法小于pos，死循环。

有同学可能想，将end的类型改成int不就行了吗？答案是还是不行。因为pos的类型是size_t，编译器在判断end >= pos时，会做整型提升，让end提升成size_t类型。所以解决方案就只有两种，一种是直面整型提升，将pos在比较时强转成int：end >= (int)pos；另一种是让end直接指向_size + 1，\0的后一个位置，然后让_str[end] = _str[end - 1]，当end等于pos时，循环停止。

namespace LHY
{
	class string
	{
	public:
		void insert(size_t pos, char ch)
		{
			assert(pos <= _size);		// 等于_size就是尾插
			if (_size == _capacity)
			{
				reserve(_capacity * 2);
			}

			size_t end = _size + 1;
			while (end > pos)
			{
				_str[end] = _str[end - 1];
				--end;
			}
			_str[pos] = ch;
			_size++;
		}

		// 这个重载不详细讲了，大家可以自己试着实现一下，锻炼一下自己的编码能力
		void insert(size_t pos, const char* str)
		{
			assert(pos <= _size);
			size_t len = strlen(str);
			if (_size + len > _capacity)
			{
				reserve(_size + len);
			}

			// 挪数据
			size_t end = _size + 1;
			while (end > pos)
			{
				_str[end + len - 1] = _str[end - 1];
				--end;
			}

			// 插入
			for (size_t i = 0; i < len; i++)
			{
				_str[pos++] = str[i];
			}

			_size += len;
		}

		// ...

	private:
		char* _str;
		size_t _size;
		size_t _capacity;
	};
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58

测试：

int main()
{
	LHY::string s = "hello world";
	s.insert(s.size(), '%');
	cout << s.c_str() << endl;

	s.insert(5, '%');
	cout << s.c_str() << endl;

	s.insert(0, '%');
	cout << s.c_str() << endl;

	s.insert(0, "xxx");
	cout << s.c_str() << endl;

	s.insert(s.size(), "xxxxxxxxxxxxxxxxxxxxxxxxxx");
	cout << s.c_str() << endl;

	s.insert(5, "xx");
	cout << s.c_str() << endl;
 	
 	return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

2.6模拟erase

想要模拟erase()，首先要模拟实现npos。

namespace LHY
{
	class string
	{
	public:
	
		// ...
		
	private:
		char* _str;
		size_t _size;
		size_t _capacity;

	public:	// npos可能会显示的调用，所以用public修饰
		// const static size_t npos = -1; // 特例，只有const修饰的静态整型才可以在类内初始化
		// const static double npos = 1.1; // 不支持
		const static size_t npos;
	};

	const size_t string::npos = -1;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

我们都知道，静态成员变量是不会走初始化列表的，不能在声明处直接给值。所以只能在类内声明，类外定义。但是const修饰的静态整型是一个例外，它可以在类内定义，可以直接在声明处给值。这样的语法实际上是很不明所以的，换成const static double npos = 1;都编不过，相当于给整型开了个特例。

namespace LHY
{
	class string
	{
	public:
		void erase(size_t pos, size_t len = npos)
		{
			assert(pos < _size);
			if (len == npos || pos + len >= _size)
			{
				_str[pos] = '\0';
				_size = pos;
			}
			else
			{
				size_t begin = pos + len;
				while (begin <= _size)
				{
					_str[begin - len] = _str[begin];
					begin++;
				}
				_size -= len;
			}
		}

		// ...

	private:
		char* _str;
		size_t _size;
		size_t _capacity;
		
	public:	// npos可能会显示的调用，所以用public修饰
	
		//const static size_t npos = -1; // 特例，只有const修饰的静态整型才可以在类内初始化
		const static size_t npos;
	};

	const size_t string::npos = -1;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

测试：

int main()
{
	LHY::string s("hello world");
	s.erase(0, 3);
	cout << s.c_str() << endl;

	s.erase(6, 100);
	cout << s.c_str() << endl;

	s.erase(1);
	cout << s.c_str() << endl;

	return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14

2.7模拟比较运算符重载

namespace LHY
{
	class string
	{
	public:
		bool operator<(const string& s) const
		{
			return strcmp(_str, s.c_str()) < 0;
		}

		bool operator==(const string& s) const
		{
			return strcmp(_str, s.c_str()) == 0;
		}

		bool operator<=(const string& s) const
		{
			return *this < s || *this == s;
		}

		bool operator>(const string& s) const
		{
			return !(*this <= s);
		}

		bool operator>=(const string& s) const
		{
			return !(*this < s); 
		}

		// ...
		
	private:
		char* _str;
		size_t _size;
		size_t _capacity;

	public:	// npos可能会显示的调用，所以用public修饰

		//const static size_t npos = -1; // 特例，只有const修饰的静态整型才可以在类内初始化
		const static size_t npos;
	};

	const size_t string::npos = -1;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45

测试：

int main()
{
	LHY::string s1 = "zhangsan";
	LHY::string s2("lisi");

	cout << (s1 < s2) << endl;
	cout << (s1 <= s2) << endl;
	cout << (s1 > s2) << endl;
	cout << (s1 == s2) << endl;
	cout << (s1 >= s2) << endl;

	return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13

我们在重载比较运算符时，要注意复用代码，利用好之前写好的函数。

2.8模拟重载流插入和流提取

流插入和流提取不能重载为成员函数，要写在类外，在类外声明在类外定义。

看一段错误的流提取写法：

istream& operator>>(istream& in, string& s)
{
	char ch;
	in >> ch;
	while (ch != ' ' && ch != '\n')
	{
		s += ch;
		in >> ch;
	}
	return in;
}
1
2
3
4
5
6
7
8
9
10
11

很多同学想当然的就把流提取重载写成了这样，发现黑框框会像一个无底洞一样，一直让你输入，不会停止，陷入死循环，这是为什么？通过调试可以发现，ch无法提取到空格或者\n，导致死循环。

回忆一下C语言中，如何获取到空格字符。用scanf()是不行的，因为scanf()这个函数将空格和换行符认为是不同数据间的分隔符。要想提取到空格或者换行符，可是使用函数getcahr()。

C++中也是同理，我们要想提取到空格，需要用到istream类中的一个函数get()，它的作用就类似于getchar()，可以帮助我们提取到空格和换行符。

namespace LHY
{
	class string
	{
	public:
		void clear()
		{
			_str[0] = '\0';
			_size = 0;
		}

		// ...

	private:
		char* _str;
		size_t _size;
		size_t _capacity;

	public:	// npos可能会显示的调用，所以用public修饰

		//const static size_t npos = -1; // 特例，只有const修饰的静态整型才可以在类内初始化
		const static size_t npos;
	};

	const size_t string::npos = -1;

	ostream& operator<<(ostream& out, const string& s)
	{
		/*for (size_t i = 0; i < s.size(); i++)
		{
			out << s[i];
		}
		return out;*/
		for (auto ch : s)
			out << ch;
		return out;
	}

	istream& operator>>(istream& in, string& s)
	{
		s.clear();		// 先把原来的数据清空
		char ch;
		ch = in.get();
		while (ch != ' ' && ch != '\n')
		{
			s += ch;
			ch = in.get();
		}
		return in;
	}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

测试：

int main()
{
	LHY::string s("hello world");
	cout << s << endl;

	cin >> s;
	cout << s << endl;

	return 0;
}
1
2
3
4
5
6
7
8
9
10

注意：

对于string类来说，流插入和流提取的重载不需要友元声明。因为使用重载的[]就可以访问到要输出的成员变量。
在流提取的时候要先把原来的数据清空，然后再进行提取。因为我们使用了+=这个重载，如果不清空数据，这就是一个尾插的逻辑。
流插入可以使用范围for来简化语法。

优化流提取的扩容：

按照上述流提取的写法，如果输入的字符串很大，s可能会经历很多次扩容，能不能减少扩容次数，进行一些优化？看下面一段代码：

istream& operator>>(istream& in, string& s)
{
	s.clear();

	char buff[129];	// 129是个数
	size_t i = 0;

	char ch;
	ch = in.get();
	while (ch != ' ' && ch != '\n')
	{
		buff[i++] = ch;
		if (i == 128)		// i是下标，i等于128时指向的是buff中的第129个数据
		{
			buff[i] = '\0';
			s += buff;
			i = 0;
		}

		ch = in.get();
	}
	
	if (i != 0)
	{
		buff[i] = '\0';
		s += buff;
	}

	return in;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

上面代码的逻辑是：提取够128个有效数据，扩容一次（执行一次+=），用if处理一下不够128个有效数据（比如只有100个数据）的情况，和解决多余的字符（比如有效数据有200个，处理剩下的72个）。

2.9模拟resize

namespace LHY
{
	class string
	{
	public:
		void resize(size_t n, char ch = '\0')
		{
			if (n <= _size)
			{
				_str[n] = '\0';
				_size = n;
			}
			else
			{
				reserve(n);
				while (_size < n)
				{
					_str[_size++] = ch;
				}

				_str[_size] = '\0';
			}
		}

		// ...
		
	private:
		char* _str;
		size_t _size;
		size_t _capacity;

	public:	// npos可能会显示的调用，所以用public修饰

		//const static size_t npos = -1; // 特例，只有const修饰的静态整型才可以在类内初始化
		const static size_t npos;
	};

	const size_t string::npos = -1;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

测试：

int main()
{
	LHY::string s("hello world");
	s.resize(5);
	cout << s << endl;

	s.resize(7, 'x');
	cout << s << endl;

	return 0;
}
1
2
3
4
5
6
7
8
9
10
11

2.10模拟赋值

s1和s2是两个LHY::string类型的对象，将s1赋值给s2，有同学可能就会考虑到容量的问题，万一s2的容量不够，是不是要扩容？要是s2容量太大了，用不用缩容？其实，在赋值这个地方考虑容量这些复杂的情况，就是自找麻烦，因为异地的挪动数据不可避免（除非s1和s2容量一样）。我们不如直接统一处理，统一将s2先释放，然后再为s2新开一块和s1容量一样大的空间，再将数据一一拷贝。

namespace LHY
{
	class string
	{
	public:
		string& operator=(const string& s)
			{
				if (this != &s)		// 不能自己给自己赋值
				{
					char* tmp = new char[s._capacity + 1];
					strcpy(tmp, s._str);
					delete[] _str;
					_str = tmp;
					_size = s._size;
					_capacity = s._capacity;
				}

				return *this;
			}
			
		// ...
		
	private:
		char* _str;
		size_t _size;
		size_t _capacity;

	public:	// npos可能会显示的调用，所以用public修饰

		//const static size_t npos = -1; // 特例，只有const修饰的静态整型才可以在类内初始化
		const static size_t npos;
	};

	const size_t string::npos = -1;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

测试：

int main()
{
	LHY::string s1("hello world");
	LHY::string s2;
	LHY::string s3;

	s3 = s2 = s1;

	cout << s3 << endl;
	cout << s2 << endl;

	return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13

注意：

不能自己给自己赋值，判断this != &s。

2.11find系列

namespace LHY
{
	class string
	{
	public:
		size_t find(char ch, size_t pos = 0)	// 从pos位置开始找字符ch
		{
			assert(pos < _size);
			for (size_t i = pos; i < _size; i++)
			{
				if (_str[i] == ch)
				{
					return i;
				}
			}

			return npos;		// 找不到
		}

		size_t find(const char* sub, size_t pos = 0)	// 从pos位置开始找子串sub
		{
			const char* p = strstr(_str + pos, sub);	// 返回子串第一次出现位置的指针，找不到就返回空指针
			if (p)
			{
				return p - _str;
			}
			else
			{
				return npos;
			}
		}
		
		// ...
		
	private:
		char* _str;
		size_t _size;
		size_t _capacity;

	public:	// npos可能会显示的调用，所以用public修饰

		//const static size_t npos = -1; // 特例，只有const修饰的静态整型才可以在类内初始化
		const static size_t npos;
	};

	const size_t string::npos = -1;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

测试：

int main()
{
	LHY::string s("hello world");
	cout << s.find('l') << endl;
	cout << s.find('x') << endl;
	return 0;
}
1
2
3
4
5
6
7

2.12模拟substr

namespace LHY
{
	class string
	{
	public:
		string substr(size_t pos, size_t len = npos)	// 从pos位置开始，取len个字符
		{
			string s;
			size_t end = pos + len;
			if (len == npos || pos + len >= _size)	// 有多少取多少
			{
				len = _size - pos;
				end = _size;
			}
			
			s.reserve(len);		// 提前开好空间
			for (size_t i = pos; i < end; i++)
			{
				s += _str[i];
			}

			return s;
		}
		
		// ...
		
	private:
		char* _str;
		size_t _size;
		size_t _capacity;

	public:	// npos可能会显示的调用，所以用public修饰

		//const static size_t npos = -1; // 特例，只有const修饰的静态整型才可以在类内初始化
		const static size_t npos;
	};

	const size_t string::npos = -1;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

测试：

int main()
{
	LHY::string s = "https://blog.csdn.net/weixin_73870552?spm=1000.2115.3001.5343";

	LHY::string sub1, sub2, sub3;
	size_t i1 = s.find(':');
	if (i1 != string::npos)		// 如果find函数找不到目标字符，就会返回npos
		sub1 = s.substr(0, i1);
	else
		cout << "':'no found" << endl;

	size_t i2 = s.find('/', i1 + 3);	// 从i1 + 3的位置开始查找
	if (i2 != string::npos)
		sub2 = s.substr(i1 + 3, i2 - (i1 + 3));	// 左闭右开，右开减左闭就是数据个数
	else
		cout << "'/'no found" << endl;

	sub3 = s.substr(i2 + 1);

	cout << sub1 << endl;
	cout << sub2 << endl;
	cout << sub3 << endl;

	return 0;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

2.13模拟swap

namespace LHY
{
	class string
	{
	public:
		void swap(string& s)
		{
			std::swap(_str, s._str);	// 直接交换指针
			std::swap(_size, s._size);
			std::swap(_capacity, s._capacity);
		}
		
		// ...
		
	private:
		char* _str;
		size_t _size;
		size_t _capacity;

	public:	// npos可能会显示的调用，所以用public修饰

		//const static size_t npos = -1; // 特例，只有const修饰的静态整型才可以在类内初始化
		const static size_t npos;
	};

	const size_t string::npos = -1;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

测试：

int main()
{
	LHY::string s1("hello world");
	LHY::string s2("xxx");
	s2.swap(s1);
	cout << s2 << endl;

	return 0;
}
1
2
3
4
5
6
7
8
9

有了swap函数，我们可以在很多地方都复用这个swap，将原来很多需要自己完成的工作，借助swap，封装起来。

1.现代版的拷贝构造函数

namespace LHY
{
	class string
	{
	public:
		// 现代版拷贝构造
		string(const string& s)
			:_str(nullptr)
			,_size(0)
			,_capacity(0)
		{
			string tmp(s._str);		// 构造函数
			swap(tmp);				// this -> swap(tmp);
		}

		// ...
		
	private:
		char* _str;
		size_t _size;
		size_t _capacity;
	};
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

我们来分析一下这个拷贝构造：我们直接利用构造函数，先构造了一个临时变量tmp。然后利用swap函数，将tmp和this指针指向的对象交换。而且不能不初化_str，并且保险起见把_size和_capacity也一并初始化了，最好把它们都初始化成0。因为如果我们不写初始化，内置类型编译器默认是不做处理的，_str就默认是随机值，我们把这个随机值交换给tmp后，tmp出了作用域要调析构函数，释放一个随机空间，很可能会崩。

传统的写法中我们要自己开空间，自己拷贝内容，这个现代版的拷贝构造就把这些工作全部交给了别人（swap和构造函数）。

2.现代版的赋值

namespace LHY
{
	class string
	{
	public:
		// 现代赋值 s2 = s1
		string& operator=(const string& s)
		{
			if (this != &s)
			{
				string tmp(s);	// 这里调用拷贝构造
				swap(tmp);
			}

			return *this;
		}

		// ...
		
	private:
		char* _str;
		size_t _size;
		size_t _capacity;
	};
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

分析这个赋值：假如现在有两个LHY::string类型对象s1、s2，我们让s2 = s1。按照传统写法，我们需要自己先创建一个临时变量tmp存储s1字符部分的数据，然后再释放s2的_str原来指向的空间，把tmp赋值给s2，再将s1的_size和_capacity给到s2，这些工作都需要我们自己完成。

现在，我们直接调拷贝构造，让s1中的所有数据给到tmp，然后再交换tmp和s2，最终tmp出作用域调用析构函数销毁，还不用我们自己释放s2字符部分的数据，爽的起飞。

3.极致的现代赋值

namespace LHY
{
	class string
	{
	public:
		string& operator=(string tmp)
		{
			swap(tmp);

			return *this;
		}

		// ...
		
	private:
		char* _str;
		size_t _size;
		size_t _capacity;
	};
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

不用判断是否自己给自己赋值，而且这种现代写法是通用的，我们只要写好了拷贝构造就可以这样来赋值，对所有的类都是可行的。

3.string模拟实现代码整合

这里我们选择将声明和定义分开来写，都写在同一个命名空间中，不分文件写。

.h头文件：

#pragma once
#include<iostream>
#include<stdio.h>
#include<assert.h>
using namespace std;

namespace LHY
{
	class string
	{
	public:
		// 迭代器
		typedef char* iterator;
		typedef const char* const_iterator;

		iterator begin() { return _str; }
		iterator end() { return _str + _size; }
		const_iterator begin() const { return _str; }
		const_iterator end() const { return _str + _size; }

		// size()和capacity()
		size_t size() const { return _size; }
		size_t capacity() const { return _capacity; }

		const char* c_str() const { return _str; }

		// 扩容
		void reserve(size_t n);
		void resize(size_t n, char ch);

		// 插入删除
		void push_back(char ch);
		void append(const char* str);
		void insert(size_t pos, char ch);
		void insert(size_t pos, const char* str);
		void erase(size_t pos, size_t len);

		// 运算符重载
		char& operator[](size_t pos);
		const char& operator[](size_t pos) const;
		string& operator+=(char ch);
		string& operator+=(const char* str);
		string& operator=(const string& s);
		bool operator<(const string& s) const;
		bool operator==(const string& s) const;
		bool operator<=(const string& s) const;
		bool operator>(const string& s) const;
		bool operator>=(const string& s) const;

		// 查找
		size_t find(char ch, size_t pos);
		size_t find(const char* sub, size_t pos);
		string substr(size_t pos, size_t len);

		string()	// 处理空字符串的情况
			:_str(new char[1] {'\0'})
			, _size(0)
			, _capacity(0)
		{}

		string(const char* str)		// 用常量字符串来初始化
			:_size(strlen(str))
			, _capacity(_size)
		{
			_str = new char[_capacity + 1];	// 加一是要多存一个'\0'
			strcpy(_str, str);
		}

		string(const string& s)
			:_str(nullptr)
			, _size(0)
			, _capacity(0)
		{
			string tmp(s._str);		
			swap(tmp);
		}

		~string()
		{
			delete[] _str;
			_str = nullptr;
			_size = _capacity = 0;
		}

		void clear()
		{
			_str[0] = '\0';
			_size = 0;
		}

		void swap(string& s)
		{
			std::swap(_str, s._str);	// 直接交换指针
			std::swap(_size, s._size);
			std::swap(_capacity, s._capacity);
		}

	private:
		char* _str;
		size_t _size;
		size_t _capacity;

	public:
		//const static size_t npos = -1; // 特例，只有const修饰的静态整型才可以在类内初始化
		const static size_t npos;
	};

	const size_t string::npos = -1;

	ostream& operator<<(ostream& out, const string& s)
	{
		for (auto ch : s)
			out << ch;
		return out;
	}

	istream& operator>>(istream& in, string& s)
	{
		s.clear();

		char buff[129];	// 129是个数
		size_t i = 0;

		char ch;
		ch = in.get();
		while (ch != ' ' && ch != '\n')
		{
			buff[i++] = ch;
			if (i == 128)		// i是下标，i等于128时指向的是buff中的第129个数据
			{
				buff[i] = '\0';
				s += buff;
				i = 0;
			}

			ch = in.get();
		}

		if (i != 0)
		{
			buff[i] = '\0';
			s += buff;
		}

		return in;
	}

	void string::reserve(size_t n)
	{
		if (n > _capacity)	// reserve在n < _capacity的情况下不缩容也不用扩容
		{
			char* tmp = new char[n + 1];	// 多开一个空间给'\0'
			strcpy(tmp, _str);
			delete[] _str;
			_str = tmp;
			_capacity = n;
		}
	}

	void string::resize(size_t n, char ch = '\0')
	{
		if (n <= _size)
		{
			_str[n] = '\0';
			_size = n;
		}
		else
		{
			reserve(n);
			while (_size < n)
			{
				_str[_size++] = ch;
			}

			_str[_size] = '\0';
		}
	}

	void string::push_back(char ch)
	{
		if (_size == _capacity)
		{
			reserve(_capacity == 0 ? 4 : _capacity * 2);
		}

		_str[_size] = ch;
		++_size;
		_str[_size] = '\0';
	}

	void string::append(const char* str)
	{
		size_t len = strlen(str);
		if (_size + len > _capacity)
		{
			reserve(_size + len);
		}

		strcpy(_str + _size, str);
		_size += len;
	}

	void string::insert(size_t pos, char ch)
	{
		assert(pos <= _size);		// 等于_size就是尾插
		if (_size == _capacity)
		{
			reserve(_capacity * 2);
		}

		size_t end = _size + 1;
		while (end > pos)
		{
			_str[end] = _str[end - 1];
			--end;
		}
		_str[pos] = ch;
		_size++;
	}

	void string::insert(size_t pos, const char* str)
	{
		assert(pos <= _size);
		size_t len = strlen(str);
		if (_size + len > _capacity)
		{
			reserve(_size + len);
		}

		// 挪数据
		size_t end = _size + 1;
		while (end > pos)
		{
			_str[end + len - 1] = _str[end - 1];
			--end;
		}

		// 插入
		for (size_t i = 0; i < len; i++)
		{
			_str[pos++] = str[i];
		}

		_size += len;
	}

	void string::erase(size_t pos, size_t len = npos)
	{
		assert(pos < _size);
		if (len == npos || pos + len >= _size)
		{
			_str[pos] = '\0';
			_size = pos;
		}
		else
		{
			size_t begin = pos + len;
			while (begin <= _size)
			{
				_str[begin - len] = _str[begin];
				begin++;
			}
			_size -= len;
		}
	}

	char& string::operator[](size_t pos)
	{
		assert(pos < _size);
		return _str[pos];
	}

	const char& string::operator[](size_t pos) const
	{
		assert(pos < _size);
		return _str[pos];
	}

	string& string::operator+=(char ch)
	{
		push_back(ch);
		return *this;
	}

	string& string::operator+=(const char* str)
	{
		append(str);
		return *this;
	}

	string& string::operator=(const string& s)
	{
		if (this != &s)
		{
			char* tmp = new char[s._capacity + 1];
			strcpy(tmp, s._str);
			delete[] _str;
			_str = tmp;
			_size = s._size;
			_capacity = s._capacity;
		}

		return *this;
	}

	bool string::operator<(const string& s) const 
	{ 
		return strcmp(_str, s.c_str()) < 0;
	}

	bool string::operator==(const string& s) const
	{
		return strcmp(_str, s.c_str()) == 0;
	}

	bool string::operator<=(const string& s) const
	{
		return *this < s || *this == s;
	}

	bool string::operator>(const string& s) const
	{
		return !(*this <= s);
	}

	bool string::operator>=(const string& s) const
	{
		return !(*this < s);
	}

	size_t string::find(char ch, size_t pos = 0)
	{
		assert(pos < _size);
		for (size_t i = pos; i < _size; i++)
		{
			if (_str[i] == ch)
			{
				return i;
			}
		}

		return npos;		// 找不到
	}

	size_t string::find(const char* sub, size_t pos = 0)
	{
		const char* p = strstr(_str + pos, sub);	// 返回子串第一次出现位置的指针，找不到就返回空指针
		if (p)
		{
			return p - _str;
		}
		else
		{
			return npos;
		}
	}

	string string::substr(size_t pos, size_t len = npos)	// 从pos位置开始，取len个字符
	{
		string s;
		size_t end = pos + len;
		if (len == npos || pos + len >= _size)	// 有多少取多少
		{
			len = _size - pos;
			end = _size;
		}

		s.reserve(len);		// 提前开好空间
		for (size_t i = pos; i < end; i++)
		{
			s += _str[i];
		}

		return s;
	}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376

声明：本文内容由网友自发贡献，不代表【wpsshop博客】立场，版权归原作者所有，本站不承担相应法律责任。如您发现有侵权的内容，请联系我们。转载请注明出处：https://www.wpsshop.cn/w/Li_阴宅/article/detail/929534