Data-Structure8

Before：字符串是琴弦，想弹琴了🎸🎹🪕

Data Structure 8 字符串

字符串的定义

字符串是由若干个字符按照一定顺序组合而成，如果把单个字符看作一个元素，则可把字符串看成是一个字符类型的线性表。但区别在于，线性表中的个体大多相互独立，强调的是对表中某个元素的操作，而字符串更强调的是整体的操作，是对多个字符串的同时操作。关于字符串的基本操作有以下几种：
1.求字符串中元素个数length(s)
2.字符串输出disp(s)
3.判断两个字符串相等equal(s1,s2)、大于greater(s1,s2)大于等greaterEqual(s1, s2)、小于less(s1,s2)小于等于lessEqual(s1,s2)，按字母序判断两个字符串的大小，返回true或者false
4.字符串赋值copy(s1,s2)，将字符串s2赋值给另一个字符串s1。如t的值“SHANGHAI”,s的值是“UNIVERSITY”，执行copy(t,s)后，t的值变为“UNIVERSITY”。
5.字符串连接cat(s1,s2)，将字符串s2中的字符序列连接到字符串s1的字符序列之后。
6.取子串substr(s,start,len)，在字符串 s 中从start开始取长度为len的子串。
7.字符串插入insert(s1, start, s2)
8.删除子串remove(s, start, len)
9.查找子串search(s1,s2)

字符串的顺序实现

字符串本质上是一个线性表，因而可以采用顺序存储。我们要做的是创建一个字符类型的数组。我们常说C风格字符串和C++字符串，那么C语言和C++在字符串这一数据结构的处理上有何区别呢？

C风格字符串与C++字符串对比

而更为底层的区别在于，C语言的字符串是采用静态的顺序存储，使用一个以null(‘\0’)字符结尾的字符数组来保存字符串，而C++中则把字符串封装成了一种数据类型string，采用动态的顺序存储，并用运算符重载实现了赋值、连接、比较等操作，使字符串类型的变量能与整型、实型等内置类型的变量一样用运算符操作。

那么顺序串的存储实现采用一个动态的字符数组（一个动态数组自然需要一个指向数组首地址的指针和数组的大小两个量），但由于C++字符串必须以’\0’结尾，故不管该字符数组后面还有多少元素，一旦遇到’\0’，即终止，故字符串类的动态字符数组不需要记录数组的大小。

字符串类的顺序实现

在上代码前，让我们先明晰一下，字符串类的实现要点：

1.构造函数：接受一个字符串常量作为参数。构造函数会动态分配一个数组来存储这个字符串。
2.拷贝构造函数和析构函数：由于使用了动态内存分配，需要定义拷贝构造函数来正确处理对象的拷贝，以及析构函数来释放分配的内存。（回顾《程序设计思想与方法》，存在动态内存分配时，简单的浅拷贝（按位复制）会导致多个对象共享同一块内存。这可能导致析构时多次释放同一内存（double free），引发未定义行为。）
3.运算符的重载（字符串操作包含很多运算符运算）：输出、赋值、比较、输出。（回顾《程序设计思想与方法》，运算符重载通常定义为友元函数，出于对称性的考虑，即如果其中一个操作数不是类的对象，重载为成员函数可能会导致不对称性；也出于自然语法中输入输出运算符的左操作数通常是流对象，而不是自定义类的对象。）

下面给出字符串类的定义：

class seqString{
    friend seqString operator+(const seqString &s1,const seqString &s2);
    friend bool operator==(const seqString &s1,const seqString &s2);
    friend bool operator!=(const seqString &s1,const seqString &s2);
    friend bool operator>(const seqString &s1,const seqString &s2);
    friend bool operator>=(const seqString &s1,const seqString &s2);
    friend bool operator<(const seqString &s1,const seqString &s2);
    friend bool operator<=(const seqString &s1,const seqString &s2);
    friend ostream &operator<<(ofstream &os,const seqString &s);

    char *data;
    int len;

    public:
        seqString(const char *s = "");
        seqString(const seqString &other);
        ~seqString();
        int length() const;
        // 补充说明一下这里添加const的原因：
        //  保证函数不会修改对象；允许在常量对象上调用
        seqString &operator=(const seqString&other);
        seqString subStr(int start,int num);
        void insert(int start,const seqString &s);
        void remove(int start,int num);
};

** 有趣的是，这里字符串类居然没有使用类模板？！不会是因为elemType都是char吧🤣🤣🤣

接下来就是字符串类的具体实现了，话不多说，上代码！

seqString::seqString(const char*s){
    int i;
    for(i = 0;;i++){
        if(s[i] == '\0'){
            break;
        }
    }
    len = i;
    data = new char[len + 1];
    for(i = 0;i <= len;i ++){
        data[i] = s[i];
    }
}

seqString::seqString(const seqString &other){
    data = new char[other.len + 1];
    for(int i = 0 ; i <= other.len;i ++){
        data[i] = other.data[i];
    }
    len = other.len;
}

seqString::~seqString(){
    delete []data;
}

int seqString::length() const{
    return len;
}

seqString &seqString::operator==(const seqString&other){
    if(this == &other){
        return *this;
    }
    delete []data;
    len = other.len;
    data = new char[len + 1];
    for(int i = 0;i <= len;i ++){
        data[i] = other.data[i];
    }
    return *this;
}

seqString seqString::substr(int start,int num){
    if(start < 0 || start >= len){
        return "";
    }
    seqString s;
    if(start + num > len){
        s.len = len - start;
    }else{
        s.len = num;
    }
    delete []s.data;
    s.data = new char[s.len + 1];
    for(int i = 0; i < s.len;i ++){
        s.data[i] = data[start + i];
    }
    s.data[s.len] = '\0';
    return s;
}

void seqString::insert(int start,const seqString &s){
    char *tmp = data;
    if(start > len || start < 0){
        return;
    }
    len += s.len;
    data = new char[len + 1];
    for(int i = 0;i < start; i ++){
        data[i] = tmp[i];
    }
    for(int i = start; i < start + s.len;i ++){
        data[i] = s.data[i - start];
    }
    for(int i = start + s.len; i <= len; i ++){
        data[i] = tmp[i - s.len];
    }
    
    delete tmp;
}

void seqString::remove(int start,int num){
    if(start < 0 || start >= len){
        return;
    }
    if(start + num >= len){
        data[start] = '\0';
        len = start;
        return;
    }
    for(int i = start + num; i < len; i ++){
        data[i - num] =data[i];
    }
    len -= num;
    data[len] = '\0';
}

seqString operator+(const seqString &s1,const seqString &s2){
    seqString tmp;
    tmp.len = s1.len + s2.len;
    delete []tmp.data;
    tmp.data = new char[tmp.len + 1];
    for(int i = 0 ; i < s1.len; i ++){
        tmp.data[i] = s1.data[i];
    }
    for(int i = 0; i < s2.len; i ++){
        tmp.data[s1.len + i] = s2.data[i];
    }
    tmp.data[tmp.len] = '\0';
    return tmp;
}

bool operator==(const seqString &s1,const seqString &s2){
    if(s1.len != s2.len){
        return false;
    }
    for(int i = 0 ; i <= s1.len; i ++){
        if(s1.data[i] != s2.data[i]){
            return false;
        }
    }
    return true;
}

bool operator!=(const seqString &s1,const seqString &s2){
    if(s1 == s2){
        return false;
    }
    return true;
}

bool operator>(const seqString &s1,const seqString &s2){
    for(int i = 0 ; i < s1.len; i ++){
        if(s1.data[i] > s2.data[i]){
            return true;
        }else if(s1.data[i] < s2.data[i]){
            return false;
        }
    }
    return false;
}

bool operator>=(const seqString &s1,const seqString &s2){
    if(s1 > s2){
        return true;
    }
    if(s1 == s2){
        return true;
    }
    return false;
}

bool operator<(const seqString &s1,const seqString &s2){
    if(s1 >= s2){
        return false;
    }
    return true;
}

bool operator<=(const seqString &s1,const seqString &s2){
    if(s1 > s2){
        return false;
    }
    return true;
}

ostream &operator<<(ofstream &os,const seqString &s){
    for(int i = 0; i < s.len; i ++){
        os << s.data[i];
    }
    return os;
}

字符串类的链接实现

正常的链表理应当在一个结点中存储一个字符，这种存储方式使insert和remove操作容易实现，但太浪费空间。在每个结点中，数据只占一个字节，而指针却要占多个字节。为了提高空间利用率，可使每个结点存放多个字符，称为块状链表😅😅😅😅😅流汗黄豆贴满了，应该能看出JaneZ对块状链表的无语了吧，一个差点要了我命的数据结构，《重生之JaneZ要斩了块状链表》，即将上演。

下图是一张块状链表的示意图：

块状链表存储字符串
块状链表提高了空间的利用率，但插入和删除时会引起数据的大量移动.例如在上图中的字符串中删除”C”，所有结点的数据都要发生变化。数据插入也是如此。为了提高插入和删除的效率，块状链表通常允许结点有一定的空闲空间。

如在上图的字符串中删除”C”只需在第一个结点中删除”C”，其他结点保持不变。如要删除”EFGI”只需删除第2个结点，并在原第3个结点中删除”I”。当需要在”F”后插入字符串”UVXYZ”时，先形成两个新的结点”UVX”和”YZ”，然后将结点”EFG”分裂成两个结点”EF”和”G”，将两个新结点插入它们之间。为了保证块状链表不退化成单个字符的链表，检查新插入的最后一个结点和后面一个结点能否合并成一个结点。

块状链表执行insert操作具体过程

一些实现要点：链接串类的存储采用带头结点的块状链表。由于采用链接存储，在链接串类中定义了一个私有的内嵌类node，即链表中的结点类。每个结点由3部分组成：结点中的有效字符数、保存字符串的字符数组以及一个指向后继结点的指针。链接串类有 3 个数据成员：指向头结点的指针、字符串的长度以及每个结点的容量。

下面给出代码实现：

class linkString{
    friend linkString operator+(const linkString &s1,const linkString &s2);
    friend bool operator==(const linkString &s1,const linkString &s2);
    friend bool operator!=(const linkString &s1,const linkString &s2);
    friend bool operator>(const linkString &s1,const linkString &s2);
    friend bool operator>=(const linkString &s1,const linkString &s2);
    friend bool operator<(const linkString &s1,const linkString &s2);
    friend bool operator<=(const linkString &s1,const linkString &s2);
    friend ostream &operator<<(ofstream &os,const linkString &s);

    struct Node{
        int size;
        Node *next;
        char *data;

        Node(int s = 1,Node *n = nullptr){
            data = new char[s];
            next = n;
            size = s;
        }
    };

    Node *head;
    int len;
    int nodeSize;

    void clear();
    //释放块状链表的存储空间
    void findPos(int start,int &pos,Node *&p) const;
    //找到第start个字符所在的结点地址p以及在结点中的位置pos 
    void split(Node *p,int pos);
    //split函数将指针p指向的结点以位置pos为界分裂成两个结点。
    void merge(Node *p);
    //merge函数检查p指向的结点是否能与它的直接后继合并成一个结点。 

    public:
        linkString(const char *s = "");
        linkString(const linkString &other);
        ~linkString();
        int length() const;
        linkString &operator=(const linkString&other);
        linkString subStr(int start,int num);
        void insert(int start,const linkString &s);
        void remove(int start,int num);
};

关于块状链表的性能，研究表明（事实上好像未必），块状链表的结点容最与结点个数相同时算法的效率是最高的。所以将结点数量设为 $\sqrt{len}$

具体实现：

linkString::linkString(const char *s = ""){
    Node *p;

    int i;
    for(i = 0;;i++){
        if(s[i] == "\0"){
            break;
        }
    }
    len = i;
    nodeSize = sqrt(len);

    while(*s){
        p = p -> next = new Node(1);
        for( ; p ->size < nodeSize && *s; p -> size ++,s ++){
            p -> data[p->size] = *s;
        }
    }
}

linkString::linkString(const linkString &other){
    Node *p = head = new Node(1);
    Node *otherp = other.head -> next;
    while(otherp){
        p = p -> next = new Node(1);
        for(;p -> size < other.nodeSize && otherp -> data[p -> size];p -> size ++, otherp ++){
            p -> data[p -> size] = otherp -> data[p -> size];
        }
        otherp = otherp -> next;
    }
    len = other.len;
    nodeSize = other.nodeSize;
}

void clear(){
    Node *p = head -> next;
    Node *tmp;
    while(p){
        tmp = p;
        p = p -> next;
        delete tmp;
    }
}

~linkString(){
    clear();
    delete head;
}

int linkString::length() const{
    return len;
}

linkString &linkString::operator=(const linkString&other){
    if(other == this){
        return *this;
    }

    clear();
    Node *p = head;
    Node *otherp = other.head -> next;
    while(otherp){
        p = p -> next = new Node(1);
        for(;p->size < other.nodeSize && otherp->data[p->size];p -> size ++, otherp->data ++){
            p->data[p->size] = otherp->data[p->size];
        }
        otherp = otherp -> next;
    }
    len = other.len;
    nodeSize = other.nodeSize;
    return *this;
}

void linkString::findPos(int start,int &pos,Node *&p) const{
    if(start < 0 || start >= len){
        return;
    }
    p = head -> next;
    int count = 0;

    while(p){
        if(start - count > p.size){
            count += p.size;
            p = p -> next;
        }else{
            pos = start - count;
            return;
        }
    }
}

linkString linkString::subStr(int start,int num){
    linkString tmp;
    if(start < 0 || start >= len){
        return tmp;
    }
    Node *p;
    Node *tmpp = tmp -> head;
    int pos;

    if(start + num > len){
        num = len - start ;
    }
    findPos(start,pos,p);
    tmp.len = num;
    tmp.nodeSize = sqrt(num);
    for(int i = 0 ; i < num;){
        tmpp = tmpp -> next = new Node(nodeSize);

        for(;tmpp -> size < tmp.nodeSize && i < tmp.len;i ++,tmpp->size ++){
            if(pos == p -> size){
                p = p -> next;
                pos = 0;
            }
            tmpp -> data[tmpp -> size] = p -> data[pos ++];
        }
    }

    return tmp;
}

void linkString::split(Node *p,int pos){
    p -> next = new node(nodeSize,p -> next);

    for(int i = pos; i < p -> size; i ++){
        p -> next -> data[i - pos] = p -> next[i];
    }
    p -> next -> size = p -> size - pos ;
    p -> size = pos;
}

void linkString::merge(Node *p){
    NOde *tmp = p -> next;
    if(tmp -> size + p -> size <= nodeSize){
        for(int i = p -> size; i < tmp->size + p -> size; i ++){
            p -> data[i] = tmp -> data[i - p -> size];
        }
        p -> size = tmp -> size + p -> size;
        p -> next = tmp -> next;
        delete tmp;
    }
}

⭐下面是比较重要的insert和remove的实现：

void linkString::insert(int start,const linkString &s){
    if(start < 0||start >= len){
        return;
    }

    Node *p;
    Node *tmp;
    Node *stmp = s.head -> next;
    int pos;

    findpos(start,pos,p);
    split(p,pos);
    tmp = p -> next;

    while(stmp){
        for(pos = 0;pos < stmp -> size;pos ++){
            if(pos == p -> size){
                p = p -> next = new Node(nodeSize);
            }
            p -> data[p -> size] = stmp -> data[pos];
            p -> size ++;
        }
        stmp = stmp -> next;
    }

    len += s.len;
    merge(p);
    p -> next = tmp;
}

void linkString::remove(int start,int num){
    if(start < 0||start >= len){
        return;
    }
    Node *p;
    int pos;
    findPos(start,pos,p);
    split(p,pos);

    if(start + num > len){
        num = len - start;
        len = start;
    }else{
        len -= num;
    }

    while(true){
        Node *tmp = p -> next;
        if(num - tmp->size > 0){
            num -= tmp -> size;
            p -> next = tmp -> next;
            delete tmp;
        }else{
            split(tmp,num);
            p -> next = tmp -> next;
            delete tmp;
            break;
        }
    }
    merge(p);
}

重载运算符部分与顺序实现类似，较容易，就不写了（bushi）。又一遍块状链表真是把人快搞死了😅，真难写啊呜呜呜…字符串相关算法（kmp，字符串哈希等）上学期上机课有过介绍，就不会再出现了（偷懒😋）

Reference

CSDN:https://blog.csdn.net/yf210yf/article/details/8777131
CSDN:https://blog.csdn.net/tuolaji8/article/details/51362698

Data Structure

#Data Structure #C++ #String

Data-Structure8

http://example.com/2025/02/20/Data-Structure8/

Author

Yihan Zhu

Posted on

February 20, 2025

Updated on

February 21, 2025

Licensed under

Mathematical-Logic1 Previous

数据结构2 Next