字符串专题总结

xiaoxiao2021-02-28 3

这部分主要涉及了对AC自动机的理解，和KMP的理解

注意点：模板～！！

AC自动机模板和KMP的模板

字典树的建立

KMP next 数组也是非常重要

next数组性质：

1、根据KMP的next函数的性质，已知字符串t第K个字符的next[k],那么d=k-next[k],如果k%d==0,那么t[1……k]最多可均匀的分成k/d份。也就是可以生成一个长度为d的重复度为k/d的字串

A - Oulipo_KMP

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000. Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input 3 BAPC BAPC AZA AZAZAZA VERDI AVERDXIVYERDIAN Sample Output 1 3 0

简单的KMP匹配

#include<stdio.h> #include<iostream> #include<cmath> #include<stdlib.h> #include<string> #include<cstring> #define MOD 1000000007 typedef long long ll; using namespace std; const int N = 1000005; void kmp_pre(char x[],int m,int next[]) { int i,j; j=next[0]=-1; i=0; while(i<m) { while(-1!=j && x[i]!=x[j]) j=next[j]; next[++i]=++j; } } int KMP_Count(char x[],int m,char y[],int n) { int next1[N*2]; memset(next1,0,sizeof(next1)); int i,j; int ans=0; kmp_pre(x,m,next1); i=j=0; while(i<n) { while(-1!=j && y[i]!=x[j]) j=next1[j]; i++;j++; if(j>=m) { ans++; j=next1[j]; } } return ans; } char w[10005]; char s[1000005]; int t; int main(){ // freopen("data.txt","r",stdin); scanf("%d",&t); while(~scanf("%s %s",w,s)){ printf("%d\n",KMP_Count(w,strlen(w),s,strlen(s))); } return 0; }

B - 统计难题_字典树

Ignatius最近遇到一个难题,老师交给他很多单词(只有小写字母组成,不会有重复的单词出现),现在老师要他统计出以某个字符串为前缀的单词数量(单词本身也是自己的前缀). Input 输入数据的第一部分是一张单词表,每行一个单词,单词的长度不超过10,它们代表的是老师交给Ignatius统计的单词,一个空行代表单词表的结束.第二部分是一连串的提问,每行一个提问,每个提问都是一个字符串. 注意:本题只有一组测试数据,处理到文件结束. Output 对于每个提问,给出以该字符串为前缀的单词的数量. Sample Input banana band bee absolute acm ba b band abc Sample Output 2 3 1 0

字典树入门题，先建一个字典树，然后遍历求解就好。

#include <cstdio> #include <iostream> #include <cmath> #include <cstdlib> #include <string> #include <cstring> #define MOD 1000000007 typedef long long ll; using namespace std; const int N = 1000005; struct Trie { int v;//v可以根据实际情况任意变化，在这里v是每个字母的次数; Trie *next[26]; }; Trie root; void createTrie(char *str)//建立字典树; { int len=strlen(str); Trie *p=&root,*q; for(int i=0;i<len;i++) { int id=str[i]-'a'; if(p->next[id]==NULL) { q=(Trie *)malloc(sizeof(root));//申请一块新内存; q->v=1;//v遇到新字母每一层都初始化为1; for(int j=0;j<26;j++) q->next[j]=NULL; p->next[id]=q; p=p->next[id]; } else { p->next[id]->v++;//当第一个输入的字符串和后面又相等的时候，v++; p=p->next[id]; } } } int findTrie(char *str)//在字典树里查询; { int len=strlen(str); Trie *p=&root; for(int i=0;i<len;i++) { int id=str[i]-'a'; p=p->next[id]; if(p==NULL) return 0; } return p->v;//相同的字母个数; } char str[1000000]; int t; int main(){ // freopen("data.txt","r",stdin); while(gets(str) && str[0]!='\0') { createTrie(str); } while(cin>>str) { int ans=findTrie(str); cout<<ans<<endl; } return 0; }

C - Keywords Search

In the modern time, Search engine came into the life of everybody like Google, Baidu, etc. Wiskey also wants to bring this feature to his image retrieval system. Every image have a long description, when users type some keywords to find the image, the system will match the keywords with description of image and show the image which the most keywords be matched. To simplify the problem, giving you a description of image, and some keywords, you should tell me how many keywords will be match. Input First line will contain one integer means how many cases will follow by. Each case will contain two integers N means the number of keywords and N keywords follow. (N <= 10000) Each keyword will only contains characters 'a'-'z', and the length will be not longer than 50. The last line is the description, and the length will be not longer than 1000000. Output Print how many keywords are contained in the description. Sample Input 1 5 she he say shr her yasherhs Sample Output 3

入门AC自动机

#include <iostream> #include <cstdio> #include <cstdlib> #include <cmath> #include <iomanip> #include <algorithm> #include <climits> #include <cstring> #include <string> #include <set> #include <map> #include <queue> #include <stack> #include <vector> #include <list> #define rep(i,m,n) for(int i=m;i<=n;i++) #define rsp(it,s) for(set<int>::iterator it=s.begin();it!=s.end();it++) const int inf_int = 2e9; const long long inf_ll = 2e18; #define inf_add 0x3f3f3f3f #define MOD 1000000007 #define pb push_back #define mp make_pair #define fi first #define se second #define pi acos(-1.0) #define pii pair<int,int> #define Lson L, mid, rt<<1 #define Rson mid+1, R, rt<<1|1 const int maxn=5e2+10; using namespace std; typedef vector<int> vi; typedef long long ll; typedef unsigned long long ull; inline int read(){int ra,fh;char rx;rx=getchar(),ra=0,fh=1; while((rx<'0'||rx>'9')&&rx!='-')rx=getchar();if(rx=='-') fh=-1,rx=getchar();while(rx>='0'&&rx<='9')ra*=10,ra+=rx-48, rx=getchar();return ra*fh;} //#pragma comment(linker, "/STACK:102400000,102400000") ll gcd(ll p,ll q){return q==0?p:gcd(q,p%q);} ll qpow(ll p,ll q){ll f=1;while(q){if(q&1)f=f*p;p=p*p;q>>=1;}return f;} int dir[4][2]={{-1,0},{1,0},{0,-1},{0,1}}; const int N = 1e6+5; const int M = 27; class Trie { public: //next数组存储树 //fail数组存储下一个要匹配的节点号 //end数组主要是用来标记一个模式串结尾 int next[N][M],fail[N],end[N]; int root,L; int newnode() { for(int i = 0;i < M;i++)//每一个节点对应0-128中的任意一个。 next[L][i] = -1; end[L++] = 0;//表示下面没有节点初始化，如果是记录次数，就赋0 还可以赋任意的数， return L-1; } void init() { L = 0; root = newnode(); } void insert(char s[],int id) { int len = strlen(s); int now = root; for(int i = 0;i < len;i++) { int k =s[i]-'a'; if(next[now][k] == -1) next[now][k] = newnode(); now=next[now][k]; } // end[now]=id;//记录当前匹配单词的节点 end[now]++;//也可以用匹配单词结束后来记录次数 } //BFS求fail void build() { queue<int>Q; fail[root] = root; //初始化root及其子节点 for(int i = 0;i < M;i++) if(next[root][i] == -1) next[root][i] = root; else { fail[next[root][i]] = root; Q.push(next[root][i]); } while(!Q.empty()) { int now = Q.front(); Q.pop(); //遍历节点 for(int i = 0;i < M;i++) if(next[now][i] == -1) //不需要失配函数，对所有转移一视同仁 next[now][i] = next[fail[now]][i]; else { fail[next[now][i]] = next[fail[now]][i]; Q.push(next[now][i]); } } } void query(char buf[]) { int ans=0; int len = strlen(buf); int now = root; bool flag = false; for(int i = 0;i < len;i++) { int k =buf[i]-'a'; now = next[now][k]; int temp = now; //其会匹配多个模式串 while(temp != root) { ans+=end[temp]; end[temp] = 0; temp = fail[temp]; } } printf("%d\n",ans); } }; char buf[1000005]; Trie ac; int T; int main() { // freopen("data.txt","r",stdin); int n,m; scanf("%d",&T); for(int k=0;k<T;k++) { scanf("%d",&n); ac.init(); for(int i = 1;i <= n;i++) { scanf("%s",buf); ac.insert(buf,i); } ac.build(); scanf("%s",buf); ac.query(buf); } return 0; }

D - Power Strings

Given two strings a and b we define a*b to be their concatenation. For example, if a = "abc" and b = "def" then a*b = "abcdef". If we think of concatenation as multiplication, exponentiation by a non-negative integer is defined in the normal way: a^0 = "" (the empty string) and a^(n+1) = a*(a^n). Input Each test case is a line of input representing s, a string of printable characters. The length of s will be at least 1 and will not exceed 1 million characters. A line containing a period follows the last test case. Output For each s you should print the largest n such that s = a^n for some string a. Sample Input abcd aaaa ababab . Sample Output 1 4 3 Hint This problem has huge input, use scanf instead of cin to avoid time limit exceed.

这是一个kmp比较神奇的例子

根据KMP的next函数的性质，已知字符串t第K个字符的next[k],那么d=k-next[k],如果k%d==0,那么t[1……k]最多可均匀的分成k/d份。也就是可以生成一个长度为d的重复度为k/d的字串。

#include <iostream> #include <cstdio> #include <cstdlib> #include <cmath> #include <iomanip> #include <algorithm> #include <climits> #include <cstring> #include <string> #include <set> #include <map> #include <queue> #include <stack> #include <vector> #include <list> #define rep(i,m,n) for(int i=m;i<=n;i++) #define rsp(it,s) for(set<int>::iterator it=s.begin();it!=s.end();it++) const int inf_int = 2e9; const long long inf_ll = 2e18; #define inf_add 0x3f3f3f3f #define MOD 1000000007 #define pb push_back #define mp make_pair #define fi first #define se second #define pi acos(-1.0) #define pii pair<int,int> #define Lson L, mid, rt<<1 #define Rson mid+1, R, rt<<1|1 const int maxn=5e2+10; using namespace std; typedef vector<int> vi; typedef long long ll; typedef unsigned long long ull; inline int read(){int ra,fh;char rx;rx=getchar(),ra=0,fh=1; while((rx<'0'||rx>'9')&&rx!='-')rx=getchar();if(rx=='-') fh=-1,rx=getchar();while(rx>='0'&&rx<='9')ra*=10,ra+=rx-48, rx=getchar();return ra*fh;} //#pragma comment(linker, "/STACK:102400000,102400000") ll gcd(ll p,ll q){return q==0?p:gcd(q,p%q);} ll qpow(ll p,ll q){ll f=1;while(q){if(q&1)f=f*p;p=p*p;q>>=1;}return f;} int dir[4][2]={{-1,0},{1,0},{0,-1},{0,1}}; const ll INF = 1e18+5; int next1[1000005]; void kmp_pre(char x[],int m,int next[]) { int i,j; j=next[0]=-1; i=0; while(i<m) { while(-1!=j && x[i]!=x[j]) j=next[j]; //根据当前计算下一个字符的next //abab //aba -> b next[++i]=++j; } } int KMP_Count(char x[],int m,char y[],int n) { int next1[N*2]; memset(next1,0,sizeof(next1)); int i,j; int ans=0; kmp_pre(x,m,next1); i=j=0; //i为左指针 while(i<n) { while(-1!=j && y[i]!=x[j]) j=next1[j]; i++;j++; if(j>=m) { ans++; j=next1[j]; } } return ans; } char buf[1000005]; int main() { // freopen("data.txt","r",stdin); while(~scanf("%s",buf)) { memset(next1,0,sizeof(next1)); if(buf[0]=='.'&&buf[1]=='\0') break; int len = strlen(buf); kmp_pre(buf,len,next1); int m = len-next1[len]; if(len%m==0) { // int f= 0; // for(int i=len-1;i>m;i-=m) // { // if(i-next1[i]!=m) // { // f =1; // break; // } // } // // if(f) // { // printf("%d\n",1); // } // else { printf("%d\n",len/m); } } else { printf("%d\n",1); } // for(int i=0;i<=len;i++) // { // cout << next1[i]<<" "; // } // cout <<endl; } return 0; }

F - Knight Moves

A friend of you is doing research on the Traveling Knight Problem (TKP) where you are to find the shortest closed tour of knight moves that visits each square of a given set of n squares on a chessboard exactly once. He thinks that the most difficult part of the problem is determining the smallest number of knight moves between two given squares and that, once you have accomplished this, finding the tour would be easy. Of course you know that it is vice versa. So you offer him to write a program that solves the "difficult" part. Your job is to write a program that takes two squares a and b as input and then determines the number of knight moves on a shortest route from a to b. Input The input will contain one or more test cases. Each test case consists of one line containing two squares separated by one space. A square is a string consisting of a letter (a-h) representing the column and a digit (1-8) representing the row on the chessboard. Output For each test case, print one line saying "To get from xx to yy takes n knight moves.". Sample Input e2 e4 a1 b2 b2 c3 a1 h8 a1 h7 h8 a1 b1 c3 f6 f6 Sample Output To get from e2 to e4 takes 2 knight moves. To get from a1 to b2 takes 4 knight moves. To get from b2 to c3 takes 2 knight moves. To get from a1 to h8 takes 6 knight moves. To get from a1 to h7 takes 5 knight moves. To get from h8 to a1 takes 6 knight moves. To get from b1 to c3 takes 1 knight moves. To get from f6 to f6 takes 0 knight moves.

一个简单的BFS

。。。

#include <iostream> #include <cstdio> #include <cstdlib> #include <cmath> #include <iomanip> #include <algorithm> #include <climits> #include <cstring> #include <string> #include <set> #include <map> #include <queue> #include <stack> #include <vector> #include <list> #define rep(i,m,n) for(int i=m;i<=n;i++) #define rsp(it,s) for(set<int>::iterator it=s.begin();it!=s.end();it++) const int inf_int = 2e9; const long long inf_ll = 2e18; #define inf_add 0x3f3f3f3f #define pb push_back #define mp make_pair #define fi first #define se second #define pi acos(-1.0) #define pii pair<int,int> #define Lson L, mid, rt<<1 #define Rson mid+1, R, rt<<1|1 using namespace std; typedef vector<int> vi; typedef long long ll; typedef unsigned long long ull; inline int read(){int ra,fh;char rx;rx=getchar(),ra=0,fh=1; while((rx<'0'||rx>'9')&&rx!='-')rx=getchar();if(rx=='-') fh=-1,rx=getchar();while(rx>='0'&&rx<='9')ra*=10,ra+=rx-48, rx=getchar();return ra*fh;} //#pragma comment(linker, "/STACK:102400000,102400000") ll gcd(ll p,ll q){return q==0?p:gcd(q,p%q);} ll qpow(ll p,ll q){ll f=1;while(q){if(q&1)f=f*p;p=p*p;q>>=1;}return f;} //int dir[4][2]={{-1,0},{1,0},{0,-1},{0,1}}; const int maxn=1e6+10; bool mm[10][10]; struct node { int x; int y; int step; }; int dir[8][2] = {-2,1,-1,2,1,2,2,1,2,-1,1,-2,-1,-2,-2,-1};//骑士移动的8个方位 char s1[3]; char s2[3]; int check(int x,int y) { if(x<0 || y<0 || x>=8 || y>=8 || mm[x][y]) return 1; return 0; } int bfs(int x1,int y1,int x2,int y2) { if(x1==x2&&y1==y2) return 0; node nt,cur ; nt.x = x1; nt.y = y1; nt.step = 0; queue<node> q; q.push(nt); mm[nt.x][nt.y] = 1; while(!q.empty()) { nt = q.front(); q.pop(); for(int i=0;i<8;i++) { int xx = nt.x + dir[i][0]; int yy = nt.y + dir[i][1]; if(check(xx,yy)) continue; if(xx==x2&&yy==y2) { return nt.step+1; } mm[xx][yy] = 1; cur.x = xx; cur.y = yy; cur.step = nt.step + 1; q.push(cur); } } return 0; } int main() { // freopen("data.txt","r",stdin); while(~scanf("%s%s",s1,s2)) { memset(mm,0,sizeof(mm)); int ans = 0; int x1 = s1[0] - 'a'; int y1 = s1[1] - '1'; int x2 = s2[0] - 'a'; int y2 = s2[1] - '1'; ans = bfs(x1,y1,x2,y2); printf("To get from %s to %s takes %d knight moves.\n",s1,s2,ans); } }

G - Period

For each prefix of a given string S with N characters (each character has an ASCII code between 97 and 126, inclusive), we want to know whether the prefix is a periodic string. That is, for each i (2 <= i <= N) we want to know the largest K > 1 (if there is one) such that the prefix of S with length i can be written as A ^K ,that is A concatenated K times, for some string A. Of course, we also want to know the period K. Input The input consists of several test cases. Each test case consists of two lines. The first one contains N (2 <= N <= 1 000 000) – the size of the string S.The second line contains the string S. The input file ends with a line, having the number zero on it. Output For each test case, output "Test case #" and the consecutive test case number on a single line; then, for each prefix with length i that has a period K > 1, output the prefix size i and the period K separated by a single space; the prefix sizes must be in increasing order. Print a blank line after each test case. Sample Input 3 aaa 12 aabaabaabaab 0 Sample Output Test case #1 2 2 3 3 Test case #2 2 2 6 2 9 3 12 4

与D题原理相同

转载请注明原文地址: https://www.6miu.com/read-182668.html

技术

最新回复(0)