LeetCode 1044: Longest Duplicate Substring — With “string_view”

Anshika Bhargava
4 min readJun 25, 2020

What is “string_view” ?

Oh wait, we will come to that. First, let us see the question.

Question :

Given a string S, consider all duplicated substrings: substrings of S that occur 2 or more times. (The occurrences may overlap.)

Return any duplicated substring that has the longest possible length. (If S does not have a duplicated substring, the answer is "".)

Example :

Example 1:

Input: "banana"
Output: "ana"

Example 2:

Input: "abcd"
Output: ""

Solution :

Here is my solution to it.

string longestDupSubstring(string S) {
unordered_set<string_view> seen;
string_view ans;
int start = 0, end = S.size() - 1;
while(start <= end) {
int len = (start + end) / 2;
bool found = false;
for(int i = 0; i < S.size() - len + 1; i++) {
string_view sv = string_view(S).substr(i, len);
if(seen.find(sv) != seen.end()) {
found = true;
ans = sv;
break;
} else {
seen.emplace(sv);
}
}
if(found) {
start = len + 1;
} else {
end = len - 1;
}
seen.clear();
}
return {ans.begin(), ans.end()};
}

Alright, let us get into it part by part.

Now, since our answer would be the substring of the maximum length, we can assume the binary search is going to be our go-to algorithm. So, we keep our low to be 0 (we can have no such substring, so the size of the substring would be 0), and highto be [size of string]-1(overlapping is allowed!). If we find a repeated substring of size len, it is a waste looking for the substrings smaller than len, but we may find a substring of size greater than len. So, we reduce our length search from len+1 to high. Similarly, if there is no repeated substring of size len, we reduce our search space from low to len-1.

We now have with us a particular length, say len, and all we have to do is figure out if there is any repeated substring of size len.

The simplest solution which comes to my mind is, we maintain a set, or rather an unordered set of strings. We first take a substring from our main string of size len starting from the beginning, i.e. substr(0, len) and insert it in our set. Here, 0 is the starting position of the substring and len is the length of the substring. We then increment our starting point, i.e. substr(1, len)and so on. If the new substring formed does not exists in our set, we just add it. If we encounter a substring which is already in our set, then we have got our ans for that length. After our check for length len is finished, we clear our set and look for other values of len as we had discussed above.

But, is this feasible? Our string can be a really really long one. Moreover, when we make another substring from a string, like so,

string ss = s.substr(0, len)

a new string is made which take up space. So, if we store all the substrings of a string, we may fall prey to the MemoryOverflowExecption.

So what do we do ?

Well, we have Rabin-Karp algorithm which works fantastically well. Instead of storing the strings as such, we store the hash values of the string. We would be using a rolling hash function, so the time complexity is under control. At the same time we are not storing strings, so our space complexity is managed too.

But to be honest, I liked the first solution. No overhead of calculating hash functions and no extra headache to check for collisions!!

If I tell you we can use the same logic as is by just replacing string by string_view, then…….

Oh well, seems like we can do so!

So, let us now come back to our question.

What is string_view ?

string_view is just like a view of a string. We can a have a view over a string and can make substrings from the same string_view. Unlike the strings, string_view does not creates another string. It can rather be viewed as a structure which keeps a record of the initial pointer and the length of the underlying string.

It is just like a table view in SQL but the string cannot be changed by changing the string_view.

So, the cost overhead for storing a string_view would be comparable to storing a pointer and an integer, which is, so much less as compared to that of a string. You see how much space and memory optimisation we can achieve through this. Isn’t it just awesome!

The support for string_view was added in C++17. Lots of functions like removing suffix/prefix can be done with a lot of ease. It can be very handy in competitive programming as well.

I believe with this explained, the code would be self-explanatory.

Keep Coding!! :)

--

--

Anshika Bhargava

Software Engineer at Google | I try to learn and blog