Python 3.x How to make list not repeat same character three times in row

Python 3.x How to make list not repeat same character three times in row,python-3.x,list,Python 3.x,List,I have this code for letters in itertools.product(charset, repeat=47): string = "".join(letters) print(string) and out from that is aaaaaaaaaaaa aaaaaaaaaaab aaaaaaaaaaac but im wondering how can I make it not generate same three characters in row so that out put is dddcccbbbaaa dddcccbbbaab dddcccbbbaac and so on without using something like this for letters in itertools.product(charset, repeat=47): string = "".join(letters) for i in range(1,len(string)-1): if

I have this code

for letters in itertools.product(charset, repeat=47):
    string = "".join(letters)
    print(string)

and out from that is

aaaaaaaaaaaa
aaaaaaaaaaab
aaaaaaaaaaac

but im wondering how can I make it not generate same three characters in row so that out put is

dddcccbbbaaa
dddcccbbbaab
dddcccbbbaac

and so on without using something like this

for letters in itertools.product(charset, repeat=47):
    string = "".join(letters)
    for i in range(1,len(string)-1):
        if string[i] is not string[i+1] is not string[i-1]:
            print(string)
        else:
            pass

#1

Here's a slightly modified version of your code:

import itertools

def version1(charset, N):
    result = []
    for letters in itertools.product(charset, repeat=N):
        string = "".join(letters)
        for i in range(0, N-2):
            if string[i] == string[i+1] == string[i+2]:
                break
        else: # did not find any ZZZ sequence
           result.append(string)
    return result

>>> charset = "abc"
>>> N = 5
>>> version1(charset, N)
['aabaa', 'aabab', 'aabac', 'aabba', 'aabbc', 'aabca', 'aabcb', 'aabcc', 'aacaa', 'aacab', 'aacac', 'aacba', 'aacbb', 'aacbc', 'aacca', 'aaccb', 'abaab', 'abaac', 'ababa', 'ababb', 'ababc', 'abaca', 'abacb', 'abacc', 'abbaa', 'abbab', 'abbac', 'abbca', 'abbcb', 'abbcc', 'abcaa', 'abcab', 'abcac', 'abcba', 'abcbb', 'abcbc', 'abcca', 'abccb', 'acaab', 'acaac', 'acaba', 'acabb', 'acabc', 'acaca', 'acacb', 'acacc', 'acbaa', 'acbab', 'acbac', 'acbba', 'acbbc', 'acbca', 'acbcb', 'acbcc', 'accaa', 'accab', 'accac', 'accba', 'accbb', 'accbc', 'baaba', 'baabb', 'baabc', 'baaca', 'baacb', 'baacc', 'babaa', 'babab', 'babac', 'babba', 'babbc', 'babca', 'babcb', 'babcc', 'bacaa', 'bacab', 'bacac', 'bacba', 'bacbb', 'bacbc', 'bacca', 'baccb', 'bbaab', 'bbaac', 'bbaba', 'bbabb', 'bbabc', 'bbaca', 'bbacb', 'bbacc', 'bbcaa', 'bbcab', 'bbcac', 'bbcba', 'bbcbb', 'bbcbc', 'bbcca', 'bbccb', 'bcaab', 'bcaac', 'bcaba', 'bcabb', 'bcabc', 'bcaca', 'bcacb', 'bcacc', 'bcbaa', 'bcbab', 'bcbac', 'bcbba', 'bcbbc', 'bcbca', 'bcbcb', 'bcbcc', 'bccaa', 'bccab', 'bccac', 'bccba', 'bccbb', 'bccbc', 'caaba', 'caabb', 'caabc', 'caaca', 'caacb', 'caacc', 'cabaa', 'cabab', 'cabac', 'cabba', 'cabbc', 'cabca', 'cabcb', 'cabcc', 'cacaa', 'cacab', 'cacac', 'cacba', 'cacbb', 'cacbc', 'cacca', 'caccb', 'cbaab', 'cbaac', 'cbaba', 'cbabb', 'cbabc', 'cbaca', 'cbacb', 'cbacc', 'cbbaa', 'cbbab', 'cbbac', 'cbbca', 'cbbcb', 'cbbcc', 'cbcaa', 'cbcab', 'cbcac', 'cbcba', 'cbcbb', 'cbcbc', 'cbcca', 'cbccb', 'ccaab', 'ccaac', 'ccaba', 'ccabb', 'ccabc', 'ccaca', 'ccacb', 'ccacc', 'ccbaa', 'ccbab', 'ccbac', 'ccbba', 'ccbbc', 'ccbca', 'ccbcb', 'ccbcc']

Your algorithm is not optimal. Look at the first string:

aaaaa

You know that you need len(charset) - 1 iterations (aaaab, aaaac) to arrive to:

aaaba

And then again len(charset) - 1 iterations to arrive to:

aaaca

But you can skip all those iterations, because of the aaa beginning. Actually, when you find sequence aaa, you can skip len(charset)^K - 1 where K is the number of remaining chars. This does not change the big O complexity, but will reduce the time of computation for long sequences, depending on the size of the charset and the number of characters of the strings.

Intuitively, if the charset has few chars, you will spare a lot of computations.

First, you need to find the first letter after a ZZZ sequence:

def first_after_ZZZ(string):
     for i in range(0, len(string)-2):
         if string[i] == string[i+1] == string[i+2]:
             return i+3
     return -1

>>> first_after_ZZZ("ababa")
-1
>>> first_after_ZZZ("aaaba")
3
>>> first_after_ZZZ("aaabaaabb")
3

We use this function in the previous code (intermediate step):

def version2(charset, N):
    result = []
    for letters in itertools.product(charset, repeat=N):
        string = "".join(letters)
        f = first_after_ZZZ(string)
        if f == -1:
            result.append(string)
    return result

>>> version2(charset, N) == version1(charset, N)
True

Now, we can skip some elements:

def version3(charset, N):
    result = []
    it = itertools.product(charset, repeat=N)
    for letters in it:
        string = "".join(letters)
        f = first_after_ZZZ(string)
        if f == -1:
            result.append(string)
        elif f < N:
            K = N - f # K > 1
            to_skip = len(charset)**K-1 
            next(itertools.islice(it, to_skip, to_skip), None) # this will skip to_skip tuples
    return result

>>> version3(charset, N) == version1(charset, N)
True

Benchmark:

>>> from timeit import timeit as ti
>>> ti(lambda: version1(charset, 15), number=1)
13.14919564199954
>>> ti(lambda: version3(charset, 15), number=1)
6.94705574299951

This is impressive because the charset is small, but may be insignificant with a whole alphabet.

Of course, if you write your own implementation of product, you can skip the tuples without generating them and this could be faster.


#2

I don't understand: "how to make it not repeat the same character three times in a row", and then the desired result is dddcccbbbaaa where every character is repeated three times in a row!?

#3

this is almost 2 times faster but when it comes to big charset it doesn't work because of sys.maxsize