Help me in solving GRDXOR problem

My issue

Gives TLE for all but one test case. Although the time complexity is O(N*M) it is actually 30*N*M+30*N*M which may be the cause. Is there anything else I am doing wrong here? How can I optimize?

My code

# cook your dish here
for i in range(int(input())):
    N,M=map(int,input().split())
    Matrix=[None]*N
    for j in range(N):
        Matrix[j]=list(map(int,input().split()))
    bitSetElements=[0 for l in range(30)]
    bitSetRows=[[0 for l in range(30)] for j in range(N)]
    bitSetColumns=[[0 for l in range(30)] for k in range(M)]
    for l in range(30):
        for j in range(N):
            for k in range(M):
                if Matrix[j][k]^(2**l)<Matrix[j][k]:
                    bitSetRows[j][l]+=1
                    bitSetColumns[k][l]+=1
                    bitSetElements[l]+=1
    Answer=0
    for j in range(N):
        for k in range(M):
            temp=0
            for l in range(30):
                if Matrix[j][k]^(2**l)<Matrix[j][k]:
                    temp+=(2**l)*(N*M-N-M-bitSetElements[l]+bitSetRows[j][l]+bitSetColumns[k][l])
                else:
                    temp+=(2**l)*(bitSetElements[l]-bitSetRows[j][l]-bitSetColumns[k][l])
            Answer=max(Answer,temp)
    print(Answer)

Problem Link: GRDXOR Problem - CodeChef

I am starting to think that I cannot get AC using PYTH3