Challenge for those annoyed by weak test cases

When reading trough this forum there are a lot of posts/replies of the following form

The test cases for [Problem] are weak

Or if they have done a bit more work:

The test cases for [Problem] are weak
Here is a test case where most submissions fail on

I want to make a point to users making such statements: As a setter it is really hard to find such test cases beforehand. It is almost impossible to think about every wrong approach that might be taken. I am going to challenge you by putting you in the role of setter.

For my A&D class we had a practical assignment (in the first semester of last school year, so don’t worry about me cheating). This assignment was based on an algorithm in a research paper, so I will not challenge you to find the algorithm. As far as I know none of the students, including me, found the algorithm of that paper. The algorithm I made did however pass all test cases, and I challenge you to make a test case for which my algorithm would fail.

The problem “you created”

You are given a graph where each vertex has at most 4 neighbors. Find the maximum size of a set such that there is no edge connecting two vertices of that set.
or shorter:
Find the maximum independent set of a degree-4 graph.

The solution “you have come up with”

See https://link.springer.com/article/10.1007/s10878-017-0115-3 for the paper on the algorithm to solve the problem. I do not expect you to be able to read scientific papers.

My solution

A setter would not have access to this solution.

The challenge

Make a test case for which my algorithm will fails. The requirements for test cases are:

  1. It must be (easily) verifiable by hand.
  2. You must provide the expected answer
  3. The input has the form
N M
from_1 to_1
from_2 to_2
...
from_N to_N

where N (1\leq N\leq 200) is the number of edges, M (1\leq M\leq 100) the number of vertices (yes, this is backwards compared to what is standard in competitions; but that is how it was phrased for the assignment). and vertices are numbered 1-M. Edges are bidirectional. Self-loops are not allowed.
Full list of requirements of the test case:

  • graph must be a simple graph; so no double edges between nodes
  • graph may not contain self-loops
  • each vertex of the graph must have at most 4 edges
  • the number of edges (N) is at most 200 (and at least 1)
  • the number of vertices (M) is at most 100 (and at least 1)
  • from_i in between 1 and M
  • to_i in between 1 and M

Post your test cases below, I will reply with the result of my algorithm; one of

  • :white_check_mark: AC
  • :x: WA
  • :clock10: TLE (executed for more than one second)
  • :warning: RTE (exception during execution)

And the output of the program in case of an AC or WA. I will grade test cases against the weakest version of my algorithm; although for the assignment I handed in a more robust version. Your goal is for me to receive one of the latter three verdicts on your text case.

8 Likes

it can be proven that your algo fails for some testcase or is something yet to be tested?

Yes, there are test cases for which my algorithm is likely to fail, and I am aware of at least one.

I do not consider myself one of those users who usually complains about testcases but I found really interesting your challenge:

case

1 1
1 1

answer

0

1 Like

:x: WA; output=1
Although I should have stated that the problem from the course specified that it must be as simple graph; I will modify my first post. There is a test case for which my program fails without self-loops.

ok, let’s go on with edge cases without self loops then:

Complete graph

190 20
5 16
8 13
2 12
1 13
3 9
16 19
17 20
3 15
4 6
3 5
3 12
4 12
5 17
7 9
10 14
13 18
2 20
9 20
13 17
6 10
12 19
4 8
14 20
10 20
2 7
7 12
14 15
10 18
16 20
8 15
10 12
6 9
1 10
9 17
6 12
7 17
9 13
4 10
10 16
11 19
4 11
9 11
2 8
6 7
1 14
7 8
9 12
10 15
1 12
1 20
9 10
14 18
12 15
1 16
1 7
5 9
2 15
8 16
2 5
15 19
1 11
12 13
6 15
3 10
1 8
4 15
1 9
16 18
6 19
3 7
3 18
2 13
17 18
11 15
1 3
7 18
9 15
1 5
2 14
11 20
8 10
6 13
13 20
13 14
4 18
4 7
4 5
11 14
2 4
8 17
6 11
8 12
6 20
13 15
7 10
13 16
5 20
8 14
5 19
3 11
14 16
16 17
7 15
8 9
5 6
9 16
3 17
10 17
3 4
5 8
5 11
15 17
2 16
19 20
15 20
13 19
3 16
2 9
11 16
11 12
4 13
9 14
11 17
7 13
10 13
14 17
8 18
2 11
5 15
12 20
7 14
5 13
5 12
4 20
1 18
8 19
4 9
3 13
1 19
9 18
5 14
17 19
1 6
15 18
2 6
3 6
5 18
10 11
3 14
12 18
18 19
1 15
2 18
6 8
6 14
5 10
8 20
4 14
7 19
1 2
12 16
18 20
2 3
3 19
2 19
15 16
2 17
7 20
11 18
3 20
6 18
4 16
4 17
5 7
6 16
14 19
7 11
9 19
2 10
6 17
1 17
11 13
1 4
10 19
12 17
4 19
3 8
7 16
12 14
8 11

Answer=1

Not a valid test case:

Without that restriction the problem would have been NP-hard

ups sorry…yes!!
Anyway…check please in the post if N is the number of nodes or edges…because it’s said that it’s the number of edges but then in the input format it referes to the number of nodes

@ssrivastava990 , its a must to tag him here lol

1 Like

the number of vertices and edges are indeed the wrong way around to what we would expect. From the original assignment:

The first line contains integers n, m and k, (1\leq n\leq 200, 1\leq m\leq 100, 1\leq k\leq 20) which denote the number of streets in Nijmegen, the number of intersections in Nijmegen and the number of bins which are already ordered. It is given that on every intersection a maximum of 4 streets meet.

I have removed the story behind the task; and removed the k value which was used to make it a decision problem instead of an optimization problem.

However I won’t mind if anyone accidentally swapped them.

Case

180 100
88 98
48 49
19 29
6 16
50 60
10 20
73 83
7 17
67 68
35 45
33 34
39 49
62 72
69 79
99 100
38 48
15 25
78 79
36 37
75 85
23 24
77 87
44 45
92 93
2 3
96 97
27 37
45 55
64 65
52 62
63 73
90 100
86 96
18 19
33 43
85 86
38 39
21 22
54 64
11 21
45 46
43 53
12 22
5 6
72 82
59 69
80 90
31 32
23 33
87 88
86 87
93 94
21 31
78 88
14 15
25 26
2 12
26 36
17 18
47 57
18 28
12 13
61 62
95 96
52 53
42 43
17 27
68 78
28 29
73 74
40 50
8 9
13 23
42 52
48 58
24 25
25 35
4 5
75 76
57 67
1 2
3 13
57 58
70 80
74 84
65 66
30 40
15 16
85 95
1 11
20 30
97 98
88 89
82 92
34 44
46 56
36 46
89 99
44 54
14 24
49 59
66 67
46 47
49 50
29 30
69 70
35 36
22 23
98 99
6 7
56 57
68 69
83 84
29 39
58 68
24 34
84 85
82 83
58 59
91 92
8 18
41 42
54 55
55 56
51 52
4 14
71 81
16 26
9 10
61 71
76 86
28 38
94 95
74 75
67 77
81 91
13 14
11 12
32 42
7 8
59 60
9 19
39 40
27 28
66 76
71 72
77 78
5 15
51 61
19 20
62 63
89 90
22 32
84 94
16 17
32 33
83 93
76 77
56 66
60 70
65 75
34 35
72 73
55 65
41 51
81 82
37 47
79 89
64 74
53 54
63 64
47 48
31 41
53 63
26 27
3 4
43 44
87 97
37 38
79 80

Answer=50

1 Like

codeforces has good test cases almost all the time
and what is wrong when community is annoyed???
they expect good testcases for contests
thats not wrong
not able to produce strong test cases almost in every contest is codechef testers issue.

I mean if a company releases a defective product, its their issue.
it cannot just challenge everyone saying its tough :rofl: :rofl: :rofl: :rofl: :rofl: :rofl: :joy: :joy: :joy: :joy:

PS: i am not against u…i am just against codechef

I assume that case is a 10x10 grid
And then
:x: WA ; result = 49 is given as a result.

What I haven’t mentioned is that my approach was non-deterministic.
It would still fit in the time limit to do 10x as many checks in which case it would receive AC, but I think that would be cheating on my part.

The test case I had in mind was a straight line. It was funny for me to make an algorithm that could pass a lot of the difficult test cases but fail for the simple ones

In my final submission I had an extra step before running the non-deterministic part to prevent the line from giving a WA; it seems your test case also defeats the better algorithm.

I hope my intuitive answer is correct :roll_eyes:

If that’s the case talk to you school professor to hire mi as a tester! :grinning:

1 Like

Personally I am not annoyed by weak test cases. I do however notice in this forum that there is a lot of complaining about weak test cases. The point I want to make is that while complaining and finding weak test cases afterwards is easy, thinking about them beforehand is hard.

Of course problem setters should try their best at making good test cases. After a recent contest I saw a setter explain why some of his test cases were wrong/weak, and I think he should be applauded for that. I believe we should be more open to work other people do, how difficult that is, and appreciate them from doing it.

Then about Codeforces vs Codechef: I like this platform better. I like algorithmic thinking a lot more than implementing. I believe this platform is a better place for that with nice problems, the long contest and especially the good editorials. From the few times I participated on Codeforces I started disliking it. It feels it is more about making a robust implementation in a short amount of time than developing algorithms. Especially with the extra focus on the hacking phase.

2 Likes

Do you consider weak test cases to be a frequent issue in codechef?

Yes,Weak Testcases did happen frequently.
June long, GUESSG and CONVAIR
July long, LCM Constraints.
May cookoff CHEFSHIP
@carre
I do appreciate the effort of the setter.
But,during long contest before 5th day, if weak test cases are pointed out on forum, please try to add those test cases for long challenges

2 Likes

GUESSG? I don’t remember reading about many unintended approaches that got 100% points on that problem. It may be the case that I missed the post. I remember only 1 approach discussed. Obviously, I do not consider cases to be weak if only a few unintended approaches succeed in obtaining AC. I don’t comment on the others problems you mention because I don’t know, but I’m surprised to see this one on the list.

Did you have some reasoning to the test case you constructed?

Let me add few more.
COOK116B - CHCBOX
LTIME76B - CATFEED
See, I know it is impossible to predict all wrong submissions. No one can do this before hand. So why not let the community help. Why there’s no hacking phase. Let the participants help. Codechef spends so much time and effort on plagiarism checking (As they said in one of their recent blogs). It’s useless.

“our sole intention lies in making our users learn new concepts while competing”
-Codechef’s motto.

But if you learn a concept wrongly, it’s more dreadful than cheating.

Now think like this : there are three accepted submissions.

  1. A cheater - who just blindly copied a code.
  2. A person - who submitted the correct code.
  3. A person - who submitted the incorrect code - thinking his code is correct.
    Who is at the bigger loss?
    Definitely the third person. The cheater doesn’t know the concept at all. He might learn in future but what about the third person!!

I really think codechef should implement such alternative to capture weak testcases specially in short contests(it would be idiot to implement it in long for obvious reasons). It will also reduce excessive work of testers.(hopefully!)

1 Like