The thing is, the experiment that I am currently running and is about to finish involves asking people about their grammaticality judgments. I presented them with sentences that are manipulated using various factors, and I ask them whether they think it is a good sentence or not. Of course, I do not directly ask them, instead, I use a technique called magnitude estimation. This requires people to assign numbers to sentences, and the numbers reflect how good or how bad they think the sentences are.
Now there are a few assumptions that this paradigm makes. This assumes that grammaticality is not a binary function. It is not the case that the goodness or badness of a sentence can be a yes or no response. Instead, it takes the view that grammaticality is a graded variable. By using numbers (and therefore a scalar variable), one can say that one sentence is good, another sentence is slightly bad, and another one be very bad. It is not always the case that sentences are bad just because they violate some prescriptive assumption.
Anyway, the experimental design that I made involved four different types of filler items, and each of these fillers had a good version and a bad version. Each of these four filler types correspond to different syntactic constructions. The sentences were either good or bad because they would be violating some syntactic constraint, for example, subject-verb agreement or number agreement. These two agreement types are uncontroversial in the literature: if there is a disagreement, then the sentence becomes wrong.
Now, why do I have these fillers? Well, they serve a purpose. One purpose is that it hides as much as it can the real items in the experiment. The thing is, if the person realizes that there is a pattern in the items, then one can possibly strategize in one's responses and so the response isn't automatic anymore: the response isn't done on the fly. So there are fillers that are designed to distract the participant from finding out which items are the real test items.
Another purpose of the fillers that I had was to make sure that people are paying attention. The thing is, they are being paid a small amount of money in the experiment, and they might as well come in with a could-care-less attitude and then just plug in random numbers without even thinking about the directions. Thus, if they respond to the filler items the same way, regardless if they were good or bad fillers, then that means that they were not paying attention to the task.
So, I had to compare the means of the good and bad fillers. I had to conduct four different T-tests for every participant, since there are four filler types. If they come out as significant in at least 3 out of 4 filler types, then I will retain the participant. If they only come out as significant in just 2 types or less, then I will drop the participant and discard the data garnered from that participant. That is a way in checking whether the participant is actually doing what you want them to do or not.
So yeah, comparing the means involves executing T-tests, but last weekend, I was not able to make the software package to read my data. Weird. But now, I got help from someone, and I was able to make it work. Due to that, we're dropping a few people from the study, about 4, since they didn't show evidence that they were actually paying attention to the task.