PSY 321:  Cognitive Processes
Concept Formation

Raw Data Set:


ID Sex Age RT (ms)   Proportion Correct
Prototype Old Instance New Instance (high variability) New Instance (low variability) Prototype Old Instance New Instance (high variability) New Instance (low variability)

Suggested Data Analysis
ANOVA

Do not start the data analysis before 3/26/2008 11:59:59 PM.

  1. Open Excel and click in cell A8
  2. Copy the above table (drag the mouse downward, starting just to the right of the colon after Raw Data Set to the beginning of the decorative divider, and then use the copy command) and paste it into Excel starting in cell A8.
  3. Delete any blank columns to the left of the data.  Delete or insert blank rows above the data so that the first cell with data is in cell A10.
  4. Identify the last cell with reaction time data.  It should be in column G for the reaction time data.
  5. In cell H10, type the following formula:

    =VAR(D10:G10)
     
  6. Copy cell H10 down the column until you reach the end of the data set.
  7. In cell D6, type the following formula:

    =AVERAGE(D10:D9)

    This row contains the condition means.
  8. Copy the formula you just entered, and paste it into the next three cells to the right.
  9. In cell H6, type the following formula:

    =AVERAGE(D6:G6)

    This is the mean of the entire set of data.
  10. In cell D7, enter the following formula:

    =(D6-$H$6)^2

    This row contains the condition sum of squares (SS). 
  11. Copy the formula you just entered into the three cells directly to the right.
     
  12. We will now construct the ANOVA summary table.
  13. In row 1, enter "Source" in column C, "SS" in column D, "DF" in column E, "MS" in column F, "F" in column G, and "p" in column H.  Do not include the quote marks
  14. In cell C2, enter "Between" without quotes.
  15. In cell C3, enter "Within" without quotes.
  16. In cell C4, enter "Total" without quotes.
  17. In cell D2, enter the following formula:

    =COUNT(D10:D9)*SUM(D7:G7)

    This cell is the sum of squares between-groups.
  18. In cell D4, enter the following formula:

    =VAR(D10:G9)*(COUNT(D10:G9)-1)

    This cell is the sum of squares total.
  19. In cell D3, enter the following formula:

    =SUM(H10:H9)*(COUNT(D10:G10)-1)-D2

    This cell is the sum of squares within-groups.
  20. In cell E2, enter the following formula:

    =COUNT(D10:G10)-1

    This is the degrees of freedom for the between-groups estimate of variance.
  21. In cell E3, enter the following formula:

    =(COUNT(D10:D9)-1)*(COUNT(D10:G10)-1)

    This is the degrees of freedom for the within-groups estimate of variance.
  22. In cell E4, enter the following formula:

    =COUNT(D10:G9)-1

    This is the total degrees of freedom.
  23. In cell F2, enter the following formula:

    =D2/E2

    This is the between-groups estimate of variance.
  24. Copy the formula you just entered into the cell directly below it. 
    This is the within-groups estimate of variance.
  25. In cell G2, enter the following formula:

    =F2/F3

    This is the F ratio.
  26. In cell H2, enter the final formula for the ANOVA summary table!!:

    =FDIST(G2, E2, E3)

    This is the probability (p value) that we can reject H0.
     
  27. We now need to use Tukey multiple comparisons to determine the which condition are likely different from which other conditions.  This groups of steps (27 through 39) should only be performed in the p value obtained in step 26 is less than our α level of .05.
  28. Consult a table of critical q values.
  29. In the above table, click on the link that contains our within-groups degrees of freedom (in cell E3).
  30. Find the value at the intersection of the column with the number of conditions (4) and the row with the within-groups degrees of freedom (in cell E3) with our α level (.05).  This value is the critical q value.
  31. In cell I6, enter the following formula:

    =critical q value*SQRT(F3/COUNT(D10:D9))

    Where critical q value is replaced by the value from the table that you found in step 30.
    This is called the honestly significant difference.
  32. If the difference between any two condition means (in cells D6 to G6) is at least as large as the value in cell I6, then those two conditions are reliably different from each other.
  33. In row 1 enter "Prototype" in column K, "Old Instance" in column L, and "New Instance / High" in column M.
  34. In column J enter "New Instance / Low" in row 2, "New Instance / High" in row 3, and "Old Instance" in row 4.
  35. In cell K2 enter the following formula:

    =IF(ABS($G$6-D6)>$I$6,"DIFFERENT", "NOT SIG.")

    If the cell results shows DIFFERENT, the condition represented by the column and row likely have different mean values.  If the cell results shows NOT SIG, the condition represented by the column and row are not reliably different from each other.
  36. Copy the formula in cell K2 into cells L2 to M2.
  37. In cell K3 enter the following formula:

    =IF(ABS($F$6-D6)>$I$6, "DIFFERENT", "NOT SIG.")

    This cell is interpreted in the same way as above.
  38. Copy the formula in cell K3 into cell L3
  39. In cell K4 enter the following formula:

    =IF(ABS($E$6-D6)>$I$6, "DIFFERENT", "NOT SIG.")

    This cell is interpreted in the same way as above.
     
  40. In cell C6 enter the following formula to determine the mean age of the participants:

    =AVERAGE(C10:C9)

    In cell B6 enter the following formula to determine the number of female participants:

    =COUNTIF(B10:B9,"F")

    In cell B7 enter the following formula to determine the number of male participants:

    =COUNTIF(B10:B9,"M")

     
  41. We are now going to analyze the proportion correct data in the same way.  (The hard work is done -- this will only take a few steps.)
  42. Select all the cells in the spreadsheet by clicking in the gray cell above the row labels and to the left of the column labels.
  43. Copy all the cells to the clipboard (Edit | Copy or Ctrl-C).
  44. Switch to sheet two by clicking on the Sheet 2 tab in the lower left.
  45. Paste the data onto Sheet 2.
  46. Select the proportion correct data and headers by dragging across cells I10 to L9.
  47. Copy the data (Edit | Copy or Ctrl-C)
  48. Click in cell D10 to select it.
  49. Paste the data (Edit | Paste or Ctrl-V).
     
  50. ANOVA tests the null hypothesis that H0: μprototype = μold instance = μnew instance (high variability) = μnew instance (low variability)  That is, the ANOVA tests the assumption that all the means for the conditions are equal -- that the treatment had no effect.  To interpret the output, look at the value in the cell at the intersection of the row labeled "Between" and the column labeled "p" (cell H2).  If that value is less than .05 (our α level), it is unlikely that differences among the means as large as those in the data set occurred due to chance.  That is, if the value is less than .05, it is unlikely that all of the means are equal and it is likely that some of the means are different from other means -- the treatment had some effect.  The results of the ANOVA should be written as:  The analysis of variance revealed a main effect of the type of instance, F(give the value from the df column and the between row (cell E2), give the value from the df column and the within row (cell E3)) = give the value from the F column and the between row (cell G2), MSerror = give the value from the MS column and the within row (cell F3), p = give the value from the P-value column and the between row (cell H2), α = .05.

  51. The Tukey multiple comparisons test the null hypothesis that H0: μ1 = μ2 where the subscripts 1 and 2 represent any two of the four conditions.  When we reject the null hypothesis (when the difference between the means of the two conditions is larger than the honestly significant difference (see step 31), then it is unlikely that the difference is due to chance.  That is, it is likely that the difference is due to the treatment.  The results of the multiple comparisons should be written as:  Tukey multiple comparisons revealed reliable differences between list which pairs of means are reliably different, all p < .05.  No other differences were statistically reliable.

Suggested Data Analysis
Graph

Do not start the data analysis before 3/26/2008 11:59:59 PM.

  1. Return to the sheet that contains the data by clicking on the Sheet1 tab at the lower left of the window
  2. Select the cell means by dragging across cells D6 to G6
  3. Click Insert | Charts | Column | Clustered Column
  4. Click Chart Tools | Design | Select Data
  5. If Series 1 is not already selected, click on it
  6. Click the Edit button in the Horizontal (Category) Axis Labels box
  7. Click the data select button () to the right of Series name text box
  8. Select cells D9 to G9 by dragging across them and press Enter
  9. Click OK
  10. Click OK
  11. Click Chart Tools | Layout | Axis Titles | Primary Horizontal Axis Title | Title Below Axis
  12. Type "Instance Type" and press Enter
  13. Click Chart Tools | Layout | Axis Titles | Primary Vertical Axis Title | Rotated Title
  14. Type "RT (ms)" and press Enter
  15. Click Chart Tools | Layout | Gridlines | Primary Horizontal Gridlines | None
  16. Click Chart Tools | Layout | Legend | None
  17. Repeat the previous steps for the proportion correct data (change the Value (Y) axis text appropriately).  You will need to switch to Sheet2 (click on Sheet2 in the lower left) to get to the proportion correct data.
  18. Optional: Format the graph into appropriate APA style.  For example:
    1. Remove the border and gray background from the graph and chart area
    2. Make the bars black or white or a shade of gray
    3. Increase all font sizes to 12 points and remove the bold
    4. Make the graph sufficiently large

 

Reload the raw data set

Glossary

Analysis of Variance (ANOVA) -- ANOVA is an inferential statistic that tests the hypothesis that all the means from the various conditions of the experiment are equal.  It is used to tell us if the treatment had an effect.

Degrees of freedom (df) -- the number of scores that are free to take on any value after certain constraints (such as the mean of the data set) have been made.

Null hypothesis -- In inferential statistics, the null hypothesis is typically the hypothesis that one wants to reject.  In this study, t is the hypothesis that all the means are equal -- the various treatments had no effect.  In this experiment, the null hypothesis is H0: μprototype = μold instance = μnew instance (high variability) = μnew instance (low variability).  The Greek letter μ is the mean in the population.  The null hypothesis says that if we tested everyone, we would find no difference in the reaction times or proportion correct for the four different conditions of the experiment.

p-value -- in an inferential statistic, the p-value is the probability that a difference as large as what was observed in the data would occur by chance factors if the null hypothesis was true.  If the p-value is less than .05, then most psychologists are willing to state that the null hypothesis is probably not true.