The Harms of High Stakes Accountability Tests

In an elementary school in Westport, students are taking their Connecticut Smarter Balanced Assessment. They sit silently in a gloomy room, completing their standardized tests uninterrupted. Some are very nervous, some are almost apathetic to the tests. Meanwhile, in different schools within the same state, students take those tests in much worse conditions and thus score worse. This is the problem with standardized testing. While standardized tests such as the SBAC are designed to provide objective data to the school system of how well students are being taught by teachers, it is a simple fact that students with better testing environments and schools with more funding will test better than disadvantaged schools due to their better conditions. Thus, the tests do not reflect actual learning in the classroom, but instead how much funding it had. In recent years, the futility of the test increased even more because of economic inequalities being exacerbated due to the pandemic. However, instead of being used to give more help to these schools, the data is being used to withhold funding from and punish schools that do not meet the standard. So, in present-day America, standardized testing is being used in a completely incorrect way. Because high stakes accountability tests are proven ineffective because of non-standard environments and reflect racial and economic inequalities instead, states should reconsider their use of standardized testing.

Though they were created for a different purpose, standardized tests are currently being misused and are generally ineffective. Standardized testing was originally created and developed in the 19th and 20th centuries. In his article published in Brookings “Standardized tests aren’t the problem, it’s how we use them”, Andre M. Perry writes that standardized tests were originally meant to measure a few parts of academic performance (Perry). The original purpose of the tests was to give a fair and basic understanding of the academic strengths of students. However, the tests are currently being used to grade students, evaluate teachers, and evaluate schools. Given the original purpose of the tests, it is clear that they are being misused in a way that is not aligned with their real purpose. Accountability is out of the scope of simply providing data and instead just unnecessarily punishes those who need more help. The tests emphasized reliability, which meant that the same person taking the same test twice would get the same result, and validity, which is that the test measures qualities that it is designed to predict (Benjamin and Pashler 14). The effectiveness of the tests is based on how well it predicts certain parts of academic performance and consistency in results. Without these qualities, the results are not fulfilling their purpose and are meaningless. Therefore, any conclusions made from any results that are not completely standardized are meaningless.

However, the very concept of high-stakes standardized testing is flawed. Teachers are not directly responsible for each of their students’ scores and effective teaching and learning is not equated to higher test scores (Au, “Neither Fair Nor Accurate”). First, there are statistical error rates when measuring a teacher’s effectiveness. The error rate for a year’s worth of test data is 35% while the error rate for three years is 25%. So, when using high-stakes test scores to judge teachers, there is a one in four chance that a teacher that should get an average rating would be incorrectly and unjustly rated as below average and get punished. As evidenced by this statistic, the results are not consistent due to random chance in a given year. In addition, there is year-to-year test score instability with test scores of students taught by the same teacher widely fluctuating from year to year. In a year, a third of those from the bottom 20% went to the top 40% and a third of top-ranked teachers dropped to the bottom 40%. There is also day-to-day score instability, with 50-80% of the change in a test score being caused by random one-time factors such as nutrition, mood, and who takes the test with them. A lot of these factors are also out of school and so their effects should not be attributed to the school or teacher. Thus, it is truly irrational to punish schools, teachers, and students for performing worse on standardized tests, instead of allowing the tests to fulfill their original purpose of a simple comparison to just provide data. However, there are also economic and racial factors that greatly affect standardized testing results.

High-stakes accountability test scores are based on class and racial inequalities rather than the quality of teaching or learning. High-stakes standardized testing originated from the IQ tests used in the early 1900s to test 1.75 million recruits during World War I (Au, “Hiding behind high-stakes testing… 8). Though the IQ tests were created just to assess if young children were developmentally disabled, American cognitive psychologists co-opted the tests to prove their previous assumptions about human ability. They were meant to help justify ranking different people by race, ethnicity, gender, and class, according to apparent “hereditary” intelligence. In 1940, the founder of the NAACP W.E.B. DuBois said that it was “indeed after the (first) World War that there came the hurried use of the new technique of psychological tests, which were quickly adjusted so as to put black folk absolutely beyond the possibility of civilization” (Au?9). Then, standardized IQ testing soon began to be used in academic tracking, with the army tests being adapted into the National Intelligence Tests for school children to classify and sort them. The logic of standardized tests providing objective measurement was extended into school structures, reproducing socio-economic inequalities. In modern-day America, high-stakes testing policies have not improved reading and math achievement and have not narrowed achievement gaps between white and non-white or rich and poor students. For example, when Massachusetts used a high-stakes accountability system in the 1990s, there was a 300% increase in dropouts, which were disproportionately African American and Latino (11). So, while it claims to be a meritocracy and benefits those who are disadvantaged, high-stakes testing has historically had the opposite effect and measures how privileged a student is, not necessarily their merit. Standardized testing also is naturally geared to those with wealth. The aforementioned out-of-school factors such as inadequate access to health care, food insecurity, and poverty-related stress, are all related to one’s economic status. All of the average scores for the standardized international tests, national tests, college entrance tests, and each of the individual state tests increase as income per family increases (Strauss). Specifically, on the mathematics part of the 2012 Program for International Student Assessment, poor students from the lowest quartile in family income scored an average score of 425 while those from the highest quartile of family income scored an average of 528. In 2014, students from families that earn more than $200,000 annually score an average combined score of 1,714 on the SAT while those from families earning under $20,000 annually score an average combined score of 1326 (Hess). In each of the three parts of the SAT, this trend held, especially in the reading section, where the less wealthy students got 433 in comparison to the wealthier students with 570. These differences in school are attributed mostly to funding disparities and out-of-school factors which are influenced greatly by wealth. Wealthier students can afford summer school, parent education programs, parent housing vouchers, extracurriculars, AP courses, more school staff, professional development, tutors, and various other resources that other students simply do not have due to funding disparities. In addition, students in more affluent areas are more likely to get special “504 designations,” normally given to students with anxiety or ADHD, which gives them extra time. This inequality has also increased. A study from Sean Reardon, a Stanford education researcher, shows that the difference in scores between students from highest and lowest income families had jumped 30 to 40 percent in the decades leading to the 2000s (Barnum). This shows that the problem is not only still prevalent, but also getting worse due to accountability standardized tests. However, the U.S. government has not done enough to address the issue.

Though the U.S. government has historically supported high-stakes accountability tests, it should invest in other alternatives in the future. Though in 1999 the Committee on Appropriate Test Use of the National Research Council warned that a single test score should not be used to greatly affect test-takers, Congress passed the No Child Left Behind with strong bipartisan support in 2001 (Au, “Neither Fair Nor Accurate”) and President Bush signed it the next year. An extension of the Elementary and Secondary Education Act (ESEA), it emphasized “accountability”, “flexibility”, “research-based education”, and “parent options” (“No Child Left Behind Act of 2001”). States were required to test students in math and reading annually from grades 3-8 and once in high school. 100% of the students were expected to meet or exceed state standards by 2014. The proficiency levels were called adequate yearly progress (AYP) (Lee). If a school did not meet its AYF, the government could label it as “needing improvement”. If a Title 1 school was labeled that way, the act allowed the state to change the school’s leadership team and staff or close the school. A school that missed the AYP two years in a row had to allow students to transfer to a public school that performed better in the same district (Klein, “No Child Left Behind: An Overview”). If a school missed the AYP for three years in a row, it must have offered free tutoring. However, there were many occasions in which the students simply did not use those opportunities. While some believe that it did lead to a greater focus on struggling students, NCLB focused too much on standardized testing and caused students to “teach” the test, focusing on the testing material as their funding was now connected to those scores. This results in the curriculum being narrowed and less helpful information being taught to the students. The penalties were also often too harsh and unhelpful, as the staff was often unjustly considered to be ineffective. The failure rate for the AYP rose from 29% in 2006 to 38% in 2010 to over 50% in several states. The high failure rate to meet AYP shows how the act was ineffective and did nothing to help the learning of students. It took away time for actual learning, but still did not even meet the goal of improving test scores. After growing public criticism of aspects of NCLB and campaign rhetoric about more measures of student learning and teacher evaluation, Barack Obama and the general public still supported the use of high stakes standardized tests in education (Au, “Hiding behind high-stakes testing…” 11). This shows how testing is still prevalent in society and is still an issue that needs to be addressed. He promoted the “Race to the Top” program, including more funding for accountability tests. While his bipartisan Every Student Succeeds Act (ESSA) gave more leeway to states and allowed them to pick their own accountability goals, states still intervened with the bottom 5% of performers and could take over the school, fire the principal, and turn the school into a charter (Klein, “The Every Student Succeeds Act: An ESSA Overview”). Thus, though some undesirable parts of previous attempts at high stakes accountability tests were removed, there was still general support for this type of testing being used to decide the funding. However, there are alternatives to this type of testing (Top 7 Alternatives To Standardized Testing). A popular alternative is instead using other measures and forms of data to gauge and understand a student’s performance such as graduation rates, demographic information, and emotional skills surveys. This allows students to perform in a stress-free environment and have a more holistic, more fair form of assessment that doesn’t exacerbate class and racial inequalities. In addition, like candidates in the job market, students can submit portfolios composed of their best individual and group projects, which shows their best work. This gives them more control over their own work and allows them to work in their own interests, instead of being taught testing material. Standardized testing can still be used, but not to defund schools for doing badly and more commonly so that they stress the student less. Inspections are also a good tool that can be used to review work without any advantage to wealthier or white students. Overall, many alternative tools can be used to

There is no doubt that high-stakes testing is futile and has had a negative impact on American education. It has not succeeded in its original purpose to accurately and objectively measure student data because of its inherently error-prone and random nature. The tests cause teachers and schools to be unfairly punished as they are more of a reflection of a student’s socioeconomic status, rather than their actual learning or the teacher’s instruction and make inequalities worse. Although the flaws of the practice have been seen in the failure and repeal of the No Child Left Behind Act, support for the testing remains popular and an issue in modern-day America. High-stakes testing has played a significant role in the United State’s education system over the last century of our history. However, there are numerous better alternatives to the practice that benefit students more and do not hinder schools’ ability to teach students. Quality education has numerous, vital impacts on society. In all nations, it is essential to reduce inequality and ultimately benefit the people. Education is the foundation of skills that citizens need to function and thrive in society, whether it be basic job qualifications or critical thinking skills. Thus, it is imperative to the future well-being of America and its people to get rid of and replace high-stakes accountability tests.

Github