Last month, Stanford University’s Center for Research on Education Outcomes (CREDO) released a new study on urban charter schools, which purports to show, for the first time, that charters outperform city public schools, at least on standardized-test scores. If true, the study’s findings are a potential bombshell since, thus far, studies have shown no meaningful difference between charter and public schools.
The new study, Urban Charter School Study Report on 41 Regions claims to show that “urban charter schools in the aggregate provide significantly higher levels of annual growth in both math and reading compared to their TPS [public school] peers.”
The years I’ve spent researching schools on the ground in charter-heavy districts like New Orleans and New York City made me skeptical of such an outcome. But because I am not an expert in research methodology, I decided to hire a respected statistician, Kaiser Fung, author of Numbersense and an adjunct professor of statistics at New York University who has no connection to the education-reform movement (and thus no axe to grind), to help me analyze the CREDO study.
After combing through the study and its accompanying technical document, and after exchanging a series of emails with Macke Raymond, Director of CREDO, we found significant problems with the CREDO study. The problems go well beyond technical quibbles and suggest that any generalizations drawn from the study about the quality of traditional public schools relative to charter schools would be a big mistake. In particular, the study does a poor job of explaining the basis on which it includes or excludes charter- and public-school students; an email exchange with Raymond clarified the study’s methodology, but also revealed that it introduced, in many cases, an anti-public-school bias. And, in at least one case—the findings on New Orleans, the first all-charter district in the country—Raymond admits that CREDO violated its own methodology, a fact not disclosed in either the study or its accompanying technical documents.
Let’s begin with a brief description of the study itself. The study analyses data from 22 states, covering just over a million charter-school students, during the 2006/2007-to-2011/2012 school years. It seeks to measure charter-school performance in 41 urban areas against students who attend “feeder” public schools in the same urban areas. The study includes about 80 percent of the charter students in the areas under study (20 percent are excluded from the study because CREDO could not find any matching public-school students.)
The study also relies on a controversial methodology that the researchers used in past CREDO studies and that has been critiqued here and here and here. What’s important to know about the methodology is that it purports to compare each charter student in the study to a “virtual twin,” a composite of as many as seven public-school kids who attend “feeder” schools and who “match” the charter students on both demographics and test scores; the virtual twin is literally an averaged kid. The demographic criteria for creating each virtual twin includes: grade level, ethnicity, gender, Title 1 eligibility, special-education and English-language-learner status.
In this post, I will not revisit the problems referenced above with CREDO’s virtual-twin methodology; rather I will focus on three major problems with this study:
First, the study excludes public schools that do NOT send students to charters, thus introducing a bias against the best urban public schools, especially small public schools that may send few, if any, students to charters. The study implies that the “virtual twins” are drawn from the general population of traditional public schools—specifically that a school is considered to be a feeder if even a single student transferred during the study period. This is not the case. In our email exchange, Raymond explained that to qualify as a “feeder school” a public school must send at least five students to charter schools, a detail not revealed in the study. The study never explains that it uses this stricter, five-student-minimum criteria that public schools must meet to be included in the study. (Nor does the study explain why it didn’t look at all “neighboring” public schools with comparable/charter-like demographics—whether they send kids to charters or not.)
To test my theory, I contacted two of the better Title 1 middle schools in New York City whose demographics I knew would mirror those of local charter schools to see if they meet the criteria that would qualify them as charter “feeder schools.” Global Technology Preparatory, in East Harlem, where most kids are black or Latino, estimates that it has sent only three of its graduating eight graders to charter high schools over the last three years. West Side Collaborative, a ten-year-old school with similar demographics, hasn’t sent a single transfer-student or graduate to a charter school, according to Jeanne Rotunda, the recently retired founding principal. Both schools received an “A” and a “B” on the last two graded report cards from the New York City Department of Education, and are given high marks for quality from parents, students and teachers. Yet, although both GTP and West Side have charter-like demographics and are in an area rich with charter schools, they would not count as feeders and, therefore, their students wouldn’t be included among the virtual twins in the CREDO study.
This also raises several further questions: The study’s geographical filtering mechanism for determining which schools qualify as “feeders” isn’t disclosed, except for some qualitative description in the Technical Appendix. It would have been much more straight forward to rely on simple geographic or district demarcations. New York City, for example, neatly divides its schools into clearly defined neighborhoods, such as East Harlem North and East Harlem South etc., as well as distinct educational districts.
Global Tech and West Side Collaborative also highlight the ways in which CREDO’s matching criteria miss critical differences between public- and charter-school demographics. Urban public school students are often poorer, more likely to attend schools with large number of kids with special needs and English language learners than their charter-school counterparts. They are also likely to have parents who are less engaged, for a variety of reasons, than those in charter schools, which target the most engaged families via everything from lotteries to requiring that parents attend a set number of open houses before they can even enter lotteries. These distinctions are not addressed by the CREDO study.
Second, in the case of New Orleans, the study compares charter students to virtual twins who go to school, not in New Orleans, but anywhere in Louisiana—a clear violation of the study’s feeder-school criteria, and one that isn’t disclosed in the study. In 2007, the first year of the study, 56 percent of New Orleans students were enrolled in charter schools; by 2012, the last year of the study, over 80 percent were enrolled in charter schools. Since each virtual twin is a composite of an average of five public-school student test scores, it seemed logical that there were not enough public-school students in New Orleans to meet the methodological requirements of the CREDO study. (See chart below) When I asked Raymond about this, she wrote: “In Nola, we use similar schools that operate in similar communities in Louisiana to provide our matches.”
Raymond also claims that New Orleans is the only city where she bent her own rules on drawing virtual twins from “feeder schools.” Yet, a similar problem is likely to impact at least two other cities—Washington, D.C. and Detroit, where 40 percent and 47 percent, respectively, of students attend charter schools. Assuming that only 80 percent of the charter students are matched in each case (the other 20 percent are dropped from the study because of insufficient public-school matches), then there are at most 1.4 and 1.9 public-school students, respectively, as potential matches for each charter student, far less than the 5-student average needed for virtual twins. However, the 1.4 and 1.9 public-student estimate assumes every public student is a match for a charter school student. We already know this to be false. Some public school students don’t attend feeder schools. There will be other reasons why public-school students drop out (e.g. CREDO’s methodology deletes public school students who transfer to a charter or who are demographically dissimilar to charter-school kids.)
The reverse problem—too many potential public school student matches per charter student—plagues the study’s findings in cities like Boston where a very small percentage of students attend charter schools. See below.
Problem Three: Subjective decisions on which charter schools and public schools to include or exclude introduce a number of additional anti-public-school biases.
The study includes both selective and non-selective charter schools, but eliminates an undisclosed number of demographically similar public schools as per above, again introducing the potential for anti-public school bias. Among charter schools, the study eliminates only those that operate in “secure” settings, such as detention centers, as well as “charters that are permitted to use entrance exams (these occur only in two of the regions, a total of 4 schools, I believe),” writes Raymond in an email. “Other than these exceptions we include all TESTED charter students.” Again, this detail is not available in the study or its attendant technical documents.
While relatively high-quality schools like Global Tech and West Side Collaborative do not match the study’s “feeder” criteria, selective charter schools are included in the study. Take the two-tier charter school system of New Orleans, where about a dozen of the city’s charter schools are part of the Orleans Parish School Board, which includes many of the city’s most successful schools, most of which are selective. These schools do not typically give entrance tests that would disqualify them from the CREDO study. But they do require applicants to submit, among other things: test scores, school grades, and attendance records. Even kindergarten applicants are required to submit a record of their work: In the case of the Lake Forest school, these include “1 current student artwork sample, a self-portrait drawn by said student, and one student handwriting sample. By way of partial explanation, Raymond writes: “The object is to create controls that mirror the range of charters, not the range of TPS [public schools.] We do not presume that the peers are a complete mirror of the entire TPS [public school] population.”
Says Fung: “This is exactly the reason why observers shouldn’t interpret the finding as representative of urban public schools.”
However, when a public-school student transfers to a charter, the entire record of that student is deleted from the virtual-twin control group. At the same time, she is eligible to be included in the study as a charter student. “This is doubly bad,” says Fung.
Here’s why: The public-school student who is transferring to a charter is presumably a good student. By deleting the record of this student during the time she was in public school, the study drags down the performance of the public-school matches during that period. Simultaneously, because this student can now be considered a charter student in future periods, and can be counted as part of the charter student population in future periods, she contributes to the performance of the charter students in the study.
Conversely, if a student drops out of a charter school, he is eliminated from the study. When I asked Raymond why this didn’t artificially improve the scores of charter-schools because drop-outs are likely to be among the weakest students, she answered: “Since we have very little test data on high schools, we actually don’t see if students drop out.”
Says Fung: Her answer amounts to “because we don’t have data, we don’t know if there is bias, and because we don’t know, we assume there isn’t”. The reality is the students are dropping out whether or not CREDO sees them. This is known as “survivorship bias”. All CREDO gets to see in the data are the “survivors”, students who have not dropped out of the system.
Imagine a clinical trial comparing cancer drug A and cancer drug B. You measure the increase in survival time of patients in each arm. Now, suppose each time someone dies, you drop them from the study. The problem is that the reason for the dropout is related to the outcome being measured (i.e. their survival time is so low that they died before the end of the study).
Similarly, if the students who have dropped out of charter are those who perform worse, then for sure, by letting these students drop out of the study, they have a bias in the selection of the charter student population.
Thus, the CREDO study appears to include the “cream” of the charter schools—the selective schools—while excluding the best public schools even among those that serve students who are demographically similar to those of nearby charter schools. At the same time, Raymond doesn’t acknowledge that creaming takes place among even non-selective charter schools in cities like New Orleans. Writes Raymond, in response to my question about creaming: “It is not widely acknowledged that there is cream skimming in Nola. With 90 percent of the students attending charter school it seems infeasible the cream skimming would occur.”
This despite a recent study, by the Educational Research Alliance at Tulane Univ., in which principals of New Orleans charter schools admit that they respond to market pressures—in particular, competition for scarce funds that come to schools with the highest test scores—by engaging in “actively selecting or excluding particular types of students.” Such clandestine selectivity may, indeed, contribute to large and under-counted dropout rates in New Orleans.
In addition, while CREDO says it uses an average of five public-school students, and a maximum of seven, to form a virtual twin, it never explains what proportion of public-school students in each region are included in the study, a problem that can, again, introduce bias.
In the cities with small charter-school populations, such as Boston, where only 13 percent of students attend charter schools, there are too many potential public-school student matches, requiring CREDO to make additional judgments, which again are not explained in the methodology, on which public-school kids to include. Here’s why: Since only 13 percent of Boston students attend charter schools, there are a lot more than 5 possible virtual twins (see chart); thus, CREDO needs to use an additional screening mechanism. If five matches are randomly selected from among all public school students in Boston, then large schools will predominate, introducing a large-school bias. In the most extreme cases—i.e. cities with tiny charter-school populations, such as El Paso, where less than 5 percent of the student population attends charter schools, there are many, many eligible matches per charter student.(see Kaiser’s chart)
Further, while the study matches the test scores of, on average, five public-school students to each charter student, not all the scores are exact matches. Yet, the study never discloses either the percentage of “inexact” matches nor whether these inexact matches are on the high, or the low side. If the majority of “inexact” matches are below those of their charter twins, it would not be surprising that, by the end of the study, the public-school virtual twins would still be testing lower than their charter counterparts. (The converse, of course, is also true. But we don’t know, because the study doesn’t say!)
Says Fung: “There is no clear statement from CREDO as to how they select matches under this scenario, and they do not control for school variability.”
Fung found numerous other problems, some of them technical, which I will not elaborate here. However, the examples above are more than enough to cast serious doubt on the study’s conclusions. And this, even without further challenging two key assumptions behind the study: A) That standardized-test scores are an adequate measure of school quality and B) That creaming in charter schools does not exist.
Finally, readers of the Credo report are likely to think the public-school matches are representative of public-school students in general (which Raymond herself said was not the intention), and to think that somehow a finding here, even if one were to accept CREDO’s methodology, can be used to advocate expanding charter schools.
Unfortunately, the myriad problems with this study have not stopped many charter advocates and even some respected journalists from blindly accepting the study’s findings.