The challenges regarding data production

César Guadalupe
Universidad del Pacífico





Abstract This article identifies some major issues regarding the generation and availability of data on youth and adult education. It discusses the production of data on programmes and enrolment, as well as financing and issues related to the identification of the target populations. A special focus is set on the realm of testing skills and competences, given their complexity as well as the paramount profile and importance that they have acquired in recent years. The article stresses the fact that technical attributes of data generation endeavours are contingent upon their purposes. A non-technocratic and more politically-sensitive approach to data generation is promoted.

Systematic data on youth and adult education are scarce, and their quality is often not properly documented or known. This is somewhat typical of a field that is too often overlooked, also in educational policies.

Apart from neglecting practices, there might be other factors preventing the development of a body of evidence that can play a role, not only regarding the analysis and debates on youth and adult education, but also in raising its profile.

This article looks at four areas in which data are scarce or not systematic: (i) programmes and enrolment data; (ii) learning outcomes; (iii) population to be served; and (iv) financing.

Each of these areas faces particular issues. Understanding them will enable stakeholders to better align their data-­related activities by having an organised frame of ref­erence.

Educational programmes and enrolment

Youth and adult education programmes are highly diverse in the way they are organised and delivered, as well as which agencies are involved in organising and delivering them. The purposes that they aim to serve are equally diverse. In comparison to regular school programmes for children, however, the world of youth and adult education seems to be much more difficult to grasp.

We can nevertheless sort youth and adult education programmes into three main categories: (i) programmes that are equivalent to those recognised as “official” in the country, that is leading towards official certifications that open up opportunities for further studies in the formal education system; (ii) programmes with specific purposes that serve specific needs and do not lead to formal certifications; and (iii) a combination of the first two.

The first group of programmes can be mapped into the national qualifications structure, and into the International Standard Classification of Education (UNESCO/UIS 2013). The organisation of enrolment and graduation records should facilitate that transfer. The second group of programmes represents a different challenge, since there is no need to map them into formal tracks, unless a standard compilation of data is needed for some specific purposes. In this case, a flexible classification scheme that acknowledges the nature of these programmes (as continuing education not leading to formal certificates) is called for.

In both cases (as well as in those cases that combine the two), agreement among stakeholders is needed. Some might need to accept that data generation might not have the exact form that they would like it to, but that there is a more general benefit stemming from being able to portray a comprehensive and reliable map of youth and adult education programmes in general.

As soon as the previous issue is sorted out, we need to focus on recording enrolment information. Here we must differentiate between two distinct observational units: (i) individuals and (ii) units of service (individuals served by a given programme). A typical issue stems from confounding these two things: An individual can be registered in more than one programme, and this individual should therefore be counted as one (if the focus is placed on headcounts), but there might be many units of service. Adding up participants in different programmes does not necessarily yield a total number of participants as individuals. The only exception is when enrolment in a given programme precludes enrolment in another programme in a given period of time.

“Measuring learning outcomes is probably the most difficult and most debated area of data generation in education.”

Now, recording units of service entails a risk of counting programmes that are extremely diverse as if they were equivalent. For instance, a six-hour programme should not count as equivalent to a one-semester part-time programme. This is especially true if one is interested in acquiring information on financing and allocation of resources. Using some equivalent unit such as credit hours is one way to sort out this problem.

It would therefore ultimately be possible to report participants (as headcounts), as well as units of service in equivalents of participant hours/days/credits.

Learning outcomes

Measuring learning outcomes is probably the most difficult and most debated area of data generation in education. As in any information generation endeavour, the key question is how to identify, from the very beginning, the purpose(s) that the data generation effort is going to serve. If comparison (in time, across groups) is important (to identify progress, or gaps), that purpose should be ensured in every step along the way. This includes the way in which measuring instruments are built and administered.

Programmes are usually structured to help teachers or facilitators provide a comprehensive assessment of participants’ progress and achievements. Those assessment efforts are necessarily focused on each particular setting, and therefore mobilise different criteria. This makes it difficult to generate aggregated data that is meaningful beyond a simple (and not very specific) counting of those who pass/fail/complete a programme. We are left without any certainty with regard to the actual competences that attendees have developed. It is also difficult to track progress over time when the actual criteria used for assessing might change (if one wants to measure change, it is important not to change the measure).

From these final elements stems the fact that standardised measures of skills and competences might be needed. However, testing competences is a complex task that poses challenges pertaining to several aspects, including validity and reliability issues. In this regard, it is important to pay due attention to the complexity of testing (American Educational Research Association et al. 2014), and to be wary of proposals that offer a cheap, quick fix to a complex problem.

For instance, a test can be designed in such a way as to rank individuals (differentiating between those who perform better/worse than others, regardless of the performance level that they achieve; this is what is usually known as a norm-referenced test), or to identify how they perform as compared to an expectation explicitly stated as a sort of standard or performance level (this is known as a criterion-references test), or a combination of those (Glaser 1963). This has major consequences on deciding which questions (items that include a stimulus and a question or task) are included in a particular test.

A test should also be able to properly represent what it claims to measure (“construct validity”). It should be able to capture the main components of that construct (“content validity”). It should also be valid in relation to a particular observable behaviour that it is intended to describe (“concurrent validity”) or anticipate (“predictive validity”). Finally, it is important that when designing a measurement mechanism, attention is given to the consequences (“consequential validity”) that it can have on the social setting in which it operates (Zumbo & Hubley 2016).

The last element also points towards considering the overall institutional context and conditions under which a particular test is designed, administered and used. Data can be mobilised for different purposes, including some controversial political ends (Gorur 2015, 2017; Grek 2009, 2015; Guadalupe 2017; Hamilton 2012).

© Shira Bentley

We also need to consider the way in which data will be processed and analysed. Current testing practices tend to rely on mathematical models grouped under the label Item ­Response Theory (Baker 2001; Hambleton & Jones 1993; Hambleton et al. 1991). This approach allows for a more precise way of addressing the actual attributes that individual questions (items) have when applied to a given popu­lation and, therefore, it allows identifying issues pertaining to how different populations respond to different questions which might affect the usability of some items for having reliable comparable data (Zumbo 1999, 2007).

Finally, if a particular test is going to be administered to individuals from different cultural and linguistic backgrounds, some specific issues arise in relation to the translation and adaptation of tests (Hambleton 2005; Hambleton et al. 2005).

The target population

Youth and adult education programmes are increasingly important in a world that is becoming progressively aware that education and learning take place along the whole course of life. This frequently makes it tricky to have a definition of the target population to be served, thus making it impossible to properly estimate the coverage of these programmes beyond a simple measure of the “number of participants”.

A first way of addressing this topic requires differentiating according to the intentionality of the programmes: (i) programmes that have a remedial component in relation to failure to complete compulsory education, and (ii) programmes that go beyond remedial purposes.

It is clear that the first group of programmes should have a definite target population: those who did not complete (or even start) compulsory schooling when they were supposed to. Household survey data can be used to estimate this segment of the population (Guadalupe et al. 2016; Guadalupe & Taccari 2004; UNESCO Santiago 2004), and these estimates are of paramount importance to avoid a trend towards self-complacent practices that are too focused on what we do, and neglect what we have to do. At the same time, estimates of the number of people who did not complete compulsory schooling might underestimate the need for remedial programmes since, unfortunately, many people complete schooling without developing the competences and skills that they should. Estimating this additional need can be done by surveying the distribution of competences among adults.

For non-remedial programmes, there is no clear or precise way of identifying a target population; thus, coverage can be estimated only as a proportion of the whole youth and adult population.

Collecting data on financing

This is probably the most problematic area because of the diversity of ways in which information is recorded in governmental sources, but also for the huge practical challenges entailed in trying to compile organised and systematic data from non-governmental sources. Having standard definitions of major components (current expenditure as opposed to investments; salaries as distinct from other current expenses; overheads or administrative costs) is not always an easy thing to do.

At the same time, information on finance should be read against some sort of benchmark that would provide indications of the level of sufficiency of the resources invested. Establishing a benchmark is difficult (UNESCO Santiago 2007) since it requires one to have a clear estimation of needs (which are diverse, so that addressing them involves diverse costs). We must also disregard the oversimplifications that have populated the world of education for decades, such as establishing a magical (impossible to sustain) fixed percentage of something (production, public expenditure, etc.) that appears as applicable everywhere (as if diversity did not exist), in a world where diversity and change are the rules and for long periods of time (as if change did not exist).

The next move is yours

This article is a quick summary of the major issues pertaining to the realm of data generation in youth and adult education. Data generation (not collection, since data is not a natural element that can be collected like berries, but consists of social constructions based on concepts, interests, ideas, etc.) cannot be taken as a simple issue or as something purely technical, void of political and ideological elements. On the contrary, decisions on what data should be produced, how that data should be generated, compiled, analysed and reported, are fundamentally grounded on the purposes and agendas that a particular agent wants to advance (Guadalupe 2015). Therefore, a substantive and explicit definition of purpose(s) is the cornerstone of any data generation endeavour.

At the same time, the previous point should not be used as an alibi to justify any decision regarding data: There are specific complexities and attributes that need to be properly factored into any data generation effort if sound, useful data is to be produced and reported. Cheap and quick “fixes” usually disregard the scientific properties that sound data have and, therefore, its usability. It is usually better not to have any data and be aware of this lack of evidence than to have poor data and assume that we have something on which we can rely. The first situation leads to careful action (including addressing the information gap), while the second leads to mistakes that affect people’s lives.


American Educational Research Association, American Psychological Association & National Council on Measurement in Education (2014): Standards for educational & psychological tests. Washington, DC: AERA.

Baker, F. B. (2001): The basics of item response theory (2nd ed.). USA: ERIC Clearinghouse on Assessment and Evaluation.
Glaser, R. (1963): Instructional technology and the measurement of learning outcomes: Some questions. In: American Psychologist, 18(8), 519–521.

Gorur, R. (2015): Assembling a Sociology of Numbers. In: Hamilton, M.; Maddox, B. and Addey, C. (eds.): Literacy as Numbers: Researching the Politics and Practices of International Literacy Assessment Regimes, 1–16. Cambridge: Cambridge University Press.

Gorur, R. (2017): Towards productive critique of large-scale comparisons in education. In: Critical Studies in Education, 58(3), 1–15.

Grek, S. (2009): Governing by Numbers: The PISA “Effect” in Europe. In: Journal of Education Policy, 24(1), 23–37.

Grek, S. (2015): Transnational education policy-making: international assessments and the formation of a new institutional order. In: Hamilton, M.; Maddox, B. and Addey, C. (eds.): Literacy as numbers: researching the politics and practices of international literacy assessment, 35–52. Cambridge: Cambridge University Press.

Guadalupe, C. and Taccari, D. (2004): Conclusión Universal de la Educación Primaria: ¿cómo evaluar el progreso hacia esta meta? Santiago de Chile: UNESCO.

Guadalupe, C. (2015): Contar para que cuente: una introducción general a los sistemas de información educativa. Lima: Universidad del Pacífico.

Guadalupe, C.; Castillo, L. E.; Castro, M. P.; Villanueva, A. and ­Urquizo, C. (2016): Conclusión de estudios primarios y secundarios en el Perú: progreso, cierre de brechas y poblaciones rezagadas (Documentos de Discusión No. DD1615).

Guadalupe, C. (2017): Standardisation and diversity in international assessments: barking up the wrong tree? In: Critical Studies in Education, 58(3), 326–340.

Hambleton, R. K. et al. (1991): Fundamentals of Item Response Theory. Newbury Park, London, New Delhi: Sage.

Hambleton, R. K. and Jones, R. W. (1993): An NCME Instructional Modul on Comparison of Classical Test Theory and Item Response ­Theory and their Applications To Test Development. In: Educational Measurement: Issues and Practice, 12(3), 38–47.

Hambleton, R. K. (2005): Issues, Designs and Technical Guidelines for Adapting Tests Into Multiple Languages and Cultures. In: Hambleton, R. K.; Merenda, P. and Spielberger, C. (eds.): Adapting Psychological and Educational Tests for Cross-Cultural Assessment, 38(3). Mahwah, NJ: Lawrence Erlbaum Associates.
Hambleton, R. K.; Merenda, P. and Spielberger, C. (eds.) (2005): Adapting educational and psychological tests for cross-cultural assessment. Mahwah, NJ: Lawrence Erlbaum Associates.

Hamilton, M. (2012): Literacy and the Politics of Representation. Oxon: Routledge.

UNESCO Santiago (2004): La conclusión universal de la educación primaria en América Latina: ¿Estamos realmente tan cerca? Santiago de Chile: UNESCO.

UNESCO Santiago (2007): Educación de calidad para todos: un asunto de Derechos Humanos. Santiago de Chile: UNESCO.

UNESCO/UIS (2013): International Standard Classification of Education. ISCED 2011. Montreal: UNESCO/UIS.

Zumbo, B. (1999): A Handbook on the Theory and Methods of Differential Item Functioning (DIF): Logistic Regression Modeling as a Unitary Framework for Binary and Likert-type (Ordinal) Item Scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.

Zumbo, B. (2007): Three Generations of DIF Analyses: Considering Where It Has Been, Where It Is Now, and Where It Is Going. In: Language Assessment Quarterly, 4(2), 223–233.

Zumbo, B. and Hubley, A. M. (2016): Bringing consequences and side effects of testing and assessment to the foreground. In: Assessment in Education: Principles, Policy & Practice, 23(2), 299–303.

About the author

César Guadalupe is a Doctor of Education (Sussex), M.A. Social and Political Thought (Sussex) and Sociologist (PUCP). He is a Senior Lecturer-Researcher at the Universidad del Pacífico (Peru). Previously he served for eleven years at the UNESCO Institute for Statistics, and UNESCO/Santiago. He is a member of the Peruvian ­National Education Council (2014–2020) and its current Chair (2017–2020).