CUP&A Public Research Dataset Releases

The Cambridge Multiple-Choice Questions Reading Dataset

Paper: The Cambridge Multiple-Choice Questions Reading Dataset 

Description: The Cambridge Multiple-Choice Questions (MCQ) Reading Dataset is a comprehensive dataset that consists of test-taker responses to 4-option multiple-choice reading comprehension tasks, segmented by varying proficiency levels. 

The dataset consists of 120 4-option MCQ multi-item reading tasks. Almost half the tasks (58 in total) target Common European Framework of Reference for Languages (CEFR) B2 proficiency level, with the full set ranging between CEFR B1 and C2 level.   

In multiple-choice tasks, the facility and discrimination values at the option level offer valuable insights. In this dataset, 78 out of 120 tasks include option-level values, which indicate how well an item is performing overall and pinpoint specific areas within the item that may be causing under-performance. 

Publication date: 2023

Keywords: Cambridge University Press & Assessment, Common European Framework of Reference for Languages, CEFR, Reading comprehension, multiple-choice 

Authors and Contributors: Cambridge University Press & Assessment (2023) The Cambridge Multiple-Choice Questions Reading Dataset. See dataset release paper for contributors.

Citing this paper:  Mullooly, A., Øistein, A., Benedetto, L., Buttery, P., Caines, A., Gales, M. J. F., Karatay, Y., Knill, K., Liusie, A., Raina, V., & Taslimipoor, S. (2023). The Cambridge multiple-choice questions reading dataset. Cambridge University Press & Assessment.

You may publish the results of research using this dataset.  In any such publication you must acknowledge use of the dataset in your research by citing Cambridge University Press & Assessment and the Authors and Contributors as shown. 

We ask you to inform us of any such publications by emailing:   

Please report any issues or problems in downloading the dataset by emailing:


Licence Agreement 

  1. By downloading this dataset and licence, this licence agreement (the “Agreement”) is entered into, effective this date, between you (the “Licensee"), and the Chancellor, Masters and Scholars of the University of Cambridge acting through its department Cambridge University Press & Assessment (the “Licensor”). 


  2. Copyright of the entire licensed dataset is held by the Licensor. No ownership or interest in the dataset is transferred to the Licensee, nor shall the Licensee have any rights in the dataset other than the right to use the dataset in accordance with this Agreement 


  1. The Licensor hereby grants the Licensee a non-exclusive non-transferable right to use the licensed dataset for non-commercial research and educational purposes only. The Licensee shall not sub-licence or assign the benefit or burden of this Agreement in whole or in part. 


  1. Non-commercial purposes exclude without limitation any use of the licensed dataset or information derived from the dataset for or as part of a product or service which is sold, offered for sale, licensed, leased or rented. 


  1. The Licensee shall expressly acknowledge and reference the Licensor when making use of the licensed dataset in all publications of research based on it, in whole or in part, through citation of the paper at the top of the dataset details page.


  1. The Licensee may publish excerpts of less than 100 words from the licensed dataset pursuant to clause 3. 


  1. The Licensor grants the Licensee this right to use the licensed dataset "as is". Licensor does not make, and expressly disclaims, any express or implied warranties, representations or endorsements of any kind whatsoever. The Licensor has no liability for any loss or damage whatsoever sustained by Licensee as a result of the availability or use of or reliance on the dataset. 


  1. The Licensor shall not be liable for any indirect or consequential loss or damage or for any loss of or corruption of data, loss of programs, profit or goodwill (whether direct or indirect) arising out of or in connection with the access, availability, use of or reliance on the dataset. 


  1. The Licensee shall indemnify and hold the Licensor harmless against any loss or damage which it may suffer or incur as a result of the Licensee’s breach of any terms of this Agreement. 


  1. This Agreement constitutes the entire agreement between the parties and supersedes any previous agreement between the parties relating to its subject-matter. Each party acknowledges and agrees that, in entering into this Agreement, it does not rely on, and shall have no remedy in respect of, any statement, representation, warranty or understanding (whether negligently or innocently made) other than as expressly set out in this Agreement. 


  1. This Agreement shall be governed by and construed in accordance with the laws of England and the English courts shall have exclusive jurisdiction. 



You may download this dataset if you agree to the licence terms above and complete the following registration form.  Publications using this dataset must acknowledge and reference Cambridge University Press & Assessment as the source of the data.



Registration form

This question is for testing that you are a human visitor and to prevent automated spam submissions.