correct answers造句

日期：2022-08-09 10:08:40 栏目：诗词文章浏览：323次来源：原由网人生GU事侧边栏

Chapter 6 The role of large-scale assessment in learning是全书最技术的一章，对于多数一线教师（包括我自己）阅读理解起来可能更难。昨天第一遍阅读截止昨晚11点只有两为群友分享了她们的阅读与思考。其实，我们不一定需要精通，但一定要尽可能去理解。其中，张海会老师将本章中各节自己认为的重点难点提炼一遍的做法值得借鉴，不失为一种行之有效的学习方法（点评与反馈附后）。

我对本章建议的思考题如下：

1.翻译杜威的名句“Were all instructors to realize that the quality of mental process, not the production of correct answers, is the measure of educative growth something hardly less than a revolution in teaching would be worked.” 你如何理解这句话里时态的使用？此引用与本章的关联？对我们教学的启示？

2.请整理本章中的重要概念，并查阅专业词典，如我们之前推荐的外研社引进的《语言测试词典》。

3.请用测评的basicprocess (p. 78) 自评我们的实践，哪些环节做得不够甚至缺失？如何改进？

4.请下载和阅读Broadfood & Black（2004）文献Redefining assessment? 请谈谈你对他们此话的理解：“Educational assessment must be understood as a social practice, an art as much as ascience, a humanistic project with all the challenges this implies”（p.79）。强力推荐李筱菊教授的《语言测试科学与艺术》（湖南教育出版社2001版）。

5.请翻译Messick（1989）对效度的定义：“an overall evaluative judgment of the degree to which evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores”（p. 86），并谈谈你对此的理解。

6.请重读本章对Weir(2005) socio-cognitive validity framework的介绍（pp.88-89）。推荐阅读本人国家社科重点项目申报书“基于证据的大学英语四六级、雅思、托福考试效度对比研究”，并谈谈你的理解与收获。

2017元宵节礼物——国家社科重点项目申请书分享

国家社科基金项目设计案例分享（附重点项目书）

7.你赞同大规模测试的四点优势么（p.90）？请谈谈你校或你本人是如何使用我国相关的大规模语言测试结果/数据的？

8.请问你是否有认真完整地阅读过相关的教学大纲（如：高中的新课标、大学英语课程标准、英语专业本科教学质量国家标准等）？如果没有，请查找阅读，并谈谈你阅读后的收获。

9.请问你是否有认真完整地阅读过我国的大规模英语测试（如：高考英语、四六级考试、四八级考试、硕士研究生入学英语考试等）的考试大纲？如果没有，请查找阅读，并谈谈你阅读后的收获。

10.请查阅我国大规模语言测试分数解释/报道的权威性文献；请查阅我国大规模语言测试发展历程的权威性文献，请在群里分享文献，并分享你的阅读收获。

11.请查看国内大规模语言测试的官网，并对比国外的大规模语言测试的官网（如：雅思、托福考试的官网），对比其的异同。

今早重新阅读本章、反馈大家思考题和拟定本章思考题共用了四个小时（4：30-8：30）。

correct answers造句

福豆8点过睡眼惺忪地出现在奶奶的书房门口，说自己睡醒了，让我给她穿衣服。我给她把了尿，放她在榻榻米上说：奶奶很快就做完了，你先躺一会儿啊。一分钟后听到她踏实的鼾声......

每一章的思考题分量都很重，希望它们能真正促进大家的阅读与思考。在一个月高强度的训练之后，大家能够独立地查找、阅读、点评相关的尤其是高水平文献。

我最喜欢的杜威的名句是“The goal of education is to enable individuals to continue their education” (Dewey 1916: 100)。我们要把阅读与自我提升当作养孩子那样倾情投入，不用五年、十年，三年即可看到成效显著！

correct answers造句

3.17DAY 17 群友思考题汇总

（林禹宏 截止至2020.3.1723：00）

陈艳清

1.关于引用杜威的问题：

大学阶段的各门课程期末考试、结业考试中，有多少instructors把”thequality of menntal process... is the measure of educative growth”? 我以后努力好好把这条去实践一下（赞）。

2.很赞同作者说的“learningis a process as much as an outcome...”（P78）,学习不仅仅是结果，过程更重要，而对过程的评价也有必要。形成性评价中怎么给学习过程的质量差别体现出来？ （看下一章）

3.（P88）对第二段作者说的“testscan be either reliable or valid but not both” （这是解释前面的引用的英美两国的测试传统，并非作者观点。重点在随后的But……）这不怎么理解。大规模测试中的情况呢？真是信度优先吗？何以见得效度和信度不可能同时做好呢？我们的专业四八级考试，大学英语四六级呢？

4.图6.2中自上而下那response指什么方面的response？(受试/考生作答)作者也没做说明。

5.6.7的标题是Large-scaleassessment:evidence of and for learning, 但是感觉作者更多的是说了大规模测试的strengths（这就是for learning啊，而且是重点，我原由网们之前不够重视的；大规模测试提供的evidence of learning是不言而喻的），summativetests如何能促学之类的，并没有具体讲evidence，whynot？不知道是不是我没看懂？

张海会

辜老师好：

1. 文章开头杜威的话“如果所有的教育者都能够意识到思维过程的质量，而不是产出正确答案，才是衡量教育进步的标准该多好啊，其价值不亚于一场教育革命。”从中可以看出教育的重点在于促进学习者的思维发展，或者是能力的培养，而不仅仅是掌握具体的知识，这需要提升教育者的意识或者是思维，只有教育者意识到这个问题，才能真正把这种理念付诸实施，才能真正促进学习者的学习。我同意这种注重思维能力培养的观点，但是这个历程任重责艰，而且转变教育者的思想为重中之重。其实这在某种程度上和我们现在进行的导读活动也有相似之处，一方面是对于学习形式的认识，另一方面是对于所学内容的思考。这些对于像我这样的青椒来说都是第一次，但是其影响一定是深远的。所以，在今后的教学中，要努力扩展自己的思维，多角度多维度思考，把所学应用于时间，服务于教学。（大赞）

2.关于本章的内容的文献汇总

辜&李：第六章介绍了考评部等机构实践的大规模测试，探讨了一系列保障大规模考试信度和效度的重要条件，比如水平测试中的能力参照、量表建构、用于估计题目难度的项目反应理论（item response theory）、用于测试建设的题库（item banking）。这些都有助于验证LOA增强学习的积极影响。

Li & Gu：ChaptemAQwxr 6 presentslarge-scale assessment as practiced by examination bodies such as CambridgeAssessment English. A range of important requirements to ensure validity andreliability are discussed, such as criterion reference in proficiency testing,scale construction, item response theory for item difficulty estimation, itembanking in test construction. These are helpful in validating the model oflearning progression and in promoting positive impact on learning.

李亮：第六章突破了传统的形成性评价和终结性评价的二分法，分析了大规模测试在学习中所发挥的作用，探讨其效度和信度问题。本章以社会认知作为框架，从构念效度、情境效度、结果效度和效标关联效度方面探讨测试的效度问题。测试任务应依据标准参照(criterion-referenced)量表和项目反应理论(Item Response Theory，IRT) 设计，经Rasch 模型分析和反复试测，确保题项的效度。学习导向的评价体系中，测试效度具有积极的社会效应:师生备考是真实的学习过程，测试高阶能力的导向作用促使师生提升交际语言能力;测试提供学业成就发展信息，帮助师生反思语言教学和学习成效，为日后调整做好铺垫;测试反映学习结果，激发学生学习动机。信度和效度并非此消彼长，信度的真正挑战来自于如何在更大范畴内整合测试与课堂评价。

Linlin Cao: Chapter6 focuses on large-scale assessment as practiced by Cambridge English. Theauthors emphasize the two critical concepts of an assessment, namely, validityand reliability, and they demonstrate how criterion reference, scaleconstruction, item response theory, item banking, and performance assessmentcould account for the validity and usefulness of large- scale assessments. Theyalso claim that learning-oriented large-scale assessment can achieve highlevels of reliability without compromising validity, thus distinguishing LOAfrom other assessment approaches. Assessment practitioners might draw upon thischapter to catch aglimpse of the requirements of large-scale assessments that ensure theirvalidity and reliability.

(此类汇总这样可以帮助更好地理解本章！)

3.本章要点汇总及主要概念和理论：这几天通过这项学习，感觉对文章重点的理解和文章内容的理顺很有帮助，而且在此过程中能够加深理解。前面的阅读基础对此也起着非常重要的作用。

The two key qualities of anassessment—validity and reliability—are related to the specific requirement ofLOA—that is, to promote better learning.

6.1 Proficiency test: The importance of criterionreference

Our focus is on language proficiency,rather than achievement. The proficiency/achievement distinction parallels thedistinction between treating language as a skill or as a body of language.

Evaluating learning in relation to desired, real-worldoutcomes is called c riterion reference.

Evaluating learning by how learners rank in relation toeach other—better or worse—is called norm reference.

6.2 Scale construction

Valid measurement begins with theconstruction of a common scale to which every individual test result cansubsequently be linked（punctuationmark missed）

A key aim for large-scale assessment is tostandardize judgments or measures so that they remain the same across examsessions.

The term ‘ scale’ might be used for any kind of deion of a progressionfrom less to more: quantitative, qualitative, or a combination of both.

Clearly, the most useful scales are thosethat combine effectively the quantitative and qualitative: accurate measurementand meaningful interpretation.

RaschModelbased on Item Response Theory ( IRT)

IRT conception of measurement: The metaphorof “measurement” suggests that language proficiency is something in a person’shead which can be qualified, can be measured. Language proficiency, liketemperature, has unique meaning, as if a single language test could measure alllearners in all contexts.

6.3 Item Resp//www.58yuanyou.comonse Theory

Rasch Modelwithin IRT exploits the factthat the probability of a learner responding correctly to an item depends onthe difference between the item’s difficulty and the learner’s ability on aproficiency scale.

Finding the difficulty of test items iscalled calibration, and it is vitalthat all items in the bank are calibrated to the same scale. This is done bysetting an arb原由网itrary point on the scale at the beginning of scale constructionand ensuring that every subsequent data set can be linked to it, by includingsome items which have already been calibrated. This is called anchoring.

6.4 Item banking (Figure 6.1)

Itembakingis a methodology for constructing tests and interpreting testoutcomes using an IRT model.

The great benefit of an item banking isthat in consequence it facilitates the construction of meanings which explainwhat it is that the scale measures. Firstly, the items in the bank provide aconcrete, detailed deion of progression in terms of test content.Secondly, the fact that standards can be precisely maintained from session tosession and from level to level facilitates doing the research to developstable interpretations of learner’s performance in the world beyond the test.Thirdly, standards maybe described in linguistic terms, e.g. English Profile.

6.5 Performance assessment

The approach taken by large-scaleassessment towards the performance skills of speaking and writing is morerecognizable as a standardized version of activities that also take place inthe classroom.

standardizationrelates both to judgments of performance and to the nature of the performancesthemselves.

6.6 Validity and reliability of large -scale assessment

Validity is the key quality of anassessment system, and for LOA it must refer to both large-scale and classroomassessment, and to its fundamental purpose of producing better learning.

The notion of validity ……demonstrates that a teatmeasures what it purports to measure. （此句后面才是重点哈）

The concept of reliability and validity arenot independent of each other in practice. （语法错误）

Reliability in assessment means somethingrather different to its everyday use. Reliability in testing has the narrowermeaning of ‘consistent’, in that a test should produce the same result onrepeated use and would rank-order a group of test takers in the same way.

Reliabilityis in the sense of converging evidence—accumulating evidence from differenttypes that support the same inference (Mislecy 1994:8). That’s why LOA willneed to deal with the complex evidence from classroom as well as large-scaleassessments. (Figure 6.2)

(Figure 6.2 的文字描述也很重要哈，p.89)

6.7 Large-scale assessment: Evidence of and forleaning

Large-scale assessment focused (时态错误，意义就变了哦) on proficiency, enablingcriterion-re原由网ferenced interpretation of what has been learned. It reports interms which are positive for all students. It is construct based ……and can inform the construction of both classroomand test materials. It uses a strong measurement model which ensures thatscores on different test versions remain comparable and interpretable in thesame frame of reference.

Summativeassessment can also serve to support learning in at least three ways. Besides,it can support learning –by providing a motivating framework for setting,monitoring and pursuing leaning objectives.

(四点Strengths可以作为我们评价国内大规模测试的参考标准)

Good Summativeassessment can thus provide evidence both of learning and for learning.

6.8 In summary

Validity and reliability

The validity of large-scale assessmentbegins with construct definition: an explicit model of how a language skilldevelops from lower to higher levels.

In the context of LOA validity must alsosatisfy the extended concept where validity is evidenced by positive socialconsequences of using the assessment.

Concerning reliability, it need not be inconflict (with validity). Large 0scale assessment can achieve high levels ofreliability without compromising on validity .

The challenge forLOA is to address Mislevy’s conception of a broader conception of reliability(1994),based on combining a range of evidence including from classrooms: convergingevidence that supports the same inferences.

(这样提炼出本章各节中自己认为的重点和难点不失为一种有效的学习方法)

4.记得第一次接触Asset Languages的时候，有很大疑惑，所以当时查找了文献，所以今天阅读中再次碰到了时候，就没有出现阅读障碍，（相关文章参见群内PDF分享）

虽然是第一次接触converging evidence一词，但是通过作者的阐述理解了这个词的含义，也更进一步明白了大规模评估和课堂评估的互补及其价值。二者的共同目标都是为了提供LOA促进获取更好的学习效果的证据。

5. 能否在国内的英语教学中借鉴大规模评估？（不是借鉴，而是用好大规模评估提供的evidence和扩大大规模评估的积极影响）如何来建构？国际上的一些大规模评估能否通过调整来适应国内的现实?如何调整？

6. 如果评估的目的是为了促进实现更好的学习效果，那么对相应的课程和教师以及学习者有什么要求呢？（培养学习型教师、提升师生测试素养）

感谢辜老师！

correct answers造句