Solution Overview & Team Lead Details

What is the name of your solution?

DeeperLearning thru LLM/ML/NLP

Provide a one-line summary of your solution.

LLM AI may reify existing social biases. Project wld combine LLM w/ SRI proprietary ML/NLP to drive deeper student writing responding to complex text

What type of organization is your solution team?

Nonprofit (may include universities)

What is the name of the organization that is affiliated with your solution?

Literacy Design Collaborative, Inc.

What is your solution?

Since 2017, LDC has conducted research with SRI International’s Artificial Intelligence Center to prototype a scalable generative AI mechanism to measure and provide real time discursive feedback accurately and validly of deeper student learning.  While SRI historically has almost entirely provided support to Department of Defense and Homeland Security, for this passion project SRI would design an AI technology solution that will “Ensure that all children are learning in good educational environments, particularly those affected by poverty....  Use inclusive design to ensure engagement and better outcomes for learners with disabilities and neurodivergent learners, while benefiting all learners, [and]. Provide the skills that people need to thrive in both their community and a complex world, including social-emotional competencies, problem-solving, and literacy….”  SRI’s current Generative-AI work now combines their proprietary ML/NLP algorithms with ChatGPT. In addition, partner Stanford Center for Assessment Learning and Equity (SCALE) – which created California’s teacher certification test used in 9 additional states as well as SBAC advisors and item writers – provide the analytic student rubric and curriculum rigor analysis rubric designs and validation. LDC’s collaboration with SRI’s AI neuroscientists and SCALE measurement experts would produce a teacher and student facing mechanism that can assess whether assigned performance writing tasks incorporate the standards cognitive demand rigor and then measure the student’s mastery of those standards (CCRB/NGSS/C3) in students’ extended writing in response to these validated writing performance tasks. SCALE already has validated performance task rubrics by grade (K-12) and discipline (Sci/SS/ELA) but the current process requires extended manual calibration of humans to gain inter-rater reliability. This solution would in an instant, create an expertly calibrated, interrater reliable mechanism to provide feedback to individual students, teachers and administrators on extended writing performances – not short open ended answers, not multiple choice, not even five paragraph formulaic essays. The data would then be used to direct teachers to research-backed instructional strategies. LDC recently was awarded two highly competitive USDOE AHC grants to create culturally-responsive K12 Civics writing prompts to add to LDC’s already robust OER K-12 prompt bank. LDC will transform SRI’s backend AI in 3 ways: (1) develop APIs and a teacher interface to enable SRI’s AI to integrate with ChatGPT; (2) use manual student writing LDC currently collects to improve SRI’s AI engine in combination with ChatGPT to 95%+ scoring accuracy (from current 85%) but more importantly use the learning data to integrate discursive “in-line” writing feedback at score deprecations for far greater student learning impact, and (3) make the ongoing, expanding learning dataset available to researchers to define the most efficacious ways student writing develops – i.e. data will demonstrate the measurable patterns of subskill development most effective to students mastering writing on summative state tests. No existing commercial solution scores student work to untrained prompts, much less provide inline discursive feedback at AI-determined score deprecations.

How will your solution impact the lives of priority Pre-K-8 learners and their educators?

This project targets high need urban and rural districts, schools, and student populations – all the places LDC currently operates in its competitively-awarded federal grants and district contracts.  For example, NYC, LAUSD, rural Kentucky, North Carolina, and other rural districts have all provided letters of support in connection with parallel federal funding requests. Since 2016 in LDC’s USDOE 100+ NYC/LAUSD Investing in Innovation (i3) grant schools, teachers have clamored for an accurate mechanism to support their efforts to instruct rigorous deeper learning performance writing tasks without crushing them with scoring and student feedback burdens.  Schools with the highest need students also often have teachers with the thinnest support or experience and the least exposure to high-quality instructional materials.  In both LDC’s 2022 USDOE Education Innovation and Research grant in 140 urban (NYC/LAUSD) and rural (KY) schools and LDC’s USDOE 150 school culturally responsive curricula and PD grant, school populations exceed 82% free lunch, 80’s BIPOC students, and 20% Special Education.  This solution would address three targeted student and teacher needs: (1) provide instantaneous analytic rubric scoring and discursive coaching student feedback; (2) instantaneous data that assesses the efficacy of teacher-chosen instructional strategies and practices, and (3) assess the quality of curricular lesson plans.  Having an expertly-calibrated, tech-enabled instant feedback mechanism would enable teachers working with high-need students to do what they do best: support their deeper learning based on standards-aligned materials and high quality feedback and guidance.  CCRB standards’ more rigorous expectations and COVID have only exacerbated school challenges and highlighted inequities in communities with high teacher-to-student ratios.  For the past year, LDC has collaborated with National Writing Project sites in ad hoc exploration of ChatGPT’s effectiveness in student feedback. And more recently (Oct 2023), LDC was awarded a small Cambiar Foundation grant to assess the efficacy of using ChatGPT with low income, multilingual, and immigrant parents to support their students’ learning, particularly extended writing in response to complex text.  Over 10 NWP Writing Project sites oversubscribed this year’s Cambiar-funded ChatGPT research.  LDC already always engages teachers, students, and parents from high need, underserved populations in user centered design and prototyping of LDC instructional product development.

How are you and your team (if you have one) well-positioned to deliver this solution?

LDC’s team and collaborating organizations have intensively designed instructional and technological solutions for high needs students, teachers, and schools for several decades first in urban low-income settings and more recently also low-SES rural American schools.  LDC’s team lead (Chad Vignola) began as a NYCDOE senior central office leader in 1999 charged with transforming several offices to more effectively support high need students including special education and English language learners.  Later Chad was trained at IHI in Cambridge on continuous improvement as well as IDEO user-centered design and rapid prototyping of instructional resources, leveraging technology in New York City and Los Angeles schools among others.  Our instructional design team is highly diverse with several only recently having left urban classrooms to join LDC at creating scalable teacher-friendly tools with specific focus on multilingual and special education students.

Sabrina Alicea (Latina) LDC’s Director of Curriculum Development is a former Chicago public school teacher, curriculum designer, and Harvard graduate who currently drives two USDOE Civic’s design grants awarded to LDC to create high-quality, culturally-relevant lessons, tasks, and resources for students and teachers as well as culturally sustaining and relevant professional development and has always sought to elevate access to educational and creative opportunities for marginalized and underrepresented communities. 

Jabari Sellars, (Black) Curriculum Associate Director, joined LDC after over a decade of middle school and high school classroom teaching experience, most recently in Oakland. Jabari was awarded Harvard’s Intellectual Contribution Award for his work with comics and graphic novels in literacy education and recently was asked to judge the Walter Dean Myers award honoring diverse books by diverse creators.  Additional diverse instructional staff include Andrea Fullington, a former New Orleans teacher and “Black is Brilliant Fellow, Chadae MacAnuff, former NYC teacher (Black) and Lydia Mantis (Egyptian).

Rob Kasher, Chief Program Officer, a former Brooklyn TFA social studies teacher in Brooklyn has served in roles at The Broad Center (TBC) and at Leadership for EducationalEquity (LEE) providing leadership development to equity-minded organizations before returning to LDC to oversee the design, delivery, and innovations.

In addition to internal staff, collaborating organization staff include Dr. John Niekrasz, a computer scientist in the Advanced Analytics group of SRI's Artificial Intelligence Center focusing on the use of automated discourse analysis technologies. Niekrasz holds a PhD. in linguistics from Edinburgh and a Stanford BS in symbolic systems.  SCALE created the California Teacher Certification test (TPAC) used in 9 other states and were SBAC advisors and item writers.  Dr. Ruth Wei SCALE’s VP has led research studies on the efficacy of project-based learning curricula, the quality of large-scale assessment items used in state tests across the United States, and the National Board for Professional Teaching Standards portfolio assessment. Dr. Wei has degrees from Harvard and Teachers College (Columbia), spent six years teaching in New York City prior to obtaining her PhD from Stanford.

Which dimension(s) of the challenge does your solution most closely address?

  • Providing continuous feedback that is more personalized to learners and teachers, while highlighting both strengths and areas for growth based on individual learner profiles

Which types of learners (and their educatiors) is your solution targeted to address?

  • Grades 6-8 - ages 11-14

What is your solution’s stage of development?

Prototype

Please share details about why you selected the stage above.

We tested the viability of SRI's proprietary ML/NLP over two years recently to accurately assess students deeper learning by measuring and providing feedback to student writing in response to complex text.  However, this was SRI's backend research engine.  Pending external funding, LDC has not designed the user interface for teacher (and later student) access.  Thus, arguably the user interface is only conceptual though LDC has previously built teacher user interfaces through user centered design and rapid prototyping including 140 sprint cycles from 2013 to the present to the build LDC's teacher-facing platform https://CoreTools.LDC.Org/ accessed by over 105,000 unique educator users. 

In what city, town, or region is your solution team headquartered?

New York City, NY, USA

Is your solution currently active (i.e. being piloted, reaching learners or educators) within the US?

Yes

In which US states do you currently operate?

NY, California, Kentucky, Pennsylvania, North Carolina, Colorado, Tennessee, Maryland -- really all 50 states via our open educational resource platform.  But these are the direct service sites.

Who is the Team Lead for your solution?

Chad Vignola

More About Your Solution

What makes your solution innovative?

Prior to an LLM like ChatGPT, existing commercial scoring of student writing had multiple severe limitations.  Big publisher and researcher auto essay scoring engines (e.g. Pearson, McGraw Hill, ETS) only worked with a defined publisher’s prompt and text, manually calibrated human scorers of ~200 student papers that were then used to calibrate the AI engine.  Change a word in the prompt or the text or have poor or limited teacher scorer accuracy and the engine did not work well.  Worse, the writing assessed typically was limited to 800 words or less and of low Depth of Knowledge: – e.g low cognitive level student writing or formulaic 5 paragraph essays.  They were not usable with “blind prompts” – prompts the engine hadn’t seen before – nor able to assess the deeper student learning expectations of national (or international) standards of learning (e.g. Common Core, Next Generation Science Standards, College, Career, and Civic Life) necessary for participating effectively in the non-routine work of the global future.  Scoring and feedback on extended student writing in response to authentic and complex disciplinary text also has not been available in the marketplace.  Although ChatGPT has created new opportunities, the shortcomings are (1) most emerging solutions driven by for profit companies driven are low quality bot responses to low cognitive level questions of highly uncertain response quality or even inaccuracy and bias.  (2) ChatGPT or other LLMs in and of themselves are not of sufficient quality or accuracy.  Using an LLM as a component rather than standalone, with SRI’s ML/NLP created and tested over decades in high profile contexts (e.g. national research laboratories, Dept of Defense and Homeland Security contracts) to date has proven far more effective.  Creating a teacher facing solution more user friendly than current ChatGPT interfaces.  Finally, over time the user learning engineering data generated will both create self reinforcing improvements in tool performance but equally importantly provide feedback on which instructional pedagogical practices and instructional materials most impactfully improve student deeper writing development and reading comprehension and – potentially – the most effective sequencing of those strategies and content in such student learning development.

Describe the core AI and other technology that powers your solution.

LDC will use both traditional innovations (analytic rubrics) and new technologies (Generative AI and SAS software development).  The SCALE analytic student rubric measures the student’s learning progress, signaling to both student and teacher the next step toward proficiency. For example, both the teacher and the student can easily understand the R2 learning progression and next proximal learning steps for the 3rd grade student scoring at 2nd grade level 2 with additional complex text based on the explicit SCALE analytic student rubric.  This will ensure the project builds upon the already established benefit of rubrics in that they are statistically more likely to mitigate bias in grading practices by standardizing scoring criteria, reducing the space for subjective assessment grading. This ultimately makes educators more effective in assessments and feedback built on years of research showing students learn best when they are exposed to relevant content and provided with opportunities to expand their critical literacy skills.  Even before Large Language Models, Machine Learning and Natural Language Processing demonstrated efficiency and accuracy to provide formative data and insights into student learning including reading comprehension and extended writing in response to complex text.  LDC began exploring ML/NLP solutions in 2013, ultimately beginning a research partnership with SRI International’s Artificial Intelligence Center.  From late 2016 to the end of 2019, LDC invested a half million in internal dollars to confirm the viability of using SRI’s proprietary AI to generate accurate and valid data to support student deeper learning goals.  Notwithstanding demonstrating the practical and conceptual benefit to the work, the absence of external funding stalled ongoing efforts.  However, with the release of ChatGPT in 2022, SRI reached out to suggest restarting our work as LLMs had significantly improved the potential for our collaborative work.  But while LLMs like ChatGPT are “out-of-the-box” remarkably capable of providing feedback to students, adapting them to the unique characteristics of student writing, and avoiding biases and errors that come with default models, requires tuning the existing model weights and the implementation of additional model components (e.g., model layers). By supplementing LLMs with additional model elements and training them on student data and validated scores, SRI will significantly improve upon foundation model accuracy, mitigate bias, and in tandem with LDC continuously improve its efficacy and impact on student writing over time.  The last innovation is standard software engineering sprint cycles coupled with parallel instructional resource sprint design cycles.  Throughout our SRI collaboration, LDC has continued to build CoreTools.LDC.Org to provide teacher-friendly online Open Educational Resources including online asynchronous learning microcredentials.  The asynchronous online learning includes a course that enables educators to calibrate their scoring of the SCALE analytic rubric.  However, the quantity of student work to calibrate against is small and does not include the more powerful inline discursive feedback.  CoreTools itself – which we would build upon – is a Ruby on Rails server, and Angular client running on Heroku hardware. We are currently in the process of converting from angular.js to the more modern Angular framework.

How do you know that this technology works?

Independent research (UCLA-CRESST) established that LDC doubled student learning in our USDOE i3 Validation grant in 100+ schools controlling for prior student, teacher, school state assessment performance, Free/Reduced Lunch, Sped, EL, ethnicity, etc.:  9.4 months in Los Angeles and 7.1 months in NYC.  See multiple independent research studies: https://ldc.org/impact/ For SRI Efficacy Research: Niekrasz, LDC - Automated Essay Scoring Using SRI’s ATSE Software, Project Interim Report #3 (SRI International – Advanced Analytics Lab)  https://ldc.org/aes-report/

After great teachers, research confirms that unbiased curriculum-embedded formative assessment drives more student improvement than any other resource. (Hattie 2009; Wiliams 2011)  However, no organization has created a scalable, accurate system providing feedback on deeper learning. Standardized selected response item banks designed by large-scale testing companies are too distant from, and lack contextualized connections to curriculum and instruction, nor to deeper student learning aligned to rigorous national standards. (Shepherd; Atkin) Multiple choice banks and low quality tasks do not respond to deeper learning needs. (Elmore 2010, Santelises 2015) Even putatively IRT-“validated” multiple choice and short constructed responses have little to do with the deep thinking skills contemplated by CCRS, NGSS, and C3 rigorous standards necessary to remain competitive in a global intellectual and economic market. The recent clamor around Science of Reading strategies only begins the student’s reading comprehension journey. Once students master phonics, phonemic awareness, etc. rigorous standards require developing student‘s extended writing in response to complex text – a deeper learning journey that requires connecting student data to detailed, scaffolded instructional strategies at the core of LDC’s statistically significant-confirmed framework.  National standards’ rigorous expectations and COVID have exacerbated this challenge, highlighting inequities in communities with high student-to-teacher ratios. An urban ELA teacher with 130 high needs ninth-grade students asking students to read informational text and write every month faces a student essay volume often too high for teachers to provide students with adequate feedback on even one round of their writing. Too many papers frustrate teacher efforts to collect and compile data to enable real-time grouping and differentiation. By the time students finish argumentative essays, the teacher has missed multiple opportunities to address skill deficits in the absence of usable formative data revealing student learning difficulties in their reading and writing. Likewise, students have also lost multiple opportunities to take ownership of their learning, build on teacher formative feedback, and practice the reading/writing skills essential for 21st-century learning. (Moreover scores alone are not as useful as inline discursive feedback on extended writing in response to complex text which this project would provide). More than ever, rigorous CCRS expectations for student achievement require teachers to provide frequent, direct, substantive feedback on students’ deep thinking work as manifested by frequent rigorous student writing. As Wiliam asserts, formative assessment is the “bridge between teaching and learning – only through some kind of assessment process can we decide whether instruction has had its intended effect.”  LDC has tested this SRI solution to achieve 85-87% nationally calibrated accuracy to date.

What is your approach to ensuring equity and combating bias in your implementation of AI?

As noted above, LDC will use both traditional innovations (analytic rubrics) and new technologies (Generative AI and SAS software development).  The SCALE analytic student rubric measures the student’s learning progress, signaling to both student and teacher the next step toward proficiency. For example, both the teacher and the student can easily understand the R2 learning progression and next proximal learning steps for the 3rd grade student scoring at 2nd grade level 2 with additional complex text based on the explicit SCALE analytic student rubric.  This will ensure the project builds upon the already established benefit of rubrics in that they are statistically more likely to mitigate bias in grading practices by standardizing scoring criteria, reducing the space for subjective assessment grading. This ultimately makes educators more effective in assessments and feedback built on years of research showing students learn best when they are exposed to relevant content and provided with opportunities to expand their critical literacy skills.  Even before Large Language Models, Machine Learning and Natural Language Processing demonstrated efficiency and accuracy to provide formative data and insights into student learning including reading comprehension and extended writing in response to complex text.  LDC began exploring ML/NLP solutions in 2013, ultimately beginning a research partnership with SRI International’s Artificial Intelligence Center.  From late 2016 to the end of 2019, LDC invested a half million in internal dollars to confirm the viability of using SRI’s proprietary AI to generate accurate and valid data to support student deeper learning goals.  Notwithstanding demonstrating the practical and conceptual benefit to the work, the absence of external funding stalled ongoing efforts.  However, with the release of ChatGPT in 2022, SRI reached out to suggest restarting our work as LLMs had significantly improved the potential for our collaborative work.  But while LLMs like ChatGPT are “out-of-the-box” remarkably capable of providing feedback to students, adapting them to the unique characteristics of student writing, and avoiding biases and errors that come with default models, requires tuning the existing model weights and the implementation of additional model components (e.g., model layers). By supplementing LLMs with additional model elements and training them on student data and validated scores, SRI will significantly improve upon foundation model accuracy, mitigate bias, and in tandem with LDC continuously improve its efficacy and impact on student writing over time.  The last innovation is standard software engineering sprint cycles coupled with parallel instructional resource sprint design cycles.  

How many people work on your solution team?

LDC has an internal team of 14 full time staff, but for our culturally responsive ELA, Science, and Social Studies (Civics) performance writing task development, we work with approximately 20 highly diverse (mostly not white or male) classroom educators across the country who provide consulting design service.  In addition, as noted above, we are currently collaborating on building both content and online culturally sustaining and relevant professional learning courses with WestEd and Common Good, among others.  We also work with Preva Learning to outsource some of our back end software engineering code writing.  Obviously, SCALE and SRI are collaborating partners.

How long have you been working on your solution?

December 2013 we began exploring Generative AI. We began collaborating with SRI in 2016

Solution Team

 
    Back
to Top