LifestyleNeural network capable of solving university-level Mathematics problems at...

Neural network capable of solving university-level Mathematics problems at scale


- Advertisment -

[Submitted on 31 Dec 2021 (v1), last revised 4 Jan 2022 (this version, v2)]

Download PDF

Abstract: We demonstrate that a neural network pre-trained on text and fine-tuned on
code solves Mathematics problems by program synthesis. We turn questions into
programming tasks, automatically generate programs, and then execute them,
perfectly solving university-level problems from MIT’s large Mathematics
courses (Single Variable Calculus 18.01, Multivariable Calculus 18.02,
Differential Equations 18.03, Introduction to Probability and Statistics 18.05,
Linear Algebra 18.06, and Mathematics for Computer Science 6.042), Columbia
University’s COMS3251 Computational Linear Algebra course, as well as questions
from a MATH dataset (on Prealgebra, Algebra, Counting and Probability, Number
Theory, and Precalculus), the latest benchmark of advanced mathematics problems
specifically designed to assess mathematical reasoning. We explore prompt
generation methods that enable Transformers to generate question solving
programs for these subjects, including solutions with plots. We generate
correct answers for a random sample of questions in each topic. We quantify the
gap between the original and transformed questions and perform a survey to
evaluate the quality and difficulty of generated questions. This is the first
work to automatically solve, grade, and generate university-level Mathematics
course questions at scale. This represents a milestone for higher education.

Submission history

From: Iddo Drori [view email]

Fri, 31 Dec 2021 18:57:31 UTC (5,846 KB)

Tue, 4 Jan 2022 17:35:19 UTC (5,575 KB)

Join the pack! Join 8000+ others registered users, and get chat, make groups, post updates and make friends around the world!
Read More

- Advertisement -


  1. Not to pour too much cold water on this, but the claim of 100% accuracy has a huge caveat. In the paper (Page 4) they state:

    Interaction. The original question may not be a prompt that synthesizes a program whose execution results in the correct answer. In addition, the answer may require multiple steps with clear plots or other modalities. We therefore may interactively prompt Codex until reaching the correct answer or visualizations, making the minimum necessary changes from the original question

    Which to me basically sounds like they had a human in the loop (that knows how to solve these math problems) that kept changing the question until it gave the correct answer. They do measure the distance (using a sentence embedding model) of the original question to the one that yielded the correct answer, but that feels a bit contrived to me.

    Nevertheless, its still really cool that the correct answer is indeed inside the model.

  2. They rewrite the questions before feeding it into the AI. That makes "100% correct" significantly less impressive.

    For example, they manually rewrote the question:

    > Find the derivative of the function using the definition of a derivative. f(x) = (x2 − 1)/(2x − 3)


    > Use Sympy to find the derivative of f(x)=(x*2-1)/(2*x-3)

    Which completely changes the question, and also their rewrite ensures that the AI doesn't fail to parse any data etc.

You might also likeRELATED
Recommended to you

The apt Christmas tradition I care about is Sweden’s arson goat

If you’ve never heard of the Gävle Goat before, there are just two critical pieces of information you need to know. First: the Gävle Goat is a giant straw statue of a goat erected every Christmas in the Swedish town of Gävle. It’s essentially a massive Yule Goat, a traditional symbol of the Yuletide season…

Show HN: A statically-rendered, integrated comment system for Jamstack blogs

statically rendered, ad-free, intentionally boring.Your Jamstack blog isn't complicated. Comments shouldn't be either.Today, options to collect & display comments...

Tictactoe in PyFL, my experimental functional language

I wrote a program to play unbeatable tictactoe in my experimental functional language PYFL. (PYFL = Python based functional language; henceforth PyFL) Of course writing a tictactoe player is hardly a major challenge, but I wanted to see how it turned out in PyFL. It worked out rather well, as you’ll see. It made good…

$150B lawsuit over genocide might maybe maybe even simply drive Facebook to confront its shadowy aspect

FILE — Rohingya wait in line for humanitarian aid in Kutupalong camp August 27, 2018, in Kutupalong, Cox's Bazar, Bangladesh. UN investigators said that Myanmar's army had carried out genocide against the Rohingya in Rakhine state. Paula Bronstein / Getty Images AsiaPac How much of a financial hit would it take to force Mark Zuckerberg…
- Advertisement -

Must read

Show HN: Thumbor Release 7.0.0

Installing =7.0.0" $ thumbor-doctor">$ pip install "thumbor>=7.0.0" $ thumbor-doctor Keep on reading for the changes you need to be aware of. If you experience any problems, please run thumbor-doctor --nocolor and create an issue with your problem and the results of running thumbor-doctor. Introduction This release notes will be slightly different from others as we…

Jesse Kline: CBC’s 18 words you can not remark is the dumbest thing I’ve ever heard

Even terms that were once staples of the left-wing lexicon are now lingua non grata Author of the article: Jesse Kline Publishing date: Nov 30, 2021  •  December 1, 2021  •  5 minute read  •  1074 Comments Photo by CBC News/Twitter In an age in which many on the left seem to be constantly trying…
- Advertisement -