Using a Google Search Algorithm to Make Sense of the Bible

Human beings have a difficult time making sense of large corpora such as the Bible. No one can possibly maintain in their awareness at a single point in time all the different themes and points of view expressed therein. How should a reader make sense of it all? Which texts or statements are most important? What should a reader do when conflicting viewpoints arise? An algorithm used by Google may help answer these questions (and simultaneously provide thousands of undergraduate students in Bible as Literature courses a sort of CliffsNotes to this sometimes daunting corpus!).

In 1996, Google co-founders Sergey Brin and Lawrence Page developed an algorithm called PageRank to determine the importance of a web page. In essence, it determines the importance of a web page based upon "votes"—in the form of links—it receives from other web pages. When page A links to page B, it in effect casts a vote for page B. A variation on this algorithm, called TextRank, has proved useful for summarizing texts. It analyzes sentences and words instead of web pages and links.

In the images that follow, I will use the TextRank algorithm to rank the importance of each sentence in the (Greek) New Testament. (I will leave aside the Hebrew Bible for now, although I have worked with that corpus as well.) The pink labels and data points indicate the most important sentences in each book. The blue lines represent LOESS smoothing and illustrate the importance of the sentences in each part of any given text. The zenith of that blue line, then, roughly corresponds to the most important part of a book, whereas the nadir correlates with the least important. (I use Python to calculate TextRank scores and R to plot the resulting data.)

These plots are based upon Greek lemmas. Since Greek is a highly inflected language, I have removed inflection in order to focus on those elements that are most informative. Here are the stop words that I exclude from these plots:

['ὁ', 'καί', 'αὐτός', 'σύ', 'δέ', 'ἐν', 'ἐγώ', 'εἰμί', 'εἰς’, 'οὐ', 'ὅς', 'οὗτος', 'ὅτι', 'μή', 'γάρ', 'ἐκ', 'ἐπί', 'πρός’, 'γίνομαι', 'διά', 'ἵνα', 'ἀπό', 'ἀλλά', 'τίς', 'τὶς', 'ὡς’, 'εἰ', 'οὖν', 'κατά', 'μετά', 'εἷς', 'ἤ', 'περί', 'ἐάν’, 'ἑαυτοῦ', 'ἐκεῖνος', 'ὑπό', 'τέ', 'οὕτω', 'ἰδού', 'παρά’, 'λέγω']

I also exclude words with any of the following attributes:

['preposition', 'definite article', 'conjunction’, 'pronoun', 'particle']

I want to limit my comments to a couple observations pertaining to the plot above. First, perhaps the most explicit statement of purpose of any of the New Testament texts shows up among the most important sentences in the Gospel of John:

. . . ταῦτα δὲ γέγραπται ἵνα πιστεύ[σ]ητε ὅτι Ἰησοῦς ἐστιν ὁ χριστὸς ὁ υἱὸς τοῦ θεοῦ, καὶ ἵνα πιστεύοντες ζωὴν
ἔχητε ἐν τῷ ὀνόματι αὐτοῦ. (20:30-31)

. . . Now these things have been written in order that you might believe that Jesus is the Christ, the son of God,
and in order that—by believing—you might have life in his name. (Translation my own.)

That seems like a success. Although this statement of purpose is explicit enough that most readers will not need help identifying it, it would have presented a major problem for this algorithm had it shown up among the less important sentences. (That was a close one! Google may be on to something here . . . .)

Second, the important sentences for many of the non-narrative texts have received a lot of discussion for their importance within the thought of their respective authors. For example, both Galatians 2:15-16 and Romans 3:21-22 show up among the most important sentences for their respective texts. For those of you who do not know, those verses have generated a tremendous amount of literature over a particular Greek construction. The question basically comes down to the following: does Paul understand a person to be justified through faith in Christ or through the faith of Christ (that is, Christ's faith)? (Thanks Richard B. Hays!)

By way of one final example, consider the important sentences in the Epistle of James. Readers of James often have difficulty identifying a cogent structure for the epistle; it appears to be a collection of admonitions. In the face of this lack of structure, the TextRank algorithm might offer some aid. The most important sentences appear in 2:14a and 2:18.

As you can see from the blue line, the most important part of the letter (again, according to our algorithm) also corresponds to section of the letter with these verses. The most important sentence is this one:

Τί ὄφελος, ἀδελφοί μου, ἐὰν πίστιν λέγῃ τις ἔχειν ἔργα δὲ μὴ ἔχῃ; (2:14a)

What use is it, my brothers, if someone claims to have faith but does not have actions? (Translation my own.)

This section, of course, contrasts with Paul’s view of faith, which maintains that a person is justified by faith apart from actions (compare Romans 3:28; 4:1, 6). In fact, it prompted Martin Luther at one point to claim that James was an epistle of straw (an assertion he elucidates elsewhere). Is it possible that this dissimilarity is the very heart of the Epistle of James?

My point here is that computers help us encounter texts more fully than we can encounter them when left to our own devices. They can help us take in huge swaths of data all at once, and they can help us analyze that data in sophisticated ways. They are, in short, excellent conversation partners that come to texts with presuppositions of a kind entirely different from our own. No one has taught them to read texts from a liberal or conservative standpoint (although one theoretically could). They offer us the opportunity to read with something wholly other.