N-stæðuskoðari fyrir Risamálheildina Stofnun Árna Magnússonar í íslenskum fræðum

Heim Leiðbeiningar Um skoðarann Tilvísanir og gögn

The n-gram viewer of The Árni Magnússon Institution for Icelandic Studies

The n-gram viewer offers a visual representation of word frequencies in a historical context. The viewer can be used both for individual words and multi-word expressions. The output is computed from the Icelandic Gigaword Corpus which contains over a billion running words of text from various sources. By default, the viewer uses data from 2000-2019 from the entire corpus but the time period can be adjusted and specific subcorpora can be chosen if desired. By using the viewer, it is possible to visualize when certain words gain foothold in the language and how the use of a certain word or expression changes throughout the years. For instance, the use of the word "snjalltæki" (smart device) is practically non-existant before the year 2010 but has grown steadily since 2013. The use of the word "kreppa" (economic recession) grows exponentially during the periofffd 2007-2009 but has steadily lessened since then.

The n-gram viewer is based on the idea and design of the National Library of Norway, Nasjonalbiblioteket.

Directions for use

To use the n-gram viewer, a search term must be entered at the front page, but before a query is made the user must decide whether it should be based on lemmas or word forms. By lemmas we mean the lexical look-up form of the word, that is to say, the word or words as they are presented in a dictionary. If word forms are chosen, the words or words can be conjugated in any way. Either lemmas or word forms can be chosen by checking the appropriate radio button on the front page.

There are buttons beneath the line chart that shows the frequencies of the word or words where you can select the years from which the results should be calculated. The available years are from 2000 to 2019 but as previously stated, a narrower time period within those years can be selected if desired. Beneath the search window, two options can be selected. Either results can be generated from the entire corpus or a specific subcorpus or subcorpora can be chosen. If that option is chosen, a selection appears where specific categories of text can be chosen but by selecting the triangle symbol on the right side of the category name, specific media within the category can be chosen as well.

The search queries can be of various kinds. It's possible to look up the frequencies of a single word or of a collocation of up to three words. The words within a collocation should be seperated by a space. It's also possible to conduct more than one search query at the same time to compare their frequencies. In that case, the queries should be seperated by a comma. Additionally, the common frequencies of two seperate queries can be found by adding a plus sign between the queries.

The wildcard character * can be used to search for all words or collocations that contain a certain query. For instance, the query "hesta*" results in all compounds that exists within the selected corpus or corpora that start with "hesta" (horse). In the same way, the query "þokkalega *" results in all two word collocations where "þokkalega" (reasonably) is the first word.

Various other options can be found on the front page. For instance, it's possible to show either the total frequencies or porportional frequencies of the word or words. By porportional frequencies we mean the porportional use of the query compared to other words used in the same time period. A cumulative line can also be selected. Finally, case sensitivity for the initial letter of a word can be checked on or off. The graph can be smoothed as desired on the right side underneath the graph.

Additionally, if the mouse is placed over a certain year in the line chart, a window appears showing the precise number of occurrances during that year. By clicking the dot on the chart, a link appears where it's possible to get sentence examples including the selected word or words.

By selecting the download symbol on the front page, the results can be downloaded either as a line graph in .SVG format or as raw data in .CSV format. Information on how to cite results from the n-gram viewer can be found under the subpage Tilvísanir og gögn (Citations and data).