An n-gram is a collection of n successive items in a text document that may include words, numbers, symbols, and punctuation. N-gram models are useful in many text analytics applications, where sequences of words are relevant such as in sentiment analysis, text classification, and text generation.
Table of Contents
Applications
An n-gram model is a type of probabilistic language model for predicting the next item in such a sequence in the form of a (n − 1)–order Markov model.[2]n-gram models are now widely used in probability, communication theory, computational linguistics (for instance, statistical natural language processing), computational biology (for instance, biological sequence analysis), and data compression. Two benefits of n-gram models (and algorithms that use them) are simplicity and scalability – with larger n, a model can store more context with a well-understood space-time tradeoff, enabling small experiments to scale up efficiently.
Examples
Here are further examples; these are word-level 3-grams and 4-grams (and counts of the number of times they appeared) from the Google n-gram corpus.[3]
3-grams
- ceramics collectibles collectibles (55)
- ceramics collectibles fine (130)
- ceramics collected by (52)
- ceramics collectible pottery (50)
- ceramics collectibles cooking (45)
4-grams
- serve as the incoming (92)
- serve as the incubator (99)
- serve as the independent (794)
- serve as the index (223)
- serve as the indication (72)
- serve as the indicator (120)
FAQs
How do you make N-Grams?
Conclusion
N-gram models are a powerful tool for text analytics and should be considered when analyzing text data. They can be used to determine sentiment, classify documents, and generate text. If you’re looking for a way to get more insights from your text data, consider using an n-gram model.
Published on: 2022-03-28
Updated on: 2022-10-28