Difference between revisions of "Glossary"

m
(7 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{tocright}}
Attribution scholarship, especially so-called "non-traditional" studies, typically relies on technical vocabulary. This page will provide glosses in plain language to demystify these unfamiliar terms and concepts.


= About the Glossary =
== A ==
 
=== algorithm ===
A set of logical instructions or mathematical rules used to perform calculations and problem-solving operations, typically by a computer.
 
== C ==
 
=== closed-class words ===
A group or class of words to which new items are rarely added. Cf. [[#open-class_words|open-class words]].


Placeholder text about how attribution scholarship often uses highly technical language.
=== content words ===
'''also lexical words''' A term for a group of [[#open-class_words|open-class words]] that supply the lexical content of a sentence, as opposed to words whose purpose is primarily or exclusively grammatical (i.e., [[#function_words|function words]]). Content words include adjectives and nouns, as well as most verbs and adverbs.  


= Entries =  
== F ==


== A ==
=== function words ===
'''also grammatical words''' A term for a group of [[#closed-class_words|closed-class words]] that carry little, no, or ambiguous lexical meaning but otherwise express grammatical relationships among other words in a sentence. Function words in English typically include auxiliary verbs, conjunctions, determiners (e.g. the definite and indefinite articles), particles, prepositions, pronouns, and some adverbs. Since they are essential to the structuring of sentences, function words are among the most frequently used words in written and spoken English. (Cf. [[#content_words|content words]].)
 
== G ==
 
=== grammatical words ===
See [[#function_words|function words]].
 
== L ==


=== algorithm ===
=== lexical words ===
A set of logical instructions or mathematical rules used to perform calculations and problem-solving operations, typically by a computer.
See [[#content_words|content words]].


== M ==
== M ==


=== machine learning ===
=== machine learning ===
A class of algorithm that analyse patterns and relationships in data to make determinations and predictions, using the outcomes of these operations to ''learn'' by iterations and improve future accuracy. Machine learning procedures may be ''supervised'', requiring human intervention to provide pre-defined examples with which to ''train'' the algorithm, or ''unsupervised'', where no human pre-processing of the data is required.
A class of [[#algorithm|algorithm]] that analyse patterns and relationships in data to make determinations and predictions, using the outcomes of these operations to ''learn'' by iterations and improve future accuracy. Machine learning procedures may be [[#supervised|supervised]], requiring human intervention to provide pre-defined examples with which to ''train'' the algorithm, or [[#unsupervised|unsupervised]], where no human pre-processing of the data is required.
 
== O ==
 
=== open-class words ===
A group or class of words that accepts the addition of new items. Cf. [[#closed-class_words|closed-class words]].
 
== S ==
 
=== supervised ===
In the context of machine learning and statistical analysis, an [[#algorithm|algorithm]] that requires human intervention to provide pre-defined examples with which to [[#training|train]] is said to be ''supervised''. In attribution study, a typical supervised procedure may [[#training|train]] an [[#algorithm|algorithm]] on a corpus comprising texts of known authorship to identify [[#feature|features]] with which to test others.
 


== T ==
== T ==
Line 21: Line 49:
=== text encoding ===
=== text encoding ===


A set of explicit instructions for the computational representation of text. Whereas a human reader of the text of Othello, for example, is familiar with the conventions distinguishing the word ''Othello'' as it functions as a title, a running header, a speech prefix, or as a reference to the character in stage directions and dialogue, a computer generally requires these distinctions to be made explicit. Textual encoding or "markup" involves annotating or "tagging" to define units of text (from individual letters to entire documents) into categories. Linguistic features (e.g. grammatical part of speech, syntactic function, and so on) are commonly tagged to support natural language processing, and structural features are used to distinguish elements of text and paratext (e.g. title, dialogue, speech prefix, stage direction, prologue). Other kinds of analysis may require additional categories, such as tagging gender or social status.
A set of explicit instructions for the computational representation of text. Whereas a human reader of the text of ''Othello'', for example, is familiar with the conventions distinguishing the word "Othello" as it functions as a title, a running header, a speech prefix, or as a reference to the character in stage directions and dialogue, a computer generally requires these distinctions to be made explicit. Textual encoding or "markup" involves annotating or "tagging" to define units of text (from individual letters to entire documents) into categories. Linguistic features (e.g. grammatical part of speech, syntactic function, and so on) are commonly tagged to support natural language processing, and structural features are used to distinguish elements of text and paratext (e.g. title, dialogue, speech prefix, stage direction, prologue). Other kinds of analysis may require additional categories, such as tagging gender or social status.


=== token ===
=== token ===


A concrete instance of a ''type''. "I came, I saw, I conquered", for example, contains six word-tokens (excluding punctuation): "came", "saw", "conquered", and three instances of "I".
A concrete instance of a [[#type|type]]. "I came, I saw, I conquered", for example, contains six word-tokens (excluding punctuation): "came", "saw", "conquered", and three instances of "I".


=== type ===
=== type ===


A unique form; cf. ''token''. "I came, I saw, I conquered", for example, contains four word-types (excluding punctuation): "I", "came", "saw", and "conquered".
A unique form; cf. [[#token|token]]. "I came, I saw, I conquered", for example, contains four word-types (excluding punctuation): "I", "came", "saw", and "conquered".
 
== U ==
 
=== unsupervised ===
 
In the context of machine learning and statistical analysis, an [[#algorithm|algorithm]] that requires no human intervention or pre-processing of the data is said to be ''unsupervised''. This way, the [[#algorithm|algorithm]] processes the data without bias. Principal Components Analysis is an example of a widely used unsupervised method.


== W ==
== W ==


=== word-token ===
=== word-token ===
See ''token''.
See [[#token|token]].


=== word-type ===
=== word-type ===
See ''type''.
See [[#type|type]].

Revision as of 17:14, 29 August 2021

Attribution scholarship, especially so-called "non-traditional" studies, typically relies on technical vocabulary. This page will provide glosses in plain language to demystify these unfamiliar terms and concepts.

A

algorithm

A set of logical instructions or mathematical rules used to perform calculations and problem-solving operations, typically by a computer.

C

closed-class words

A group or class of words to which new items are rarely added. Cf. open-class words.

content words

also lexical words A term for a group of open-class words that supply the lexical content of a sentence, as opposed to words whose purpose is primarily or exclusively grammatical (i.e., function words). Content words include adjectives and nouns, as well as most verbs and adverbs.

F

function words

also grammatical words A term for a group of closed-class words that carry little, no, or ambiguous lexical meaning but otherwise express grammatical relationships among other words in a sentence. Function words in English typically include auxiliary verbs, conjunctions, determiners (e.g. the definite and indefinite articles), particles, prepositions, pronouns, and some adverbs. Since they are essential to the structuring of sentences, function words are among the most frequently used words in written and spoken English. (Cf. content words.)

G

grammatical words

See function words.

L

lexical words

See content words.

M

machine learning

A class of algorithm that analyse patterns and relationships in data to make determinations and predictions, using the outcomes of these operations to learn by iterations and improve future accuracy. Machine learning procedures may be supervised, requiring human intervention to provide pre-defined examples with which to train the algorithm, or unsupervised, where no human pre-processing of the data is required.

O

open-class words

A group or class of words that accepts the addition of new items. Cf. closed-class words.

S

supervised

In the context of machine learning and statistical analysis, an algorithm that requires human intervention to provide pre-defined examples with which to train is said to be supervised. In attribution study, a typical supervised procedure may train an algorithm on a corpus comprising texts of known authorship to identify features with which to test others.


T

text encoding

A set of explicit instructions for the computational representation of text. Whereas a human reader of the text of Othello, for example, is familiar with the conventions distinguishing the word "Othello" as it functions as a title, a running header, a speech prefix, or as a reference to the character in stage directions and dialogue, a computer generally requires these distinctions to be made explicit. Textual encoding or "markup" involves annotating or "tagging" to define units of text (from individual letters to entire documents) into categories. Linguistic features (e.g. grammatical part of speech, syntactic function, and so on) are commonly tagged to support natural language processing, and structural features are used to distinguish elements of text and paratext (e.g. title, dialogue, speech prefix, stage direction, prologue). Other kinds of analysis may require additional categories, such as tagging gender or social status.

token

A concrete instance of a type. "I came, I saw, I conquered", for example, contains six word-tokens (excluding punctuation): "came", "saw", "conquered", and three instances of "I".

type

A unique form; cf. token. "I came, I saw, I conquered", for example, contains four word-types (excluding punctuation): "I", "came", "saw", and "conquered".

U

unsupervised

In the context of machine learning and statistical analysis, an algorithm that requires no human intervention or pre-processing of the data is said to be unsupervised. This way, the algorithm processes the data without bias. Principal Components Analysis is an example of a widely used unsupervised method.

W

word-token

See token.

word-type

See type.