[SOLVED] CS代考计算机代写 scheme Cheat Sheet

30 $

File Name: CS代考计算机代写_scheme_Cheat_Sheet.zip
File Size: 395.64 KB

SKU: 2346839175 Category: Tags: , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,

Or Upload Your Assignment Here:


Cheat Sheet
Extensions
quanteda works well with these companion packages:
• quanteda.textmodels: Text scaling
and classification models
• readtext:aneasywaytoreadtext
data
• spacyr: NLP using the spaCy library • quanteda.corpora: additional text
corpora
• stopwords: multilingual stopword
lists in R
Extract features (dfm_*; fcm_*)
Create a document-feature matrix (dfm) from a corpus
x <- dfm(data_corpus_inaugural,tolower = TRUE, stem = FALSE, remove_punct = TRUE, remove = stopwords(“en”))print(x, max_ndoc = 2, max_nfeat = 4)## Document-feature matrix of: 58 documents, 9,210 features (92.6% sparse) and 4 docvars. General syntax• corpus_* manage text collections/metadata• tokens_* create/modify tokenized texts• dfm_*create/modifydoc-featurematrices• fcm_*workwithco-occurrencematrices• textstat_*calculatetext-basedstatistics• textmodel_* fit (un-)supervised models• textplot_* create text-based visualizationsConsistent grammar:• object()constructorfortheobjecttype• object_verb()inputs&returnsobjecttypefeaturesfellow-citizens senate house representatives## 1793-Washington 96 147 4 1793 WashingtonExtract or add document-level variablesGeorge noneparty <- data_corpus_inaugural$Partyx$serial_number <- seq_len(ndoc(x))docvars(x, “serial_number”) <- seq_len(ndoc(x)) # alternativeBind or subset corporacorpus(x[1:5]) + corpus(x[7:9]) corpus_subset(x, Year > 1990)
Change units of a corpus
corpus_reshape(x, to = “sentences”) Segment texts on a pattern match
corpus_segment(x, pattern, valuetype, extract_pattern = TRUE)
Take a random sample of corpus texts
corpus_sample(x, size = 10, replace = FALSE)
Utility functions
texts(corpus)
ndoc(corpus /dfm /tokens) nfeat(corpus /dfm /tokens) summary(corpus / dfm) head(corpus / dfm) tail(corpus / dfm)
Show texts of a corpus Count documents/features Count features
Print summary
Return first part
Return last part
##
## docs
## 1789-Washington
## 1793-Washington
## [ reached max_ndoc … 56 more documents, reached max_nfeat … 9,206 more features ]
Create a dictionary
dictionary(list(negative = c(“bad”, “awful”, “sad”), positive = c(“good”, “wonderful”, “happy”)))
Apply a dictionary
dfm_lookup(x, dictionary = data_dictionary_LSD2015) Select features
dfm_select(x, pattern = data_dictionary_LSD2015, selection = “keep”) Randomly sample documents or features
dfm_sample(x, what = c(“documents”, “features”)) Weight or smooth the feature frequencies
dfm_weight(x, scheme = “prop”) | dfm_smooth(x, smoothing = 0.5)
Sort or group a dfm
dfm_sort(x, margin = c(“features”, “documents”, “both”)) dfm_group(x, groups = “President”)
Combine identical dimension elements of a dfm
dfm_compress(x, margin = c(“both”, “documents”, “features”))
Create a feature co-occurrence matrix (fcm)
x <- fcm(data_corpus_inaugural, context = “window”, size = 5) fcm_compress/remove/select/toupper/tolower are also availableUseful additional functionsLocate keywords-in-contextkwic(data_corpus_inaugural, pattern = “america*”)1 1 2 2 0 0 0 0 Create a corpus from texts (corpus_*) Read texts (txt, pdf, csv, doc, docx, json, xml)my_texts <- readtext::readtext(“~/link/to/path/*”) Construct a corpus from a character vectorx <- corpus(data_char_ukimmig2010, text_field = “text”) Explore a corpussummary(data_corpus_inaugural, n = 2)## Corpus consisting of 58 documents, showing 2 documents: #### Text Types Tokens Sentences Year President FirstName Party ## 1789-Washington 625 1537 23 1789 Washington George none https://creativecommons.org/licenses/by/4.0/Tokenize a set of texts (tokens_*)Tokenize texts from a character vector or corpusx <- tokens(“Powerful tool for text analysis.”, remove_punct = TRUE)Convert sequences into compound tokensmyseqs <- phrase(c(“text analysis”)) tokens_compound(x, myseqs)Select tokenstokens_select(x, c(“powerful”, “text”), selection = “keep”)Create ngrams and skipgrams from tokenstokens_ngrams(x, n = 1:3) tokens_skipgrams(x, n = 2, skip = 0:1)Convert case of tokens or featurestokens_tolower(x) tokens_toupper(x) dfm_tolower(x) Stem tokens or featurestokens_wordstem(x) dfm_wordstem(x)Calculate text statistics (textstat_*)Tabulate feature frequencies from a dfmtextstat_frequency(x) topfeatures(x)Identify and score collocations from a tokenized texttoks <- tokens(c(“quanteda is a pkg for quant text analysis”, “quant text analysis is a growing field”))textstat_collocations(toks, size = 3, min_count = 2) Calculate readability of a corpustextstat_readability(x, measure = c(“Flesch”, “FOG”)) Calculate lexical diversity of a dfmtextstat_lexdiv(x, measure = “TTR”) Measure distance or similarity from a dfmtextstat_simil(x, “2017-Trump”, method = “cosine”, margin = c(“documents”, “features”))textstat_dist(x, “2017-Trump”,margin = c(“documents”, “features”))Calculate keyness statisticstextstat_keyness(x, target = “2017-Trump”)by Stefan Müller and Kenneth Benoit • [email protected], [email protected] https://creativecommons.org/licenses/by/4.0/ Learn more at: http://quanteda.io • updated: 05/2020Fit text models based on a dfm (textmodel_*) These functions require the quanteda.textmodels packageCorrespondence Analysis (CA)textmodel_ca(x, threads = 2, sparse = TRUE, residual_floor = 0.1) Naïve Bayes classifier for textstextmodel_nb(x, y = training_labels, distribution = “multinomial”) SVM classifier for textstextmodel_svm(x, y = training_labels)Wordscores text modelrefscores <- c(seq(-1.5, 1.5, .75), NA)) textmodel_wordscores(data_dfm_lbgexample, refscores)Wordfish Poisson scaling modeltextmodel_wordfish(dfm(data_corpus_irishbudget2010), dir = c(6,5)) Textmodel methods: predict(), coef(), summary(), print()Plot features or models (textplot_*)Plot features as a wordclouddata_corpus_inaugural %>% corpus_subset(President == “Obama”) %>% dfm(remove = stopwords(“en”)) %>% textplot_wordcloud()
Plot word keyness
data_corpus_inaugural %>% corpus_subset(President %in%
c(“Obama”, “Trump”)) %>% dfm(groups = “President”,
remove = stopwords(“en”)) %>% textstat_keyness(target = “Trump”) %>% textplot_keyness()
Plot Wordfish, Wordscores or CA models (requires the quanteda.textmodels package) scaling_model %>%
textplot_scale1d(groups = party, margin = “documents”)
power
peace american
us
Kenny FG ODonnell FG Bruton FG
Quinn LAB Higgins LAB Burton LAB Gilmore LAB
Gormley Green Cuffe Green Ryan Green
OCaolain SF Morgan SF
Lenihan FF Cowen FF
know years creed common america
one act let liberty
still women believe
long
generation every spirit well god worldnewcan freedom
work uspeople
make now oath today
must citizens nation equal journey
words life
future just
men together country
government
know still common freedom journey generation
− must
can
may courage
everyone first
back right
country obama
Trump Obama
dreams
protected american
america

−10 0 10
chi2
● ●

● ●

● ● ●




−0.10
−0.05 0.00
0.05
0.10
Document position
Convert dfm to a non-quanteda format
convert(x, to = c(“lda”, “tm”, “stm”, “austin”, “topicmodels”, “lsa”, “matrix”, “data.frame”))
time
americans less
FG LAB Green SF FF

Reviews

There are no reviews yet.

Only logged in customers who have purchased this product may leave a review.

Shopping Cart
[SOLVED] CS代考计算机代写 scheme Cheat Sheet
30 $