§1 Introduction to Digital Humanities

1 Slides

2 Digital Humanities Analysis with R

Let’s explore some fascinating examples of how computational methods can be applied to humanities questions.

2.1 Example 1: Text Analysis of Literary Works

Code
library(pacman)
p_load(tidyverse, tidytext, wordcloud, gutenbergr, scales, hrbrthemes)

# Function to download and process text
# gutenberg_works(title == "The Age of Innocence")
inno <- gutenberg_download(541)

# Analyze word frequencies
word_freq <- inno %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words) %>%
  count(word, sort = TRUE) %>%
  top_n(20, n)

# Create a bar plot of word frequencies
p <- ggplot(word_freq, aes(x = reorder(word, n), y = n)) +
  geom_col(fill = "#69b3a2", width = 0.7) +
  geom_text(aes(label = n), hjust = -0.3, size = 3) +
  coord_flip() +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(x = NULL, y = "Frequency", 
       title = "Top 20 Most Frequent Words in The Age of Innocence",
       subtitle = "After removing common stop words",
       caption = "Source: Project Gutenberg") +
  theme_minimal() +
  theme(
    plot.title = element_text(size = 16, face = "bold", margin = margin(b = 10)),
    plot.subtitle = element_text(size = 12, color = "gray50", margin = margin(b = 20)),
    plot.caption = element_text(size = 10, color = "gray50", margin = margin(t = 10)),
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 12),
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    plot.margin = margin(20, 20, 20, 20)
  )

# Print the plot
print(p)

This word cloud visualizes the most frequent words in The Age of Innocence, giving us a quick insight into common themes and vocabulary.

  1. What words stand out to you in this visualization?
  2. How might this kind of analysis complement traditional close reading of literary works?
  3. What limitations might this approach have for understanding the author’s language?

2.2 Example 2: Sentiment Analysis of Jane Austen’s Novels

Code
library(janeaustenr)
library(tidyverse)
library(tidytext)
library(hrbrthemes)

# Combine Austen's novels
austen_books <- austen_books() %>%
  group_by(book) %>%
  mutate(
    linenumber = row_number(),
    chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]", ignore_case = TRUE)))
  ) %>%
  ungroup()

# Perform sentiment analysis
austen_sentiment <- austen_books %>%
  unnest_tokens(word, text) %>%
  inner_join(get_sentiments("bing"), multiple="all") %>%
  count(book, index = linenumber %/% 100, sentiment) %>%
  pivot_wider(names_from = sentiment, values_from = n, values_fill = 0) %>% 
  mutate(sentiment = positive - negative)

# Plot sentiment over narrative time
ggplot(austen_sentiment, aes(index, sentiment, fill = book)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~book, ncol = 2, scales = "free_x") +
  labs(
    title = "Sentiment Analysis of Jane Austen's Novels",
    subtitle = "Emotional trajectory throughout the narrative",
    x = "Narrative Time",
    y = "Sentiment Score",
    caption = "Data: janeaustenr package | Analysis: tidytext"
  ) +
  theme_minimal() +
  theme(
    text = element_text(size = 12),
    strip.text = element_text(size = 14, face = "bold"),
    panel.grid.minor = element_blank(),
    plot.title = element_text(size = 18, face = "bold"),
    plot.subtitle = element_text(size = 14, color = "gray50"),
    plot.caption = element_text(size = 10, color = "gray50"),
    axis.title = element_text(size = 12, face = "bold"),
    plot.margin = margin(20, 20, 20, 20)
  ) +
  scale_fill_brewer(palette = "Set2") +
  scale_y_continuous(expand = expansion(mult = c(0.1, 0.1)))

Code
# Display summary statistics
austen_summary <- austen_sentiment %>%
  group_by(book) %>%
  summarize(
    mean_sentiment = mean(sentiment),
    max_sentiment = max(sentiment),
    min_sentiment = min(sentiment)
  )

knitr::kable(austen_summary, caption = "Summary Statistics of Sentiment Analysis", digits = 2)
Summary Statistics of Sentiment Analysis
book mean_sentiment max_sentiment min_sentiment
Sense & Sensibility 9.94 51 -26
Pride & Prejudice 10.69 42 -31
Mansfield Park 12.47 61 -45
Emma 14.40 48 -31
Northanger Abbey 9.19 47 -33
Persuasion 15.14 59 -13

This visualization shows the emotional trajectory of Jane Austen’s novels over their narrative time.

  1. What patterns do you notice in the emotional arcs of Austen’s novels?
  2. How might this type of analysis enhance our understanding of narrative structure?
  3. What challenges might arise in applying sentiment analysis to historical texts?

2.3 Example 3: Network Analysis of “Empresses in the Palace” Characters

Code
library(pacman)
p_load(jsonlite,igraph,ggraph,tidyverse,showtext,ggrepel,cowplot)

# Add Noto Sans CJK font
font_add_google("Noto Sans SC", "Noto Sans SC")
showtext_auto()


# Read the JSON data
relation_data <- fromJSON("../book/data/relation.json")

relation_data$nodes <- relation_data$nodes %>% rename(Bio="角色描述")

relation_data$nodes <- relation_data$nodes%>%
mutate(Alliance = ifelse(Alliance=="皇后阵容","皇后阵营",Alliance))

# Ensure that the 'source' and 'target' in edges match the 'ID' in nodes
relation_data$edges$source <- as.character(relation_data$edges$source)
relation_data$edges$target <- as.character(relation_data$edges$target)
relation_data$nodes$ID <- as.character(relation_data$nodes$ID)
edges <- relation_data$edges %>% select(-Relationship)
# Set the ID as the row names for the nodes dataframe
rownames(relation_data$nodes) <- relation_data$nodes$ID

# Create the directed graph
empresses_graph <- graph_from_data_frame(d = edges,
                                         vertices = relation_data$nodes$ID,
                                         directed = TRUE)

# Set node attributes
V(empresses_graph)$Alliance <- relation_data$nodes$Alliance
V(empresses_graph)$Label <- relation_data$nodes$Label

# Create a color palette for alliances
alliance_colors <- c(
  "皇室成员" = "#4E79A7",  # Royal Blue
  "皇后阵营" = "#F28E2B",  # Warm Orange
  "甄嬛阵营" = "#E15759",  # Soft Red
  "华妃阵营" = "#76B7B2"   # Teal
)

# Calculate node size based on total degree centrality (in + out)
V(empresses_graph)$size <- degree(empresses_graph, mode = "total") * 0.5 + 3
E(empresses_graph)$Relationship <- relation_data$edges$Relationship

# Plot the network
set.seed(123) # for reproducibility
plot <- ggraph(empresses_graph, layout = "fr") +
  geom_edge_link0(aes(edge_color = Relationship), 
                  arrow = arrow(length = unit(0.2, "inches"),
                  ends = "last", type = "closed"), 
                  show.legend = FALSE, 
                  width = 1) +
  geom_node_point(aes(color = Alliance, size = size), 
                  shape = 20, show.legend = FALSE) +
  geom_node_text(aes(label = Label), repel = TRUE, size = 6) +
  scale_color_manual(values = alliance_colors) +
  scale_edge_colour_discrete() +
  scale_size_continuous(range = c(5, 20)) +
  theme_void() +
  labs(title = "Character Network in 'Empresses in the Palace'",
       subtitle = "Directed relationships between key figures in the Chinese drama") +
  theme(
    plot.title = element_text(hjust = 0.5, size = 18, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 14),
    legend.text = element_text(size = 8),
    legend.title = element_text(size = 10, face = "bold")
  )

# Plot
plot

This visualization shows the complex network of relationships between characters in the Chinese drama “Empresses in the Palace” (甄嬛传).

  1. What insights can we gain from this network visualization? What information is presented in the plot?
  2. Which characters appear to be central to the network? How might their positions reflect their importance in the narrative?
  3. How could this type of analysis complement traditional literary analysis of historical dramas or novels?
  4. What limitations might this network analysis have in representing the complex relationships and dynamics of the story?

Let’s verify our impressions!

3 Conclusion

These demonstrations showcase just a few of the exciting possibilities that digital humanities offers for exploring and analyzing humanities data. By combining computational methods with traditional humanities scholarship, we can uncover new patterns, ask novel questions, and gain fresh insights into cultural and historical materials.

Consider the examples we’ve explored today:

  1. Which technique (word frequency analysis, sentiment analysis, or network analysis) do you find most intriguing? Why?
  2. Can you think of a humanities question or topic from your own interests that might benefit from one of these computational approaches?

Share your thoughts with a partner or the class if time allows.