Last updated: 2025-07-15
Checks: 7 0
Knit directory: frascolla_chemoresistance/
This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20250522)
was run prior to running
the code in the R Markdown file. Setting a seed ensures that any results
that rely on randomness, e.g. subsampling or permutations, are
reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version 57ef6f7. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for
the analysis have been committed to Git prior to generating the results
(you can use wflow_publish
or
wflow_git_commit
). workflowr only checks the R Markdown
file, but you know if there are other scripts or data files that it
depends on. Below is the status of the Git repository when the results
were generated:
Untracked files:
Untracked: .DS_Store
Untracked: data/samples_info_sh1plus2_vs_scr.tsv
Untracked: output/deg_shIFI6_Day0_vs_shSCR_Day0.tsv
Untracked: output/deg_shIFI6_Day1_vs_shSCR_Day0.tsv
Untracked: output/deg_shIFI6_Day2_vs_shSCR_Day0.tsv
Untracked: output/deg_shIFI6_Day3_vs_shSCR_Day0.tsv
Untracked: output/deg_shIFI6_Day4_vs_shSCR_Day0.tsv
Untracked: output/deg_shIFI6_Day5_vs_shSCR_Day0.tsv
Untracked: output/deg_shSCR_Day2_vs_shSCR_Day0.tsv
Untracked: output/deg_shSCR_Day3_vs_shSCR_Day0.tsv
Untracked: output/deg_shSCR_Day4_vs_shSCR_Day0.tsv
Untracked: src/a
Untracked: src/genes2filter_ensemblid.csv
Untracked: src/genes2filter_symbol.csv
Unstaged changes:
Modified: output/deg_shIFI6_1_Day1_vs_shSCR_Day1.tsv
Modified: output/deg_shIFI6_1_Day2_vs_shSCR_Day2.tsv
Modified: output/deg_shIFI6_1_Day3_vs_shSCR_Day3.tsv
Modified: output/deg_shIFI6_1_Day4_vs_shSCR_Day4.tsv
Modified: output/deg_shIFI6_1_Day5_vs_shSCR_Day5.tsv
Modified: output/deg_shIFI6_2_Day1_vs_shSCR_Day1.tsv
Modified: output/deg_shIFI6_2_Day2_vs_shSCR_Day2.tsv
Modified: output/deg_shIFI6_2_Day3_vs_shSCR_Day3.tsv
Modified: output/deg_shIFI6_2_Day4_vs_shSCR_Day4.tsv
Modified: output/deg_shIFI6_2_Day5_vs_shSCR_Day5.tsv
Modified: output/deg_shIFI6_Day1_vs_shIFI6_Day0.tsv
Modified: output/deg_shIFI6_Day1_vs_shSCR_Day1.tsv
Modified: output/deg_shIFI6_Day2_vs_shIFI6_Day0.tsv
Modified: output/deg_shIFI6_Day2_vs_shSCR_Day2.tsv
Modified: output/deg_shIFI6_Day3_vs_shIFI6_Day0.tsv
Modified: output/deg_shIFI6_Day3_vs_shSCR_Day3.tsv
Modified: output/deg_shIFI6_Day4_vs_shIFI6_Day0.tsv
Modified: output/deg_shIFI6_Day4_vs_shSCR_Day4.tsv
Modified: output/deg_shIFI6_Day5_vs_shIFI6_Day0.tsv
Modified: output/deg_shIFI6_Day5_vs_shSCR_Day5.tsv
Modified: src/__utils_rna_seq_functions.R
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were
made to the R Markdown (analysis/05_de_scr_day4_day0.Rmd
)
and HTML (docs/05_de_scr_day4_day0.html
) files. If you’ve
configured a remote Git repository (see ?wflow_git_remote
),
click on the hyperlinks in the table below to view the files as they
were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 57ef6f7 | Mariani_Gianluca_Alessio | 2025-07-15 | Redone analysis with ribosomal genes filter |
html | fea3f82 | Yinxiu Zhan | 2025-07-11 | Build site. |
Rmd | 0de2706 | Yinxiu Zhan | 2025-07-11 | fix |
Rmd | c0a0612 | Yinxiu Zhan | 2025-07-09 | :sparkles: Release first version |
knitr::opts_chunk$set(echo = FALSE,
message = FALSE,
warning = FALSE,
cache = FALSE,
autodep = TRUE,
fig.align = 'center',
fig.width = 10,
fig.height = 8)
The objective of this report is to investigate differential gene expression between the two conditions and to conduct gene ontology enrichment analysis to explore the biological functions involved.
Below is the list of parameters used in this report to define differential gene expression.
Below we show the comparison group considered for the analysis presented in this report. Each group contains all the samples associated to the specific condition we want to conduct the analysis on.
The group is divided into experimental samples and control samples.
Each differential gene expression comparison will be conducted between these two groups.
Group considered:
- Experimental samples
(shSCR_Day4):
shSCR_Day4_rep1
shSCR_Day4_rep2
-
Control Samples
(shSCR_Day0):
shSCR_Day0_rep1
The RNAseq data for this analysis:
The sample population include:
Below we present the PCA analysis conducted on the two specific conditions analyzed in this report.
Although the samples do not form perfectly distinct clusters, the first principal component clearly separates the experimental and control groups. This supports the validity of the samples.
The same analysis will be repeated after removing the outlier samples S61882_S_plus_NuMA_A and S61886_S-NuMA_B to assess whether their removal improves the quality of the differential gene expression results.
The MA plot is a widely used visualization in differential expression analysis that displays the relationship between the average expression (A) and the log fold change (M) for each gene. The x-axis represents the mean expression level across samples, while the y-axis shows the log fold change between groups.
Total number of significant genes: 829
The Volcano plot is a graphical method to visualize differential expression results by combining statistical significance and magnitude of change for each gene. It plots the log2 fold change on the x-axis against the negative log10 of the p-value (or adjusted p-value) on the y-axis.
Below we present two tables, the first includes all the genes identified in the analysis while the second includes only the differentially expressed genes (DEG)
The columns in the table are:
baseMean: The average normalized count of a gene across all samples, reflecting its overall expression level in the dataset.
log2FoldChange: The estimated log2-transformed fold change in expression between two conditions (experimental vs control). Positive values indicate upregulation, negative values indicate downregulation with respect to control.
lfcSE: The standard error associated with the log2 fold change estimate, indicating the variability or uncertainty of the fold change measurement.
stat: The test statistic value calculated for the hypothesis test of whether the log2 fold change differs from zero.
pvalue: The raw p-value corresponding to the statistical test for differential expression; it reflects the probability of observing the data assuming no true difference in expression.
padj: The p-value adjusted for multiple testing (using the Benjamini-Hochberg method) to control the false discovery rate (FDR), providing a more reliable significance measure.
comparison_exp_vs_contr: A label or descriptor indicating the comparison made, specifying which condition is experimental and which is control.
gene: The unique Ensembl identifier for each gene as annotated in the reference genome.
symbol: The gene symbol or common gene name, which is easier to interpret biologically than numerical IDs.
FoldChange: The fold change in linear scale (non-logarithmic), derived from log2FoldChange (i.e., 2^(log2FoldChange)), representing how many times expression has changed.
differentially_expressed: A categorical variable indicating whether the gene is considered differentially expressed (“yes” or “no”) based on the predefined thresholds for significance and fold change described in the next section.
In this table we can find a subset of the previous table that includes the differentially expressed genes (DEGs).
The genes defined as DEGs need to satisfy these two conditions:
Given the significant genes, among the differentially expressed genes previously computed, below a visualization of all the DE genes.
Meaning of Colors
This heatmap displays the expression levels of all genes detected in the RNA-seq dataset across all samples. The values are normalized and transformed (via variance stabilizing transformation) to allow comparison across genes and samples. This comprehensive visualization provides an overview of the global expression patterns, highlighting overall similarities and differences between samples, as well as potential outliers.
Version | Author | Date |
---|---|---|
fea3f82 | Yinxiu Zhan | 2025-07-11 |
Further analysis is done through gene set enrichment analysis, which does not exclude genes based on logfc or adjusted p-value, as done previously. GSEA is performed separately on each subontology: biological processes (BP), cellular components (CC) and molecular functions (MF). The dot plot below shows all the enriched GO terms. The size of each dot correlates with the count of differentially expressed genes associated with each GO term. Furthermore, the color of each dot reflects the significance of the enrichment of the respective GO term, highlighting its relative importance.
shSCR_Day4 vs shSCR_Day0
P value cutoff: 0.05
Version | Author | Date |
---|---|---|
fea3f82 | Yinxiu Zhan | 2025-07-11 |
shSCR_Day4 vs shSCR_Day0
P value cutoff: 0.05
Version | Author | Date |
---|---|---|
fea3f82 | Yinxiu Zhan | 2025-07-11 |
shSCR_Day4 vs shSCR_Day0
P value cutoff: 0.05
Version | Author | Date |
---|---|---|
fea3f82 | Yinxiu Zhan | 2025-07-11 |
We performed a functional enrichment analysis based on Over-Representation Analysis (ORA) using the GO pathway database. Unlike GSEA, which considers the entire ranked list of genes, ORA focuses only on genes that meet specific differential expression thresholds (adjusted p-value and log2 fold change). The analysis was conducted separately for upregulated and downregulated genes to identify GO pathways that are significantly enriched in each group, compared to what would be expected by chance. This allows for a clearer biological interpretation of distinct transcriptional programs activated or suppressed in the dataset. The dot plots below display all significantly enriched GO pathways. Each dot’s size represents the number of differentially expressed genes associated with the pathway, while the color reflects the statistical significance of the enrichment (adjusted p-value).
shSCR_Day4 vs shSCR_Day0
P value cutoff: 0.05
Q value cutoff: 0.05
Version | Author | Date |
---|---|---|
fea3f82 | Yinxiu Zhan | 2025-07-11 |
shSCR_Day4 vs shSCR_Day0 : No enrichment found
P value cutoff: 0.05
Q value cutoff: 0.05
R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: Etc/UTC
tzcode source: system (glibc)
attached base packages:
[1] stats4 grid stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] ReactomePA_1.53.0 tibble_3.3.0
[3] limma_3.65.1 org.Hs.eg.db_3.21.0
[5] AnnotationDbi_1.71.0 git2r_0.36.2
[7] gridExtra_2.3 WGCNA_1.73
[9] fastcluster_1.3.0 dynamicTreeCut_1.63-1
[11] dplyr_1.1.4 clusterProfiler_4.17.0
[13] reshape_0.8.10 DT_0.33
[15] gplots_3.2.0 RColorBrewer_1.1-3
[17] rtracklayer_1.69.0 DESeq2_1.49.1
[19] SummarizedExperiment_1.39.0 Biobase_2.69.0
[21] MatrixGenerics_1.21.0 matrixStats_1.5.0
[23] GenomicRanges_1.61.0 GenomeInfoDb_1.45.4
[25] IRanges_2.43.0 S4Vectors_0.47.0
[27] BiocGenerics_0.55.0 generics_0.1.4
[29] ComplexHeatmap_2.25.0 plotly_4.10.4
[31] ggplot2_3.5.2
loaded via a namespace (and not attached):
[1] splines_4.5.0 later_1.4.2 BiocIO_1.19.0
[4] bitops_1.0-9 ggplotify_0.1.2 R.oo_1.27.1
[7] polyclip_1.10-7 preprocessCore_1.71.0 graph_1.87.0
[10] XML_3.99-0.18 rpart_4.1.24 lifecycle_1.0.4
[13] doParallel_1.0.17 rprojroot_2.0.4 MASS_7.3-65
[16] lattice_0.22-7 crosstalk_1.2.1 backports_1.5.0
[19] magrittr_2.0.3 Hmisc_5.2-3 sass_0.4.10
[22] rmarkdown_2.29 jquerylib_0.1.4 yaml_2.3.10
[25] httpuv_1.6.16 ggtangle_0.0.6 cowplot_1.1.3
[28] DBI_1.2.3 abind_1.4-8 purrr_1.0.4
[31] R.utils_2.13.0 ggraph_2.2.1 RCurl_1.98-1.17
[34] yulab.utils_0.2.0 nnet_7.3-20 rappdirs_0.3.3
[37] tweenr_2.0.3 circlize_0.4.16 enrichplot_1.29.1
[40] ggrepel_0.9.6 tidytree_0.4.6 reactome.db_1.92.0
[43] codetools_0.2-20 DelayedArray_0.35.1 ggforce_0.5.0
[46] DOSE_4.3.0 tidyselect_1.2.1 shape_1.4.6.1
[49] aplot_0.2.5 UCSC.utils_1.5.0 farver_2.1.2
[52] viridis_0.6.5 base64enc_0.1-3 GenomicAlignments_1.45.0
[55] jsonlite_2.0.0 GetoptLong_1.0.5 tidygraph_1.3.1
[58] Formula_1.2-5 survival_3.8-3 iterators_1.0.14
[61] foreach_1.5.2 tools_4.5.0 treeio_1.33.0
[64] Rcpp_1.0.14 glue_1.8.0 SparseArray_1.9.0
[67] xfun_0.52 qvalue_2.41.0 withr_3.0.2
[70] fastmap_1.2.0 caTools_1.18.3 digest_0.6.37
[73] R6_2.6.1 gridGraphics_0.5-1 colorspace_2.1-1
[76] GO.db_3.21.0 gtools_3.9.5 RSQLite_2.4.0
[79] R.methodsS3_1.8.2 tidyr_1.3.1 data.table_1.17.6
[82] graphlayouts_1.2.2 httr_1.4.7 htmlwidgets_1.6.4
[85] S4Arrays_1.9.1 graphite_1.55.0 whisker_0.4.1
[88] pkgconfig_2.0.3 gtable_0.3.6 blob_1.2.4
[91] impute_1.83.0 workflowr_1.7.1 XVector_0.49.0
[94] htmltools_0.5.8.1 fgsea_1.35.0 clue_0.3-66
[97] scales_1.4.0 png_0.1-8 ggfun_0.1.8
[100] knitr_1.50 rstudioapi_0.17.1 reshape2_1.4.4
[103] rjson_0.2.23 checkmate_2.3.2 nlme_3.1-168
[106] curl_6.4.0 cachem_1.1.0 GlobalOptions_0.1.2
[109] stringr_1.5.1 KernSmooth_2.23-26 parallel_4.5.0
[112] foreign_0.8-90 restfulr_0.0.15 pillar_1.10.2
[115] vctrs_0.6.5 promises_1.3.3 cluster_2.1.8.1
[118] htmlTable_2.4.3 evaluate_1.0.4 cli_3.6.5
[121] locfit_1.5-9.12 compiler_4.5.0 Rsamtools_2.25.0
[124] rlang_1.1.6 crayon_1.5.3 labeling_0.4.3
[127] plyr_1.8.9 fs_1.6.6 stringi_1.8.7
[130] viridisLite_0.4.2 BiocParallel_1.42.1 Biostrings_2.77.1
[133] lazyeval_0.2.2 GOSemSim_2.35.0 Matrix_1.7-3
[136] patchwork_1.3.0 bit64_4.6.0-1 statmod_1.5.0
[139] KEGGREST_1.49.0 igraph_2.1.4 memoise_2.0.1
[142] bslib_0.9.0 ggtree_3.17.0 fastmatch_1.1-6
[145] bit_4.6.0 ape_5.8-1 gson_0.1.0