Last updated: 2025-07-15

Checks: 7 0

Knit directory: frascolla_chemoresistance/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(20250522) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 57ef6f7. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Untracked files:
    Untracked:  .DS_Store
    Untracked:  data/samples_info_sh1plus2_vs_scr.tsv
    Untracked:  output/deg_shIFI6_Day0_vs_shSCR_Day0.tsv
    Untracked:  output/deg_shIFI6_Day1_vs_shSCR_Day0.tsv
    Untracked:  output/deg_shIFI6_Day2_vs_shSCR_Day0.tsv
    Untracked:  output/deg_shIFI6_Day3_vs_shSCR_Day0.tsv
    Untracked:  output/deg_shIFI6_Day4_vs_shSCR_Day0.tsv
    Untracked:  output/deg_shIFI6_Day5_vs_shSCR_Day0.tsv
    Untracked:  output/deg_shSCR_Day2_vs_shSCR_Day0.tsv
    Untracked:  output/deg_shSCR_Day4_vs_shSCR_Day0.tsv
    Untracked:  src/a
    Untracked:  src/genes2filter_ensemblid.csv
    Untracked:  src/genes2filter_symbol.csv

Unstaged changes:
    Modified:   output/deg_shIFI6_1_Day1_vs_shSCR_Day1.tsv
    Modified:   output/deg_shIFI6_1_Day2_vs_shSCR_Day2.tsv
    Modified:   output/deg_shIFI6_1_Day3_vs_shSCR_Day3.tsv
    Modified:   output/deg_shIFI6_1_Day4_vs_shSCR_Day4.tsv
    Modified:   output/deg_shIFI6_1_Day5_vs_shSCR_Day5.tsv
    Modified:   output/deg_shIFI6_2_Day1_vs_shSCR_Day1.tsv
    Modified:   output/deg_shIFI6_2_Day2_vs_shSCR_Day2.tsv
    Modified:   output/deg_shIFI6_2_Day3_vs_shSCR_Day3.tsv
    Modified:   output/deg_shIFI6_2_Day4_vs_shSCR_Day4.tsv
    Modified:   output/deg_shIFI6_2_Day5_vs_shSCR_Day5.tsv
    Modified:   output/deg_shIFI6_Day1_vs_shIFI6_Day0.tsv
    Modified:   output/deg_shIFI6_Day1_vs_shSCR_Day1.tsv
    Modified:   output/deg_shIFI6_Day2_vs_shIFI6_Day0.tsv
    Modified:   output/deg_shIFI6_Day2_vs_shSCR_Day2.tsv
    Modified:   output/deg_shIFI6_Day3_vs_shIFI6_Day0.tsv
    Modified:   output/deg_shIFI6_Day3_vs_shSCR_Day3.tsv
    Modified:   output/deg_shIFI6_Day4_vs_shIFI6_Day0.tsv
    Modified:   output/deg_shIFI6_Day4_vs_shSCR_Day4.tsv
    Modified:   output/deg_shIFI6_Day5_vs_shIFI6_Day0.tsv
    Modified:   output/deg_shIFI6_Day5_vs_shSCR_Day5.tsv
    Modified:   src/__utils_rna_seq_functions.R

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/04_de_ifi6_day5_day0.Rmd) and HTML (docs/04_de_ifi6_day5_day0.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 57ef6f7 Mariani_Gianluca_Alessio 2025-07-15 Redone analysis with ribosomal genes filter
html fea3f82 Yinxiu Zhan 2025-07-11 Build site.
html 3839c7f Yinxiu Zhan 2025-07-10 Build site.
Rmd 4b13018 Yinxiu Zhan 2025-07-10 Add all reports
Rmd c0a0612 Yinxiu Zhan 2025-07-09 :sparkles: Release first version

knitr::opts_chunk$set(echo       = FALSE,
                      message    = FALSE,
                      warning    = FALSE,
                      cache      = FALSE,
                      autodep    = TRUE,
                      fig.align  = 'center',
                      fig.width  = 10,
                      fig.height = 8)

shIFI6_Day5 vs shIFI6_Day0

The objective of this report is to investigate differential gene expression between the two conditions and to conduct gene ontology enrichment analysis to explore the biological functions involved.

Parameters

Below is the list of parameters used in this report to define differential gene expression.

  • logfc = 1
  • qvalue = 0.05 , Adjusted p-value threshold (false discovery rate)
  • Lowly expressed genes are removed to reduce noise. Lowly expressed genes are here considered as:
    • genes having total number of reads less than half of the total samples, 16 ;
    • genes expressed in less samples than the total number of conditions, 12 .

Comparison Group and Differential Gene Expression Analysis Plan

Below we show the comparison group considered for the analysis presented in this report. Each group contains all the samples associated to the specific condition we want to conduct the analysis on.

The group is divided into experimental samples and control samples.

Each differential gene expression comparison will be conducted between these two groups.

Group considered:

    - Experimental samples (shIFI6_Day5):
        shIFI6_1_Day5_rep1
        shIFI6_1_Day5_rep2
        shIFI6_2_Day5_rep1
        shIFI6_2_Day5_rep2

    - Control Samples (shIFI6_Day0):
        shIFI6_1_Day0_rep1
        shIFI6_1_Day0_rep2
        shIFI6_2_Day0_rep1
        shIFI6_2_Day0_rep2


The RNAseq data for this analysis:

  • aim to investigate what is the difference in gene expression between shIFI6_Day5 and shIFI6_Day0

The sample population include:

  • 23 samples, 2 conditions, shIFI6_Day5 and shIFI6_Day0 in 2 replicates each.

PCA

Below we present the PCA analysis conducted on the two specific conditions analyzed in this report.



Interpretation PCA Analysis

Although the samples do not form perfectly distinct clusters, the first principal component clearly separates the experimental and control groups. This supports the validity of the samples.

The same analysis will be repeated after removing the outlier samples S61882_S_plus_NuMA_A and S61886_S-NuMA_B to assess whether their removal improves the quality of the differential gene expression results.

MA plot and volcano plot

MA plot

The MA plot is a widely used visualization in differential expression analysis that displays the relationship between the average expression (A) and the log fold change (M) for each gene. The x-axis represents the mean expression level across samples, while the y-axis shows the log fold change between groups.

Total number of significant genes: 2314



Volcano plot

The Volcano plot is a graphical method to visualize differential expression results by combining statistical significance and magnitude of change for each gene. It plots the log2 fold change on the x-axis against the negative log10 of the p-value (or adjusted p-value) on the y-axis.



Tables of genes

Below we present two tables, the first includes all the genes identified in the analysis while the second includes only the differentially expressed genes (DEG)

Table of all genes

The columns in the table are:

  • baseMean: The average normalized count of a gene across all samples, reflecting its overall expression level in the dataset.

  • log2FoldChange: The estimated log2-transformed fold change in expression between two conditions (experimental vs control). Positive values indicate upregulation, negative values indicate downregulation with respect to control.

  • lfcSE: The standard error associated with the log2 fold change estimate, indicating the variability or uncertainty of the fold change measurement.

  • stat: The test statistic value calculated for the hypothesis test of whether the log2 fold change differs from zero.

  • pvalue: The raw p-value corresponding to the statistical test for differential expression; it reflects the probability of observing the data assuming no true difference in expression.

  • padj: The p-value adjusted for multiple testing (using the Benjamini-Hochberg method) to control the false discovery rate (FDR), providing a more reliable significance measure.

  • comparison_exp_vs_contr: A label or descriptor indicating the comparison made, specifying which condition is experimental and which is control.

  • gene: The unique Ensembl identifier for each gene as annotated in the reference genome.

  • symbol: The gene symbol or common gene name, which is easier to interpret biologically than numerical IDs.

  • FoldChange: The fold change in linear scale (non-logarithmic), derived from log2FoldChange (i.e., 2^(log2FoldChange)), representing how many times expression has changed.

  • differentially_expressed: A categorical variable indicating whether the gene is considered differentially expressed (“yes” or “no”) based on the predefined thresholds for significance and fold change described in the next section.



Table of differentially expressed genes

In this table we can find a subset of the previous table that includes the differentially expressed genes (DEGs).

The genes defined as DEGs need to satisfy these two conditions:

  • The associated padj (p-value adjusted for multiple testing) must be inferior to the qvalue of 0.05 ;
  • The associated log2FoldChange absolute value must be superior to the logfc value of 1 .

Heatmaps

Given the significant genes, among the differentially expressed genes previously computed, below a visualization of all the DE genes.

Meaning of Colors

  • Red: Indicates high expression for that gene in a given sample (value above average, positive compared to the standardized scale).
  • Blue: Indicates low expression for that gene in a given sample (value below average, negative compared to the standardized scale).
  • White (or intermediate color): Indicates an expression close to the average (standardized value around 0).

Heatmap for all genes

This heatmap displays the expression levels of all genes detected in the RNA-seq dataset across all samples. The values are normalized and transformed (via variance stabilizing transformation) to allow comparison across genes and samples. This comprehensive visualization provides an overview of the global expression patterns, highlighting overall similarities and differences between samples, as well as potential outliers.



Version Author Date
3839c7f Yinxiu Zhan 2025-07-10

Gene set enrichment analysis

Further analysis is done through gene set enrichment analysis, which does not exclude genes based on logfc or adjusted p-value, as done previously. GSEA is performed separately on each subontology: biological processes (BP), cellular components (CC) and molecular functions (MF). The dot plot below shows all the enriched GO terms. The size of each dot correlates with the count of differentially expressed genes associated with each GO term. Furthermore, the color of each dot reflects the significance of the enrichment of the respective GO term, highlighting its relative importance.

GO - Biological Processes

  • shIFI6_Day5 vs shIFI6_Day0

  • P value cutoff: 0.05

Version Author Date
3839c7f Yinxiu Zhan 2025-07-10

GO - Cellular Components

  • shIFI6_Day5 vs shIFI6_Day0

  • P value cutoff: 0.05

Version Author Date
3839c7f Yinxiu Zhan 2025-07-10

GO - Molecular Functions

  • shIFI6_Day5 vs shIFI6_Day0

  • P value cutoff: 0.05

Version Author Date
3839c7f Yinxiu Zhan 2025-07-10

Over representation analysis

We performed a functional enrichment analysis based on Over-Representation Analysis (ORA) using the GO pathway database. Unlike GSEA, which considers the entire ranked list of genes, ORA focuses only on genes that meet specific differential expression thresholds (adjusted p-value and log2 fold change). The analysis was conducted separately for upregulated and downregulated genes to identify GO pathways that are significantly enriched in each group, compared to what would be expected by chance. This allows for a clearer biological interpretation of distinct transcriptional programs activated or suppressed in the dataset. The dot plots below display all significantly enriched GO pathways. Each dot’s size represents the number of differentially expressed genes associated with the pathway, while the color reflects the statistical significance of the enrichment (adjusted p-value).

ORA - UP

  • shIFI6_Day5 vs shIFI6_Day0

  • P value cutoff: 0.05

  • Q value cutoff: 0.05

Version Author Date
3839c7f Yinxiu Zhan 2025-07-10

ORA - DOWN

  • shIFI6_Day5 vs shIFI6_Day0

  • P value cutoff: 0.05

  • Q value cutoff: 0.05

Version Author Date
3839c7f Yinxiu Zhan 2025-07-10

R version 4.5.0 (2025-04-11)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] stats4    grid      stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] ReactomePA_1.53.0           tibble_3.3.0               
 [3] limma_3.65.1                org.Hs.eg.db_3.21.0        
 [5] AnnotationDbi_1.71.0        git2r_0.36.2               
 [7] gridExtra_2.3               WGCNA_1.73                 
 [9] fastcluster_1.3.0           dynamicTreeCut_1.63-1      
[11] dplyr_1.1.4                 clusterProfiler_4.17.0     
[13] reshape_0.8.10              DT_0.33                    
[15] gplots_3.2.0                RColorBrewer_1.1-3         
[17] rtracklayer_1.69.0          DESeq2_1.49.1              
[19] SummarizedExperiment_1.39.0 Biobase_2.69.0             
[21] MatrixGenerics_1.21.0       matrixStats_1.5.0          
[23] GenomicRanges_1.61.0        GenomeInfoDb_1.45.4        
[25] IRanges_2.43.0              S4Vectors_0.47.0           
[27] BiocGenerics_0.55.0         generics_0.1.4             
[29] ComplexHeatmap_2.25.0       plotly_4.10.4              
[31] ggplot2_3.5.2              

loaded via a namespace (and not attached):
  [1] splines_4.5.0            later_1.4.2              BiocIO_1.19.0           
  [4] bitops_1.0-9             ggplotify_0.1.2          R.oo_1.27.1             
  [7] polyclip_1.10-7          preprocessCore_1.71.0    graph_1.87.0            
 [10] XML_3.99-0.18            rpart_4.1.24             lifecycle_1.0.4         
 [13] doParallel_1.0.17        rprojroot_2.0.4          MASS_7.3-65             
 [16] lattice_0.22-7           crosstalk_1.2.1          backports_1.5.0         
 [19] magrittr_2.0.3           Hmisc_5.2-3              sass_0.4.10             
 [22] rmarkdown_2.29           jquerylib_0.1.4          yaml_2.3.10             
 [25] httpuv_1.6.16            ggtangle_0.0.6           cowplot_1.1.3           
 [28] DBI_1.2.3                abind_1.4-8              purrr_1.0.4             
 [31] R.utils_2.13.0           ggraph_2.2.1             RCurl_1.98-1.17         
 [34] yulab.utils_0.2.0        nnet_7.3-20              rappdirs_0.3.3          
 [37] tweenr_2.0.3             circlize_0.4.16          enrichplot_1.29.1       
 [40] ggrepel_0.9.6            tidytree_0.4.6           reactome.db_1.92.0      
 [43] codetools_0.2-20         DelayedArray_0.35.1      ggforce_0.5.0           
 [46] DOSE_4.3.0               tidyselect_1.2.1         shape_1.4.6.1           
 [49] aplot_0.2.5              UCSC.utils_1.5.0         farver_2.1.2            
 [52] viridis_0.6.5            base64enc_0.1-3          GenomicAlignments_1.45.0
 [55] jsonlite_2.0.0           GetoptLong_1.0.5         tidygraph_1.3.1         
 [58] Formula_1.2-5            survival_3.8-3           iterators_1.0.14        
 [61] foreach_1.5.2            tools_4.5.0              treeio_1.33.0           
 [64] Rcpp_1.0.14              glue_1.8.0               SparseArray_1.9.0       
 [67] xfun_0.52                qvalue_2.41.0            withr_3.0.2             
 [70] fastmap_1.2.0            caTools_1.18.3           digest_0.6.37           
 [73] R6_2.6.1                 gridGraphics_0.5-1       colorspace_2.1-1        
 [76] GO.db_3.21.0             gtools_3.9.5             RSQLite_2.4.0           
 [79] R.methodsS3_1.8.2        tidyr_1.3.1              data.table_1.17.6       
 [82] graphlayouts_1.2.2       httr_1.4.7               htmlwidgets_1.6.4       
 [85] S4Arrays_1.9.1           graphite_1.55.0          whisker_0.4.1           
 [88] pkgconfig_2.0.3          gtable_0.3.6             blob_1.2.4              
 [91] impute_1.83.0            workflowr_1.7.1          XVector_0.49.0          
 [94] htmltools_0.5.8.1        fgsea_1.35.0             clue_0.3-66             
 [97] scales_1.4.0             png_0.1-8                ggfun_0.1.8             
[100] knitr_1.50               rstudioapi_0.17.1        reshape2_1.4.4          
[103] rjson_0.2.23             checkmate_2.3.2          nlme_3.1-168            
[106] curl_6.4.0               cachem_1.1.0             GlobalOptions_0.1.2     
[109] stringr_1.5.1            KernSmooth_2.23-26       parallel_4.5.0          
[112] foreign_0.8-90           restfulr_0.0.15          pillar_1.10.2           
[115] vctrs_0.6.5              promises_1.3.3           cluster_2.1.8.1         
[118] htmlTable_2.4.3          evaluate_1.0.4           cli_3.6.5               
[121] locfit_1.5-9.12          compiler_4.5.0           Rsamtools_2.25.0        
[124] rlang_1.1.6              crayon_1.5.3             labeling_0.4.3          
[127] plyr_1.8.9               fs_1.6.6                 stringi_1.8.7           
[130] viridisLite_0.4.2        BiocParallel_1.42.1      Biostrings_2.77.1       
[133] lazyeval_0.2.2           GOSemSim_2.35.0          Matrix_1.7-3            
[136] patchwork_1.3.0          bit64_4.6.0-1            statmod_1.5.0           
[139] KEGGREST_1.49.0          igraph_2.1.4             memoise_2.0.1           
[142] bslib_0.9.0              ggtree_3.17.0            fastmatch_1.1-6         
[145] bit_4.6.0                ape_5.8-1                gson_0.1.0