Data Library

Are you looking to access preclinical data to further advance understanding of human disease and safety? We are providing access to data sets to enhance understanding of translation to human efficacy and safety.


We offer access to preclinical data sets on our early development compounds for data mining and research purposes. The aim is to enhance understanding of translation to human efficacy and safety. 

Available data sets:

  • Preclinical safety data:  contains in vivo data in standard models to provide insight into compounds and explore relationships in data to better understand preclinical safety profiles and translation to human safety
  • Oncology combinations data:  contains 11,000 data points from over 100 oncology drugs tested in combination, for the purpose of assessing and predicting drug combination synergies.

Interested investigators are invited to:

  • Learn more about the available data sets through the information on this site
  • Submit a brief proposal on how you intend to use the data set
  • Access our data once your application has been approved

Functional genomics cellular imaging data

The Functional Genomics group at AstraZeneca carries out druggable and genome wide arrayed CRISPR screening in cellular models of disease in order to better understand the link between disease phenotype and target gene. The functional genomic screens are performed in an arrayed format where the cells in each well of a microtitre plate receive treatment with a different gRNA for a gene. Treated cells are imaged using high content microscopy for the effect of the gene knockout on cellular phenotype and disease specific biomarkers.

Multiple image datasets are available from various arrayed CRISPR cellular imaging projects, comprising of images and annotations of genetic perturbation and treatment conditions.

1.   Cell Lethality

  • Whole genome arrayed CRISPR screen in an H358 cell line
  • Fluorescent images of nuclear marker, cellular Cas9-GFP expression and MAPK pathway biomarkers

2.   Lipid Nanoparticle Uptake

  • Druggable genome arrayed CRISPR screen in an H358 cell line
  • Fluorescent images of nuclear marker, cellular Cas9-GFP expression and expression of mCherry mRNA

3.   Androgen Receptor Modulation

  • Arrayed CRISPR screens profiled in various prostate cell lines (LnCAP, LnCAP95, AD1, D567) and hormone treatment
  • Fluorescent and brightfield images of nuclear marker, Cas9-BFP expression and biology specific biomarkers (Androgen receptor and FKBP5)

4.   Estrogen Receptor Modulation

  • Arrayed CRISPR screens profiled in a breast cancer cell line with and without fulvestrant treatment
  • Cell line used was an inducible MCF7 Cas9
  • Cellular images with fluorescent markers for nuclei, cellular Cas9-BFP expression and Estrogen receptor expression

Let’s collaborate in image and data analytics

Are you a data, analytical or computer science research group that profiles data sets with algorithms to interrogate patterns in data and link to biological outcomes? Our aim is to create a wider insight into this data; leverage by combining with other data sets and develop novel methodologies to extract information from the cellular images. We invite you to submit a proposal outlining how your analysis can unlock new insight into this data.


Preclinical safety data

The Functional Genomics group at AstraZeneca carries out druggable and genome wide arrayed CRISPR screening in cellular models of disease in order to better understand the link between disease phenotype and target gene. The functional genomic screens are performed in an arrayed format where the cells in each well of a microtitre plate receive treatment with a different gRNA for a gene. Treated cells are imaged using high content microscopy for the effect of the gene knockout on cellular phenotype and disease specific biomarkers.

Multiple image datasets are available from various arrayed CRISPR cellular imaging projects, comprising of images and annotations of genetic perturbation and treatment conditions.

1.   Cell Lethality

  • Whole genome arrayed CRISPR screen in an H358 cell line
  • Fluorescent images of nuclear marker, cellular Cas9-GFP expression and MAPK pathway biomarkers

2.   Lipid Nanoparticle Uptake

  • Druggable genome arrayed CRISPR screen in an H358 cell line
  • Fluorescent images of nuclear marker, cellular Cas9-GFP expression and expression of mCherry mRNA

3.   Androgen Receptor Modulation

  • Arrayed CRISPR screens profiled in various prostate cell lines (LnCAP, LnCAP95, AD1, D567) and hormone treatment
  • Fluorescent and brightfield images of nuclear marker, Cas9-BFP expression and biology specific biomarkers (Androgen receptor and FKBP5)

4.   Estrogen Receptor Modulation

  • Arrayed CRISPR screens profiled in a breast cancer cell line with and without fulvestrant treatment
  • Cell line used was an inducible MCF7 Cas9
  • Cellular images with fluorescent markers for nuclei, cellular Cas9-BFP expression and Estrogen receptor expression

Let’s collaborate in image and data analytics

Are you a data, analytical or computer science research group that profiles data sets with algorithms to interrogate patterns in data and link to biological outcomes? Our aim is to create a wider insight into this data; leverage by combining with other data sets and develop novel methodologies to extract information from the cellular images. We invite you to submit a proposal outlining how your analysis can unlock new insight into this data.



Oncology combinations data

In order to accelerate the understanding of oncology drug combination synergies, AstraZeneca is sharing over 11,000 pre-clinical pharmacology data points.  These data enable you to explore fundamental traits that underlie effective combination treatments and synergistic drug behavior.

Available data:

  • Phenotypic (cell viability) data from over 11,000 experiments testing over 100 drugs paired at various dose combinations in up to 85 cancer cell lines, primarily colon, lung, and breast cancer. Comprehensive monotherapy drug response data for each drug and cell line.
  • “Synergy score” comparing drug combination to respective monotherapy effects in each cell line.
  • Target and chemical properties of drugs including gene names of protein target, molecular weight, H-bond acceptors, H-bond donors, cLogP, Lipinski's rule of 5.
  • Ability to link to deep molecular profiles for respective cell panels in public resources such as GDSC/COSMIC and CCLE.

Let's partner to uncover new oncology combinations

Are you a scientist striving to identify novel oncology drug combinations? Do you hope to understand the fundamental traits that underlie effective drug combinations?  Do you aim to identify patients most likely to benefit from drug combinations?  If so, we invite you to submit a proposal outlining how you think analysis of our data can help uncover new paths. We encourage you to include your own background knowledge and data into the analysis.


Transcriptomic profiling data

We have completed a project where a set of 32 compounds were assessed against two cell lines for their RNA signature profiles. This is only the start of this project as the power is in the analytical interpretation of the data. We have therefore decided to share these data to allow other groups access to the data and enable further modelling.

The data available comprises:

·       32 compounds that have been profiled against two cells lines at two concentrations (high dose and low dose)

·       The cell lines used were A549 and MCF7

·       Compounds were assessed as monotherapies

·       RNAseq data; raw data files will be sent with corresponding identifiers

The data can be released as blinded or unblinded regarding the mechanism of action of the compounds, depending on the analysis request.

Let’s collaborate in Data Analytics

Are you a data, analytical or computer science research group that profiles data sets with algorithms to interrogate patterns in data and link to biological outcomes? We would encourage these data to be combined with other data sets to expand the biological outcome. Our aim is to create a wider insight into these data, the mechanisms and potential benefit to the patient and we invite you to submit a proposal outlining how your analysis can unlock new insight into these data.