The regulatory build provides a single 'best guess' set of regulatory features. These features are based on the information contained within the Ensembl funcgen database. Output and supporting data from the Regulatory Build are available in 'Region In Detail' and the various regulation displays. Configuration is available under the 'Regulation' menu item:
The 'Regulatory Build' is performed by overlap analysis of annotations from data sets in a two stage cell type aware manner.
In stage one, core regions are identified across all available cell types using 'focus' features, which are chosen to define a set of potential binding sites. These tend to be broad coverage, narrowly focused marks which are likely candidates for different types of regulatory elements or motifs. Focus feature types include DNase1 which is known to mark accessible chromatin, TFBSs and CTCF, which characterises 'insulator/enhancer' elements. As such the core regions of regulatory features are likely to be positioned on or around any potential regulatory motif. Core regions are extended only in the case of direct overlap with another focus feature. To maintain resolution and to avoid chaining of regulatory features across regions of dense regulatory elements a 2KB cut-off is imposed. Exceeding this cut-off causes the offending focus feature to be treated as an attribute feature (see below) and so does not extend the core region.
Stage two extends the structure in a cell type specific manner, using 'attribute' features. Attribute features do not define a binding site and are some times longer ranging feature types which are useful for classification, such as histone modifications. If core data exists for a given cell type, a Regulatory Feature is seeded using the core region defined in stage one. The arms or bounds are defined by overlap of attribute features with respect to the core region. Directly overlapping attribute features are said to have one degree of separation. Attributes with two degrees of separation are only included if they are entirely contained within another longer associated attribute feature. This is done to capture information adjacent and indirectly associated with the core region, whilst avoiding longer range and potentially anomalous associations.
For some cell lines where the is no core data available, but there is substantial other attribute data present, a projection build method is employed. This involves projecting the core region defined by the other cell lines to the 'sparse' cell line. The attribute extension detailed above is then carried out using the projected core region.
These two stages give rise to regulatory FeatureSets for the core 'MultiCell' features and for each available cell type.
Regulatory Features (regfeats) are classified by considering their position on the genome in relation to other classes of feature on the genome (eg genes, repeats, intergenic regions) together with the combination of regulatory attributes they possess as coded in their binary_string. In the binary string each position corresponds to a particular focus or attribute feature and a value of 1 indicates that the regulatory feature overlaps this particular type of focus or attribute feature. A set of randomly distributed features (mockfeats) corresponding to the regfeats in terms of length and chromosome are also generated so that we can judge if the placement of regfeats in relation to the genomic features is non-random.
The first step in the procedure is to determine which genomic features (genfeats) each regfeat overlaps. A single common basepair is sufficient to consider two features overlapping. We do the same with the mockfeats. (Strictly speaking this is not the first step, as we know from experience that certain regulatory features are most probably artefacts and that others contain no useable information so these are filtered out before the procedure begins and the mockfeats correspond to only the filtered set of regfeats).
Next we create a set of patterns of attributes we wish to evaluate. Currently this is all the patterns which occur in the display labels more than once, plus all the patterns which can be created by re-setting one bit of the existing patterns from 1 to 0.
For each pattern, we look at all the regulatory features which have the same or more bits set. If there are more than 100 such regfeats we count the number of times these features overlap each class of genfeat. We do the same count with the set of mockfeats which correspond to the regfeats. If >50% of the regfeats overlap a particular class of genfeat and the chi-squared statistic calculated using the mockfeat count as the 'expected' value is >8.0 (P0.005) we record that this pattern is associated with this class of genfeat.
If the pattern IS associated with a genfeat we collect a second set of patterns which have this pattern's PLUS any other bits set. For each of these patterns we look at all the regulatory features which have the same or more bits set and we count the number of times these features overlap each class of genfeat. If less than 50% of the regfeats overlap we record that this second pattern is not associated with the class of genfeat involved.
Having determined all the associated and non-associated patterns for each class of genfeat, we look at all the regfeats and use the 'associated' and then 'not-associated' patterns to set or unset a flag indicating whether the particular regfeat is associated with a particular class of genfeat. During this process it is possible for a given regfeat to be associated with more than one class of genfeat and some of these can be contradictory. This is particularly the case where all or nearly all the bits are set.
Finally, for the purposes of the regulatory build, there is a set of rules which 1. resolve conflicts amongst the above flags and 2. assign a regulatory feature_type to the regfeat. The following types are currently in use :-
non-gene-associated = patterns under-represented in gene regions
gene-associated = patterns over-represented in gene regions
promoter-associated = patterns over-represented in the region of the transcription start site plus or minus 2500 bp upstream of protein coding genes, but not in the downstream 'gene-body'.
polIII-associated = patterns over-represented in regions 2500 bp upstream of polIII transcribed regions like tRNAs.
At present only cell-type specific regulatory features are classified as different cell types may give conflicting signals reflecting their unique combination of regulatory and transcriptional states.
These data sets can be displayed along the chromosome in 'Region in Detail', displayed for a gene in the 'Regulation View' view or mined from the functional genomics database.
For each transcription factor (TF) which has both a ChIP-seq data set in the functional genomics database and a publicly available position weight matrix (PWM) we have annotated the position of putative TF binding sites within the peaks called using the ChIP-seq reads.
Initially PWMs are mapped to the genome using the find_pssm_dna program from the MOODS software (1) with the -f flag set and a permissive threshold of 0.001. We then filter these mappings using a log odds score threshold. The threshold is derived per PWM by considering the occurrence of mappings in a sample of randomly positioned 'background' sequences matched in terms of size and chromosome to the ChIP-seq peaks for this TF. We select the threshold such that the proportion of these background peaks containing a mapping is approximately 5%.
Only mappings which overlap the corresponding ChIP-seq peaks are included in the functional genomics database.
PWMs are taken from JASPAR (2).
1. Janne Korhonen, Petri Martinmaki, Cinzia Pizzi, Pasi Rastas, Esko Ukkonen. MOODS: fast search for position weight matrix matches in DNA sequences. Bioinformatics, Vol. 25, No. 23. (1 December 2009), pp. 3181-3182.
2. Bryne JC, Valen E, Tang MH, Marstrand T, Winther O, da Piedade I, Krogh A, Lenhard B, Sandelin A. JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acids Res. 2008 Jan;36(Database issue):D102-6.
Regulatory features are generated by using a variety of genome wide epigenomic data sets. The vast majority of features are derived from ChIP-seq data. In order to offer a uniform set of features, we processed the raw reads from each of the sets. Reads from pooled replicates were aligned to the current genome assembly using the bwa algorithm (Li H and Durbin R, 2010) with default parameters. All matches to mitochondria were filtered out, and the resulting alignments were passed to the SWEMBL peak caller software (S. Wilder et al, in preparation). Peaks for most datasets were obtained using a strict set of parameters (-f 150 -R 0.015) obtained using CTCF as a reference dataset. These parameters are too strict for datasets that have a broad distribution of reads. Thus, we applied a more relaxed set of parameters (-f 150 -R 0.0025) for the DNAse1 datasets. The resulting peaks were then filtered out to avoid problematic areas identified in the ENCODE project.
Transcription Factor | Ensembl Gene | Jaspar Matrix(ces) |
---|---|---|
CTCF | ENSMUSG00000005698 | MA0139.1 |
Cmyc | ENSMUSG00000022346 | MA0147.1 |
E2F1 | ENSMUSG00000027490 | MA0024.1 |
Esrrb | ENSMUSG00000021255 | MA0141.1 |
Klf4 | ENSMUSG00000003032 | MA0039.2 |
Nanog | ENSMUSG00000012396 | |
Oct4 | ENSMUSG00000012396 | |
STAT3 | ENSMUSG00000004040 | MA0144.1 |
Smad1 | ENSMUSG00000031681 | |
Sox2 | ENSMUSG00000074637 | MA0143.1 |
Suz12 | ENSMUSG00000017548 | |
Tcfcp2l1 | ENSMUSG00000026380 | MA0145.1 |
Zfx | ENSMUSG00000079509 | MA0146.1 |
nMyc | ENSMUSG00000037169 | MA0104.2 |
p300 | ENSMUSG00000055024 |
The current release comprises of the following datasets:
CD4 | ||
---|---|---|
Focus Sets | Data type | Reference |
CTCF | ChIP-Seq | 3 |
Attribute Sets | Data type | Reference |
H2AK5ac | ChIP-Seq | 4 |
H2AK9ac | ChIP-Seq | 4 |
H2AZ | ChIP-Seq | 3 |
H2BK120ac | ChIP-Seq | 4 |
H2BK12ac | ChIP-Seq | 4 |
H2BK20ac | ChIP-Seq | 4 |
H2BK5ac | ChIP-Seq | 4 |
H2BK5me1 | ChIP-Seq | 3 |
H3K14ac | ChIP-Seq | 4 |
H3K18ac | ChIP-Seq | 4 |
H3K23ac | ChIP-Seq | 4 |
H3K27ac | ChIP-Seq | 4 |
H3K27me1 | ChIP-Seq | 3 |
H3K27me2 | ChIP-Seq | 3 |
H3K27me3 | ChIP-Seq | 3 |
H3K36ac | ChIP-Seq | 4 |
H3K36me1 | ChIP-Seq | 3 |
H3K36me3 | ChIP-Seq | 3 |
H3K4ac | ChIP-Seq | 4 |
H3K4me1 | ChIP-Seq | 3 |
H3K4me2 | ChIP-Seq | 3 |
H3K4me3 | ChIP-Seq | 3 |
H3K79me1 | ChIP-Seq | 3 |
H3K79me2 | ChIP-Seq | 3 |
H3K79me3 | ChIP-Seq | 3 |
H3K9ac | ChIP-Seq | 4 |
H3K9me1 | ChIP-Seq | 3 |
H3K9me2 | ChIP-Seq | 3 |
H3K9me3 | ChIP-Seq | 3 |
H3R2me1 | ChIP-Seq | 3 |
H3R2me2 | ChIP-Seq | 3 |
H4K12ac | ChIP-Seq | 4 |
H4K16ac | ChIP-Seq | 4 |
H4K20me1 | ChIP-Seq | 3 |
H4K20me3 | ChIP-Seq | 3 |
H4K5ac | ChIP-Seq | 4 |
H4K8ac | ChIP-Seq | 4 |
H4K91ac | ChIP-Seq | 4 |
H4R3me2 | ChIP-Seq | 3 |
PolII | ChIP-Seq | 3 |
GM06990 | ||
Focus Sets | Data type | Reference |
CTCF | ChIP-Seq | 5 |
DNase1 | Dnase-Seq | 5 |
Attribute Sets | Data type | Reference |
H3K27me3 | ChIP-Seq | 5 |
H3K36me3 | ChIP-Seq | 5 |
H3K4me3 | ChIP-Seq | 5 |
GM12878 | ||
Focus Sets | Data type | Reference |
BATF | ChIP-Seq | 10 |
BCL11A | ChIP-Seq | 10 |
BCL3 | ChIP-Seq | 10 |
CTCF | ChIP-Seq | 7 |
CTCF | ChIP-Seq | 6 |
Cfos | ChIP-Seq | 9 |
Cmyc | ChIP-Seq | 6 |
DNase1 | Dnase-Seq | 6 |
DNase1 | Dnase-Seq | 5 |
EBF | ChIP-Seq | 10 |
Egr1 | ChIP-Seq | 10 |
FAIRE | FAIRE-Seq | 6 |
Gabp | ChIP-Seq | 10 |
IRF4 | ChIP-Seq | 10 |
Jund | ChIP-Seq | 9 |
Max | ChIP-Seq | 9 |
NFKB | ChIP-Seq | 9 |
Nrsf | ChIP-Seq | 10 |
POU2F2 | ChIP-Seq | 10 |
PU1 | ChIP-Seq | 10 |
Pax5 | ChIP-Seq | 10 |
Pbx3 | ChIP-Seq | 10 |
Rad21 | ChIP-Seq | 9 |
SP1 | ChIP-Seq | 10 |
Sin3Ak20 | ChIP-Seq | 10 |
Srf | ChIP-Seq | 10 |
TAF1 | ChIP-Seq | 10 |
Tcf12 | ChIP-Seq | 10 |
Tr4 | ChIP-Seq | 9 |
USF1 | ChIP-Seq | 10 |
Yy1 | ChIP-Seq | 9 |
ZBTB33 | ChIP-Seq | 10 |
ZZZ3 | ChIP-Seq | 9 |
p300 | ChIP-Seq | 10 |
Attribute Sets | Data type | Reference |
H3K27ac | ChIP-Seq | 7 |
H3K27me3 | ChIP-Seq | 7 |
H3K27me3 | ChIP-Seq | 5 |
H3K36me3 | ChIP-Seq | 7 |
H3K36me3 | ChIP-Seq | 5 |
H3K4me1 | ChIP-Seq | 7 |
H3K4me2 | ChIP-Seq | 7 |
H3K4me3 | ChIP-Seq | 7 |
H3K9ac | ChIP-Seq | 7 |
H4K20me1 | ChIP-Seq | 7 |
PolII | ChIP-Seq | 10 |
PolII | ChIP-Seq | 6 |
PolII | ChIP-Seq | 9 |
PolIII | ChIP-Seq | 9 |
H1ESC | ||
Focus Sets | Data type | Reference |
CTCF | ChIP-Seq | 7 |
Cmyc | ChIP-Seq | 6 |
DNase1 | Dnase-Seq | 6 |
DNase1 | Dnase-Seq | 5 |
FAIRE | FAIRE-Seq | 6 |
Nrsf | ChIP-Seq | 10 |
TAF1 | ChIP-Seq | 10 |
Attribute Sets | Data type | Reference |
H3K27ac | ChIP-Seq | 11 |
H3K27me3 | ChIP-Seq | 7 |
H3K27me3 | ChIP-Seq | 11 |
H3K27me3 | ChIP-Seq | 11 |
H3K27me3 | ChIP-Seq | 11 |
H3K36me3 | ChIP-Seq | 7 |
H3K36me3 | ChIP-Seq | 11 |
H3K36me3 | ChIP-Seq | 11 |
H3K36me3 | ChIP-Seq | 11 |
H3K4me1 | ChIP-Seq | 7 |
H3K4me1 | ChIP-Seq | 11 |
H3K4me1 | ChIP-Seq | 11 |
H3K4me1 | ChIP-Seq | 11 |
H3K4me2 | ChIP-Seq | 7 |
H3K4me3 | ChIP-Seq | 7 |
H3K4me3 | ChIP-Seq | 11 |
H3K4me3 | ChIP-Seq | 11 |
H3K4me3 | ChIP-Seq | 11 |
H3K9ac | ChIP-Seq | 7 |
H3K9ac | ChIP-Seq | 11 |
H3K9ac | ChIP-Seq | 11 |
H3K9ac | ChIP-Seq | 11 |
H3K9me3 | ChIP-Seq | 11 |
H3K9me3 | ChIP-Seq | 11 |
H3K9me3 | ChIP-Seq | 11 |
H4K20me1 | ChIP-Seq | 7 |
PolII | ChIP-Seq | 10 |
PolII | ChIP-Seq | 6 |
HUVEC | ||
Focus Sets | Data type | Reference |
CTCF | ChIP-Seq | 7 |
CTCF | ChIP-Seq | 6 |
CTCF | ChIP-Seq | 5 |
Cjun | ChIP-Seq | 9 |
Cmyc | ChIP-Seq | 6 |
DNase1 | Dnase-Seq | 6 |
DNase1 | Dnase-Seq | 5 |
FAIRE | FAIRE-Seq | 6 |
Max | ChIP-Seq | 9 |
Attribute Sets | Data type | Reference |
H3K27ac | ChIP-Seq | 7 |
H3K27me3 | ChIP-Seq | 7 |
H3K27me3 | ChIP-Seq | 5 |
H3K36me3 | ChIP-Seq | 7 |
H3K36me3 | ChIP-Seq | 5 |
H3K4me1 | ChIP-Seq | 7 |
H3K4me2 | ChIP-Seq | 7 |
H3K4me3 | ChIP-Seq | 7 |
H3K4me3 | ChIP-Seq | 5 |
H3K9ac | ChIP-Seq | 7 |
H3K9me1 | ChIP-Seq | 7 |
H4K20me1 | ChIP-Seq | 7 |
PolII | ChIP-Seq | 7 |
PolII | ChIP-Seq | 6 |
PolII | ChIP-Seq | 9 |
HeLa | ||
Focus Sets | Data type | Reference |
Ap2alpha | ChIP-Seq | 9 |
Ap2gamma | ChIP-Seq | 9 |
BAF155 | ChIP-Seq | 9 |
BAF170 | ChIP-Seq | 9 |
Bdp1 | ChIP-Seq | 9 |
Brf1 | ChIP-Seq | 9 |
Brf2 | ChIP-Seq | 9 |
Brg1 | ChIP-Seq | 9 |
CTCF | ChIP-Seq | 6 |
CTCF | ChIP-Seq | 5 |
Cfos | ChIP-Seq | 9 |
Cjun | ChIP-Seq | 9 |
Cmyc | ChIP-Seq | 6 |
Cmyc | ChIP-Seq | 9 |
DNase1 | Dnase-Seq | 6 |
DNase1 | Dnase-Seq | 5 |
E2F1 | ChIP-Seq | 9 |
E2F4 | ChIP-Seq | 9 |
E2F6 | ChIP-Seq | 9 |
FAIRE | FAIRE-Seq | 6 |
Gabp | ChIP-Seq | 10 |
Ini1 | ChIP-Seq | 9 |
Jund | ChIP-Seq | 9 |
Max | ChIP-Seq | 9 |
Nrf1 | ChIP-Seq | 9 |
RPC155 | ChIP-Seq | 9 |
TAF1 | ChIP-Seq | 10 |
TFIIIC-110 | ChIP-Seq | 9 |
Tr4 | ChIP-Seq | 9 |
Attribute Sets | Data type | Reference |
H3K27me3 | ChIP-Seq | 5 |
H3K36me3 | ChIP-Seq | 5 |
H3K4me3 | ChIP-Seq | 5 |
PolII | ChIP-Seq | 10 |
PolII | ChIP-Seq | 6 |
PolII | ChIP-Seq | 9 |
HepG2 | ||
Focus Sets | Data type | Reference |
BHLHE40 | ChIP-Seq | 10 |
CTCF | ChIP-Seq | 7 |
CTCF | ChIP-Seq | 6 |
CTCF | ChIP-Seq | 5 |
Cmyc | ChIP-Seq | 6 |
DNase1 | Dnase-Seq | 6 |
DNase1 | Dnase-Seq | 5 |
FAIRE | FAIRE-Seq | 6 |
FOSL2 | ChIP-Seq | 10 |
Gabp | ChIP-Seq | 10 |
HEY1 | ChIP-Seq | 10 |
Jund | ChIP-Seq | 10 |
RXRA | ChIP-Seq | 10 |
SRebp1 | ChIP-Seq | 9 |
SRebp2 | ChIP-Seq | 9 |
Sin3Ak20 | ChIP-Seq | 10 |
USF1 | ChIP-Seq | 10 |
ZBTB33 | ChIP-Seq | 10 |
p300 | ChIP-Seq | 10 |
Attribute Sets | Data type | Reference |
H3K27ac | ChIP-Seq | 7 |
H3K27me3 | ChIP-Seq | 5 |
H3K36me3 | ChIP-Seq | 7 |
H3K36me3 | ChIP-Seq | 5 |
H3K4me2 | ChIP-Seq | 7 |
H3K4me3 | ChIP-Seq | 7 |
H3K4me3 | ChIP-Seq | 5 |
H3K9ac | ChIP-Seq | 7 |
H4K20me1 | ChIP-Seq | 7 |
PolII | ChIP-Seq | 6 |
PolII | ChIP-Seq | 9 |
IMR90 | ||
Focus Sets | Data type | Reference |
DNase1 | Dnase-Seq | 11 |
Attribute Sets | Data type | Reference |
H2AK5ac | ChIP-Seq | 11 |
H2BK120ac | ChIP-Seq | 11 |
H2BK12ac | ChIP-Seq | 11 |
H2BK15ac | ChIP-Seq | 11 |
H2BK20ac | ChIP-Seq | 11 |
H3K14ac | ChIP-Seq | 11 |
H3K18ac | ChIP-Seq | 11 |
H3K23ac | ChIP-Seq | 11 |
H3K27ac | ChIP-Seq | 11 |
H3K27me3 | ChIP-Seq | 11 |
H3K36me3 | ChIP-Seq | 11 |
H3K4ac | ChIP-Seq | 11 |
H3K4me1 | ChIP-Seq | 11 |
H3K4me2 | ChIP-Seq | 11 |
H3K4me3 | ChIP-Seq | 11 |
H3K56ac | ChIP-Seq | 11 |
H3K79me1 | ChIP-Seq | 11 |
H3K79me2 | ChIP-Seq | 11 |
H3K9ac | ChIP-Seq | 11 |
H3K9me3 | ChIP-Seq | 11 |
H4K20me1 | ChIP-Seq | 11 |
H4K5ac | ChIP-Seq | 11 |
H4K8ac | ChIP-Seq | 11 |
H4K91ac | ChIP-Seq | 11 |
K562 | ||
Focus Sets | Data type | Reference |
ATF3 | ChIP-Seq | 9 |
Bdp1 | ChIP-Seq | 9 |
Brf1 | ChIP-Seq | 9 |
Brf2 | ChIP-Seq | 9 |
Brg1 | ChIP-Seq | 9 |
CTCF | ChIP-Seq | 7 |
CTCF | ChIP-Seq | 6 |
CTCF | ChIP-Seq | 5 |
Cfos | ChIP-Seq | 9 |
Cjun | ChIP-Seq | 9 |
Cmyc | ChIP-Seq | 6 |
Cmyc | ChIP-Seq | 9 |
DNase1 | Dnase-Seq | 6 |
DNase1 | Dnase-Seq | 5 |
Egr1 | ChIP-Seq | 10 |
FAIRE | FAIRE-Seq | 6 |
GTF2B | ChIP-Seq | 9 |
Gabp | ChIP-Seq | 10 |
HEY1 | ChIP-Seq | 10 |
Ini1 | ChIP-Seq | 9 |
Jund | ChIP-Seq | 9 |
Max | ChIP-Seq | 9 |
NELFe | ChIP-Seq | 9 |
Nfe2 | ChIP-Seq | 9 |
Nfya | ChIP-Seq | 9 |
Nfyb | ChIP-Seq | 9 |
Nrsf | ChIP-Seq | 10 |
PU1 | ChIP-Seq | 10 |
Rad21 | ChIP-Seq | 9 |
SIX5 | ChIP-Seq | 10 |
SP1 | ChIP-Seq | 10 |
Sin3Ak20 | ChIP-Seq | 10 |
Sirt6 | ChIP-Seq | 9 |
Srf | ChIP-Seq | 10 |
TAF1 | ChIP-Seq | 10 |
TFIIIC-110 | ChIP-Seq | 9 |
USF1 | ChIP-Seq | 10 |
XRCC4 | ChIP-Seq | 9 |
Attribute Sets | Data type | Reference |
Gata1 | ChIP-Seq | 9 |
H3K27ac | ChIP-Seq | 7 |
H3K27me3 | ChIP-Seq | 7 |
H3K27me3 | ChIP-Seq | 5 |
H3K36me3 | ChIP-Seq | 7 |
H3K36me3 | ChIP-Seq | 5 |
H3K4me1 | ChIP-Seq | 7 |
H3K4me2 | ChIP-Seq | 7 |
H3K4me3 | ChIP-Seq | 7 |
H3K4me3 | ChIP-Seq | 5 |
H3K9ac | ChIP-Seq | 7 |
H3K9me1 | ChIP-Seq | 7 |
H4K20me1 | ChIP-Seq | 7 |
PolII | ChIP-Seq | 7 |
PolII | ChIP-Seq | 10 |
PolII | ChIP-Seq | 6 |
PolII | ChIP-Seq | 9 |
PolIII | ChIP-Seq | 9 |
Znf263 | ChIP-Seq | 9 |
K562b (no Regulatory Features built, but data is available). | ||
E2F4 | ChIP-Seq | 9 |
E2F6 | ChIP-Seq | 9 |
Gata1 | ChIP-Seq | 9 |
Gata2 | ChIP-Seq | 9 |
SETDB1 | ChIP-Seq | 9 |
Tr4 | ChIP-Seq | 9 |
Yy1 | ChIP-Seq | 9 |
ZNF274 | ChIP-Seq | 9 |
Znf263 | ChIP-Seq | 9 |
NHEK | ||
Focus Sets | Data type | Reference |
CTCF | ChIP-Seq | 7 |
CTCF | ChIP-Seq | 6 |
CTCF | ChIP-Seq | 5 |
DNase1 | Dnase-Seq | 6 |
DNase1 | Dnase-Seq | 5 |
FAIRE | FAIRE-Seq | 6 |
Attribute Sets | Data type | Reference |
H3K27ac | ChIP-Seq | 7 |
H3K27me3 | ChIP-Seq | 7 |
H3K27me3 | ChIP-Seq | 5 |
H3K36me3 | ChIP-Seq | 7 |
H3K36me3 | ChIP-Seq | 5 |
H3K4me1 | ChIP-Seq | 7 |
H3K4me2 | ChIP-Seq | 7 |
H3K4me3 | ChIP-Seq | 7 |
H3K4me3 | ChIP-Seq | 5 |
H3K9ac | ChIP-Seq | 7 |
H3K9me1 | ChIP-Seq | 7 |
H4K20me1 | ChIP-Seq | 7 |
PolII | ChIP-Seq | 7 |
ES | ||
---|---|---|
Focus Sets | Data type | Reference |
DNase1 | Dnase-Seq | 12 |
CTCF | ChIP-Seq | 13 |
Cmyc | ChIP-Seq | 13 |
E2F1 | ChIP-Seq | 13 |
Esrrb | ChIP-Seq | 13 |
Klf4 | ChIP-Seq | 13 |
Nanog | ChIP-Seq | 13 |
Oct4 | ChIP-Seq | 13 |
STAT3 | ChIP-Seq | 13 |
Smad1 | ChIP-Seq | 13 |
Sox2 | ChIP-Seq | 13 |
Suz12 | ChIP-Seq | 13 |
Tcfcp2I1 | ChIP-Seq | 13 |
Zfx | ChIP-Seq | 13 |
nMyc | ChIP-Seq | 13 |
p300 | ChIP-Seq | 13 |
Attribute Sets | Data type | Reference |
H3 | ChIP-Seq | 14 |
H3K4me3 | ChIP-Seq | 14 |
H3K9me3 | ChIP-Seq | 14 |
H3K27me3 | ChIP-Seq | 14 |
H3K36me3 | ChIP-Seq | 14 |
H4K20me3 | ChIP-Seq | 14 |
PolII | ChIP-Seq | 14 |
ES Hybrid * | ||
Attribute Sets | Data type | Reference |
H3K36me3 | ChIP-Seq | 14 |
H3K4me3 | ChIP-Seq | 14 |
H3K9me3 | ChIP-Seq | 14 |
MEF * | ||
Attribute Sets | Data type | Reference |
H3K27me3 | ChIP-Seq | 14 |
H3K36me3 | ChIP-Seq | 14 |
H3K4me3 | ChIP-Seq | 14 |
H3K9me3 | ChIP-Seq | 14 |
NPC * | ||
Attribute Sets | Data type | Reference |
H3K27me3 | ChIP-Seq | 14 |
H3K36me3 | ChIP-Seq | 14 |
H3K4me3 | ChIP-Seq | 14 |
H3K9me3 | ChIP-Seq | 14 |
* The Mouse Regulatory Features for ESHyb, MEF and NPC were built using ES Focus features.
References for datasets
1. Genome-wide identification of DNaseI hypersensitive sites was
performed by Greg Crawford and Terry Furey (Duke University) using a
whole genome DNase-sequencing protocol (Crawford et al., Genome
Research 2006).
DNase-sequencing was performed using the Illumina
(Solexa) sequencing by synthesis method from a DNase treated library
generated from the GM06990 cell line (Crawford and Furey,
unpublished).
2. Kim, T.H.; Abdullaev, Z.K.; Smith, A.D.; Ching, K.A.; Loukinov,
D.I.; Green, R.D.; Zhang, M.Q.; Lobanenkov, V.V. & Ren, B.
Analysis of the vertebrate insulator protein CTCF-binding sites in the
human genome.
Cell, 2007 , 128 , 1231-1245
3. A. Barski, S. Cuddapah, K. Cui, T.Y. Roh, D.E. Schones, Z. Wang, G. Wei, I. Chepelev and K. Zhao, (2007). High-resolution profiling of histone methylations in the human genome, Cell 129 (2007), pp. 823-837.
4. Wang Z, Zang C, Rosenfeld JA, Schones DE, Barski A, Cuddapah S, Cui K, Roh TY, Peng W, Zhang MQ, Zhao K. Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet. 2008 Jul;40(7):897-903. Epub 2008 Jun 15. PMID: 18552846
5. This data was produced as part of the ENCODE project and is used in accordance to their data release policy. These data were generated by the UW ENCODE group. More information here and here
6. This data was produced as part of the ENCODE project and is used in accordance to their data release policy. These data and annotations were created by a collaboration of multiple institutions. More information here
7. This data was produced as part of the ENCODE project and is used in accordance to their data release policy. The ChIP-seq data were generated at the Broad Institute and in the Bradley E. Bernstein lab at the Massachusetts General Hospital/Harvard Medical School. More information here
8. Raha D, Wang Z, Moqtaderi Z, Wu L et al. Close association of RNA polymerase II and many transcription factors with Pol III genes. Proc Natl Acad Sci USA 2010 Feb 23;107(8):3639-44. PMID: 20139302
9. This data was produced as part of the ENCODE project and is used in accordance to their data release policy. These data were generated and analyzed by the labs of Michael Snyder, Mark Gerstein and Sherman Weissman at Yale University; Peggy Farnham at UC Davis; and Kevin Struhl at Harvard. More information here
10. This data was produced as part of the ENCODE project and is used in accordance to their data release policy. These data were provided by the Myers Lab at the HudsonAlpha Institute for Biotechnology. More information here
11. This data was produced as part of the Epigenomics Roadmap and is used in accordance to their data release policy. More information in here: http://nihroadmap.nih.gov/epigenomics/
12. Dnase1-sequencing was produced as a collaboration between Ensembl, David Adams (Wellcome Trust Sanger Institute), and Greg Crawford (Duke University).
13. Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, Wong E, Orlov YL, Zhang W, Jiang J, Loh YH, Yeo HC, Yeo ZX, Narang V, Govindarajan KR, Leong B, Shahab A, Ruan Y, Bourque G, Sung WK, Clarke ND, Wei CL, Ng HH. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell. 2008 Jun 13;133(6):1106-17. PMID: 18555785
14. Mikkelsen TS, Ku M, Jaffe DB, Issac B, Lieberman E, Giannoukos G, Alvarez P, Brockman W, Kim TK, Koche RP, Lee W, Mendenhall E, O'Donovan A, Presser A, Russ C, Xie X, Meissner A, Wernig M, Jaenisch R, Nusbaum C, Lander ES, Bernstein BE. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007 Aug 2;448(7153):548-9. PMID: 17603471