Count occurrences of given character per cell

Tags:

google-sheets

Question

For example if I wanted to count the number of Ns in a column of strings how can I do this in Google Spreadsheets at a per cell basis (i.e. a formula that points at one cell at a time that I can drag down)?

String/Count table

Background

I'm having to decide a threshold -min-overlap <integer> for a program called TOMTOM** which compares similarity between PWMs*** of small DNA motifs****, N is a regular expression for any linear combination of the letters A, C, G and T. It would be nice if I could get an idea of the distribution of non-N lengths of my DNA motifs to help inform me of a proper -min-overlap <integer> value for TOMTOM.

And here are some real examples:

enter image description here

** TOMTOM is a tool for comparing a DNA motif to a database of known motifs. See here for more info.

*** PWM stands for Position Weight Matrix:

According to Wiki: A position weight matrix (PWM), also known as a position-specific weight matrix (PSWM) or position-specific scoring matrix (PSSM), is a commonly used representation of motifs (patterns) in biological sequences.
According to this paper, it could be defined as:

Position weight matrix (PWM) or PWM‐like models are widely used to represent DNA‐binding preferences of proteins (Stormo, 2000). In these models, a matrix is used to represent the TF‐binding site (TFBS), with each element representing the contribution to the overall binding affinity from a nucleotide at the corresponding position. An inherent assumption of traditional PWM models is position independence; that is, the contribution of different nucleotide positions within a TFBS to the overall binding affinity is assumed to be additive. Although this approximation is broadly valid, nevertheless, it does not hold for several proteins (Man & Stormo, 2001; Bulyk et al, 2002). To improve quantitative modeling, PWM models have been extended to include additional parameters, such as k‐mer features, to account for position dependencies within TFBSs (Zhao et al, 2012; Mathelier & Wasserman, 2013; Mordelet et al, 2013; Weirauch et al, 2013; Riley et al, 2015). Interdependencies between nucleotide positions have a structural origin. For example, stacking interactions between adjacent base pairs form the local three‐dimensional DNA structure. TFs have preferences for sequence‐dependent DNA conformation, which we call DNA shape readout (Rohs et al, 2009, 2010).

OR, more contemporarily:

Based on this rationale, an alternative approach to augment traditional PWM models is the inclusion of DNA structural features. Models of TF–DNA binding specificity incorporating these DNA shape features achieved comparable performance levels to models incorporating higher‐order k‐mer features, while requiring a much smaller number of parameters (Zhou et al, 2015). We previously revealed the importance of DNA shape readout for members of the basic helix‐loop‐helix (bHLH) and homeodomain TF families (Dror et al, 2014; Yang et al, 2014; Zhou et al, 2015). We were also able, for Hox TFs, to identify which regions in the TFBSs used DNA shape readout, demonstrating the power of the approach to reveal mechanistic insights into TF–DNA recognition (Abe et al, 2015). This capability was extensively shown for only two protein families, due to the lack of large‐scale high‐quality TF–DNA binding data. With the recent abundance of high‐throughput measurements of protein–DNA binding, it is now possible to dissect the role of DNA shape readout for many TF families.

**** DNA motif: wiki: In genetics, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and has, or is conjectured to have, a biological significance. For proteins, a sequence motif is distinguished from a structural motif, a motif formed by the three-dimensional arrangement of amino acids, which may not be adjacent.

641

asked Nov 26 '14 16:11

hello_there_andy

2 Answers

An alternative for one cell at a time (formula to be copied down):

=len(A2)-len(SUBSTITUTE(A2,"N",""))

199

answered Sep 23 '22 09:09

pnuts

I don't know if this is gonna help but let's say you have those strings in range A2:A6 and you enter

=ArrayFormula(LEN(REGEXREPLACE(A2:A6, "[^N]", "")))

in B2, that should output the N count for the whole range.

answered Sep 19 '22 09:09

JPV

Related questions
                            
                                Debugging a custom function in Google Apps Script
                            
                                Sheet.getRange(1,1,1,12) what does the numbers in bracket specify?
                            
                                How to take only certain columns from a FILTER result?
                            
                                Find Cell Matching Value And Return Rownumber
                            
                                Sum of row n through last row
                            
                                How to Force New Google Spreadsheets to refresh and recalculate?
                            
                                How to edit legend labels in google spreadsheet plots?
                            
                                Google Sheets API: How to find a row by value and update it's content
                            
                                How do you do dynamic / dependent drop downs in Google Sheets?
                            
                                Google Sheets: How to replace text in column header?
                            
                                How to convert a date to a string in Google Sheet
                            
                                How can I create a hyperlink in the middle of cell text? [duplicate]
                            
                                find & replace commas with newline on Google Spreadsheet
                            
                                New Google Spreadsheets publish limitation
                            
                                Google Spreadsheet sum which always ends on the cell above
                            
                                Reference a cell using formula?
                            
                                ArrayFormula and "AND" Formula in Google Sheets
                            
                                Google Spreadsheet, copy column in sheet to column in another sheet
                            
                                Summarize grouping by year and month
                            
                                How to label scatterplot points by name?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With