﻿ R_It technology blog_ Programming technology Q & A「QuestionBank」
QuestionBank

### How to get ranks with no gaps when there are ties among values?

When there are ties in the original data, is there a way to create a ranking without gaps in the ranks (consecutive, integer rank values)? Suppose: x <- c(10, 10, 10, 5, 5, 20, 20) rank(x) #  4.0 4.0 4.0 1.5 1.5 6.5 6.5 In this case the desired result would be: my_rank(x)  2 2 2 1 1 3 3 I've played with all the options for ties.method option (average, max, min, random), none of which are designed to provide the desired result. Is it possible to acheive this with the rank() func

### Variable width bars in ggplot2 barplot in R

I'm trying to produce a barplot with bar widths determined by an integer variable (sample size). Adding "width = variableName" doesn't seem to work. Is there an established way of doing this? Here's some dummy data and code. I want the bar widths to be a function of variable d in this example. dat <- data.frame(a=c("A", "B", "C"), b=c(0.71, 0.94, 0.85), d=c(32, 99, 18)) ggplot(dat, aes(x=a, y=b, fill=a)) + geom_bar(colour="black", size=.3) + theme_bw()

### How to find Waldo with R?

Inspired by this thread How do I find Waldo with Mathematica? I have never done image processing in R but maybe other people who have want to share... thanks!

### multiple outliers in ggplot -> alpha to outlier.color

I want to set my outlier points in a boxplot to be semitransparent In here! they used "jitter" (similar idea, different approach) my code ggplot() + geom_boxplot(aes(x = Sistema, y=values, linetype = Sistema), data=estacado, outlier.size=1, outlier.shape=2) + coord_flip() + labs(x="Sistema", y=expression(paste("RMSD ",(ring(A))))) + opts(legend.position="none") my data >head(estacado) values ind Sistema 1 0.310214 r24_a R24 2 0.428232 r24_a R24 3 0.460971 r

### making a stacked bar graph with 3 bars for each category

I have the following dataset: Marker Method NAtype Ratio 1 CSF1PO GEL -9 3.623417e-01 2 CSF1PO GEL -1 4.713273e-02 3 CSF1PO GEL NA 0.000000e+00 4 CSF1PO MegaBACE -9 1.417205e-02 5 CSF1PO MegaBACE -1 8.312974e-03 6 CSF1PO MegaBACE NA 5.405026e-06 7 CSF1PO ABI -9 4.714592e-02 8 CSF1PO ABI -1 1.989925e-03 9 CSF1PO ABI NA 4.174494e-05 10 D10S1248 GEL -9 9.999201e-01 11 D10

### Plotting multiple time-series in ggplot

I have a time-series dataset consisting of 10 variables. I would like to create a time-series plot, where each 10 variable is plotted in different colors, over time, on the same graph. The values should be on the Y axis and the dates on the X axis. Click Here for dataset csv This is the (probably wrong) code I have been using: c.o<-read.csv(file="co.csv",head=TRUE) ggplot(c.o, aes(Year, a, b, c, d, e,f))+geom_line() and here's what the output from the code looks like: Can anyone poin

### Is there a way to shut down the computer from inside R

Possible Duplicate: Shutdown Windows after simulation I was wondering whether there is a way to shut down the PC after some process has ended in R? somefunction() Sys.shut.down()

### Subset selection from binary matrix with dynamic column indices

A few questions, for which the R language might have elegant solutions.... Given, a matrix m containing binary values 1 and 0, and a vector v of column indices how would I write a function to extract the all rows in m that have the value of 1 in each of the columns indexed by the integers in v? as an extra feature, how would one return the row indices along with the corresponding rows? Probably best if I illustrating, with an example.... Assuming the logic I'm asking for resides in functi

### Size of labels for x-axis and y-axis ggplot in R

I have a ggplot code and I wanted to change the size of labels for x-axis and y-axis. the code: df.m <- melt(df, names(df)[2:3], names(df)) df.m\$Results <- factor(df.m\$Results) df.m\$HMn25_30.h <- strptime(as.character(df.m\$HMn25_30.h), format = "%Y-%m-%d %H:%M:%S") p <- ggplot(df.m, aes(x = HMn25_30.h, y = value, group = variable, color = variable)) p <- p + scale_shape_manual(values=c(20,22)) p <- p + geom_point(aes(shape = Results), cex=4, color= "blue3") p <- p + geo

### How do I understand this R function?

It has no function body, but it actually returns the Fibonacci sequence correctly.

### How to run linear model in R with certain data range?

I run a linear model on my dataset which has the dimension of 2 columns and 100 rows. How could I run the model for a certain data range e.g from row 30 to row 80? set.seed(123) # allow reproducible random numbers A <- data.frame(x=rnorm(100), y=runif(100))# 2 columns with 100 rows of data fit.lm <- lm(A\$x~A\$y) #fit 100 data summary(fit.lm)# summary 100 data Thanks in advance.

### How to take a subset from a netCDF file using latitude/longitude boundaries in R

I have a netCDF file that I wish to extract a subset from defined by latitude/longitude boundaries (i.e. a lat/long defined box), using the ‘ncdf’ package in R. A summary of my netCDF file is below. It has two dimensions (latitude and longitude) and 1 variable (10U_GDS4_SFC). It is essentially a lat/long grid containing wind values:  "file example.nc has 2 dimensions:"  "lat_0 Size: 1280"  "lon_1 Size: 2560"  "------------------------"  "file example.nc has 1 variables:" [1

### Does the varIdent function, used in LME work fine?

I would be glad if somebody could help me to solve this problem. I have data with repeated measurements design, where we tested a reaction of birds (time.dep) before and after the infection (exper). We have also FL (fuel loads, % of lean body mass), fat score and group (Experimental vs Control) as explanatory variables. I decided to use LME, because distribution of residuals doesn’t deviate from normality. But there is a problem with homogeneity of residuals. Variances of groups “before” and “af

### Quartz device: plot is cutoff by default

Does anyone have any insight as to why the plot region would be cut off by default using the Quartz device? % R --vanilla > plot(1,1) gives me this; The plot settings are normal > par("mar")  5.1 4.1 4.1 2.1 and running quartz.options(reset=TRUE) doesn't seem to change anything. What did I mess up?? Note this is a not a problem using the X11 device. > sessionInfo() R version 3.1.1 (2014-07-10) Platform: x86_64-apple-darwin13.1.0 (64-bit) locale:  en_US.UTF-8/en_US.U

### deleteFile = FALSE in renderImage not working

I have a problem with the deleteFile = FALSE argument of renderImage. In short it deletes the image file anyway. As a short test example i have the ui.R library(shiny) shinyUI(fluidPage( titlePanel("Testing ..."), sidebarLayout( sidebarPanel(), mainPanel( imageOutput("f1") ) ) )) and the server.R library(shiny) shinyServer(function(input, output,session) { output\$f1 <- renderImage({ list(src="f1.png

### R Mapping Multiple Values

I need the help from experts like you with a problem, which is too big for my R skills. I've a vector and a data.frame: vec = c("v1;v2","v3","v4","v5;v6") vecNames = c("v1","v2","v3","v4","v5","v6") vecNames ##  "v1" "v2" "v3" "v4" "v5" "v6" vecDescription = c("descr1","descr2","descr3","descr4","descr5","descr6") vecDescription ##  "descr1" "descr2" "descr3" "descr4" "descr5" "descr6" df = data.frame(vecNames, vecDescription) df vecNames vecDescription 1 v1 descr1 2

### calculate mean for multiple columns in data.frame

Just wondering whether it is possible to calculate means for multiple columns by just using the mean function e.g. mean(iris[,1]) is possible but not mean(iris[,1:4]) tried: mean(iris[,c(1:4)]) got this error message: Warning message: In mean.default(iris[, 1:4]) : argument is not numeric or logical: returning NA I know I can just use lapply(iris[,1:4],mean) or sapply(iris[,1:4],mean)

### Load dataset from "R" package using data(), assign it directly to a variable?

How do you load a dataset from an R package using the data() function, and assign it directly to a variable without creating a duplicate copy in your environment? Put simply, can you do this without creating two identical dfs in your environment: > data("faithful") # Old Faithful Geyser Data from datasets package > x <- faithful > ls() # Now I have 2 identical dfs - x and faithful - in my environment  "faithful" "x" > remove(faithful) # Now I've removed one of the redund

### Have x-axis labels directed inwards in ggplot

I'm trying to get the x-axis labels ("Text 1" through "Text 6") to move inwards. I want "Text 1" to be aligned to the right, so that this label does not start before x = 0. Similarly, I want "Text 6" to be aligned to the left, so that this label ends before x = 6 (right now it's not even fully visible). d=data.frame(x=c(1,2,3,4,4,6), y=c(3,7,1,4,5,6)) lbl <- paste("Text",seq(1,6,1)) ggplot() + geom_point(data=d, mapping=aes(x=x, y=y)) + scale_x_continuous(expand=c(0,0),labels=lbl,breaks=s

### importing csv file in R

I'm having troubles with reading in a csv file. When I open the csv file in notepad it looks like this: `USER` `USER_TYPE` `V1` `V2` `V3` `V4` `V5` `V6` `V7` `V8` `V9` `V10` 508 `Gemandateerde zonder werk` 8 4 1 2 `` `` `` `` 1 1 510 `Gemandateerde zonder werk` 8 4 2 `` `` `` `` `` 1 1 511 `Gemandateerde met werk` 8 3 1 2 `` `` `` `` 1 1 512 `Kind` 8 4 1 2 2 2 2 1 1 1 513 `Kind` 5 4 1 1 2 3 6 2 1 1 514 `Kind` 2 3 1 2 `` `` `` `` 1 2 515 `Gemandateerde zonder werk` 8 4 1 1 2 6 2 1 1 1 516 `Geman

### R: adding a dummy variable column to xts timeseries object

I have a xts time series object made up of minute by minute intraday trading data for 2015. I would like to add a dummy variable denoting 1 as an event day or 0 as a nonevent day. Since the dummy variable is not inherently a time series, is it possible for me to add this to my trading data? How should I construct the dummy column? How can it be added to the existing xts? New to R, so please be as specific as possible in your answer. Thank you!

### zip/unzip functions in R

I am looking for functions like zip/unzip in functional programming languages (e.g. Haskell, Scala). Examples from the Haskell reference. Zip: Input: zip [1,2,3] [9,8,7] Output: [(1,9),(2,8),(3,7)] Unzip: Input: unzip [(1,2),(2,3),(3,4)] Output: ([1,2,3],[2,3,4]) In R, the input would look something like this. For zipping: l1 <- list(1,2,3) l2 <- list(9,8,7) l <- Map(c, l1, l2) For unzipping: tuple1 <- list(1,2) tuple2 <- list(2,3) tuple3 <- list(3,4) l <- Map(c,

### How to read a text label in ggraph radial graph

In ggraph, if the plot is radial, the labels can get crowded, whether using repel=T or not. Is there a way to make label interactive or allowing rotating the graph in order to read the labels? library(ggraph) mtcarsDen <- as.dendrogram(hclust(dist(mtcars[1:4], method='euclidean'), method='ward.D2')) ggraph(graph = mtcarsDen, layout = 'dendrogram', repel = TRUE, circular = TRUE, ratio = 0.5) + geom_edge_elbow() + geom_node_text(aes(x = x*1.05, y=y*1.05, fi

### R How to change column data type of a tibble

Pipes and tidyverse are sometimes very convenient. The user wants to do convert one column from one type to another. Like so: mtcars\$qsec <-as.integer(mtcars\$qsec) This requires typing twice what I need. Please do not suggest "with" command since I find it confusing to use. What would be the tidyverse and magrittr %<>% way of doing the same with least amount of typing? Also, if qsec is 6th column, how can I do it just refering to column position. Something like (not correct code) mtc

### data.table J argument: function with 2 arguments by two groups with one fixed subset

I'm using the J argument of data.table to get the confidence interval of my variable, like so: mt=data.table(mtcars) mt_m=mt[,.(qsec=mean(qsec),CI1=t.test(qsec)\$conf.int,CI2=t.test(qsec)\$conf.int),.(cyl)] mt_m cyl qsec CI1 CI2 1: 6 17.97714 16.39856 19.55573 2: 4 19.13727 18.00699 20.26755 3: 8 16.77214 16.08159 17.46270 Very useful to plot mean and errorbars. But now I would like to test each condition against my control and get the p-value, something like mt[,.(

### Show the table of values under the bar plot

I ask this without find something to try, because I didn't find something same. I apologize for this. From this bar plot: df <- structure(list(year = 2002:2005, work = c(1L, 2L, 3L, 2L), confid = c(8L, 5L, 0L, 6L), jrs = c(0L, 3L, 4L, 5L)), .Names = c("year", "work", "confid", "jrs"), class = "data.frame", row.names = c(NA, -4L )) library(ggplot2) library(reshape) md <- melt(df, id=(c("year"))) temp.plot <- ggplot(data=md, aes(x=year, y=value, fill=variable) ) + geom_bar(stat=

### Change name of element names in loop through lists

Is there a way to change the names of the elements of multiple lists in a loop: a <- list(1, 2) b <- list(3, 4) for (my.list in c("a", "b") { names(my.list) <- c("element1", "element2") } In my own words, I would say the problem is, that the variable my.list needs to be evaluated to the name of the list. Therefore, I tried assign(names(my.list) <- ... as well as names(as.name(my.list)) <- ..., but to no success.

### ph_with_vg and ggplot on R

I am able to create my desired chart using ggplot using the following code: ggplot(data, aes(x=as.Date(data\$Date, "%d/%m/%Y"), y=items)) + geom_col(fill="#00cccc") However, when i use it with my full code, i get an error that reads "StartTag:invalid element name " my_pres<- # Load template read_pptx("C:/Users/USERNAME/Desktop/template.pptx") %>% # 02 - SLIDE add_slide(layout="Title with Subtitle and Content", master="MySlides2016") %>% # 02

### R Markdown Bullet List with Multiple Levels

https://github.com/rstudio/cheatsheets/raw/master/rmarkdown-2.0.pdf The cheat sheet above lists the following syntax to generate a bulleted list in R Markdown. * is the primary solid bullet. + is a secondary hollow bullet. And - is a tertiary solid square. * unordered list + sub-item 1 + sub-item 2 - sub-sub-item 1 After I render the output with knitr I don't get the expected output. I get what's shown below. The second and third lines are not indented. Only the very last line is

### Is there symbolic ODE solver in R ? (ODE = ordinary differential equation)

Question: Is there symbolic ODE solver in R ? (ODE = ordinary differential equation) I am afraid there is NO... but let me confirm from experts ... For example, solve: > (5x-6)^2 y' = 5(5x-6) y - 2 Here: y - unknown function, y' - its derivative (It is easy to solve by hands: y = 1/(5(5x-6)) + C* (5x-6) , but I want to get that answer from R). What I know: 1) There are NUMERICAL (not symbolic) solvers: I know there are numerical ODE solvers like library(deSolve), see answer

### Check text for multiple substrings

I am trying to do some text mining, and part of it is to get texts that contain certain names. So I need to check a text for multiple substrings. But using grepl with a column as pattern returns the following error: In grepl(comp\$n, text\$text) : argument 'pattern' has length > 1 and only the first element will be used Which makes sense since "MIKE" is the first element. Of the pattern. Am I using grepl wrong, can pattern only be a single element and do I need to approach this differentl

### Rolling sum in dplyr

For each id, I want to create a column which has the sum of previous 5 x values. df %>% group_by(id) %>% mutate(roll.sum = c(x[1:4], zoo::rollapply(x, 5, sum))) # Groups: id  x id roll.sum <int> <int> <int> 3 1 3 8 1 8 5 1 5 9 1 9 10 1 10 1 1 36 6 1 39 9 1 40 6 1 41 5 1 37 10 2 10 5 2 5 7 2 7 6 2 6

### joining all points inside a grouped point plot using ggplot

I have data frame like this AA= States Clusters ncomp HR Cluster-1 9 HR Cluster-2 4 HR Cluster-3 9 WB Cluster-1 13 WB Cluster-2 13 WB Cluster-3 13 WB Cluster-4 13 TL Cluster-1 9 TL Cluster-2 11 TL Cluster-3 9 TL Cluster-4 10 TL Cluster-5 11 TL Cluster-6 7 OR Cluster-1 15 OR Cluster-2 15 OR Cluster-3 15 OR Cluster-4 14 OR Cluster-5 15 OR Cluster-6 15 Need to plot which looks as. I don't want to use facet_wrap.

### Find longest word with even number of characters

So I have a string and I need to find the word which matches two constraints viz, the number of characters in the word should be even and it should be the longest such word. For ex: Input: I am a bad coder with good logical skills Output: skills Just starting off with R so any help would be great.

### R Is it possible to knit just one code chunk and output the LaTeX code to the console?

I am writing my thesis in LaTeX and doing the data analysis in R. I already have my tex files setup with the formatting I want and a R markdown file for my code. I only use R markdown, because of the improved sectioning and not to generate any sort of report from it. My normal workflow for making tables was to generate a regression table in R using texreg or stargazer and copy the LaTeX code to my tex file. But now I need to make a custom regression table and I have found the kableExtra package

### rpart create a table that indicates if an observation belongs to a node or not

The following figure shows what I want to do: Grow a tree with rpart for some dataset Create a table with one row per observation in the original data set and one column per node in the tree, plus an id. The nodes columns should take the value 1 if the observation belongs to that node and zero otherwise. This is some code that I wrote: library(rpart) library(rattle) data <- kyphosis fit <- rpart(Age ~ Number + Start, data = kyphosis) fancyRpartPlot(fit) nodeNumbers <-

### dplyr / tidyr summaries two columns into a single named list column

Imagine this data frame: df <- tibble( key = c(rep(1, 3), rep(2, 3), rep(3, 3)), date = rep(Sys.Date(), 9), hour = rep(c('00', '01', '02'), 3), value = rep(c(8, 9, 10), 3) ) I want output such that the group summary column is a named list of hour and value. Same as if I were to do this, for each group: as.list(setNames(df\$value[df\$key == 1], df\$hour[df\$key == 1])) \$`00`  8 \$`01`  9 \$`02`  10 Something along these lines, but something that actually works: df %>%

### equivalent of melt+reshape that splits on column names

Point: if you are going to vote to close, it is poor form not to give a reason why. If it can be improved without requiring a close, take the 10 seconds it takes to write a brief comment. Question: How do I do the following "partial melt" in a way that memory can support? Details: I have a few million rows and around 1000 columns. The names of the columns have 2 pieces of information in them. Normally I would melt to a data frame (or table) comprised of a pair of columns, then I would split

### Error with labels in split violin plot in ggplot2 (R) due to missing values

I am generating split violin plots using the geom_split_violin function created here: Split violin plot with ggplot2. Then, I add labels for sample sizes (n = ...) for each split violin. However, there are some missing values, which results in mislabelling from the missing data onward. Using the code below, this is the result: In the bottom grid (B), under p2, there are no values for fill value = 1. This results in mislabelling of the split violins thereafter. Specifically, the labels n = 3

### Format an Excel cell from R if a cell is not blank

I am trying to format a cell in a particular column in Excel from R (producing a workbook using a dataframe). I am currently using openxlsx. This is the line of code that I am currently trying to get to work: conditionalFormatting(WorkBook, "Sheet1", cols=17, rows=1:11000, rule='<TODAY(),"<>"&""', style = negStyle) I have also tried this: conditionalFormatting(WorkBook, "Sheet1", cols=17, rows=1:11000, rule='AND(<TODAY(),"<>"&"")', style = negStyle) and condition

### What is this error when I using Pipe in R

This is the code I write to learn how to use the pipe. I'm not familiar with how to use it yet. An error occurred and I don't know how to fix it.  481376  481376 Unknown or uninitialised column: 'Neigborhood'.integer(0) The first two lines are able to generate a result. But not the third one. It means the neighborhood column is available. However, there are some mistakes in it. My purpose was to sort the table using function d2[order(d2)]. How can I fix it?

### Change colors of part of y axis ticks in heatmap

I have a heatmap: heatmap <- ggplot(df, aes(ID, Name)) + geom_raster(aes(fill = N)) I want to change colors of y ticks in heatmap, if they are equal to 300, 301, 302. How could in do that? Its unclear to me how to do that for part of ticks. For all, i add theme(axis.text.x = element_text(colour="black"), axis.text.y = element_text(colour="red")) but how just for 300, 301, 302?

### Dynamic if-else "tests" or case_when "formulas" from a key-value table?

I am trying to write a function in R that uses a "key-value" data.frame of quantile breakpoints to return a weight based on the quantile that an input value falls into. Here is an example of one of these data.frames: key1 <- data.frame(Boundary = c(0.01, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.22), Weight = c(0,1, 3.5, 8, 15, 25, 37.5, 51.5, 65, 76.5, 85.5, 91.5, 95

### .rds file internal format

I have lost a .rds file due to the device (let's call it volume 1) getting filled up. Usually when that happened R would throw an error and stop. In that case I had a safe copy on a different volume (volume 2). This time however, R would write the file on volume 1 without error and copy it over to volume 2. Now the file cannot be opened with readRDS anymore with the error "error reading from connection". The file contains a data.table, is stored uncompressed and infoRDS can read the me

### How to plot lines and dots in the same plot while using different sized data

This toy data frame represents my data. Time Gene Value 1 0 A 1 2 1 A 2 3 2 A 3 4 0 B 1 5 1.2 B 2 6 1.7 B 2 7 2.1 B 2 8 3 B 2 Using the following code I can turn this into a line plot with two lines, one for A and one for B. ggplot(data=Data, aes(x=Time, y=Value, group=Gene)) + geom_line(aes(color=Gene), linetype=&qu

### R What exactly does split do in the sink() function?

What exactly does split do in the sink() function? No website or video seems to explain it explicitly.

### SQLite and Shiny/flexdashboard: Cannot embed a reactiveValues() object in a SQL query

I am trying to filter a sqlite3 database and print/plot the resulting data. E.g. the nycflights13::weather data. But in a flexdashboard environment it complains of not being able to embed a reactive object in an SQL query. Does that mean that I can't use SQLite with flexdashboard? --- title: "Untitled" output: flexdashboard::flex_dashboard: orientation: columns vertical_layout: fill runtime: shiny --- ```{r setup, include=FALSE} library(flexdashboard) library(DBI) library(d

### Convert sf to unmarked ppp

I wish to convert an sf object to an unmarked ppp. (Conversion from sf to ppp is now supported, according to this post.) library(sf) #Initialise sf object pp <- structure(list(X = c(959207.877070254, 959660.734838225, 951483.685462513, 951527.767554883, 958310.673042469, 950492.05212104, 959660.734838225, 959207.877070254, 960500.020456073, 959660.734838225), Y = c(1944457.42827898, 1955543.76027363, 1939982.16629396, 1940216.55143212, 1954704.68186897, 1951434.68524296, 1955543.76027363

### How to count the number of matched strings in R, when the string pattern to match is a column from another dataframe?

I have got two extremely large dataframes, the first data frame consists of a column body, which is a list of comments and the second one consists of names. I want to count how many elements in body contain each element of names. Here's a small reproducible dataset (the original dataset has about 2000 names, where each name is a name of the car): df1 <- tibble(body = c("The Tesla Roadster has a range of 620 miles", "ferrari needs to make an electric car&q

### Remove characters before first appearance of specified character, column dataframe, R

for every row I would like all numbers before the first 5 to be deleted e.g. second row: 1 5 5 5 --> 5 5 5 but first row should stay the same as it starts with a 5. I have tried with gsub but it only gives me empty strings. gsub(".*5", "",xy.list) Any help is appreciated! structure(list(data_rel1 = c("5 5 5 5", "1 5 5 5", "1 5 5 5", "1 5 5 5", "1 5 5 5", "5 5 5 5", "1 5 5 5", "1 5 5 5", &quo

1    2   3   4   5   6  ... 下一页 最后一页 共 1426 页