もうggらない{dplyr}

{dplyr}パッケージ

Rパッケージ{dplyr}でデータ整形に使えるパッケージです．毎回ggるのが面倒なのでまとめてみます．github

https://github.com/tidyverse/dplyr も参照(ブラウザ上で公式チートシートからコピペしたい．．．)．

使用例(データフレーム)

R標準で使えるデータセットirisを使います．散布図で様子を確認しましょう．

#散布図
>library('ggplot2')
>library('dplyr')
>plot<- iris %>% ggplot( aes(x = Sepal.Length, y = Sepal.Width, color = Species)) +
  geom_point()
>plot(plot)

f:id:kimigayoseishou:20190331111430p:plain — 散布図

行抽出は`filter()`

行の抽出を行うには，filter([dataframe], [condition])を使う．

#filterの中に条件式を書く
>setosa <- iris %>% 
  dplyr::filter(Species == 'setosa')
#species列がsetosaの行(50行)が抽出される

集計関数は`summarise()`

集計を行うにはsummarise([dataframe], [max | min | mean ...])を使う．

#集計結果はlistで返される
>summary <-iris %>%
  dplyr::summarise(Sepal_length_max = max(Sepal.Length)
                   , Sepal_length_mim = min(Sepal.Length),
                   ,Sepal_length_mean = mean(Sepal.Length)
                   , Sepal_length_sd = sd(Sepal.Length)
                   )
>summary

結果は以下の通り．

	Sepal_length_max	Sepal_length_mim	Sepal_length_mean	Sepal_length_sd
1	7.9	4.3	5.843333	0.8280661

グループ化は`group_by()`

group_by([dataframe],[列名])で指定した列に関するグループ化ができる．

#Species列でグループ化
>summary_btw_spcies <- iris %>%
  dplyr::group_by(Species) %>%
  dplyr::summarise(Sepal_length_max = max(Sepal.Length)
                   , Sepal_length_mim = min(Sepal.Length)
                   , Sepal_length_mean = mean(Sepal.Length))
>summary_btw_spcies

結果は以下の通り．

	Species	Sepal_length_max	Sepal_length_mim	Sepal_length_mean
1	setosa	5.8	4.3	5.01
2	versicolor	7	4.9	5.94
3	virginica	7.9	4.9	6.59

ソートは`arrange()`

arrange([dataframe], [列名])で列名ソートができます．(デフォルトは昇順．)

#昇順ソート
iris_asc <- iris %>%
  dplyr::arrange(Sepal.Length)

#Sepal.Lengthが最も小さい3行
iris_asc %>% head(n = 3)

#降順ソート
iris_desc <- iris %>%
  dplyr::arrange(desc(Sepal.Length))

#Sepal.Lengthが最も大きい3行
iris_desc %>% head(n = 3)

一日一膳(当社比)

RとJavaと時々数学

目次

{dplyr}パッケージ

使用例(データフレーム)

行抽出は`filter()`

集計関数は`summarise()`

グループ化は`group_by()`

ソートは`arrange()`

目次

{dplyr}パッケージ

使用例(データフレーム)

行抽出はfilter()

集計関数はsummarise()

グループ化はgroup_by()

ソートはarrange()

行抽出は`filter()`

集計関数は`summarise()`

グループ化は`group_by()`

ソートは`arrange()`