<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>R Archives - Analytica Data Science Solutions</title>
	<atom:link href="https://analyticadss.com/tag/r/feed/" rel="self" type="application/rss+xml" />
	<link>https://analyticadss.com/tag/r/</link>
	<description>World&#039;s Leading Artificial Inelegance Company</description>
	<lastBuildDate>Sat, 26 Aug 2023 09:33:24 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=7.0</generator>

<image>
	<url>https://analyticadss.com/wp-content/uploads/2020/06/cropped-F.B-Cover-photo_V0.1-02-32x32.png</url>
	<title>R Archives - Analytica Data Science Solutions</title>
	<link>https://analyticadss.com/tag/r/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Unleash the Power of Functional Programming in R with the purrr Package</title>
		<link>https://analyticadss.com/unleash-the-power-of-functional-programming-in-r-with-the-purrr-package/</link>
		
		<dc:creator><![CDATA[Aous Abdo]]></dc:creator>
		<pubDate>Fri, 14 Apr 2023 18:10:01 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[R Statistical Language]]></category>
		<category><![CDATA[functional programming]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Rstats]]></category>
		<guid isPermaLink="false">https://analyticadss.com/?p=6138</guid>

					<description><![CDATA[<p>Introduction Welcome to our comprehensive guide on harnessing the power of the purrr package in R for functional programming. If you’re keen on elevating your R skills, you’re in for a treat. Today, we’ll be delving into the wonders of the purrr package — a lifesaver for functional programming. With the avalanche of data we encounter nowadays, having the [&#8230;]</p>
<p>The post <a href="https://analyticadss.com/unleash-the-power-of-functional-programming-in-r-with-the-purrr-package/">Unleash the Power of Functional Programming in R with the purrr Package</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading has-medium-font-size" id="9372">Introduction</h2>



<p class="wp-block-paragraph" id="dd36">Welcome to our comprehensive guide on harnessing the power of the <code>purrr</code> package in R for functional programming. If you’re keen on elevating your R skills, you’re in for a treat. Today, we’ll be delving into the wonders of the <code>purrr</code> package — a lifesaver for functional programming. With the avalanche of data we encounter nowadays, having the right tools for efficient data wrangling is paramount. If you’ve dabbled in R, you might’ve felt certain built-in functions lacking, especially when grappling with intricate operations.</p>



<p class="wp-block-paragraph" id="a3fa">This is where <code>purrr</code> strides in, offering a plethora of robust tools to fine-tune your code, making it not only clearer but also more sustainable.</p>



<p class="wp-block-paragraph" id="e548">Throughout this article, we’ll journey through the intricacies of the <code>purrr</code> package, elucidate its fundamental functions, and showcase its real-world applicability. We’ll also touch on how it can enrich your experience with R, making it more fruitful. By the time you reach the end, you’ll be well-versed in the magic of <code>purrr</code> and ready to wield its power in your data endeavors. Let’s embark on this insightful voyage into the realm of R and unravel the capabilities of the <code>purrr</code> package.</p>



<hr class="wp-block-separator has-alpha-channel-opacity is-style-dots"/>



<h2 class="wp-block-heading has-medium-font-size" id="e803"><strong>What is functional programming and why is it useful?</strong></h2>



<p class="wp-block-paragraph" id="fd43">Functional programming isn’t merely a way to write code; it’s a philosophical shift that guides how we approach computation. By treating computation as the evaluation of mathematical functions, it foregoes changes to the state and avoids mutable data. Instead, it thrives on pure functions that take given inputs and produce predictable outputs, devoid of side effects. The outcome? Code that’s more modular, predictable, and test-friendly.</p>



<p class="wp-block-paragraph" id="0e06">Now, if you’re working with R, particularly for data manipulation and analysis, functional programming can be a game-changer. It lets you create more coherent and succinct code, and here’s how:</p>



<ol class="wp-block-list">
<li>Enhanced Readability and Sustainability: Decomposing complex procedures into smaller, more digestible functions improves the understandability of your code. Plus, it’s easier to tweak as needed.</li>



<li>Boosted Productivity: By steering clear of traps like global variables, which may lead to unforeseen behaviors and debugging headaches, functional programming saves time and frustration.</li>



<li>Optimized Performance: Embracing functional programming could also enhance the efficiency of your code. It prompts the use of vectorized operations and cuts down on the necessity for explicit loops.</li>
</ol>



<p class="wp-block-paragraph" id="b77b">Eager to tap into these benefits? Read on! We’ll dive into the <code>purrr</code> package, an invaluable asset for adopting functional programming in R. Through its power, you can not only elevate your data manipulation and analysis routines but also bring more enjoyment and effectiveness to your programming journey.</p>



<hr class="wp-block-separator has-alpha-channel-opacity is-style-dots"/>



<h2 class="wp-block-heading has-medium-font-size" id="3fdc">Exploring the <code>purrr </code>package</h2>



<p class="wp-block-paragraph" id="283b">Belonging to the tidyverse collection, the <code>purrr</code>package serves as R’s gateway to functional programming. It’s packed with dynamic functions crafted to ease tasks when working with lists and a variety of data structures. Adopting <code>purrr</code>ensures that your data transformation, summarization, and manipulation processes benefit from a unified and logical syntax.</p>



<p class="wp-block-paragraph" id="382c">Let’s delve into what sets <code>purrr</code>apart:</p>



<ol class="wp-block-list">
<li>Uniformity in Function Naming: One of <code>purrrs’</code> strengths is its organized naming structure, simplifying the task of recalling and employing its functions.</li>



<li>Proficiency with Complex Data Structures: Be it nested lists, data frames, or any layered data structure, <code>purrr</code>stands out in its management capabilities.</li>



<li>Robust Error Management: Real-world data can be messy. <code>purrr</code>lends a hand by equipping you with functions that elegantly tackle errors and unexpected scenarios.</li>



<li>Harmony with <code>tidyverse</code> Companions: A significant advantage is <code>purrr</code>compatibility with renowned <code>tidyverse</code> allies such as <code>dplyr</code>, <code>tidyr</code>, and <code>ggplot2</code>. This cohesion allows for a smoother integration of functional programming into your prevailing data routines.</li>
</ol>



<p class="wp-block-paragraph" id="298a">Keen to commence your <code>purrr</code>journey? Simply fetch it from CRAN and initialize it in your R workspace.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.708335876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="install.packages(&quot;purrr&quot;)
library(purrr)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #66D9EF">install.packages</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;purrr&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(purrr)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="51fd">In the next section, we will dive into the core functions provided by the <code>purrr</code> package and demonstrate their usage with practical examples.</p>



<hr class="wp-block-separator has-alpha-channel-opacity is-style-dots"/>



<h2 class="wp-block-heading has-medium-font-size" id="b5b5"><strong>Core functions in purrr</strong></h2>



<p class="wp-block-paragraph" id="3c30">In this section, we will cover some of the most important and widely used functions in the <code>purrr</code> package, along with examples to demonstrate their usage.</p>



<p class="wp-block-paragraph" id="ac1b"> <strong>A. The map() family</strong></p>



<p class="wp-block-paragraph" id="0f09">The <code>map()</code> family of functions is the heart of the <code>purrr</code> package. These functions allow you to apply a function to each element of a list or a vector and return the results in a specified format.</p>



<ul class="wp-block-list">
<li><code>map()</code>: Returns a list.</li>



<li><code>map_lgl()</code>: Returns a logical vector.</li>



<li><code>map_int()</code>: Returns an integer vector.</li>



<li><code>map_dbl()</code>: Returns a double vector.</li>



<li><code>map_chr()</code>: Returns a character vector.</li>



<li><code>map_df()</code>: Returns a data frame.</li>
</ul>



<p class="wp-block-paragraph" id="e748">Example:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.7083282470703125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Define a list of numbers
number_list <- list(1, 2, 3, 4)

# Square each number using map()
squared_numbers <- map(number_list, ~ .x^2)
print(squared_numbers)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Define a list of numbers</span></span>
<span class="line"><span style="color: #F8F8F2">number_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">4</span><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Square each number using map()</span></span>
<span class="line"><span style="color: #F8F8F2">squared_numbers </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> map(number_list, </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> .x</span><span style="color: #F92672">^</span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(squared_numbers)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="89fb"><strong>B. pmap()</strong></p>



<p class="wp-block-paragraph" id="1c29">The <code>pmap()</code> function is used to apply a function to elements of multiple lists simultaneously.</p>



<p class="wp-block-paragraph" id="f317">Example:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.7083282470703125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Define two lists
list1 <- list(1, 2, 3)
list2 <- list(4, 5, 6)

# Add corresponding elements of the two lists using pmap()
sum_list <- pmap(list(list1, list2), ~ ..1 + ..2)
print(sum_list)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Define two lists</span></span>
<span class="line"><span style="color: #F8F8F2">list1 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">list2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">4</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">5</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">6</span><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Add corresponding elements of the two lists using pmap()</span></span>
<span class="line"><span style="color: #F8F8F2">sum_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> pmap(</span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(list1, list2), </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> ..1 </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> .</span><span style="color: #AE81FF">.2</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(sum_list)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="0ca9"><strong>C. safely(), quietly(), and possibly()</strong></p>



<p class="wp-block-paragraph" id="ee39">These functions are used to handle errors and exceptions gracefully while applying a function.</p>



<ul class="wp-block-list">
<li><code>safely()</code>: Returns a list containing the result and any error encountered.</li>



<li><code>quietly()</code>: Returns a list containing the result, any warnings, and any messages.</li>



<li><code>possibly()</code>: Returns a default value if an error is encountered.</li>
</ul>



<p class="wp-block-paragraph" id="a799">Example:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.708335876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Define a list with numbers and a character
mixed_list <- list(1, 2, &quot;a&quot;, 3)

# Define a safely wrapped square function
safe_square <- safely(~ .x^2)

# Apply the safe_square function to the mixed_list
results <- map(mixed_list, safe_square)
print(results)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Define a list with numbers and a character</span></span>
<span class="line"><span style="color: #F8F8F2">mixed_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;a&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Define a safely wrapped square function</span></span>
<span class="line"><span style="color: #F8F8F2">safe_square </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> safely(</span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> .x</span><span style="color: #F92672">^</span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Apply the safe_square function to the mixed_list</span></span>
<span class="line"><span style="color: #F8F8F2">results </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> map(mixed_list, safe_square)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(results)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="cbfb"><strong>D. compact() and compose()</strong></p>



<p class="wp-block-paragraph" id="37ee"><code>compact()</code> is used to remove <code>NULL</code> elements from a list, while <code>compose()</code> allows you to combine multiple functions into a single function.</p>



<p class="wp-block-paragraph" id="a367">Example:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.402778625488281px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Define a list with NULL elements
null_list <- list(1, NULL, 2, NULL, 3)

# Remove NULL elements using compact()
clean_list <- compact(null_list)
print(clean_list)

# Compose two functions: square and increment
square <- function(x) x^2
increment <- function(x) x + 1
square_and_increment <- compose(increment, square)

# Apply the composed function to a number
result <- square_and_increment(3)
print(result)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Define a list with NULL elements</span></span>
<span class="line"><span style="color: #F8F8F2">null_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">NULL</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">NULL</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Remove NULL elements using compact()</span></span>
<span class="line"><span style="color: #F8F8F2">clean_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> compact(null_list)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(clean_list)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Compose two functions: square and increment</span></span>
<span class="line"><span style="color: #A6E22E">square</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(x) x</span><span style="color: #F92672">^</span><span style="color: #AE81FF">2</span></span>
<span class="line"><span style="color: #A6E22E">increment</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(x) x </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">1</span></span>
<span class="line"><span style="color: #F8F8F2">square_and_increment </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> compose(increment, square)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Apply the composed function to a number</span></span>
<span class="line"><span style="color: #F8F8F2">result </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> square_and_increment(</span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(result)</span></span></code></pre></div>



<p class="wp-block-paragraph">These core functions are just the beginning of what <code>purrr</code> has to offer. In the next section, we will demonstrate how to use these functions to solve real-world problems through practical examples.</p>



<hr class="wp-block-separator has-alpha-channel-opacity is-style-dots"/>



<h2 class="wp-block-heading has-medium-font-size" id="4243">Practical examples with purrr</h2>



<p class="wp-block-paragraph" id="1de5">In this section, we will explore two practical examples that demonstrate how the <code>purrr</code> package can be used to solve real-world problems efficiently.</p>



<p class="wp-block-paragraph" id="3fb2"><strong>A. Example 1: Calculating summary statistics for multiple variables</strong></p>



<p class="wp-block-paragraph" id="3260">Suppose you have a data frame with multiple numerical variables, and you want to calculate summary statistics (mean, median, and standard deviation) for each of these variables.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.40277099609375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load required packages
library(dplyr)
library(purrr)

# Create a sample data frame
data <- data.frame(
  var1 = rnorm(100, mean = 10, sd = 2),
  var2 = rnorm(100, mean = 20, sd = 5),
  var3 = rnorm(100, mean = 30, sd = 3),
  stringsAsFactors = FALSE
)

# Define a list of summary functions
summary_functions <- list(mean = mean, median = median, sd = sd)

# Calculate summary statistics for each variable using nested map functions
summary_stats <- map_dfr(summary_functions, ~ map_dfc(data, .x), .id = &quot;Statistic&quot;)
print(summary_stats)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Load required packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(dplyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(purrr)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a sample data frame</span></span>
<span class="line"><span style="color: #F8F8F2">data </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">data.frame</span><span style="color: #F8F8F2">(</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #FD971F; font-style: italic">var1</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">rnorm</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">100</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">mean</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">10</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">sd</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">),</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #FD971F; font-style: italic">var2</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">rnorm</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">100</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">mean</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">20</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">sd</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">5</span><span style="color: #F8F8F2">),</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #FD971F; font-style: italic">var3</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">rnorm</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">100</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">mean</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">30</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">sd</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">),</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #FD971F; font-style: italic">stringsAsFactors</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">FALSE</span></span>
<span class="line"><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Define a list of summary functions</span></span>
<span class="line"><span style="color: #F8F8F2">summary_functions </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF; font-style: italic">list</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F; font-style: italic">mean</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> mean, </span><span style="color: #FD971F; font-style: italic">median</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> median, </span><span style="color: #FD971F; font-style: italic">sd</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> sd)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Calculate summary statistics for each variable using nested map functions</span></span>
<span class="line"><span style="color: #F8F8F2">summary_stats </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> map_dfr(summary_functions, </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> map_dfc(data, .x), </span><span style="color: #FD971F; font-style: italic">.id</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Statistic&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(summary_stats)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="ae64"><strong>B. Example 2: Fitting multiple linear models for different subsets of data</strong></p>



<p class="wp-block-paragraph" id="c80a">In this example, we will fit linear models for different subsets of the <code>mtcars</code> dataset based on the number of cylinders. We will use <code>purrr</code> functions to apply the linear model function to each subset and extract the model coefficients.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.402786254882812px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load required packages
library(dplyr)
library(purrr)
library(broom)

# Split the mtcars dataset by the number of cylinders
mtcars_split <- mtcars %>% group_split(cyl)

# Define a function to fit a linear model and extract coefficients
fit_lm <- function(data) {
  model <- lm(mpg ~ wt, data = data)
  coef <- data.frame(tidy(model)) %>%
    select(term, estimate) %>%
    mutate(cyl = unique(data$cyl))
  return(coef)
}

# Apply the fit_lm function to each subset using map_dfr()
model_coefs <- map_dfr(mtcars_split, fit_lm)
print(model_coefs)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Load required packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(dplyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(purrr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(broom)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Split the mtcars dataset by the number of cylinders</span></span>
<span class="line"><span style="color: #F8F8F2">mtcars_split </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> mtcars </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> group_split(cyl)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Define a function to fit a linear model and extract coefficients</span></span>
<span class="line"><span style="color: #A6E22E">fit_lm</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(data) {</span></span>
<span class="line"><span style="color: #F8F8F2">  model </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">lm</span><span style="color: #F8F8F2">(mpg </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> wt, </span><span style="color: #FD971F; font-style: italic">data</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> data)</span></span>
<span class="line"><span style="color: #F8F8F2">  coef </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">data.frame</span><span style="color: #F8F8F2">(tidy(model)) </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">    select(term, estimate) </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">    mutate(</span><span style="color: #FD971F; font-style: italic">cyl</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">unique</span><span style="color: #F8F8F2">(data</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">cyl))</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">return</span><span style="color: #F8F8F2">(coef)</span></span>
<span class="line"><span style="color: #F8F8F2">}</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Apply the fit_lm function to each subset using map_dfr()</span></span>
<span class="line"><span style="color: #F8F8F2">model_coefs </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> map_dfr(mtcars_split, fit_lm)</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(model_coefs)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="8a7a"><strong>C. Reading Multiple CSV files with purrr</strong></p>



<p class="wp-block-paragraph" id="1f79">Suppose you have multiple CSV files in a directory and you want to read them all into a single data frame using <code>purrr</code>. Here’s an example of how you can achieve this:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.402801513671875px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Define the directory containing the CSV files
csv_directory <- &quot;path/to/your/csv/files&quot;

# List all CSV files in the directory
csv_files <- list.files(csv_directory, pattern = &quot;*.csv&quot;, full.names = TRUE)

# Define a function to read a CSV file and add a column with the filename
read_csv_with_filename <- function(file) {
  data <- read_csv(file)
  data <- data %>% mutate(filename = basename(file))
  return(data)
}

# Read all CSV files using map_dfr() and bind the results into a single data frame
combined_data <- map_dfr(csv_files, read_csv_with_filename)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Define the directory containing the CSV files</span></span>
<span class="line"><span style="color: #F8F8F2">csv_directory </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;path/to/your/csv/files&quot;</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># List all CSV files in the directory</span></span>
<span class="line"><span style="color: #F8F8F2">csv_files </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">list.files</span><span style="color: #F8F8F2">(csv_directory, </span><span style="color: #FD971F; font-style: italic">pattern</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;*.csv&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">full.names</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Define a function to read a CSV file and add a column with the filename</span></span>
<span class="line"><span style="color: #A6E22E">read_csv_with_filename</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(file) {</span></span>
<span class="line"><span style="color: #F8F8F2">  data </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> read_csv(file)</span></span>
<span class="line"><span style="color: #F8F8F2">  data </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> data </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> mutate(</span><span style="color: #FD971F; font-style: italic">filename</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">basename</span><span style="color: #F8F8F2">(file))</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">return</span><span style="color: #F8F8F2">(data)</span></span>
<span class="line"><span style="color: #F8F8F2">}</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Read all CSV files using map_dfr() and bind the results into a single data frame</span></span>
<span class="line"><span style="color: #F8F8F2">combined_data </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> map_dfr(csv_files, read_csv_with_filename)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="9b39">In this example, we first list all the CSV files in the specified directory. Then, we define a custom function <code>read_csv_with_filename()</code> to read each CSV file and add a column with the filename. Finally, we use <code>purrr</code>‘s <code>map_dfr()</code> function to apply the custom function to each file in the list and bind the results into a single data frame.</p>



<p class="wp-block-paragraph" id="bd14"><strong>D. purrr and ggplot2</strong></p>



<p class="wp-block-paragraph" id="78fa">In this example, we’ll demonstrate how to use <code>purrr</code> to create multiple ggplots for different subsets of data within a single data frame. We’ll use the <code>mtcars</code> dataset and create separate ggplots for each unique number of cylinders.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" data-code-block-pro-font-family="Code-Pro-JetBrains-Mono" style="font-size:.875rem;font-family:Code-Pro-JetBrains-Mono,ui-monospace,SFMono-Regular,Menlo,Monaco,Consolas,monospace;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.40277099609375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load required packages
library(purrr)
library(ggplot2)
library(dplyr)
library(cowplot)

# Create a list of data frames, one for each unique number of cylinders in the mtcars dataset
data_list <- mtcars %>%
  split(.$cyl)

# Define a function to create a ggplot for a given data frame
create_ggplot <- function(data) {
  ggplot(data, aes(x = mpg, y = hp)) +
    geom_point(aes(color = factor(gear)), size = 3) +
    labs(title = paste(&quot;Number of Cylinders:&quot;, unique(data$cyl)),
         x = &quot;Miles per Gallon&quot;,
         y = &quot;Horsepower&quot;) +
    theme_minimal() +
    theme(legend.title = element_blank()) +
    scale_color_discrete(name = &quot;Gears&quot;)
}

# Create a list of ggplots using map()
ggplot_list <- data_list %>% 
  map(create_ggplot)

# Combine the ggplots into a single plot using cowplot's plot_grid()
combined_plot <- plot_grid(plotlist = ggplot_list, ncol = 1, align = &quot;v&quot;, rel_heights = c(1, 1, 1))

# Display the combined plot
print(combined_plot)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822" tabindex="0"><code><span class="line"><span style="color: #88846F"># Load required packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(purrr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(ggplot2)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(dplyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(cowplot)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a list of data frames, one for each unique number of cylinders in the mtcars dataset</span></span>
<span class="line"><span style="color: #F8F8F2">data_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> mtcars </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #66D9EF">split</span><span style="color: #F8F8F2">(.</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">cyl)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Define a function to create a ggplot for a given data frame</span></span>
<span class="line"><span style="color: #A6E22E">create_ggplot</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(data) {</span></span>
<span class="line"><span style="color: #F8F8F2">  ggplot(data, aes(</span><span style="color: #FD971F; font-style: italic">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> mpg, </span><span style="color: #FD971F; font-style: italic">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> hp)) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">    geom_point(aes(</span><span style="color: #FD971F; font-style: italic">color</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">factor</span><span style="color: #F8F8F2">(gear)), </span><span style="color: #FD971F; font-style: italic">size</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">    labs(</span><span style="color: #FD971F; font-style: italic">title</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">paste</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;Number of Cylinders:&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #66D9EF">unique</span><span style="color: #F8F8F2">(data</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">cyl)),</span></span>
<span class="line"><span style="color: #F8F8F2">         </span><span style="color: #FD971F; font-style: italic">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Miles per Gallon&quot;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">         </span><span style="color: #FD971F; font-style: italic">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Horsepower&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">    theme_minimal() </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">    theme(</span><span style="color: #FD971F; font-style: italic">legend.title</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> element_blank()) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">    scale_color_discrete(</span><span style="color: #FD971F; font-style: italic">name</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Gears&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">}</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a list of ggplots using map()</span></span>
<span class="line"><span style="color: #F8F8F2">ggplot_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> data_list </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> </span></span>
<span class="line"><span style="color: #F8F8F2">  map(create_ggplot)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Combine the ggplots into a single plot using cowplot&#39;s plot_grid()</span></span>
<span class="line"><span style="color: #F8F8F2">combined_plot </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> plot_grid(</span><span style="color: #FD971F; font-style: italic">plotlist</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> ggplot_list, </span><span style="color: #FD971F; font-style: italic">ncol</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">align</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;v&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">rel_heights</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">))</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Display the combined plot</span></span>
<span class="line"><span style="color: #66D9EF">print</span><span style="color: #F8F8F2">(combined_plot)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="674e">In this example, we first create a list of data frames, one for each unique number of cylinders in the <code>mtcars</code> dataset. Then, we define a custom function <code>create_ggplot()</code> to create a ggplot for a given data frame. The function creates a scatterplot of miles per gallon (mpg) versus horsepower (hp), with a title that reflects the number of cylinders.</p>



<p class="wp-block-paragraph" id="1d4e">Finally, we use <code>purrr</code>‘s <code>map()</code> function to apply the custom function to each data frame in the list, resulting in a list of ggplots. We use a for loop to display each ggplot.</p>



<p class="wp-block-paragraph" id="4483">The plot we get can be seen below:</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="720" height="663" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2023/04/1_0zvCrChg56Sq3kBk3neGSg.webp" alt="" class="wp-image-6139" srcset="https://analyticadss.com/wp-content/uploads/2023/04/1_0zvCrChg56Sq3kBk3neGSg.webp 720w, https://analyticadss.com/wp-content/uploads/2023/04/1_0zvCrChg56Sq3kBk3neGSg-500x460.webp 500w, https://analyticadss.com/wp-content/uploads/2023/04/1_0zvCrChg56Sq3kBk3neGSg-150x138.webp 150w" sizes="auto, (max-width: 720px) 100vw, 720px" /></figure>
</div>


<p class="wp-block-paragraph" id="aa70">In this example, we’ve made some changes to the <code>create_ggplot()</code> function to improve the aesthetics of the plots:</p>



<ol class="wp-block-list">
<li>We use <code>geom_point(aes(color = factor(gear)), size = 3)</code> to color the points by the number of gears and increase their size.</li>



<li>We apply <code>theme_minimal()</code> to use a minimalistic theme for the plots.</li>



<li>We remove the legend title using <code>theme(legend.title = element_blank())</code>.</li>



<li>We rename the color scale to “Gears” using <code>scale_color_discrete(name = "Gears")</code>.</li>
</ol>



<p class="wp-block-paragraph" id="f475">Finally, we use the <code>plot_grid()</code> function from the <code>cowplot</code> package to combine the ggplots in the <code>ggplot_list</code> into a single plot with one column and display the combined plot.</p>



<p class="wp-block-paragraph" id="2900">These examples showcase how the <code>purrr</code> package can help you write more efficient and readable code, making your data analysis workflows more robust and maintainable. By incorporating <code>purrr</code> into your R projects, you can take full advantage of functional programming techniques and harness their power to solve complex problems.</p>



<hr class="wp-block-separator has-alpha-channel-opacity is-style-dots"/>



<h2 class="wp-block-heading has-medium-font-size" id="5467">Tips and Best Practices for Using purrr</h2>



<p class="wp-block-paragraph" id="e68e">In this final section, we will share some tips and best practices for using the <code>purrr</code> package in your R projects. These recommendations will help you write more efficient, readable, and maintainable code.</p>



<p class="wp-block-paragraph" id="6bbd"><strong>1. Use anonymous functions when appropriate</strong></p>



<p class="wp-block-paragraph" id="5346">When using <code>map()</code> functions, you can create anonymous functions using the <code>~</code> notation, which allows for concise and readable code. However, if the function becomes too complex or is used multiple times, consider defining it as a separate named function for better code organization and readability.</p>



<p class="wp-block-paragraph" id="aafb"><strong>2. Leverage the power of function composition</strong></p>



<p class="wp-block-paragraph" id="d747">The <code>compose()</code> function allows you to create new functions by combining existing ones. This technique promotes code reusability and makes it easier to build complex functionality by breaking it down into simpler, more manageable parts.</p>



<p class="wp-block-paragraph" id="76d8"><strong>3. Handle errors gracefully</strong></p>



<p class="wp-block-paragraph" id="21ec">When applying a function to a list or vector, use functions like <code>safely()</code>, <code>quietly()</code>, and <code>possibly()</code> to handle errors gracefully without stopping the execution of your code. This approach ensures that your code remains robust and can handle unexpected input values.</p>



<p class="wp-block-paragraph" id="97a5"><strong>4. Know when to use purrr vs. base R or dplyr</strong></p>



<p class="wp-block-paragraph" id="aac3">While <code>purrr</code> provides a powerful and flexible way to manipulate data, there are cases where base R or <code>dplyr</code> functions may be more appropriate or efficient. For example, if you need to perform simple operations on a data frame, consider using <code>dplyr</code> functions like <code>mutate()</code> or <code>summarize()</code>. Evaluate the needs of your specific task and choose the best tool for the job.</p>



<p class="wp-block-paragraph" id="d5c9"><strong>5. Familiarize yourself with the purrr documentation</strong></p>



<p class="wp-block-paragraph" id="57fa">The <code>purrr</code> package has a wealth of functions and features that can help you streamline your code and solve complex problems. Make sure to consult the official documentation (<a href="https://purrr.tidyverse.org/" rel="noreferrer noopener" target="_blank">https://purrr.tidyverse.org/</a>) to explore its full capabilities and discover new techniques.</p>



<p class="wp-block-paragraph" id="eac3">By following these tips and best practices, you can fully leverage the power of the <code>purrr</code> package in your R projects, making your code more efficient, readable, and maintainable. Embrace the functional programming paradigm and use <code>purrr</code> to solve real-world data analysis challenges with ease.</p>



<hr class="wp-block-separator has-alpha-channel-opacity is-style-dots"/>



<h1 class="wp-block-heading" id="c363">Wrapping up</h1>



<p class="wp-block-paragraph" id="6914">Throughout this article, we’ve delved into the capabilities and adaptability of R’s <code>purrr</code> package in the realm of functional programming and data handling. Spanning from foundational functional programming principles to the pivotal role of the map() function suite, all the way to intricate subjects like engaging nested data sets and adept error management.</p>



<p class="wp-block-paragraph" id="75dd">Using real-world scenarios, we’ve showcased how <code>purrr</code> can be instrumental in de-complicating daunting tasks, optimizing your scripts, and enhancing its legibility and sustainability. Incorporating <code>purrr</code> into your R utilities ensures a smoother journey through data manipulation and analytical hurdles.</p>



<p class="wp-block-paragraph" id="78e2">As you venture further into the depths of the <code>purrr</code> package, bear in mind that mastery comes with repetition. Embrace exploration, and endeavor to ingeniously apply <code>purrr</code> functionalities in your endeavors. With perseverance, you’ll cultivate a profound grasp of its intricacies, propelling you towards proficient data management in R.</p>



<p class="wp-block-paragraph" id="15ba">Happy coding!</p>



<p class="wp-block-paragraph" id="b396"><strong>Further Reading and Exploration:</strong></p>



<p class="wp-block-paragraph" id="79d8">For those eager to expand their expertise on <code>purrr</code> and R’s functional programming, consider the following treasure trove of resources:</p>



<ol class="wp-block-list">
<li><code>purrr’s</code> Official Guide: As a logical first step, the <code>purrr</code> package’s official documentation provides a thorough overview of all it offers. Dive into the nuances at <code>purrr’s</code><a href="https://purrr.tidyverse.org/" rel="noreferrer noopener" target="_blank"> official site</a>.</li>



<li>R for Data Science: A masterpiece penned by Hadley Wickham and Garrett Grolemund, this digital tome offers an exhaustive look into R’s role in data science. Notably, it features a segment dedicated to <code>purrr’s</code> prowess in functional programming. Grab your copy <a href="https://r4ds.had.co.nz/" rel="noreferrer noopener" target="_blank">here</a>.</li>



<li>Advanced R: A deeper dive by Hadley Wickham, “Advanced R” ventures into the more intricate aspects of R, shedding light on advanced functional programming paradigms. Embark on this advanced journey <a href="https://adv-r.hadley.nz/" rel="noreferrer noopener" target="_blank">here</a>.</li>



<li>RStudio’s Vibrant Community: Seeking advice, hoping to discuss new findings, or simply aiming to network? The RStudio community is a hub of enthusiasts, experts, and curious minds. Engage with like-minded individuals <a href="https://community.rstudio.com/" rel="noreferrer noopener" target="_blank">right here</a>.</li>
</ol>



<p class="wp-block-paragraph" id="238c">Harnessing these resources and proactively mingling with the wider R circle will undoubtedly refine your prowess with both the <code>purrr</code> package and R’s functional programming realm. Continue your journey of discovery, trial, and collaborative learning to blossom as an adept data scientist and R aficionado.</p>
<p>The post <a href="https://analyticadss.com/unleash-the-power-of-functional-programming-in-r-with-the-purrr-package/">Unleash the Power of Functional Programming in R with the purrr Package</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Introduction to Probability and Statistics: Basic Concepts and Terminology with Visuals — Part I</title>
		<link>https://analyticadss.com/introduction-to-probability-and-statistics-basic-concepts-and-terminology-with-visuals-part-i/</link>
		
		<dc:creator><![CDATA[Aous Abdo]]></dc:creator>
		<pubDate>Wed, 22 Mar 2023 21:20:10 +0000</pubDate>
				<category><![CDATA[Artificial Intelligence]]></category>
		<category><![CDATA[Data Science]]></category>
		<category><![CDATA[R Statistical Language]]></category>
		<category><![CDATA[analytica data science solution]]></category>
		<category><![CDATA[analyticadss]]></category>
		<category><![CDATA[R]]></category>
		<guid isPermaLink="false">https://analyticadss.com/?p=6126</guid>

					<description><![CDATA[<p>Welcome to the first part of our series, “Demystifying Data Science: A Comprehensive Guide for Beginners.” This series is designed to help aspiring data scientists gain a solid understanding of the fundamental concepts and techniques in the field of data science. We will explore various topics, including probability, statistics, machine learning, and data visualization, with [&#8230;]</p>
<p>The post <a href="https://analyticadss.com/introduction-to-probability-and-statistics-basic-concepts-and-terminology-with-visuals-part-i/">Introduction to Probability and Statistics: Basic Concepts and Terminology with Visuals — Part I</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph" id="01da">Welcome to the first part of our series, “Demystifying Data Science: A Comprehensive Guide for Beginners.” This series is designed to help aspiring data scientists gain a solid understanding of the fundamental concepts and techniques in the field of data science. We will explore various topics, including probability, statistics, machine learning, and data visualization, with a strong emphasis on practical examples and visual aids. In this first installment, we will delve into probability and statistics, covering essential concepts such as probability fundamentals, descriptive statistics, and inferential statistics. Stay tuned for more engaging and informative content in the upcoming parts of this series!</p>



<h2 class="wp-block-heading" id="a72b">Introduction</h2>



<p class="wp-block-paragraph" id="ecb6">Probability and statistics are essential disciplines for data scientists, analysts, and researchers. They provide a solid foundation for understanding, interpreting, and drawing meaningful conclusions from data. As the demand for data-driven insights and decision-making grows across various industries, a strong grasp of these concepts is crucial for anyone seeking a career in data science or looking to enhance their analytical skills.</p>



<p class="wp-block-paragraph" id="21de">In this blog post, we will introduce probability and statistics by exploring their basic concepts and terminology. We will explain the core principles and ideas that underpin these disciplines, using real-world examples and R code to help you visualize and understand these concepts in action. By incorporating visuals, we aim to make the material more engaging and easier to comprehend, allowing you to build a strong foundation for future learning.</p>



<p class="wp-block-paragraph" id="654c">Our journey will begin with probability fundamentals, including definitions, types of probability, and essential rules. We will then move on to descriptive statistics, discussing measures of central tendency, dispersion, and shape. Throughout the blog post, we will use the tidyverse package in R, focusing on ggplot2 for data visualization. This popular package offers a powerful and flexible way to create high-quality graphics, aiding in data exploration and communication.</p>



<p class="wp-block-paragraph" id="926c">By the end of this post, you will have a solid understanding of basic probability and descriptive statistics, supported by clear visualizations. This foundational knowledge will prepare you for more advanced topics and techniques in data analysis and machine learning, setting you on a path to success in the ever-evolving world of data science.</p>



<p class="wp-block-paragraph" id="eb8f">Stay tuned as we dive into the fascinating realm of probability and statistics, providing you with practical examples and insights to enhance your understanding and skills.</p>



<h2 class="wp-block-heading" id="2e47">Probability Fundamentals</h2>



<p class="wp-block-paragraph" id="8ec0">Probability is the study of randomness and uncertainty. It provides a way to quantify the likelihood of specific outcomes or events occurring in various situations. Understanding probability fundamentals is crucial for data science, as it underlies many statistical techniques and machine learning algorithms. In this section, we will elaborate on the basic concepts, principles, and rules of probability, using real-world data when possible.</p>



<p class="wp-block-paragraph" id="0f87"><strong>I. Definitions:</strong></p>



<ul class="wp-block-list">
<li>Experiment: An action or procedure that results in one of several possible outcomes. For example, rolling a die is an experiment with six possible outcomes (1, 2, 3, 4, 5, or 6).</li>



<li>Outcome: The result of an experiment. In the die-rolling example, if the die lands on 3, then the outcome is 3.</li>



<li>Event: A set of one or more outcomes. In the die-rolling example, an event could be the die showing an even number, which includes the outcomes {2, 4, 6}.</li>



<li>Sample Space: The set of all possible outcomes of an experiment. For the die-rolling example, the sample space is {1, 2, 3, 4, 5, 6}.</li>
</ul>



<p class="wp-block-paragraph" id="4431"><strong>II. Types of Probability:</strong></p>



<ul class="wp-block-list">
<li><strong>Classical</strong>: Based on the assumption that all outcomes are equally likely. In the die-rolling example, the classical probability of getting an even number is 1/2 (3 even numbers out of 6 possible outcomes).</li>



<li><strong>Relative Frequency</strong>: Based on the observed frequencies of outcomes in a sample. For instance, suppose we roll a die 100 times and observe 40 even numbers. The relative frequency of getting an even number is 40/100 = 0.4.</li>



<li><strong>Subjective</strong>: Based on an individual’s personal judgment or belief. A person may believe that it is more likely to rain tomorrow based on their interpretation of weather patterns and past experiences, even if objective data suggests otherwise.</li>
</ul>



<p class="wp-block-paragraph" id="7da8">Let’s use a real-world dataset to illustrate the relative frequency approach. We will analyze the number of cylinders in vehicles from the <code>mtcars</code> dataset, which is included in R. We will calculate the relative frequency of vehicles with 4, 6, and 8 cylinders.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.7083740234375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="library(knitr)
data(mtcars)

cylinder_counts <- mtcars %>% count(cyl)
total_cars <- sum(cylinder_counts$n)
relative_frequencies <- cylinder_counts %>% 
mutate(relative_frequency = n / total_cars)

kable(relative_frequencies, caption = &quot;Relative Frequencies of Cylinder Counts in Vehicles&quot;)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(knitr)</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mtcars)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #F8F8F2">cylinder_counts </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> mtcars </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> count(cyl)</span></span>
<span class="line"><span style="color: #F8F8F2">total_cars </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">sum</span><span style="color: #F8F8F2">(cylinder_counts</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">n)</span></span>
<span class="line"><span style="color: #F8F8F2">relative_frequencies </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> cylinder_counts </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> </span></span>
<span class="line"><span style="color: #F8F8F2">mutate(</span><span style="color: #FD971F; font-style: italic">relative_frequency</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> n </span><span style="color: #F92672">/</span><span style="color: #F8F8F2"> total_cars)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #F8F8F2">kable(relative_frequencies, </span><span style="color: #FD971F; font-style: italic">caption</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Relative Frequencies of Cylinder Counts in Vehicles&quot;</span><span style="color: #F8F8F2">)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="1024" height="218" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2023/03/1_440_Wk8vvm9A0kbPPg8qcw-1024x218.webp" alt="" class="wp-image-6127" srcset="https://analyticadss.com/wp-content/uploads/2023/03/1_440_Wk8vvm9A0kbPPg8qcw-1024x218.webp 1024w, https://analyticadss.com/wp-content/uploads/2023/03/1_440_Wk8vvm9A0kbPPg8qcw-500x106.webp 500w, https://analyticadss.com/wp-content/uploads/2023/03/1_440_Wk8vvm9A0kbPPg8qcw-150x32.webp 150w, https://analyticadss.com/wp-content/uploads/2023/03/1_440_Wk8vvm9A0kbPPg8qcw-768x163.webp 768w, https://analyticadss.com/wp-content/uploads/2023/03/1_440_Wk8vvm9A0kbPPg8qcw.webp 1400w" sizes="auto, (max-width: 1024px) 100vw, 1024px" /><figcaption class="wp-element-caption">Relative Frequencies of Cylinder Counts in Vehicles</figcaption></figure>
</div>


<p class="wp-block-paragraph" id="ee7d">In the table above, the “cyl” column represents the number of cylinders in a vehicle, the “n” column shows the count of vehicles with the corresponding number of cylinders, and the “relative_frequency” column displays the relative frequency of each cylinder count in the dataset.</p>



<p class="wp-block-paragraph" id="88cb">Now, let’s visualize the relative frequencies using ggplot2.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.6944427490234375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="library(tidyverse)

ggplot(relative_frequencies, aes(x = factor(cyl), y = relative_frequency)) +
  geom_col(fill = &quot;steelblue&quot;) +
  labs(title = &quot;Relative Frequency of Cylinder Counts in Vehicles&quot;,
       x = &quot;Number of Cylinders&quot;,
       y = &quot;Relative Frequency&quot;) +
  theme_minimal()" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyverse)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #F8F8F2">ggplot(relative_frequencies, aes(</span><span style="color: #FD971F; font-style: italic">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">factor</span><span style="color: #F8F8F2">(cyl), </span><span style="color: #FD971F; font-style: italic">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> relative_frequency)) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  geom_col(</span><span style="color: #FD971F; font-style: italic">fill</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;steelblue&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  labs(</span><span style="color: #FD971F; font-style: italic">title</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Relative Frequency of Cylinder Counts in Vehicles&quot;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">       </span><span style="color: #FD971F; font-style: italic">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Number of Cylinders&quot;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">       </span><span style="color: #FD971F; font-style: italic">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Relative Frequency&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme_minimal()</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="496" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2023/03/1_I7NZ2TsdPjhhQIZz1bNO8Q.webp" alt="" class="wp-image-6128" srcset="https://analyticadss.com/wp-content/uploads/2023/03/1_I7NZ2TsdPjhhQIZz1bNO8Q.webp 828w, https://analyticadss.com/wp-content/uploads/2023/03/1_I7NZ2TsdPjhhQIZz1bNO8Q-500x300.webp 500w, https://analyticadss.com/wp-content/uploads/2023/03/1_I7NZ2TsdPjhhQIZz1bNO8Q-150x90.webp 150w, https://analyticadss.com/wp-content/uploads/2023/03/1_I7NZ2TsdPjhhQIZz1bNO8Q-768x460.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph" id="f3cf">Another way to demonstrate the relative frequency approach is to simulate a die experiment, we will roll a fair six-sided die 1000 times and calculate the relative frequency of each outcome (1, 2, 3, 4, 5, and 6). Additionally, we will visualize the results using ggplot2.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.708335876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="set.seed(42) # Set seed for reproducibility
n_rolls <- 1000
die_rolls <- sample(1:6, size = n_rolls, replace = TRUE)

die_rolls_df <- data.frame(outcome = die_rolls) %>%
  count(outcome) %>%
  mutate(relative_frequency = n / n_rolls)

die_rolls_df" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">set.seed</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">42</span><span style="color: #F8F8F2">) </span><span style="color: #88846F"># Set seed for reproducibility</span></span>
<span class="line"><span style="color: #F8F8F2">n_rolls </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">1000</span></span>
<span class="line"><span style="color: #F8F8F2">die_rolls </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">sample</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">1</span><span style="color: #F92672">:</span><span style="color: #AE81FF">6</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F; font-style: italic">size</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> n_rolls, </span><span style="color: #FD971F; font-style: italic">replace</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #F8F8F2">die_rolls_df </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">data.frame</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F; font-style: italic">outcome</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> die_rolls) </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">  count(outcome) </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">  mutate(</span><span style="color: #FD971F; font-style: italic">relative_frequency</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> n </span><span style="color: #F92672">/</span><span style="color: #F8F8F2"> n_rolls)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #F8F8F2">die_rolls_df</span></span></code></pre></div>



<p class="wp-block-paragraph" id="8671">This will generate a data frame with the outcome, count, and relative frequency of each die roll:</p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="301" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2023/03/1_6cqlVMLGN0pwlyuyCIZKug.webp" alt="" class="wp-image-6129" srcset="https://analyticadss.com/wp-content/uploads/2023/03/1_6cqlVMLGN0pwlyuyCIZKug.webp 828w, https://analyticadss.com/wp-content/uploads/2023/03/1_6cqlVMLGN0pwlyuyCIZKug-500x182.webp 500w, https://analyticadss.com/wp-content/uploads/2023/03/1_6cqlVMLGN0pwlyuyCIZKug-150x55.webp 150w, https://analyticadss.com/wp-content/uploads/2023/03/1_6cqlVMLGN0pwlyuyCIZKug-768x279.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph" id="d448">Now, let’s create a bar chart to visualize the relative frequencies:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.6944427490234375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="ggplot(die_rolls_df, aes(x = factor(outcome), y = relative_frequency)) +
  geom_col(fill = &quot;steelblue&quot;) +
  labs(title = &quot;Die Roll Simulation&quot;,
       x = &quot;Outcome&quot;,
       y = &quot;Relative Frequency&quot;) +
  theme_minimal()" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki monokai" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">ggplot(die_rolls_df, aes(</span><span style="color: #FD971F; font-style: italic">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">factor</span><span style="color: #F8F8F2">(outcome), </span><span style="color: #FD971F; font-style: italic">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> relative_frequency)) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  geom_col(</span><span style="color: #FD971F; font-style: italic">fill</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;steelblue&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  labs(</span><span style="color: #FD971F; font-style: italic">title</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Die Roll Simulation&quot;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">       </span><span style="color: #FD971F; font-style: italic">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Outcome&quot;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">       </span><span style="color: #FD971F; font-style: italic">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Relative Frequency&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme_minimal()</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="488" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2023/03/1_qLn-3a0Vi2UpTqWf5CQOeg.webp" alt="" class="wp-image-6130" srcset="https://analyticadss.com/wp-content/uploads/2023/03/1_qLn-3a0Vi2UpTqWf5CQOeg.webp 828w, https://analyticadss.com/wp-content/uploads/2023/03/1_qLn-3a0Vi2UpTqWf5CQOeg-500x295.webp 500w, https://analyticadss.com/wp-content/uploads/2023/03/1_qLn-3a0Vi2UpTqWf5CQOeg-150x88.webp 150w, https://analyticadss.com/wp-content/uploads/2023/03/1_qLn-3a0Vi2UpTqWf5CQOeg-768x453.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph" id="f84b">The resulting bar chart displays the relative frequencies of each outcome from the die roll simulation. The chart illustrates the concept of relative frequency by showing the proportion of each outcome observed in the experiment.</p>



<p class="wp-block-paragraph" id="5529"><strong>III. Probability Rules:</strong></p>



<p class="wp-block-paragraph" id="e648"><strong>Addition Rule</strong>: The addition rule helps us calculate the probability of either event A or event B (or both) occurring. The rule is defined as:</p>



<p class="wp-block-paragraph" id="4952">P(A ∪ B) = P(A) + P(B) — P(A ∩ B)</p>



<p class="wp-block-paragraph" id="9525">Here, P(A ∪ B) represents the probability of event A or event B occurring, while P(A ∩ B) denotes the probability of both events A and B happening together.</p>



<p class="wp-block-paragraph" id="c9a1">Example: Suppose we have a deck of 52 playing cards. What is the probability of drawing either a red card (hearts or diamonds) or a queen?</p>



<p class="wp-block-paragraph" id="85c6">There are 26 red cards and 4 queens in the deck, but 2 of the queens are also red cards (queen of hearts and queen of diamonds). So, applying the addition rule:</p>



<p class="wp-block-paragraph" id="6ec3">P(Red ∪ Queen) = P(Red) + P(Queen) — P(Red ∩ Queen)<br>P(Red ∪ Queen) = (26/52) + (4/52) — (2/52) = 28/52 ≈ 0.5385</p>



<p class="wp-block-paragraph" id="257a"><strong>Multiplication Rule</strong>: The multiplication rule helps us determine the probability of both events A and B occurring simultaneously. The rule is defined as:</p>



<p class="wp-block-paragraph" id="8dea">P(A ∩ B) = P(A|B) * P(B)</p>



<p class="wp-block-paragraph" id="dd18">Here, P(A|B) represents the probability of event A occurring given that event B has occurred.</p>



<p class="wp-block-paragraph" id="f14b">Example: Consider a bag containing 5 blue and 3 red balls. We draw two balls from the bag without replacement. What is the probability of drawing a blue ball first, followed by a red ball?</p>



<p class="wp-block-paragraph" id="d5d1">To apply the multiplication rule, we first calculate the probability of each event:</p>



<p class="wp-block-paragraph" id="80b0">P(Blue1) = 5/8 P(Red2|Blue1) = 3/7</p>



<p class="wp-block-paragraph" id="3d0a">Now, we can compute the probability of both events happening together:</p>



<p class="wp-block-paragraph" id="21c4">P(Blue1 ∩ Red2) = P(Blue1) * P(Red2|Blue1) = (5/8) * (3/7) ≈ 0.2679</p>



<p class="wp-block-paragraph" id="8764">In the next part of this series, we will cover topics related to Descriptive Statistics.</p>



<p class="wp-block-paragraph" id="e9c0">For more practical tips and insights on AI, data science, and statistics, explore our blog at <a href="https://analyticadss.com/blog/" rel="noreferrer noopener" target="_blank">Analytica Data Science Solutions</a>. Discover engaging content to expand your knowledge and stay up-to-date with the latest developments.</p>



<p class="wp-block-paragraph" id="a9b4">In this blog post, we have covered the basics of probability and statistics. If you wish to further expand your knowledge and understanding, here are some references to help you dive deeper into these topics:</p>



<p class="wp-block-paragraph" id="f5eb">Books:</p>



<ol class="wp-block-list">
<li>DeGroot, M. H., & Schervish, M. J. (2012). Probability and Statistics (4th ed.). Pearson.</li>



<li>Wackerly, D., Mendenhall, W., & Scheaffer, R. L. (2007). Mathematical Statistics with Applications (7th ed.). Cengage Learning.</li>



<li>Casella, G., & Berger, R. L. (2001). Statistical Inference (2nd ed.). Duxbury Press.</li>



<li>Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer.</li>



<li>Wickham, H., & Grolemund, G. (2016). R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media.</li>
</ol>



<p class="wp-block-paragraph" id="ca60">Websites:</p>



<ol class="wp-block-list">
<li>Khan Academy — Probability and Statistics: <a href="https://www.khanacademy.org/math/statistics-probability" rel="noreferrer noopener" target="_blank">https://www.khanacademy.org/math/statistics-probability</a></li>



<li>Stat Trek — Teach yourself statistics: <a href="https://stattrek.com/" rel="noreferrer noopener" target="_blank">https://stattrek.com/</a></li>



<li>Carnegie Mellon University Probability & Statistics:<a href="https://oli.cmu.edu/courses/probability-statistics-open-free/" rel="noreferrer noopener" target="_blank"> https://oli.cmu.edu/courses/probability-statistics-open-free/</a></li>
</ol>



<p class="wp-block-paragraph" id="48bc">Online Courses:</p>



<ol class="wp-block-list">
<li>Coursera — Statistics with R Specialization by Duke University: <a href="https://www.coursera.org/specializations/statistics" rel="noreferrer noopener" target="_blank">https://www.coursera.org/specializations/statistics</a></li>



<li>Coursera — Introduction to Probability and Data with R by Duke University: <a href="https://www.coursera.org/learn/probability-intro" rel="noreferrer noopener" target="_blank">https://www.coursera.org/learn/probability-intro</a></li>



<li>edX — Probability and Statistics in Data Science using Python by the University of California, San Diego: <a href="https://www.edx.org/course/probability-and-statistics-in-data-science-using-python" rel="noreferrer noopener" target="_blank">https://www.edx.org/course/probability-and-statistics-in-data-science-using-python</a></li>



<li>DataCamp — Introduction to Probability in R: <a href="https://www.datacamp.com/courses/introduction-to-probability-in-r" rel="noreferrer noopener" target="_blank">https://www.datacamp.com/courses/introduction-to-probability-in-r</a></li>



<li>DataCamp — Foundations of Probability in R: <a href="https://www.datacamp.com/courses/foundations-of-probability-in-r" rel="noreferrer noopener" target="_blank">https://www.datacamp.com/courses/foundations-of-probability-in-r</a></li>
</ol>



<p class="wp-block-paragraph">Read More blogs in AnalyticaDSS Blogs here : <a href="https://analyticadss.com/blog">BLOGS</a></p>



<p class="wp-block-paragraph">Read More blogs in Medium : <a href="https://medium.com/@aousabdo">Medium Blogs</a></p>
<p>The post <a href="https://analyticadss.com/introduction-to-probability-and-statistics-basic-concepts-and-terminology-with-visuals-part-i/">Introduction to Probability and Statistics: Basic Concepts and Terminology with Visuals — Part I</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>The Tidyverse and data.table R Packages</title>
		<link>https://analyticadss.com/the-tidyverse-and-data-table-r-packages/</link>
		
		<dc:creator><![CDATA[Aous Abdo]]></dc:creator>
		<pubDate>Sun, 14 Feb 2021 15:21:31 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[R Statistical Language]]></category>
		<category><![CDATA[Python]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Tidyverse]]></category>
		<guid isPermaLink="false">https://analyticadss.com/?p=4821</guid>

					<description><![CDATA[<p>“The Tidyverse and data.table R Packages” The power of R comes from the vast collection of software libraries, i.e. packages, that can be easily installed and loaded in R. Today we will cover two of the most powerful packages in R, the tidyverse and data.table packages. The tidyverse and data.table are two popular packages in R that provide functions for working with data. [&#8230;]</p>
<p>The post <a href="https://analyticadss.com/the-tidyverse-and-data-table-r-packages/">The Tidyverse and data.table R Packages</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<p class="wp-block-paragraph">“The Tidyverse and data.table R Packages”</p>



<p class="wp-block-paragraph" id="73a3">The power of R comes from the vast collection of software libraries, i.e. packages, that can be easily installed and loaded in R. Today we will cover two of the most powerful packages in R, the <strong><code>tidyverse</code> </strong>and <code><strong>data.table</strong></code> packages.</p>



<p class="wp-block-paragraph" id="12a6">The <strong><code>tidyverse</code> </strong>and <strong><code>data.table</code> </strong>are two popular packages in R that provide functions for working with data. They both have their own strengths and are suitable for different types of tasks.</p>



<p class="wp-block-paragraph" id="df56">The <strong><code>tidyverse</code> </strong>is a collection of packages designed for data manipulation, visualization, and modeling. It is based on the principles of tidy data, which suggests that data should be structured in a way that makes it easy to work with. The <strong><code>tidyverse</code> </strong>includes packages such as <code><strong>dplyr</strong></code>, <code><strong>tidyr</strong></code>, and <code>ggplot2</code>, which provides functions for data manipulation, cleaning, and visualization.</p>



<p class="wp-block-paragraph" id="1c2b">One of the main advantages of the <strong><code>tidyverse</code> </strong>is its simplicity. The functions in the <strong><code>tidyverse</code> </strong>are easy to learn and use, and they often require fewer lines of code compared to other packages. They also have a consistent syntax, which makes it easier to learn and use multiple functions.</p>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading" id="c2b4">Examples: Tidyverse Examples</h2>



<p class="wp-block-paragraph" id="2da7">Here are some examples of how to use the <code><strong>tidyverse</strong></code>:</p>



<p class="wp-block-paragraph" id="2df8">To select specific columns from a dataset:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the tidyverse package
library(tidyverse)

# Load the mpg dataset from the ggplot2 package
data(mpg)

# Select the &quot;manufacturer&quot; and &quot;model&quot; columns
mpg %>% select(manufacturer, model)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the tidyverse package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyverse)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg dataset from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Select the &quot;manufacturer&quot; and &quot;model&quot; columns</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> select(manufacturer, model)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="8924">And to group and summarize a dataset:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the tidyverse package
library(tidyverse)

# Load the mpg dataset from the ggplot2 package
data(mpg)

# Group the dataset by &quot;class&quot; and compute the mean of the &quot;hwy&quot; column
mpg %>% group_by(class) %>% summarize(mean_hwy = mean(hwy))" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the tidyverse package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyverse)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg dataset from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Group the dataset by &quot;class&quot; and compute the mean of the &quot;hwy&quot; column</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> group_by(class) </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> summarize(</span><span style="color: #FD971F">mean_hwy</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">mean</span><span style="color: #F8F8F2">(hwy))</span></span></code></pre></div>



<p class="wp-block-paragraph" id="d645">To join two datasets:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the tidyverse package
library(tidyverse)

# Load the mpg and cylinders datasets from the ggplot2 package
data(mpg)
data(cylinders)

# Join the mpg and cylinders datasets on the &quot;manufacturer&quot; column
mpg %>% left_join(cylinders, by = &quot;manufacturer&quot;)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the tidyverse package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyverse)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg and cylinders datasets from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(cylinders)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Join the mpg and cylinders datasets on the &quot;manufacturer&quot; column</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> left_join(cylinders, </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;manufacturer&quot;</span><span style="color: #F8F8F2">)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="68fd">To perform a linear regression using the <code>lm</code> function from the <code>stats</code> package:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395835876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the tidyverse and stats packages
library(tidyverse)
library(stats)

# Load the mtcars dataset
data(mtcars)

# Perform a linear regression to predict mpg (miles per gallon) using wt (weight) as the predictor variable
fit <- mtcars %>% 
  lm(mpg ~ wt, data = .)

# Summarize the model results
summary(fit)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the tidyverse and stats packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyverse)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(stats)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mtcars dataset</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mtcars)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Perform a linear regression to predict mpg (miles per gallon) using wt (weight) as the predictor variable</span></span>
<span class="line"><span style="color: #F8F8F2">fit </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> mtcars </span><span style="color: #F92672">%>%</span><span style="color: #F8F8F2"> </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #66D9EF">lm</span><span style="color: #F8F8F2">(mpg </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> wt, </span><span style="color: #FD971F">data</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> .)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Summarize the model results</span></span>
<span class="line"><span style="color: #66D9EF">summary</span><span style="color: #F8F8F2">(fit)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="9d7e">Create a scatterplot matrix using the <code>scatterplotMatrix</code> function from the <code>car</code> package:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the tidyverse and car packages
library(tidyverse)
library(car)

# Load the iris dataset
data(iris)

# Create a scatterplot matrix of the iris dataset
scatterplotMatrix(iris, smooth = FALSE)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the tidyverse and car packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyverse)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(car)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the iris dataset</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(iris)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a scatterplot matrix of the iris dataset</span></span>
<span class="line"><span style="color: #F8F8F2">scatterplotMatrix(iris, </span><span style="color: #FD971F">smooth</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">FALSE</span><span style="color: #F8F8F2">)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="9cfb">Create a faceted bar plot using <code><strong>ggplot2</strong></code>:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395843505859375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the tidyverse package
library(tidyverse)

# Load the mpg dataset from the ggplot2 package
data(mpg)

# Create a faceted bar plot showing the distribution of hwy (highway miles per gallon) by class and drv (drive type)
ggplot(mpg, aes(x = hwy)) +
  geom_histogram(binwidth = 2) +
  facet_wrap(~ class + drv, nrow = 2)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the tidyverse package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyverse)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg dataset from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a faceted bar plot showing the distribution of hwy (highway miles per gallon) by class and drv (drive type)</span></span>
<span class="line"><span style="color: #F8F8F2">ggplot(mpg, aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> hwy)) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  geom_histogram(</span><span style="color: #FD971F">binwidth</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  facet_wrap(</span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> class </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> drv, </span><span style="color: #FD971F">nrow</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">)</span></span></code></pre></div>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<h2 class="wp-block-heading" id="804c">Examples: data.table Examples</h2>



<p class="wp-block-paragraph" id="17bf">The <code><strong>data.table</strong></code> package, on the other hand, is a high-performance package for working with large datasets. It provides functions for manipulating and querying data efficiently. The <code><strong>data.table</strong></code> package is particularly useful when working with datasets that are too large to fit in memory or when you need to perform complex operations on large datasets.</p>



<h4 class="wp-block-heading">One of the main advantages of the <code><strong>data.table</strong></code> package</h4>



<p class="wp-block-paragraph" id="9709">One of the main advantages of the <code><strong>data.table</strong></code> package is its speed. The functions in the <code><strong>data.table</strong></code> package are generally faster than their counterparts in the <code><strong>tidyverse</strong></code>, especially when working with large datasets.</p>



<p class="wp-block-paragraph" id="d980">Here are some more examples of how to use the<strong> <code>data.table</code></strong> package:</p>



<p class="wp-block-paragraph" id="9202">To select specific columns from a dataset:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395843505859375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the data.table package
library(data.table)

# Load the mpg dataset from the ggplot2 package
data(mpg)

# Convert the dataset to a data.table
mpg <- as.data.table(mpg)

# Select the &quot;manufacturer&quot; and &quot;model&quot; columns
mpg[, .(manufacturer, model)]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the data.table package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg dataset from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Convert the dataset to a data.table</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Select the &quot;manufacturer&quot; and &quot;model&quot; columns</span></span>
<span class="line"><span style="color: #F8F8F2">mpg[, .(manufacturer, model)]</span></span></code></pre></div>



<p class="wp-block-paragraph" id="9768">and to group and summarize a dataset:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395843505859375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the data.table package
library(data.table)

# Load the mpg dataset from the ggplot2 package
data(mpg)

# Convert the dataset to a data.table
mpg <- as.data.table(mpg)

# Group the dataset by &quot;class&quot; and compute the mean of the &quot;hwy&quot; column
mpg[, .(mean_hwy = mean(hwy)), by = class]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the data.table package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg dataset from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Convert the dataset to a data.table</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Group the dataset by &quot;class&quot; and compute the mean of the &quot;hwy&quot; column</span></span>
<span class="line"><span style="color: #F8F8F2">mpg[, .(</span><span style="color: #FD971F">mean_hwy</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">mean</span><span style="color: #F8F8F2">(hwy)), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> class]</span></span></code></pre></div>



<p class="wp-block-paragraph" id="b569">To join two datasets:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.39581298828125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the data.table package
library(data.table)

# Load the mpg and cylinders datasets from the ggplot2 package
data(mpg)
data(cylinders)

# Convert the datasets to data.tables
mpg <- as.data.table(mpg)
cylinders <- as.data.table(cylinders)

# Join the mpg and cylinders datasets on the &quot;manufacturer&quot; column
mpg[cylinders, on = &quot;manufacturer&quot;]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the data.table package</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg and cylinders datasets from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(cylinders)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Convert the datasets to data.tables</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(mpg)</span></span>
<span class="line"><span style="color: #F8F8F2">cylinders </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(cylinders)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Join the mpg and cylinders datasets on the &quot;manufacturer&quot; column</span></span>
<span class="line"><span style="color: #F8F8F2">mpg[cylinders, </span><span style="color: #FD971F">on</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;manufacturer&quot;</span><span style="color: #F8F8F2">]</span></span></code></pre></div>



<p class="wp-block-paragraph" id="4fbd">Perform a linear regression using the <code><strong>lm</strong></code><em> </em>function from the <code>stats</code> package and the <code><strong>data.table</strong></code> package:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395835876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the data.table and stats packages
library(data.table)
library(stats)

# Load the mtcars dataset
data(mtcars)

# Convert the dataset to a data.table
mtcars <- setDT(mtcars)

# Perform a linear regression to predict mpg (miles per gallon) using wt (weight) as the predictor variable
fit <- mtcars[, lm(mpg ~ wt)]

# Summarize the model results
summary(fit)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the data.table and stats packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(stats)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mtcars dataset</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mtcars)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Convert the dataset to a data.table</span></span>
<span class="line"><span style="color: #F8F8F2">mtcars </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> setDT(mtcars)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Perform a linear regression to predict mpg (miles per gallon) using wt (weight) as the predictor variable</span></span>
<span class="line"><span style="color: #F8F8F2">fit </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> mtcars[, </span><span style="color: #66D9EF">lm</span><span style="color: #F8F8F2">(mpg </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> wt)]</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Summarize the model results</span></span>
<span class="line"><span style="color: #66D9EF">summary</span><span style="color: #F8F8F2">(fit)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="1bd0">Create a scatterplot matrix using the <code><strong>scatterplotMatrix</strong></code> function from the <strong><code>car</code> </strong>package and the <code><strong>data.table</strong></code> package:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395843505859375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the data.table and car packages
library(data.table)
library(car)

# Load the iris dataset
data(iris)

# Convert the dataset to a data.table
iris <- as.data.table(iris)

# Create a scatterplot matrix of the iris dataset
scatterplotMatrix(iris, smooth = FALSE)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the data.table and car packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(car)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the iris dataset</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(iris)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Convert the dataset to a data.table</span></span>
<span class="line"><span style="color: #F8F8F2">iris </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(iris)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a scatterplot matrix of the iris dataset</span></span>
<span class="line"><span style="color: #F8F8F2">scatterplotMatrix(iris, </span><span style="color: #FD971F">smooth</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">FALSE</span><span style="color: #F8F8F2">)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="007f">Create a faceted bar plot using <strong><code>ggplot2</code> </strong>and the <code><strong>data.table</strong></code> package:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395843505859375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# Load the data.table and ggplot2 packages
library(data.table)
library(ggplot2)

# Load the mpg dataset from the ggplot2 package
data(mpg)

# Convert the dataset to a data.table
mpg <- as.data.table(mpg)

# Create a faceted bar plot showing the distribution of hwy (highway miles per gallon) by class and drv (drive type)
ggplot(mpg, aes(x = hwy)) +
  geom_histogram(binwidth = 2) +
  facet_wrap(~ class + drv, nrow = 2)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># Load the data.table and ggplot2 packages</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(ggplot2)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Load the mpg dataset from the ggplot2 package</span></span>
<span class="line"><span style="color: #66D9EF">data</span><span style="color: #F8F8F2">(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Convert the dataset to a data.table</span></span>
<span class="line"><span style="color: #F8F8F2">mpg </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(mpg)</span></span>
<span class="line"></span>
<span class="line"><span style="color: #88846F"># Create a faceted bar plot showing the distribution of hwy (highway miles per gallon) by class and drv (drive type)</span></span>
<span class="line"><span style="color: #F8F8F2">ggplot(mpg, aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> hwy)) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  geom_histogram(</span><span style="color: #FD971F">binwidth</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  facet_wrap(</span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> class </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> drv, </span><span style="color: #FD971F">nrow</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">)</span></span></code></pre></div>



<hr class="wp-block-separator has-alpha-channel-opacity"/>



<p class="wp-block-paragraph" id="f013">In terms of implementation, both the <strong><code>tidyverse</code> </strong>and <code><strong>data.table</strong></code> packages are written in R, but some of the functions in the <code><strong>data.table</strong></code> package are implemented in C for improved performance.</p>



<h2 class="wp-block-heading">In summary</h2>



<p class="wp-block-paragraph" id="4b51">the <code><strong>tidyverse</strong></code> and <code><strong>data.table</strong> </code>are two popular packages in R that provide functions for working with data. The <strong><code>tidyverse</code> </strong>is a collection of packages designed for data manipulation, visualization, and modeling, and it is particularly suitable for tasks that require simplicity and ease of use. The <strong><code>tidyverse</code> </strong>functions are easy to learn and use, and they often require fewer lines of code compared to other packages.</p>



<p class="wp-block-paragraph" id="9f5e">The <code><strong>data.table</strong></code> package is a high-performance package for working with large datasets, and it is particularly useful when working with large datasets or when you need to perform complex operations on large datasets. The functions in the <code><strong>data.table</strong></code> package are generally faster than their counterparts in the, especially when working with large datasets.</p>



<p class="wp-block-paragraph" id="612d">In general, it is a good idea to use the <strong><code>tidyverse</code> </strong>for most tasks, unless you are working with very large datasets or need the extra performance provided by the <code><strong>data.table</strong></code> package.</p>



<h4 class="wp-block-heading">At Analytica</h4>



<p class="wp-block-paragraph" id="4600">and since we deal with larger datasets, GB to TB of data, our preferred tool for data wrangling in R is in fact <code><strong>data.table</strong></code>.</p>



<p class="wp-block-paragraph" id="9cc4">I hope this article helps the reader understand the differences between the <strong><code>tidyverse</code> </strong>and <code><strong>data.table</strong></code> in R, and how to choose the right package for their tasks. Let me know if you have any questions.</p>



<p class="wp-block-paragraph">Read More blogs in AnalyticaDSS Blogs here : <a href="https://analyticadss.com/blog">BLOGS</a></p>



<p class="wp-block-paragraph">Read More blogs in Medium : <a href="https://medium.com/@aousabdo">Medium Blogs</a></p>



<p class="wp-block-paragraph">Read More blogs in R-bloggers : <a href="https://www.r-bloggers.com/">https://www.r-bloggers.com</a></p>
<p>The post <a href="https://analyticadss.com/the-tidyverse-and-data-table-r-packages/">The Tidyverse and data.table R Packages</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Analyzing Crypto Market using R — Part 2</title>
		<link>https://analyticadss.com/analyzing-cryptocurrency-markets-using-r-part-2/</link>
		
		<dc:creator><![CDATA[Aous Abdo]]></dc:creator>
		<pubDate>Mon, 24 Dec 2018 10:34:58 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[R Statistical Language]]></category>
		<category><![CDATA[Bitcoin]]></category>
		<category><![CDATA[Cryptocurrency]]></category>
		<category><![CDATA[Data Analysis]]></category>
		<category><![CDATA[R]]></category>
		<guid isPermaLink="false">https://analyticadss.com/?p=4907</guid>

					<description><![CDATA[<p>Correlations in the Crypto World Analyzing crypto market Aous Abdo, WWW.ANALYTICADSS.COMAn interactive version of this post can be found on here. In my previous post I explored bitcoin data from different exchanges, we also covered some arbitrage-related data. In part 2 of this series I will explore alt coin related data. R Libraries Below is a list [&#8230;]</p>
<p>The post <a href="https://analyticadss.com/analyzing-cryptocurrency-markets-using-r-part-2/">Analyzing Crypto Market using R — Part 2</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading" id="bf86">Correlations in the Crypto World</h2>



<p class="wp-block-paragraph">Analyzing crypto market</p>



<p class="wp-block-paragraph"><a href="https://medium.com/u/4f20dbfad286?source=post_page-----b1a0aa44006e--------------------------------" rel="noreferrer noopener" target="_blank">Aous Abdo</a>, <a href="http://www.analyticadss.com/" rel="noreferrer noopener" target="_blank">WWW.ANALYTICADSS.COM</a><br>An interactive version of this post can be found on <a href="https://analyticadss.com/adss_blog/crypto_notebook_part2.nb.html" rel="noreferrer noopener" target="_blank">here</a>.</p>



<p class="wp-block-paragraph" id="ae2d">In my previous post I explored bitcoin data from different exchanges, we also covered some arbitrage-related data. In part 2 of this series I will explore alt coin related data.</p>



<h2 class="wp-block-heading" id="3a8a">R Libraries</h2>



<p class="wp-block-paragraph" id="46c5">Below is a list of R libraries we will be using to help us with our analysis. Not all of them are necessary but they all will make our life easier.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395835876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="library(PoloniexR)
library(data.table)
library(lubridate)
library(Quandl)
library(plyr)
library(stringr)
library(ggplot2)
library(plotly)
library(janitor)
library(quantmod)
library(pryr)
library(corrplot)
library(PerformanceAnalytics)
library(tidyr)
library(MLmetrics)
library(tidyquant)
library(corrr)
library(cowplot)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(PoloniexR)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(lubridate)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(Quandl)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(plyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(stringr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(ggplot2)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(plotly)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(janitor)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(quantmod)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(pryr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(corrplot)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(PerformanceAnalytics)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(MLmetrics)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyquant)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(corrr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(cowplot)</span></span></code></pre></div>



<h2 class="wp-block-heading" id="7a90">Data</h2>



<p class="wp-block-paragraph" id="f5c0">The best source I know off to get alt-coin data is through <a href="https://cran.r-project.org/web/packages/PoloniexR/index.html" rel="noreferrer noopener" target="_blank">PoloniexR</a>. I have written an R function to help download data.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:23.104170322418213px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="get_alt_data <- function(tz = &quot;UTC&quot;
                         , coin = c(&quot;ETH&quot;, &quot;LTC&quot;)
                         , add_bitcoin = TRUE
                         , return_in_USDT = TRUE
                         , from = &quot;2017-01-01&quot;
                         , to = &quot;2018-04-09&quot;
                         , period = &quot;D&quot;
                         , verbose = FALSE){
  
  # We will be using the public API
  poloniex.public <- PoloniexPublicAPI()
  
  # set the time zone to utc
  Sys.setenv(tz = tz)
  
  # convert from and to into time obj
  from  <- as.POSIXct(paste(from, tz, sep = &quot;&quot;))
  to    <- as.POSIXct(paste(to, tz, sep = &quot;&quot;))
  
  # lists to store data.tables and xts objects
  chart_list <- list()
  dt_list    <- list()
  
  # make sure the coin pair is in upper case
  coin       <- toupper(coin)
  coin_pairs <- paste0(&quot;BTC_&quot;, coin[coin != &quot;BTC&quot;])
  if(add_bitcoin | return_in_USDT) coin_pairs <- c(&quot;USDT_BTC&quot;, coin_pairs)
  
  # loop over the coins to get the data
  for(i in coin_pairs){
    if(verbose)
      invisible(cat('\tGetting data for ', i, ' pair\n'))
    
    # this is a list that will contain the chart data for each coin pair
    try(chart_list[[i]] <- ReturnChartData(theObject = poloniex.public
                                       , pair      = i
                                       , from      = from
                                       , to        = to
                                       , period    = period)
        , silent = TRUE)
    
    # list to contain data.tables 
    try(dt_list[[i]] <- as.data.table(chart_list[[i]]), silent = TRUE)
  }
  
  # convert to data.table and make sure to add a column containing the pairs
  coin_dt <- rbindlist(l = dt_list, use.names = TRUE, idcol = &quot;pair&quot;)
  
  # return data in usdt prices
  if(return_in_USDT){
    # to get the price of the alt coin in usdt is not that simple but we'll do it
    # get a DT of the btc_usdt pair
    btc_usd <- coin_dt[pair == &quot;USDT_BTC&quot;]
    btc_usd <- btc_usd[, .(index, pair, weightedaverage)]
    setnames(btc_usd, c(&quot;Date&quot;, &quot;USDT_BTC_pair&quot;, &quot;USDT_BTC_price&quot;))
    
    # get DT with only alt coins
    alt_coins <- copy(coin_dt)#[pair != &quot;USDT_BTC&quot;]
    
    # now we need to add an index to the alt_coins table, but first we have to rename the index column
    alt_coins[, Date := index]
    alt_coins[, index := 1:.N]
    setkey(alt_coins, index)
    
    # now merge the data tables
    coin_dt_usdt <- merge(x = alt_coins, y = btc_usd, by = &quot;Date&quot;)
    
    # now calcualte the price in usdt
    coin_dt_usdt[, price_usdt := ifelse(pair == &quot;USDT_BTC&quot;, USDT_BTC_price, weightedaverage * USDT_BTC_price)]
    
    # now get rid of the extra columns
    coin_dt_usdt[, c(&quot;USDT_BTC_price&quot;, &quot;USDT_BTC_pair&quot;) := NULL]
    
    # we need to change some column names
    col_names_to_change <- c(&quot;pair&quot;, &quot;high&quot;, &quot;low&quot;, &quot;open&quot;, &quot;close&quot;, &quot;volume&quot;, &quot;quotevolume&quot;, &quot;weightedaverage&quot;)
    col_names <- names(coin_dt_usdt)
    col_names[col_names %in% col_names_to_change] <- paste0(col_names_to_change, '_btc')
    
    setnames(coin_dt_usdt, col_names)
    
    # add a column for the usdt pair
    coin_dt_usdt[, pair_usdt := gsub(&quot;BTC_&quot;, &quot;USDT_&quot;, pair_btc)]
    
    # adjust col order
    setcolorder(coin_dt_usdt, c(1:10, 12, 11))
    
    # set key again
    setkey(coin_dt_usdt, index)
    
    # now get rid of the index column since it is not needed anymore
    coin_dt_usdt[, index := NULL]
    
    # now put together the return list  
    return_list <- list(alt_chart_list = chart_list, alt_dt = coin_dt, alt_usdt_dt = coin_dt_usdt)
  }else{
    return_list <- list(alt_chart_list = chart_list, alt_dt = coin_dt)
  }
  
  return(return_list)
}" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #A6E22E">get_alt_data</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F">tz</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;UTC&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">coin</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;ETH&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;LTC&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">add_bitcoin</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">return_in_USDT</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">from</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2017-01-01&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">to</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2018-04-09&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">period</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;D&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">verbose</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">FALSE</span><span style="color: #F8F8F2">){</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># We will be using the public API</span></span>
<span class="line"><span style="color: #F8F8F2">  poloniex.public </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> PoloniexPublicAPI()</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># set the time zone to utc</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #66D9EF">Sys.setenv</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F">tz</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> tz)</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># convert from and to into time obj</span></span>
<span class="line"><span style="color: #F8F8F2">  from  </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">as.POSIXct</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">paste</span><span style="color: #F8F8F2">(from, tz, </span><span style="color: #FD971F">sep</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;&quot;</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">  to    </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">as.POSIXct</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">paste</span><span style="color: #F8F8F2">(to, tz, </span><span style="color: #FD971F">sep</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;&quot;</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># lists to store data.tables and xts objects</span></span>
<span class="line"><span style="color: #F8F8F2">  chart_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">list</span><span style="color: #F8F8F2">()</span></span>
<span class="line"><span style="color: #F8F8F2">  dt_list    </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">list</span><span style="color: #F8F8F2">()</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># make sure the coin pair is in upper case</span></span>
<span class="line"><span style="color: #F8F8F2">  coin       </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">toupper</span><span style="color: #F8F8F2">(coin)</span></span>
<span class="line"><span style="color: #F8F8F2">  coin_pairs </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">paste0</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;BTC_&quot;</span><span style="color: #F8F8F2">, coin[coin </span><span style="color: #F92672">!=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BTC&quot;</span><span style="color: #F8F8F2">])</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">if</span><span style="color: #F8F8F2">(add_bitcoin </span><span style="color: #F92672">|</span><span style="color: #F8F8F2"> return_in_USDT) coin_pairs </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;USDT_BTC&quot;</span><span style="color: #F8F8F2">, coin_pairs)</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># loop over the coins to get the data</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">for</span><span style="color: #F8F8F2">(i </span><span style="color: #F92672">in</span><span style="color: #F8F8F2"> coin_pairs){</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #F92672">if</span><span style="color: #F8F8F2">(verbose)</span></span>
<span class="line"><span style="color: #F8F8F2">      </span><span style="color: #F92672">invisible</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">cat</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&#39;</span><span style="color: #AE81FF">\t</span><span style="color: #E6DB74">Getting data for &#39;</span><span style="color: #F8F8F2">, i, </span><span style="color: #E6DB74">&#39; pair</span><span style="color: #AE81FF">\n</span><span style="color: #E6DB74">&#39;</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># this is a list that will contain the chart data for each coin pair</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #66D9EF">try</span><span style="color: #F8F8F2">(chart_list[[i]] </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ReturnChartData(</span><span style="color: #FD971F">theObject</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> poloniex.public</span></span>
<span class="line"><span style="color: #F8F8F2">                                       , pair      = i</span></span>
<span class="line"><span style="color: #F8F8F2">                                       , from      = from</span></span>
<span class="line"><span style="color: #F8F8F2">                                       , to        = to</span></span>
<span class="line"><span style="color: #F8F8F2">                                       , period    = period)</span></span>
<span class="line"><span style="color: #F8F8F2">        , </span><span style="color: #FD971F">silent</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># list to contain data.tables </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #66D9EF">try</span><span style="color: #F8F8F2">(dt_list[[i]] </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(chart_list[[i]]), </span><span style="color: #FD971F">silent</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">  }</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># convert to data.table and make sure to add a column containing the pairs</span></span>
<span class="line"><span style="color: #F8F8F2">  coin_dt </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> rbindlist(</span><span style="color: #FD971F">l</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> dt_list, </span><span style="color: #FD971F">use.names</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">idcol</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;pair&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># return data in usdt prices</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">if</span><span style="color: #F8F8F2">(return_in_USDT){</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># to get the price of the alt coin in usdt is not that simple but we&#39;ll do it</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># get a DT of the btc_usdt pair</span></span>
<span class="line"><span style="color: #F8F8F2">    btc_usd </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> coin_dt[pair </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;USDT_BTC&quot;</span><span style="color: #F8F8F2">]</span></span>
<span class="line"><span style="color: #F8F8F2">    btc_usd </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> btc_usd[, .(index, pair, weightedaverage)]</span></span>
<span class="line"><span style="color: #F8F8F2">    setnames(btc_usd, </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;Date&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;USDT_BTC_pair&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;USDT_BTC_price&quot;</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># get DT with only alt coins</span></span>
<span class="line"><span style="color: #F8F8F2">    alt_coins </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> copy(coin_dt)</span><span style="color: #88846F">#[pair != &quot;USDT_BTC&quot;]</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># now we need to add an index to the alt_coins table, but first we have to rename the index column</span></span>
<span class="line"><span style="color: #F8F8F2">    alt_coins[, Date </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> index]</span></span>
<span class="line"><span style="color: #F8F8F2">    alt_coins[, index </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">1</span><span style="color: #F92672">:</span><span style="color: #F8F8F2">.N]</span></span>
<span class="line"><span style="color: #F8F8F2">    setkey(alt_coins, index)</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># now merge the data tables</span></span>
<span class="line"><span style="color: #F8F8F2">    coin_dt_usdt </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">merge</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> alt_coins, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> btc_usd, </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Date&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># now calcualte the price in usdt</span></span>
<span class="line"><span style="color: #F8F8F2">    coin_dt_usdt[, price_usdt </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">ifelse</span><span style="color: #F8F8F2">(pair </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;USDT_BTC&quot;</span><span style="color: #F8F8F2">, USDT_BTC_price, weightedaverage </span><span style="color: #F92672">*</span><span style="color: #F8F8F2"> USDT_BTC_price)]</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># now get rid of the extra columns</span></span>
<span class="line"><span style="color: #F8F8F2">    coin_dt_usdt[, </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;USDT_BTC_price&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;USDT_BTC_pair&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">NULL</span><span style="color: #F8F8F2">]</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># we need to change some column names</span></span>
<span class="line"><span style="color: #F8F8F2">    col_names_to_change </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;pair&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;high&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;low&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;open&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;close&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;volume&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;quotevolume&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;weightedaverage&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">    col_names </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">names</span><span style="color: #F8F8F2">(coin_dt_usdt)</span></span>
<span class="line"><span style="color: #F8F8F2">    col_names[col_names </span><span style="color: #F92672">%in%</span><span style="color: #F8F8F2"> col_names_to_change] </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">paste0</span><span style="color: #F8F8F2">(col_names_to_change, </span><span style="color: #E6DB74">&#39;_btc&#39;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    setnames(coin_dt_usdt, col_names)</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># add a column for the usdt pair</span></span>
<span class="line"><span style="color: #F8F8F2">    coin_dt_usdt[, pair_usdt </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">gsub</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;BTC_&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;USDT_&quot;</span><span style="color: #F8F8F2">, pair_btc)]</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># adjust col order</span></span>
<span class="line"><span style="color: #F8F8F2">    setcolorder(coin_dt_usdt, </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">1</span><span style="color: #F92672">:</span><span style="color: #AE81FF">10</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">12</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">11</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># set key again</span></span>
<span class="line"><span style="color: #F8F8F2">    setkey(coin_dt_usdt, index)</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># now get rid of the index column since it is not needed anymore</span></span>
<span class="line"><span style="color: #F8F8F2">    coin_dt_usdt[, index </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">NULL</span><span style="color: #F8F8F2">]</span></span>
<span class="line"><span style="color: #F8F8F2">    </span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #88846F"># now put together the return list  </span></span>
<span class="line"><span style="color: #F8F8F2">    return_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">list</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F">alt_chart_list</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> chart_list, </span><span style="color: #FD971F">alt_dt</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> coin_dt, </span><span style="color: #FD971F">alt_usdt_dt</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> coin_dt_usdt)</span></span>
<span class="line"><span style="color: #F8F8F2">  }</span><span style="color: #F92672">else</span><span style="color: #F8F8F2">{</span></span>
<span class="line"><span style="color: #F8F8F2">    return_list </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">list</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F">alt_chart_list</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> chart_list, </span><span style="color: #FD971F">alt_dt</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> coin_dt)</span></span>
<span class="line"><span style="color: #F8F8F2">  }</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">return</span><span style="color: #F8F8F2">(return_list)</span></span>
<span class="line"><span style="color: #F8F8F2">}</span></span></code></pre></div>



<p class="wp-block-paragraph">The function above can be used to download data for multiple coin at the same time. The function returns a data.table object with data for all coins in the function call. Even if the user doesn’t add bitcoin to the list of coins, the function adds bitcoin by default. This can be deactivated with the add_bitcoin argument. Here is an example</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.6875px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# get alt data for some coins
alt_data <- get_alt_data(return_in_USDT = T
                         , from = &quot;2015-01-01&quot;
                         , coin = c('ETH','XRP', 'BCH', 'LTC', 'NEO', 'XMR', 'DASH', 'XEM'))[['alt_usdt_dt']]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># get alt data for some coins</span></span>
<span class="line"><span style="color: #F8F8F2">alt_data </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> get_alt_data(</span><span style="color: #FD971F">return_in_USDT</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> T</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">from</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2015-01-01&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">                         , </span><span style="color: #FD971F">coin</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&#39;ETH&#39;</span><span style="color: #F8F8F2">,</span><span style="color: #E6DB74">&#39;XRP&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&#39;BCH&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&#39;LTC&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&#39;NEO&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&#39;XMR&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&#39;DASH&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&#39;XEM&#39;</span><span style="color: #F8F8F2">))[[</span><span style="color: #E6DB74">&#39;alt_usdt_dt&#39;</span><span style="color: #F8F8F2">]]</span></span></code></pre></div>



<p class="wp-block-paragraph" id="a8ab">Let’s look at the data we just downloaded</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704856872558594px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="head(alt_data)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">head</span><span style="color: #F8F8F2">(alt_data)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="249" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_yabOeX6Rqn8e6eDO1XzGFg.webp" alt="" class="wp-image-4908" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_yabOeX6Rqn8e6eDO1XzGFg.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_yabOeX6Rqn8e6eDO1XzGFg-500x150.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_yabOeX6Rqn8e6eDO1XzGFg-150x45.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_yabOeX6Rqn8e6eDO1XzGFg-768x231.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph" id="9a46">The table shows the date, OHLC, Volume, and weightedaverage price in BTC. It also shows the pair and we added the price in USD.</p>



<h2 class="wp-block-heading" id="c5e5">Bitcoin-Altcoins Correlations</h2>



<p class="wp-block-paragraph" id="dbac">Wheneven I look at the prices of the coins available on my <a href="https://www.coinbase.com/" target="_blank" rel="noreferrer noopener">coinbase</a> app I always get struck by the similarity of the price trends between the four coins available on coinbase: BTC, ETH, BCH, and LTC, see Figure below. So I thought it will be a good idea to explore the correlation in price trends between altcoins and bitcoin.</p>


<div class="wp-block-image">
<figure class="aligncenter size-large"><img loading="lazy" decoding="async" width="661" height="1024" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_6a1iEbXQTe5L9tKq9xs1XA-661x1024.webp" alt="" class="wp-image-4909" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_6a1iEbXQTe5L9tKq9xs1XA-661x1024.webp 661w, https://analyticadss.com/wp-content/uploads/2022/12/1_6a1iEbXQTe5L9tKq9xs1XA-323x500.webp 323w, https://analyticadss.com/wp-content/uploads/2022/12/1_6a1iEbXQTe5L9tKq9xs1XA-97x150.webp 97w, https://analyticadss.com/wp-content/uploads/2022/12/1_6a1iEbXQTe5L9tKq9xs1XA-768x1190.webp 768w, https://analyticadss.com/wp-content/uploads/2022/12/1_6a1iEbXQTe5L9tKq9xs1XA.webp 786w" sizes="auto, (max-width: 661px) 100vw, 661px" /></figure>
</div>


<p class="wp-block-paragraph" id="47a1">Let’s look at price trends of the coins we just downloaded. To better see potential correlations I am going to only zoom in on 2018.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70486307144165px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="p <- ggplot(alt_data[year(Date) == 2018], aes(x = Date, y =  price_usdt, col = pair_usdt)) + geom_line()
p <- p + facet_wrap(~pair_usdt, scales = &quot;free&quot;, ncol = 3) + theme_minimal() + theme(legend.position=&quot;none&quot;) + ylab(&quot;Price (USD)&quot;)
p" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[year(Date) </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2018</span><span style="color: #F8F8F2">], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  price_usdt, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line()</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> facet_wrap(</span><span style="color: #F92672">~</span><span style="color: #F8F8F2">pair_usdt, </span><span style="color: #FD971F">scales</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;free&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">ncol</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_minimal() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme(</span><span style="color: #FD971F">legend.position</span><span style="color: #F92672">=</span><span style="color: #E6DB74">&quot;none&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Price (USD)&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">p</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="499" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_lXmT2c5hQ6tik7sNfwvFPw.webp" alt="" class="wp-image-4910" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_lXmT2c5hQ6tik7sNfwvFPw.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_lXmT2c5hQ6tik7sNfwvFPw-500x301.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_lXmT2c5hQ6tik7sNfwvFPw-150x90.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_lXmT2c5hQ6tik7sNfwvFPw-768x463.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /><figcaption class="wp-element-caption">Prices of Bitcoin and other altcoins in 2018<br></figcaption></figure>
</div>


<p class="wp-block-paragraph" id="3bf6">The figure above shows that some coins seems to be more correlated with Bitcoin than others. The figure also shows that this variablity between Bitcoin and another coin varies over time. More on this below.</p>



<p class="wp-block-paragraph" id="73c0">Tyring to find correlations bewteen time series data using Pearson correlation coefficient or other metrics used with stationary data, time series is not a form of stationary data, can give misleading results. Similar trends in time series data can also be very misleading, a nice article on this topic can be found <a href="https://svds.com/avoiding-common-mistakes-with-time-series/" rel="noreferrer noopener" target="_blank">here</a>. And always remember that <strong>Correlation doesn’t guarantee Causation</strong></p>



<p class="wp-block-paragraph" id="0f96">Bottom line is the following, one has to be careful when cross-correlating time serice. In order to perform proper correlation analysis we need to add some new variables to our table.</p>



<h2 class="wp-block-heading" id="510f">Percentage Daily Change</h2>



<p class="wp-block-paragraph" id="0ed2">Percentage daily change calculates the price change of a coin over a period of a day. Let’s add that to the table. Notice that we are calcualting this variable using the USD price, and not the price in Bitcoin.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# add daily price change
alt_data[, pct_change := Delt(price_usdt), by = pair_usdt]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># add daily price change</span></span>
<span class="line"><span style="color: #F8F8F2">alt_data[, pct_change </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> Delt(price_usdt), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt]</span></span></code></pre></div>



<h2 class="wp-block-heading" id="a775">Normalized Price in USD</h2>



<p class="wp-block-paragraph" id="a42d">Since the prices vary a lot, both overtime for the same coin and between coins, we will add a variable of the normalized price in USD. This variable will make it easy to plot prices of coins on the same figure.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70489501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# add normalized prices in udst
alt_data[, price_usdt_norm := price_usdt/max(price_usdt), by = pair_usdt]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># add normalized prices in udst</span></span>
<span class="line"><span style="color: #F8F8F2">alt_data[, price_usdt_norm </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> price_usdt</span><span style="color: #F92672">/</span><span style="color: #66D9EF">max</span><span style="color: #F8F8F2">(price_usdt), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt]</span></span></code></pre></div>



<p class="wp-block-paragraph" id="9cec">Now that we have the normalized prices in USD, let’s look at the prices of bitcoin and litcoin on the same figure. We’ll do that for 2018 so we can better see any possible correlations.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="p <- ggplot(alt_data[year(Date) == 2018 & pair_usdt %like% &quot;BTC|LTC&quot;], aes(x = Date, y =  price_usdt_norm, col = pair_usdt)) + geom_line()
p <- p + theme_minimal() + ylab(&quot;Price (USD)&quot;)
ggplotly(p)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[year(Date) </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2018</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">&</span><span style="color: #F8F8F2"> pair_usdt </span><span style="color: #F92672">%like%</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BTC|LTC&quot;</span><span style="color: #F8F8F2">], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  price_usdt_norm, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line()</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_minimal() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Price (USD)&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">ggplotly(p)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="483" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_HHs59LsNRYT3sbL8I0Sdmg.webp" alt="" class="wp-image-4913" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_HHs59LsNRYT3sbL8I0Sdmg.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_HHs59LsNRYT3sbL8I0Sdmg-500x292.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_HHs59LsNRYT3sbL8I0Sdmg-150x88.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_HHs59LsNRYT3sbL8I0Sdmg-768x448.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph">The trends in the prices of BTC and LTC are very similar, Let’s look at price trends for 2017</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="p <- ggplot(alt_data[year(Date) == 2017 & pair_usdt %like% &quot;BTC|LTC&quot;], aes(x = Date, y =  price_usdt_norm, col = pair_usdt)) + geom_line()
p <- p + theme_minimal() + ylab(&quot;Price (USD)&quot;)
ggplotly(p)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[year(Date) </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2017</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">&</span><span style="color: #F8F8F2"> pair_usdt </span><span style="color: #F92672">%like%</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BTC|LTC&quot;</span><span style="color: #F8F8F2">], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  price_usdt_norm, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line()</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_minimal() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Price (USD)&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">ggplotly(p)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="480" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_6t9j_F4QaweBgp_Wl7Ntwg.webp" alt="" class="wp-image-4911" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_6t9j_F4QaweBgp_Wl7Ntwg.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_6t9j_F4QaweBgp_Wl7Ntwg-500x290.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_6t9j_F4QaweBgp_Wl7Ntwg-150x87.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_6t9j_F4QaweBgp_Wl7Ntwg-768x445.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /><figcaption class="wp-element-caption">Prices of Bitcoin and LTC in 2017</figcaption></figure>
</div>


<p class="wp-block-paragraph" id="c49a">Seems like we need to zoon in on the last quarter of 2017, let’s do that</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704862594604492px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="p <- ggplot(alt_data[Date >= &quot;2017-10-01&quot;  & Date < &quot;2018-01-01&quot; &  pair_usdt %like% &quot;BTC|LTC&quot;], aes(x = Date, y =  price_usdt_norm, col = pair_usdt)) + geom_line()
p <- p + theme_minimal() + ylab(&quot;Price (USD)&quot;)
ggplotly(p)
" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[Date </span><span style="color: #F92672">>=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2017-10-01&quot;</span><span style="color: #F8F8F2">  </span><span style="color: #F92672">&</span><span style="color: #F8F8F2"> Date </span><span style="color: #F92672"><</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2018-01-01&quot;</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">&</span><span style="color: #F8F8F2">  pair_usdt </span><span style="color: #F92672">%like%</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BTC|LTC&quot;</span><span style="color: #F8F8F2">], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  price_usdt_norm, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line()</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_minimal() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Price (USD)&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">ggplotly(p)</span></span>
<span class="line"></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="490" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_yEZOrUPaK-ZhizMdmeFCmA.webp" alt="" class="wp-image-4914" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_yEZOrUPaK-ZhizMdmeFCmA.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_yEZOrUPaK-ZhizMdmeFCmA-500x296.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_yEZOrUPaK-ZhizMdmeFCmA-150x89.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_yEZOrUPaK-ZhizMdmeFCmA-768x454.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /><figcaption class="wp-element-caption">Prices of Bitcoin and LTC in the last quarter of 2017<br></figcaption></figure>
</div>


<p class="wp-block-paragraph" id="fcd8">It is clear from the above figure that the correlation in the prices of bitcoin and LTC vary over time. Note how the highest price for bitcoin on December 17 2017, preceded that of LTC by two days, which occurred on December 19 2017. This wasn’t the case for the ATH which occurred on January 6th 2018 for both coins.</p>



<h2 class="wp-block-heading" id="6009">Static Correlations </h2>



<h3 class="wp-block-heading" id="6009">(and why you shouldn’t use them with crypto!)</h3>



<p class="wp-block-paragraph" id="9035">Up until now I haven’t calculated any correltaions between the price of different coins. You might ask why should we even care about correlations in time series. Well, in the case of financial time series data, if one can show that a correlation exists between two time series then one can use this correlation to model/predict the price movement of one coin/stock given the price trends of another coin/stock. </p>



<p class="wp-block-paragraph" id="9035"><strong>However</strong>, as we mentioned earlier, correlation for time series data is not static, it changes over time. Actually let’s show that. To do that I am going to be calculating the <a href="https://en.wikipedia.org/wiki/Pearson_correlation_coefficient" target="_blank" rel="noreferrer noopener">Pearson correlation coefficient</a>. In simple words, Pearson correlation coefficient for two vectors of data is a measure that shows how correlated these two vectors of data are. The value of this coefficient varies from -1, perfectly anti-correlated, to 1, perfectly correlated. So the correlation coefficient for a series of numbers on itself is 1. A value of zero means these is no correlation. Remember, this only works for static data.</p>



<p class="wp-block-paragraph" id="62e8">In order to perform correlation on our data I am going to need to do some data transformation:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70489501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# subset data, only keep the date, the pair, and the price
alt_data_sub <- alt_data[, .(Date, pair_usdt, price_usdt)]
# convert to wide format 
alt_data_sub <- spread(data = alt_data_sub, key = &quot;pair_usdt&quot;, value = &quot;price_usdt&quot;)
# clean column names
setnames(alt_data_sub, gsub(&quot;USDT_&quot;, &quot;&quot;, colnames(alt_data_sub)))" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># subset data, only keep the date, the pair, and the price</span></span>
<span class="line"><span style="color: #F8F8F2">alt_data_sub </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> alt_data[, .(Date, pair_usdt, price_usdt)]</span></span>
<span class="line"><span style="color: #88846F"># convert to wide format </span></span>
<span class="line"><span style="color: #F8F8F2">alt_data_sub </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> spread(</span><span style="color: #FD971F">data</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> alt_data_sub, </span><span style="color: #FD971F">key</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;pair_usdt&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">value</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;price_usdt&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #88846F"># clean column names</span></span>
<span class="line"><span style="color: #F8F8F2">setnames(alt_data_sub, </span><span style="color: #66D9EF">gsub</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;USDT_&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #66D9EF">colnames</span><span style="color: #F8F8F2">(alt_data_sub)))</span></span></code></pre></div>



<p class="wp-block-paragraph">The new table we created contains the date along with the prices in USDT for each coin we have in our table.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.7048492431640625px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="tail(alt_data_sub)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">tail</span><span style="color: #F8F8F2">(alt_data_sub)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="236" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_3onSqsAzx-A1jNaprt9BMw.webp" alt="" class="wp-image-4915" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_3onSqsAzx-A1jNaprt9BMw.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_3onSqsAzx-A1jNaprt9BMw-500x143.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_3onSqsAzx-A1jNaprt9BMw-150x43.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_3onSqsAzx-A1jNaprt9BMw-768x219.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph">Again, what I am doing here is not correct, I am just trying to show you why we shouldn’t be doing static correlations on crypto data. Now we’ll calculate the Pearson correlation coefficient between the coins we have, then we are going to make a nice plot of these coefficients.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704833984375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# calculate the correlation matrix
M <- cor(alt_data_sub[, -1], use = &quot;complete.obs&quot;) # notice how we are ignoring missing data with the last argument
# plot the correlation matrix
corrplot.mixed(corr = M, upper = &quot;ellipse&quot;, lower = &quot;number&quot;, order = &quot;AOE&quot;, tl.col = &quot;black&quot;)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># calculate the correlation matrix</span></span>
<span class="line"><span style="color: #F8F8F2">M </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">cor</span><span style="color: #F8F8F2">(alt_data_sub[, </span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">], </span><span style="color: #FD971F">use</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;complete.obs&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #88846F"># notice how we are ignoring missing data with the last argument</span></span>
<span class="line"><span style="color: #88846F"># plot the correlation matrix</span></span>
<span class="line"><span style="color: #F8F8F2">corrplot.mixed(</span><span style="color: #FD971F">corr</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> M, </span><span style="color: #FD971F">upper</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;ellipse&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">lower</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;number&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">order</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;AOE&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">tl.col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;black&quot;</span><span style="color: #F8F8F2">)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="826" height="722" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_mjAaFbtDf5GeYp6RCppeSA.webp" alt="" class="wp-image-4916" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_mjAaFbtDf5GeYp6RCppeSA.webp 826w, https://analyticadss.com/wp-content/uploads/2022/12/1_mjAaFbtDf5GeYp6RCppeSA-500x437.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_mjAaFbtDf5GeYp6RCppeSA-150x131.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_mjAaFbtDf5GeYp6RCppeSA-768x671.webp 768w" sizes="auto, (max-width: 826px) 100vw, 826px" /></figure>
</div>


<p class="wp-block-paragraph" id="99ac">The figure above shows the correlation coefficients between the different coins. It is easy to read, visually, the darker the color of the ellipse, and the more diagonal the ellipse, the higher the correlation coefficient. Of course you can also just look at the numbers on the bottom left part of the figure to get the value of the coefficient between two coins :). The figure shows how highly correlated the prices of crypto currencies can be. For example XRP and XEM have a correlation coefficient of 0.93. The highest correlation seems to be between BCH and DASH at 0.97 correlation coefficient.</p>



<p class="wp-block-paragraph" id="bcf4">All of the correlation coefficient we see in the above figure are significant, the question is, do these correlations vary over time. To answer this question I will calculate the correlation coefficient between Bitcoin and DASH on a monthly basis, you can do that for any time period, and will show that this coefficient varies greatly over time. Let’s do that</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395828247070312px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# subset the data
btc_dash <- alt_data_sub[, .(Date, BTC, DASH)]
# add a year_month column
btc_dash[, year_month := as.yearmon(Date)]
# calculate the correlation coefficient on montly basis
btc_dash_2 <- btc_dash[, cor(BTC, DASH), by = year_month]
# now plot the correlation coefficient as a function of month and year
plot(btc_dash_2$year_month, btc_dash_2$V1, xlab = &quot;Year-Month&quot;, main = &quot;Correlation Coeff. Between BTC and DASH Over time&quot;
     , ylab = &quot;Correlation Coefficient&quot;, type = &quot;b&quot;, pch = 19, col = ifelse(btc_dash_2$V1 > 0, &quot;blue&quot;, &quot;red&quot;)
     , ylim = c(-1, 1))" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># subset the data</span></span>
<span class="line"><span style="color: #F8F8F2">btc_dash </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> alt_data_sub[, .(Date, BTC, DASH)]</span></span>
<span class="line"><span style="color: #88846F"># add a year_month column</span></span>
<span class="line"><span style="color: #F8F8F2">btc_dash[, year_month </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> as.yearmon(Date)]</span></span>
<span class="line"><span style="color: #88846F"># calculate the correlation coefficient on montly basis</span></span>
<span class="line"><span style="color: #F8F8F2">btc_dash_2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> btc_dash[, </span><span style="color: #66D9EF">cor</span><span style="color: #F8F8F2">(BTC, DASH), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> year_month]</span></span>
<span class="line"><span style="color: #88846F"># now plot the correlation coefficient as a function of month and year</span></span>
<span class="line"><span style="color: #66D9EF">plot</span><span style="color: #F8F8F2">(btc_dash_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">year_month, btc_dash_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">V1, </span><span style="color: #FD971F">xlab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Year-Month&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">main</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Correlation Coeff. Between BTC and DASH Over time&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">     , </span><span style="color: #FD971F">ylab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Correlation Coefficient&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">type</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;b&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">pch</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">19</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">ifelse</span><span style="color: #F8F8F2">(btc_dash_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">V1 </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;blue&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;red&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">     , </span><span style="color: #FD971F">ylim</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">))</span></span></code></pre></div>



<p class="wp-block-paragraph" id="094a">This is interesting, the value of the monthly correlation coefficient between bitcoin and DASH varies between <strong>-0.91</strong>, highly anti-correlated, to <strong>0.98</strong>, highly correlated. And this is why <strong><em>you should never use static correlation metrics with crypto data!</em></strong></p>



<p class="wp-block-paragraph" id="78e2">A good blog post on this same topic is written by <a href="https://twitter.com/tomeff" rel="noreferrer noopener" target="_blank">Tom Fawcett</a> from <a href="https://www.svds.com/" rel="noreferrer noopener" target="_blank">Silicon Valley Data Science</a> and can be found <a href="https://www.svds.com/avoiding-common-mistakes-with-time-series/" rel="noreferrer noopener" target="_blank">here</a>. In his post Tom shows, with simple simulations, why static correlations should never be used with time series.</p>



<h3 class="wp-block-heading" id="2441">Correlation Networks</h3>



<p class="wp-block-paragraph" id="a16e">There is one more plot I would like to make, which is a network plot of the correlations between the different coins. The correlation network plot helps show strengths of correlation between the different coins. Agian, these correlations are time dependent and the figure we will be making will change over time, but I still think it is a good figure to make. Here it is:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395835876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# we will be using the great corrr package for this work
# get the correlation matrix, just like we did before
# build the correlation matrix 
# the code snippets below are taken from, that is a great blog BTW 
# http://www.business-science.io/timeseries-analysis/2017/07/30/tidy-timeseries-analysis-pt-3.html
corr_2 <- correlate(alt_data_sub[, -1])
# make the network plot
# Network plot
corr_net <- corr_2 %>%
  network_plot(colours = c(palette_light()[[2]], &quot;white&quot;, palette_light()[[4]]), legend = TRUE) +
  labs(
    title = &quot;Static Correlations of some Crypto Currencies&quot;,
    subtitle = &quot;2014 through 2018&quot;
  ) +
  theme_tq() +
  theme(legend.position = &quot;bottom&quot;)
corr_net" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># we will be using the great corrr package for this work</span></span>
<span class="line"><span style="color: #88846F"># get the correlation matrix, just like we did before</span></span>
<span class="line"><span style="color: #88846F"># build the correlation matrix </span></span>
<span class="line"><span style="color: #88846F"># the code snippets below are taken from, that is a great blog BTW </span></span>
<span class="line"><span style="color: #88846F"># http://www.business-science.io/timeseries-analysis/2017/07/30/tidy-timeseries-analysis-pt-3.html</span></span>
<span class="line"><span style="color: #F8F8F2">corr_2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> correlate(alt_data_sub[, </span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">])</span></span>
<span class="line"><span style="color: #88846F"># make the network plot</span></span>
<span class="line"><span style="color: #88846F"># Network plot</span></span>
<span class="line"><span style="color: #F8F8F2">corr_net </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> corr_2 </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">  network_plot(</span><span style="color: #FD971F">colours</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(palette_light()[[</span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">]], </span><span style="color: #E6DB74">&quot;white&quot;</span><span style="color: #F8F8F2">, palette_light()[[</span><span style="color: #AE81FF">4</span><span style="color: #F8F8F2">]]), </span><span style="color: #FD971F">legend</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  labs(</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #FD971F">title</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Static Correlations of some Crypto Currencies&quot;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #FD971F">subtitle</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2014 through 2018&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">  ) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme_tq() </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme(</span><span style="color: #FD971F">legend.position</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;bottom&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">corr_net</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="519" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_brRhP1L95H00I6m3TcD-2A.webp" alt="" class="wp-image-4917" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_brRhP1L95H00I6m3TcD-2A.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_brRhP1L95H00I6m3TcD-2A-500x313.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_brRhP1L95H00I6m3TcD-2A-150x94.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_brRhP1L95H00I6m3TcD-2A-768x481.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph" id="9225">The figure above shows a network which measures how strongly correlated the prices of the coins under stugy are. The darker the color of the edge, line, connecting two coins and the closer they are in the network the stronger the correlation between these two coins.</p>



<p class="wp-block-paragraph" id="851b">From the figure, it seems like XMR, LTC, and BTC are in the heart of this network, while BCH seems to be the least correlated with the rest of the coins. Let’s see how the network plot changes between 2017 and 2018:</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395835876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# subset the data and get correlation matrices
corr_2017 <- correlate(alt_data_sub[year(Date) == 2017][, -1])
corr_2018 <- correlate(alt_data_sub[year(Date) == 2018][, -1])
# build Network plots
corr_net_2017 <- corr_2017 %>%
  network_plot(colours = c(palette_light()[[2]], &quot;white&quot;, palette_light()[[4]]), legend = TRUE) +
  labs(
    title = &quot;Static Correlations of some Crypto Currencies&quot;,
    subtitle = &quot;2017&quot;
  ) +
  theme_tq() +
  theme(legend.position = &quot;bottom&quot;)
corr_net_2018 <- corr_2018 %>%
  network_plot(colours = c(palette_light()[[2]], &quot;white&quot;, palette_light()[[4]]), legend = TRUE) +
  labs(
    title = &quot;Static Correlations of some Crypto Currencies&quot;,
    subtitle = &quot;2018&quot;
  ) +
  theme_tq() +
  theme(legend.position = &quot;bottom&quot;)
# combine network plots
cow_net_plots <-plot_grid(corr_net_2017, corr_net_2018, ncol = 2)
title <- ggdraw() + 
    draw_label(label = 'Crypto Correlation Networks',
               fontface = 'bold', size = 18)
cow_out <- plot_grid(title, cow_net_plots, ncol=1, rel_heights=c(0.1, 1))
cow_out" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># subset the data and get correlation matrices</span></span>
<span class="line"><span style="color: #F8F8F2">corr_2017 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> correlate(alt_data_sub[year(Date) </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2017</span><span style="color: #F8F8F2">][, </span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">])</span></span>
<span class="line"><span style="color: #F8F8F2">corr_2018 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> correlate(alt_data_sub[year(Date) </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2018</span><span style="color: #F8F8F2">][, </span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">])</span></span>
<span class="line"><span style="color: #88846F"># build Network plots</span></span>
<span class="line"><span style="color: #F8F8F2">corr_net_2017 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> corr_2017 </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">  network_plot(</span><span style="color: #FD971F">colours</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(palette_light()[[</span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">]], </span><span style="color: #E6DB74">&quot;white&quot;</span><span style="color: #F8F8F2">, palette_light()[[</span><span style="color: #AE81FF">4</span><span style="color: #F8F8F2">]]), </span><span style="color: #FD971F">legend</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  labs(</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #FD971F">title</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Static Correlations of some Crypto Currencies&quot;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #FD971F">subtitle</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2017&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">  ) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme_tq() </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme(</span><span style="color: #FD971F">legend.position</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;bottom&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">corr_net_2018 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> corr_2018 </span><span style="color: #F92672">%>%</span></span>
<span class="line"><span style="color: #F8F8F2">  network_plot(</span><span style="color: #FD971F">colours</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(palette_light()[[</span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">]], </span><span style="color: #E6DB74">&quot;white&quot;</span><span style="color: #F8F8F2">, palette_light()[[</span><span style="color: #AE81FF">4</span><span style="color: #F8F8F2">]]), </span><span style="color: #FD971F">legend</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  labs(</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #FD971F">title</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Static Correlations of some Crypto Currencies&quot;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">    </span><span style="color: #FD971F">subtitle</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;2018&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">  ) </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme_tq() </span><span style="color: #F92672">+</span></span>
<span class="line"><span style="color: #F8F8F2">  theme(</span><span style="color: #FD971F">legend.position</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;bottom&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #88846F"># combine network plots</span></span>
<span class="line"><span style="color: #F8F8F2">cow_net_plots </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2">plot_grid(corr_net_2017, corr_net_2018, </span><span style="color: #FD971F">ncol</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">2</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">title </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggdraw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> </span></span>
<span class="line"><span style="color: #F8F8F2">    draw_label(</span><span style="color: #FD971F">label</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&#39;Crypto Correlation Networks&#39;</span><span style="color: #F8F8F2">,</span></span>
<span class="line"><span style="color: #F8F8F2">               </span><span style="color: #FD971F">fontface</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&#39;bold&#39;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">size</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">18</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">cow_out </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> plot_grid(title, cow_net_plots, </span><span style="color: #FD971F">ncol</span><span style="color: #F92672">=</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">rel_heights</span><span style="color: #F92672">=</span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #AE81FF">0.1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">cow_out</span></span></code></pre></div>



<p class="wp-block-paragraph" id="058c">As can be seen, the correlation networks do change overtime. This is not news since we already saw in the previous section that the value of the correlation varies overtime (I know we showed this to be true for the BTC-DASH air but we’ll show that this is true for the rest of the coins in the next section.)</p>



<h3 class="wp-block-heading" id="eace">Daily Returns Correlations</h3>



<p class="wp-block-paragraph" id="ffb8">Let’s look at the percentage daily changes of the altcoins between 2015 and today.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# plot the percent changes
p <- ggplot(alt_data[Date > ymd(&quot;2015-01-01&quot;)], aes(x = Date, y =  (100*pct_change), col = pair_usdt)) + geom_line()
p <- p + ggtitle(&quot;% Daily Returns over time&quot;) + ylab(&quot;Daily Return (%)&quot;) 
p <- p + theme_bw() + guides(col=guide_legend(title=&quot;Coin Pair&quot;))
ggplotly(p)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># plot the percent changes</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[Date </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> ymd(</span><span style="color: #E6DB74">&quot;2015-01-01&quot;</span><span style="color: #F8F8F2">)], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  (</span><span style="color: #AE81FF">100</span><span style="color: #F92672">*</span><span style="color: #F8F8F2">pct_change), </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line()</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ggtitle(</span><span style="color: #E6DB74">&quot;% Daily Returns over time&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Daily Return (%)&quot;</span><span style="color: #F8F8F2">) </span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> guides(</span><span style="color: #FD971F">col</span><span style="color: #F92672">=</span><span style="color: #F8F8F2">guide_legend(</span><span style="color: #FD971F">title</span><span style="color: #F92672">=</span><span style="color: #E6DB74">&quot;Coin Pair&quot;</span><span style="color: #F8F8F2">))</span></span>
<span class="line"><span style="color: #F8F8F2">ggplotly(p)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="492" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_DF6wejykPy9QxICaxujyWg.webp" alt="" class="wp-image-4918" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_DF6wejykPy9QxICaxujyWg.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_DF6wejykPy9QxICaxujyWg-500x297.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_DF6wejykPy9QxICaxujyWg-150x89.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_DF6wejykPy9QxICaxujyWg-768x456.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /><figcaption class="wp-element-caption">Percentage daily returns for some coins</figcaption></figure>
</div>


<p class="wp-block-paragraph" id="def1">Although the above figure is very cluttered, one thing is certain, percentage daily returns vary greatly for crypto. Let’s try to make this figure a bit easier to read</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="p <- ggplot(alt_data[Date > ymd(&quot;2015-01-01&quot;)], aes(x = Date, y =  (100*pct_change), col = pair_usdt)) + geom_line() + facet_wrap(~ pair_usdt)
p <- p + ggtitle(&quot;Percentage Daily Returns over time&quot;) + ylab(&quot;Daily Return (%)&quot;) 
p <- p + theme_bw() + theme(legend.position=&quot;none&quot;) 
ggplotly(p)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[Date </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> ymd(</span><span style="color: #E6DB74">&quot;2015-01-01&quot;</span><span style="color: #F8F8F2">)], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  (</span><span style="color: #AE81FF">100</span><span style="color: #F92672">*</span><span style="color: #F8F8F2">pct_change), </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> facet_wrap(</span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> pair_usdt)</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ggtitle(</span><span style="color: #E6DB74">&quot;Percentage Daily Returns over time&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Daily Return (%)&quot;</span><span style="color: #F8F8F2">) </span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> p </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme(</span><span style="color: #FD971F">legend.position</span><span style="color: #F92672">=</span><span style="color: #E6DB74">&quot;none&quot;</span><span style="color: #F8F8F2">) </span></span>
<span class="line"><span style="color: #F8F8F2">ggplotly(p)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="563" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_frUEGj6qn8sgyJOH87UsWA.webp" alt="" class="wp-image-4919" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_frUEGj6qn8sgyJOH87UsWA.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_frUEGj6qn8sgyJOH87UsWA-500x340.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_frUEGj6qn8sgyJOH87UsWA-150x102.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_frUEGj6qn8sgyJOH87UsWA-768x522.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /><figcaption class="wp-element-caption">Percentage daily returns for some coins</figcaption></figure>
</div>


<p class="wp-block-paragraph" id="dc7b">It is kind of surprising that Bitcoin has the least variability in daily returns. The nice big spike around April 2nd 2017 shows a percentage daily return of ~88% for XRP, this is the highest daily return I have seen!</p>



<p class="wp-block-paragraph" id="1715">Let’s look at the percentage daily returns for Bitcoin and Litecoin since they seem to be highly correlated. I am going to zoom in on the time period 2016–02–01 and 2016–05–01.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70486307144165px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="start_date <- ymd(&quot;2016-02-01&quot;)
end_date <- ymd(&quot;2016-05-01&quot;)
p <- ggplot(alt_data[pair_usdt %like% &quot;BTC|LTC&quot; & Date > start_date & Date < end_date], aes(x = Date, y =  (100*pct_change), col = pair_usdt)) + geom_line() + theme_bw() + ylab(&quot;Price (USD)&quot;)
p" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">start_date </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ymd(</span><span style="color: #E6DB74">&quot;2016-02-01&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">end_date </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ymd(</span><span style="color: #E6DB74">&quot;2016-05-01&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">p </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> ggplot(alt_data[pair_usdt </span><span style="color: #F92672">%like%</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BTC|LTC&quot;</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">&</span><span style="color: #F8F8F2"> Date </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> start_date </span><span style="color: #F92672">&</span><span style="color: #F8F8F2"> Date </span><span style="color: #F92672"><</span><span style="color: #F8F8F2"> end_date], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> Date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2">  (</span><span style="color: #AE81FF">100</span><span style="color: #F92672">*</span><span style="color: #F8F8F2">pct_change), </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> pair_usdt)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Price (USD)&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">p</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="480" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_f0yWVUQ02uKmgNS3Euz8cA.webp" alt="" class="wp-image-4920" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_f0yWVUQ02uKmgNS3Euz8cA.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_f0yWVUQ02uKmgNS3Euz8cA-500x290.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_f0yWVUQ02uKmgNS3Euz8cA-150x87.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_f0yWVUQ02uKmgNS3Euz8cA-768x445.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /><figcaption class="wp-element-caption">Daily Return for Bitcoin and LTC in 2018</figcaption></figure>
</div>


<p class="wp-block-paragraph" id="db3e">The figure shows clear correlation between the daily returns of Bitcoin and litcoin. It also shows that these correlations can vary overtime. In fact, let’s look at how these correlations vary overtime.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395828247070312px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# these steps are similar to the ones in the previous section, the only differnect is that now we are looking at the percentage change in price difference on daily basis instead of the actual price
# subset data, only keep the date, the pair, and the price
alt_data_sub_pct <- alt_data[, .(Date, pair_usdt, pct_change)]
# convert to wide format 
alt_data_sub_pct <- spread(data = alt_data_sub_pct, key = &quot;pair_usdt&quot;, value = &quot;pct_change&quot;)
# clean column names
setnames(alt_data_sub_pct, gsub(&quot;USDT_&quot;, &quot;&quot;, colnames(alt_data_sub)))
# subset the data
btc_ltc <- alt_data_sub_pct[, .(Date, BTC, LTC)]
# add a year_month column
btc_ltc[, year_month := as.yearmon(Date)]
# calculate the correlation coefficient on montly basis
btc_ltc_2 <- btc_ltc[, cor(BTC, LTC), by = year_month]
# now plot the correlation coefficient as a function of month and year
plot(btc_ltc_2$year_month, btc_ltc_2$V1, xlab = &quot;Year-Month&quot;, main = &quot;Correlation Coeff. Between Daily Returns of BTC and LTC Over time&quot;
     , ylab = &quot;Correlation Coefficient&quot;, type = &quot;b&quot;, pch = 19, col = ifelse(btc_ltc_2$V1 > 0, &quot;blue&quot;, &quot;red&quot;)
     , ylim = c(-1, 1))" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># these steps are similar to the ones in the previous section, the only differnect is that now we are looking at the percentage change in price difference on daily basis instead of the actual price</span></span>
<span class="line"><span style="color: #88846F"># subset data, only keep the date, the pair, and the price</span></span>
<span class="line"><span style="color: #F8F8F2">alt_data_sub_pct </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> alt_data[, .(Date, pair_usdt, pct_change)]</span></span>
<span class="line"><span style="color: #88846F"># convert to wide format </span></span>
<span class="line"><span style="color: #F8F8F2">alt_data_sub_pct </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> spread(</span><span style="color: #FD971F">data</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> alt_data_sub_pct, </span><span style="color: #FD971F">key</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;pair_usdt&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">value</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;pct_change&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #88846F"># clean column names</span></span>
<span class="line"><span style="color: #F8F8F2">setnames(alt_data_sub_pct, </span><span style="color: #66D9EF">gsub</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;USDT_&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #66D9EF">colnames</span><span style="color: #F8F8F2">(alt_data_sub)))</span></span>
<span class="line"><span style="color: #88846F"># subset the data</span></span>
<span class="line"><span style="color: #F8F8F2">btc_ltc </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> alt_data_sub_pct[, .(Date, BTC, LTC)]</span></span>
<span class="line"><span style="color: #88846F"># add a year_month column</span></span>
<span class="line"><span style="color: #F8F8F2">btc_ltc[, year_month </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> as.yearmon(Date)]</span></span>
<span class="line"><span style="color: #88846F"># calculate the correlation coefficient on montly basis</span></span>
<span class="line"><span style="color: #F8F8F2">btc_ltc_2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> btc_ltc[, </span><span style="color: #66D9EF">cor</span><span style="color: #F8F8F2">(BTC, LTC), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> year_month]</span></span>
<span class="line"><span style="color: #88846F"># now plot the correlation coefficient as a function of month and year</span></span>
<span class="line"><span style="color: #66D9EF">plot</span><span style="color: #F8F8F2">(btc_ltc_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">year_month, btc_ltc_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">V1, </span><span style="color: #FD971F">xlab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Year-Month&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">main</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Correlation Coeff. Between Daily Returns of BTC and LTC Over time&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">     , </span><span style="color: #FD971F">ylab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Correlation Coefficient&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">type</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;b&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">pch</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">19</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">ifelse</span><span style="color: #F8F8F2">(btc_ltc_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">V1 </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;blue&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;red&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">     , </span><span style="color: #FD971F">ylim</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">))</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="519" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_YozYWrtQM_qHssvavSNHFw-1.webp" alt="" class="wp-image-4922" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_YozYWrtQM_qHssvavSNHFw-1.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_YozYWrtQM_qHssvavSNHFw-1-500x313.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_YozYWrtQM_qHssvavSNHFw-1-150x94.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_YozYWrtQM_qHssvavSNHFw-1-768x481.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph" id="62de">Interesting, the correlation between the percentage daily change of the prices for bitcoin and litecoin is much more on the positive side, we only have one month in which this correlation is negtive, barely negative. This is a lot different than what we saw between Bitcoin and DASH, but that was for the actual prices and not the daily returns. Let’s redo this plot but his time for the actual prices for bitcoin and litecoin, just like we did with DASH.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395828247070312px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# subset the data
btc_ltc_price <- alt_data_sub[, .(Date, BTC, LTC)]
# add a year_month column
btc_ltc_price[, year_month := as.yearmon(Date)]
# calculate the correlation coefficient on montly basis
btc_ltc_price_2 <- btc_ltc_price[, cor(BTC, LTC), by = year_month]
# now plot the correlation coefficient as a function of month and year
plot(btc_ltc_price_2$year_month, btc_ltc_price_2$V1, xlab = &quot;Year-Month&quot;, main = &quot;Correlation Coeff. Between BTC and Litecoin Over time&quot;
     , ylab = &quot;Correlation Coefficient&quot;, type = &quot;b&quot;, pch = 19, col = ifelse(btc_ltc_price_2$V1 > 0, &quot;blue&quot;, &quot;red&quot;)
     , ylim = c(-1, 1))" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># subset the data</span></span>
<span class="line"><span style="color: #F8F8F2">btc_ltc_price </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> alt_data_sub[, .(Date, BTC, LTC)]</span></span>
<span class="line"><span style="color: #88846F"># add a year_month column</span></span>
<span class="line"><span style="color: #F8F8F2">btc_ltc_price[, year_month </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> as.yearmon(Date)]</span></span>
<span class="line"><span style="color: #88846F"># calculate the correlation coefficient on montly basis</span></span>
<span class="line"><span style="color: #F8F8F2">btc_ltc_price_2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> btc_ltc_price[, </span><span style="color: #66D9EF">cor</span><span style="color: #F8F8F2">(BTC, LTC), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> year_month]</span></span>
<span class="line"><span style="color: #88846F"># now plot the correlation coefficient as a function of month and year</span></span>
<span class="line"><span style="color: #66D9EF">plot</span><span style="color: #F8F8F2">(btc_ltc_price_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">year_month, btc_ltc_price_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">V1, </span><span style="color: #FD971F">xlab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Year-Month&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">main</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Correlation Coeff. Between BTC and Litecoin Over time&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">     , </span><span style="color: #FD971F">ylab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Correlation Coefficient&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">type</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;b&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">pch</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">19</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">ifelse</span><span style="color: #F8F8F2">(btc_ltc_price_2</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">V1 </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;blue&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;red&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">     , </span><span style="color: #FD971F">ylim</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #F92672">-</span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">, </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">))</span></span></code></pre></div>



<p class="wp-block-paragraph" id="494c">Trends in the correlations of the daily return of bitcoin and litecoin on mothly basis, boy this is a mouth full, are very similar to those for the prices as we saw in the previous figure.</p>



<p class="wp-block-paragraph" id="2a55">In the next post we’ll do something more statistically sound, rolling correlations.</p>



<p class="wp-block-paragraph">Read More blogs in AnalyticaDSS Blogs here : <a href="https://analyticadss.com/blog">BLOGS</a></p>



<p class="wp-block-paragraph">Read More blogs in Medium : <a href="https://medium.com/@aousabdo">Medium Blogs</a></p>



<p class="wp-block-paragraph">Read More blogs in R-bloggers : <a href="https://www.r-bloggers.com/">https://www.r-bloggers.com</a></p>
<p>The post <a href="https://analyticadss.com/analyzing-cryptocurrency-markets-using-r-part-2/">Analyzing Crypto Market using R — Part 2</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Analyzing Crypto Markets using R — Part 1</title>
		<link>https://analyticadss.com/analyzing-cryptocurrency-markets-using-r-part-1/</link>
		
		<dc:creator><![CDATA[Aous Abdo]]></dc:creator>
		<pubDate>Sat, 01 Dec 2018 10:19:38 +0000</pubDate>
				<category><![CDATA[Data Science]]></category>
		<category><![CDATA[R Statistical Language]]></category>
		<category><![CDATA[Bitcoin]]></category>
		<category><![CDATA[Cryptocurrency]]></category>
		<category><![CDATA[R]]></category>
		<guid isPermaLink="false">https://analyticadss.com/?p=4897</guid>

					<description><![CDATA[<p>Downloading and Processing Crypto Data with R Analyzing crypto market Aous Abdo, WWW.ANALYTICADSS.COMAn interactive version of this post can be found here. No doubt that crypto currencies with all the promises they bring, both financially and otherwise, are only here to stay. As a data scientist interested in data and numbers, I thought it would be nice [&#8230;]</p>
<p>The post <a href="https://analyticadss.com/analyzing-cryptocurrency-markets-using-r-part-1/">Analyzing Crypto Markets using R — Part 1</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></description>
										<content:encoded><![CDATA[
<h2 class="wp-block-heading" id="e5da">Downloading and Processing Crypto Data with R</h2>



<p class="wp-block-paragraph">Analyzing crypto market</p>



<p class="wp-block-paragraph"><a href="https://medium.com/u/4f20dbfad286?source=post_page-----9e0d1bff7c63--------------------------------" rel="noreferrer noopener" target="_blank">Aous Abdo</a>, <a href="http://www.analyticadss.com/" rel="noreferrer noopener" target="_blank">WWW.ANALYTICADSS.COM</a><br>An interactive version of this post can be found <a href="https://analyticadss.com/adss_blog/crypto_notebook_part1.nb.html" rel="noreferrer noopener" target="_blank">here</a>.</p>



<p class="wp-block-paragraph" id="36bb">No doubt that crypto currencies with all the promises they bring, both financially and otherwise, are only here to stay. As a data scientist interested in data and numbers, I thought it would be nice to take a look at some crypto currencies with my favorite tool, <a href="https://cran.r-project.org/" target="_blank" rel="noreferrer noopener"><strong>R</strong></a>.</p>



<h2 class="wp-block-heading" id="65a0">R Libraries</h2>



<p class="wp-block-paragraph" id="33d5">Below is a list of <strong>R</strong> libraries we will be using to help us with our analysis. Not all of them are necessary but they all will make our life easier.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395828247070312px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="library(PoloniexR)
library(data.table)
library(lubridate)
library(Quandl)
library(plyr)
library(stringr)
library(ggplot2)
library(plotly)
library(janitor)
library(quantmod)
library(pryr)
library(corrplot)
library(PerformanceAnalytics)
library(tidyr)
library(MLmetrics)
library(readr)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(PoloniexR)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(data.table)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(lubridate)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(Quandl)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(plyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(stringr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(ggplot2)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(plotly)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(janitor)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(quantmod)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(pryr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(corrplot)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(PerformanceAnalytics)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(tidyr)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(MLmetrics)</span></span>
<span class="line"><span style="color: #66D9EF">library</span><span style="color: #F8F8F2">(readr)</span></span></code></pre></div>



<h2 class="wp-block-heading" id="f526">Getting the Data</h2>



<h2 class="wp-block-heading" id="2c1e">1. PoloniexR Package</h2>



<p class="wp-block-paragraph" id="69f7">The easiest way to get current and historical data for <strong>cyrpto </strong>currencies is by using the <strong><a href="https://cran.r-project.org/web/packages/PoloniexR/index.html" target="_blank" rel="noreferrer noopener">PoloniexR</a> </strong>developed by <em>Vermeir Jellen</em>. <em>Vermeir Jellen </em>gives a good tutorial on how to start with his package <a href="https://github.com/VermeirJellen/PoloniexR" target="_blank" rel="noreferrer noopener"><strong>here</strong></a>. The <a href="https://poloniex.com/exchange" target="_blank" rel="noreferrer noopener"><strong>Poloniex exchange</strong></a> includes many coins but not all. For missiong coins on Poloniex, one can scrape the <a href="http://rstudio-pubs-static.s3.amazonaws.com/www.coinmarketcap.com" target="_blank" rel="noreferrer noopener">coinmarketcap</a> page, an example is given here.</p>



<h2 class="wp-block-heading" id="34de">2. Quandl</h2>



<p class="wp-block-paragraph" id="11e7"><strong><a href="https://www.quandl.com/" target="_blank" rel="noreferrer noopener">Quandl</a> </strong>is my go to place for any financial data. Their free-tier API has lots of good data one can use. <strong>Quandl </strong>offers data from multiple exchanges. Locating crypto data on <strong>Quandl </strong>is not straight forward. After spending few hours on their site I found out that most of the crypto data can be found <a href="https://www.quandl.com/data/BITFINEX-Bitfinex" target="_blank" rel="noreferrer noopener">here</a></p>



<p class="wp-block-paragraph" id="5a62">First, let’s take a look at different exchange data for Bitcoin using <strong>Quandl</strong>. We will download and plot historical bitcoin data from the following exchanges Kraken, <strong>Coinbase</strong>, <strong>Bitstamp</strong>, and ITBIT</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395835876464844px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# enable your Quandl API key
my_quandl_api_key <- read_file(&quot;../../quandl_api_key.txt&quot;)
Quandl.api_key(my_quandl_api_key)
# function to download quandl data
get_quandl_data <- function(data_source = &quot;BITFINEX&quot;
                            , pair = 'btcusd'
                            , ...){
  
  # make sure the user supplied the correct data_source
  if(toupper(data_source) != &quot;BITFINEX&quot;) stop(&quot;data source supplied is wrong...&quot;)
  # quandl is case sensitive, all codes have to be upper case
  pair <- toupper(pair)
  tmp <- NA
  try(tmp <- Quandl(code = toupper(paste(data_source, pair, sep = &quot;/&quot;)), ...), silent = TRUE)
  return(tmp)
}
# get btc data from different exchanges
  exchange_data <- list()
  
  exchanges <- c('KRAKENUSD','COINBASEUSD','BITSTAMPUSD','ITBITUSD')
  
  for (i in exchanges){
    exchange_data[[i]] <- Quandl(paste0('BCHARTS/', i))
  }" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># enable your Quandl API key</span></span>
<span class="line"><span style="color: #F8F8F2">my_quandl_api_key </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> read_file(</span><span style="color: #E6DB74">&quot;../../quandl_api_key.txt&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">Quandl.api_key(my_quandl_api_key)</span></span>
<span class="line"><span style="color: #88846F"># function to download quandl data</span></span>
<span class="line"><span style="color: #A6E22E">get_quandl_data</span><span style="color: #F8F8F2"> </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">function</span><span style="color: #F8F8F2">(</span><span style="color: #FD971F">data_source</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BITFINEX&quot;</span></span>
<span class="line"><span style="color: #F8F8F2">                            , </span><span style="color: #FD971F">pair</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&#39;btcusd&#39;</span></span>
<span class="line"><span style="color: #F8F8F2">                            , </span><span style="color: #F92672">...</span><span style="color: #F8F8F2">){</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># make sure the user supplied the correct data_source</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">if</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">toupper</span><span style="color: #F8F8F2">(data_source) </span><span style="color: #F92672">!=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BITFINEX&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #66D9EF">stop</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;data source supplied is wrong...&quot;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #88846F"># quandl is case sensitive, all codes have to be upper case</span></span>
<span class="line"><span style="color: #F8F8F2">  pair </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">toupper</span><span style="color: #F8F8F2">(pair)</span></span>
<span class="line"><span style="color: #F8F8F2">  tmp </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">NA</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #66D9EF">try</span><span style="color: #F8F8F2">(tmp </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> Quandl(</span><span style="color: #FD971F">code</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">toupper</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">paste</span><span style="color: #F8F8F2">(data_source, pair, </span><span style="color: #FD971F">sep</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;/&quot;</span><span style="color: #F8F8F2">)), </span><span style="color: #F92672">...</span><span style="color: #F8F8F2">), </span><span style="color: #FD971F">silent</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">TRUE</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">return</span><span style="color: #F8F8F2">(tmp)</span></span>
<span class="line"><span style="color: #F8F8F2">}</span></span>
<span class="line"><span style="color: #88846F"># get btc data from different exchanges</span></span>
<span class="line"><span style="color: #F8F8F2">  exchange_data </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">list</span><span style="color: #F8F8F2">()</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  exchanges </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">c</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&#39;KRAKENUSD&#39;</span><span style="color: #F8F8F2">,</span><span style="color: #E6DB74">&#39;COINBASEUSD&#39;</span><span style="color: #F8F8F2">,</span><span style="color: #E6DB74">&#39;BITSTAMPUSD&#39;</span><span style="color: #F8F8F2">,</span><span style="color: #E6DB74">&#39;ITBITUSD&#39;</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">  </span></span>
<span class="line"><span style="color: #F8F8F2">  </span><span style="color: #F92672">for</span><span style="color: #F8F8F2"> (i </span><span style="color: #F92672">in</span><span style="color: #F8F8F2"> exchanges){</span></span>
<span class="line"><span style="color: #F8F8F2">    exchange_data[[i]] </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> Quandl(</span><span style="color: #66D9EF">paste0</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&#39;BCHARTS/&#39;</span><span style="color: #F8F8F2">, i))</span></span>
<span class="line"><span style="color: #F8F8F2">  }</span></span></code></pre></div>



<p class="wp-block-paragraph" id="935a">So We need to convert this list of BTC prices from different exchanges into a <strong>dataframe </strong>and put them all in one data frame so we can plot them.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.7048797607421875px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# put them all in one dataframe to plot in ggplot2
btc_usd <- do.call(&quot;rbind&quot;, exchange_data)
btc_usd$exchange <- row.names(btc_usd)
btc_usd <- as.data.table(btc_usd)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># put them all in one dataframe to plot in ggplot2</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">do.call</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;rbind&quot;</span><span style="color: #F8F8F2">, exchange_data)</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd</span><span style="color: #F92672">$</span><span style="color: #F8F8F2">exchange </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">row.names</span><span style="color: #F8F8F2">(btc_usd)</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> as.data.table(btc_usd)</span></span></code></pre></div>



<p class="wp-block-paragraph">We also need to do some minor cleaning, so let’s do that. We also need to get rid of rows of data with 0 weighted price.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# some data cleaning
btc_usd[, exchange := as.factor(str_extract(exchange, &quot;[A-Z]+&quot;))]
btc_usd <- clean_names(btc_usd)
btc_usd <- btc_usd[weighted_price > 0]
# set datatable key to be the date column
setkey(btc_usd, date)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># some data cleaning</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd[, exchange </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">as.factor</span><span style="color: #F8F8F2">(str_extract(exchange, </span><span style="color: #E6DB74">&quot;[A-Z]+&quot;</span><span style="color: #F8F8F2">))]</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> clean_names(btc_usd)</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> btc_usd[weighted_price </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2">]</span></span>
<span class="line"><span style="color: #88846F"># set datatable key to be the date column</span></span>
<span class="line"><span style="color: #F8F8F2">setkey(btc_usd, date)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="4b66">Let’s take a look at the data table we just made.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="head(btc_usd)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">head</span><span style="color: #F8F8F2">(btc_usd)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="245" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_2OI9hXg9BGzha6OQ-KkIsg.webp" alt="" class="wp-image-4898" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_2OI9hXg9BGzha6OQ-KkIsg.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_2OI9hXg9BGzha6OQ-KkIsg-500x148.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_2OI9hXg9BGzha6OQ-KkIsg-150x44.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_2OI9hXg9BGzha6OQ-KkIsg-768x227.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph" id="ed3e">The data includes 10 columns, the date, <strong>OCHL </strong>prices, volumes in USD and BTC, the weighted price, and the exchange. I wish I had bought me some <em>bitcoine </em>back in 2011!!!</p>



<p class="wp-block-paragraph" id="5c06">Now we’ll look at the price of bitcoin and color code it by exchange.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704833984375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="ggplot(btc_usd, aes(x = date, y = weighted_price, col = exchange)) + geom_line() + theme_bw()" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">ggplot(btc_usd, aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> weighted_price, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> exchange)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw()</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="497" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_pKZF3UsW8ntNBCpd7RhgTA.webp" alt="" class="wp-image-4899" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_pKZF3UsW8ntNBCpd7RhgTA.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_pKZF3UsW8ntNBCpd7RhgTA-500x300.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_pKZF3UsW8ntNBCpd7RhgTA-150x90.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_pKZF3UsW8ntNBCpd7RhgTA-768x461.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<h2 class="wp-block-heading" id="7d74">Arbitrage</h2>



<p class="wp-block-paragraph" id="0a27">It appears the prices of <strong>btc </strong>on different exchanges are fairly consisant. But this is an artifact in the figure since we are covering several orders of magnitudes during the timeline we selected. To better see any price differenes we need to zoon in on the figure. Let’s zoom in on, say the first month of <strong>2018</strong>, were we had the <strong>ATH </strong>for all coins. This will enable us to better see any differences in prices.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70489501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="ggplot(btc_usd[date >= ymd(&quot;2018-01-01&quot;) & date <= ymd(&quot;2018-01-31&quot;)], aes(x = date, y = weighted_price, col = exchange)) + geom_line() + theme_bw()" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">ggplot(btc_usd[date </span><span style="color: #F92672">>=</span><span style="color: #F8F8F2"> ymd(</span><span style="color: #E6DB74">&quot;2018-01-01&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">&</span><span style="color: #F8F8F2"> date </span><span style="color: #F92672"><=</span><span style="color: #F8F8F2"> ymd(</span><span style="color: #E6DB74">&quot;2018-01-31&quot;</span><span style="color: #F8F8F2">)], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> weighted_price, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> exchange)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_line() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw()</span></span></code></pre></div>



<p class="wp-block-paragraph">There are obvious differences in prices between the exchanges. Differences seem to vary over time as well. Actually it will be interesting to look at the maxiumum price differences as a function of time, let’s do that</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70489501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# first let's find the minimum price by date
btc_usd[, min_price := min(weighted_price), by = date]
# now we need to find the price difference between the price for each day and the minimum price for that day
# but since the price of bitcoin varies a lot for the time period under study, we need to normalize the price difference
# to do that we will just divide by the median price for each day
btc_usd[, price_diff := 100*(weighted_price - min_price)/median(weighted_price), by = (date)]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># first let&#39;s find the minimum price by date</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd[, min_price </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">min</span><span style="color: #F8F8F2">(weighted_price), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> date]</span></span>
<span class="line"><span style="color: #88846F"># now we need to find the price difference between the price for each day and the minimum price for that day</span></span>
<span class="line"><span style="color: #88846F"># but since the price of bitcoin varies a lot for the time period under study, we need to normalize the price difference</span></span>
<span class="line"><span style="color: #88846F"># to do that we will just divide by the median price for each day</span></span>
<span class="line"><span style="color: #F8F8F2">btc_usd[, price_diff </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">100</span><span style="color: #F92672">*</span><span style="color: #F8F8F2">(weighted_price </span><span style="color: #F92672">-</span><span style="color: #F8F8F2"> min_price)</span><span style="color: #F92672">/</span><span style="color: #66D9EF">median</span><span style="color: #F8F8F2">(weighted_price), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> (date)]</span></span></code></pre></div>



<p class="wp-block-paragraph">Now we have a new column which gives us the percentage of price differences for each day normalized to the median price for each day. Let’s take a look at the new table.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="tail(btc_usd)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">tail</span><span style="color: #F8F8F2">(btc_usd)</span></span></code></pre></div>



<p class="wp-block-paragraph" id="574b">The reason I looked at the newer dates is that prior to <strong>2014 </strong>we only have data for one exchange, so all the price differences were <strong>0</strong>. Let’s take a look at the price differences as a function of time.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704833984375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# first we need to create a new data table with only the maxiumum prices per day
tmp <- btc_usd[, price_diff := max(price_diff), by = date]
# This will help us visualize overlapping points
MyGray <- rgb(t(col2rgb(&quot;black&quot;)), alpha=50, maxColorValue=255)
tmp[, plot(date, price_diff, pch=20, col = MyGray, xlab = &quot;Date&quot;, ylab = &quot;Maximum of Price Difference (%)&quot;)]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># first we need to create a new data table with only the maxiumum prices per day</span></span>
<span class="line"><span style="color: #F8F8F2">tmp </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> btc_usd[, price_diff </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">max</span><span style="color: #F8F8F2">(price_diff), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> date]</span></span>
<span class="line"><span style="color: #88846F"># This will help us visualize overlapping points</span></span>
<span class="line"><span style="color: #F8F8F2">MyGray </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">rgb</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">t</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">col2rgb</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;black&quot;</span><span style="color: #F8F8F2">)), </span><span style="color: #FD971F">alpha</span><span style="color: #F92672">=</span><span style="color: #AE81FF">50</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">maxColorValue</span><span style="color: #F92672">=</span><span style="color: #AE81FF">255</span><span style="color: #F8F8F2">)</span></span>
<span class="line"><span style="color: #F8F8F2">tmp[, </span><span style="color: #66D9EF">plot</span><span style="color: #F8F8F2">(date, price_diff, </span><span style="color: #FD971F">pch</span><span style="color: #F92672">=</span><span style="color: #AE81FF">20</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> MyGray, </span><span style="color: #FD971F">xlab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Date&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">ylab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Maximum of Price Difference (%)&quot;</span><span style="color: #F8F8F2">)]</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="450" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_OCmdv1nAXyXQXtM-V_p4rw.webp" alt="" class="wp-image-4900" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_OCmdv1nAXyXQXtM-V_p4rw.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_OCmdv1nAXyXQXtM-V_p4rw-500x272.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_OCmdv1nAXyXQXtM-V_p4rw-150x82.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_OCmdv1nAXyXQXtM-V_p4rw-768x417.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph" id="8ee2">Let’s show the plot with log scale on <strong>y axis</strong>. Let’s also discard dates with zero price differences.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704833984375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="tmp[price_diff > 0, plot(date, price_diff, pch=20, col = MyGray, log = &quot;y&quot; , xlab = &quot;Date&quot;, ylab = &quot;Maximum of Price Difference (%)&quot;)]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">tmp[price_diff </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2">, </span><span style="color: #66D9EF">plot</span><span style="color: #F8F8F2">(date, price_diff, </span><span style="color: #FD971F">pch</span><span style="color: #F92672">=</span><span style="color: #AE81FF">20</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">col</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> MyGray, </span><span style="color: #FD971F">log</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;y&quot;</span><span style="color: #F8F8F2"> , </span><span style="color: #FD971F">xlab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Date&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">ylab</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;Maximum of Price Difference (%)&quot;</span><span style="color: #F8F8F2">)]</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="464" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_hW8T7SyWYrpL4FWYN7Y5sg.webp" alt="" class="wp-image-4901" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_hW8T7SyWYrpL4FWYN7Y5sg.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_hW8T7SyWYrpL4FWYN7Y5sg-500x280.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_hW8T7SyWYrpL4FWYN7Y5sg-150x84.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_hW8T7SyWYrpL4FWYN7Y5sg-768x430.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph">As one can see from the figure above, the bulk of the maximum difference in bitcoin prices between the different exchanges is in the <strong>0.5–2.0%</strong> range. It is also interesting to see that the differences in prices seem to have come down between 2014 and 2016, but they seem to go up starting in <strong>2017</strong>. Let’s fit a gam model to see what we get.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.703125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# we'll use ggplot and fit a gam smooth line
ggplot(tmp[price_diff > 0 ], aes(x = date, y = price_diff)) + geom_point(alpha = 0.2, shape = 16, size = 3, show.legend = FALSE) + scale_y_continuous(trans='log10') + geom_smooth(method = &quot;gam&quot;, formula = y ~ s(x, bs = &quot;cs&quot;)) + theme_bw() + xlab(&quot;Date&quot;) + ylab(&quot;Maximum of Price Difference (%)&quot;)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># we&#39;ll use ggplot and fit a gam smooth line</span></span>
<span class="line"><span style="color: #F8F8F2">ggplot(tmp[price_diff </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2"> ], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> date, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> price_diff)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_point(</span><span style="color: #FD971F">alpha</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0.2</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">shape</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">16</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">size</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">3</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">show.legend</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">FALSE</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> scale_y_continuous(</span><span style="color: #FD971F">trans</span><span style="color: #F92672">=</span><span style="color: #E6DB74">&#39;log10&#39;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_smooth(</span><span style="color: #FD971F">method</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;gam&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">formula</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> y </span><span style="color: #F92672">~</span><span style="color: #F8F8F2"> s(x, </span><span style="color: #FD971F">bs</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;cs&quot;</span><span style="color: #F8F8F2">)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> xlab(</span><span style="color: #E6DB74">&quot;Date&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Maximum of Price Difference (%)&quot;</span><span style="color: #F8F8F2">)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="492" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_6-thiHBpH1oNCkiqPmFugA.webp" alt="" class="wp-image-4902" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_6-thiHBpH1oNCkiqPmFugA.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_6-thiHBpH1oNCkiqPmFugA-500x297.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_6-thiHBpH1oNCkiqPmFugA-150x89.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_6-thiHBpH1oNCkiqPmFugA-768x456.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph">The regression line shows a hint of an over all downtrend from <strong>2014 </strong>to mid <strong>2016</strong>, except for an uptrend for few months in late <strong>2015</strong>. The trend seems to have gone up in mid to late <strong>2017</strong>, and again we see a downword movement in price differences starting in December of 2017. This can be seen better in the <em>box-plot</em> figure below.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.703125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="tmp[, month_year := format(as.Date(date), &quot;%Y-%m&quot;)]
ggplot(tmp[price_diff > 0], aes(x = month_year, y = price_diff)) + geom_boxplot() + scale_y_continuous(trans='log10') + xlab(&quot;Date (Year-Month)&quot;) + ylab(&quot;Maximum of Price Difference (%)&quot;) + theme_bw() + theme(axis.text.x = element_text(angle = 90, hjust = 1))" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #F8F8F2">tmp[, month_year </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">format</span><span style="color: #F8F8F2">(</span><span style="color: #66D9EF">as.Date</span><span style="color: #F8F8F2">(date), </span><span style="color: #E6DB74">&quot;%Y-%m&quot;</span><span style="color: #F8F8F2">)]</span></span>
<span class="line"><span style="color: #F8F8F2">ggplot(tmp[price_diff </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2">], aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> month_year, </span><span style="color: #FD971F">y</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> price_diff)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_boxplot() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> scale_y_continuous(</span><span style="color: #FD971F">trans</span><span style="color: #F92672">=</span><span style="color: #E6DB74">&#39;log10&#39;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> xlab(</span><span style="color: #E6DB74">&quot;Date (Year-Month)&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Maximum of Price Difference (%)&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme(</span><span style="color: #FD971F">axis.text.x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> element_text(</span><span style="color: #FD971F">angle</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">90</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">hjust</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">))</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="499" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_KKc4nGtHQg6RBCxzoKkaMw.webp" alt="" class="wp-image-4903" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_KKc4nGtHQg6RBCxzoKkaMw.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_KKc4nGtHQg6RBCxzoKkaMw-500x301.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_KKc4nGtHQg6RBCxzoKkaMw-150x90.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_KKc4nGtHQg6RBCxzoKkaMw-768x463.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph" id="60f8">The <em>box-plot</em> figure above shows the variation of maximum differences in prices as a function of time. On the <strong>x-axis</strong> I grouped dates by month since anything less than a one-month period will result in congested figure.</p>



<p class="wp-block-paragraph" id="f970"><strong>Okay</strong>, now let’s find out which of the exchanges contribute the most to these price difference. That is, we are trying to determine which exchanges are constantly selling bitcoin higher, or lower, than the rest of the exchanges. We need to pull some numbers as below, and then we’ll make a bar plot to show the leading exchanges in each category.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:15.395843505859375px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# add two columns to our data table which will contain the minimum and maximum prices
tmp[, `:=`(day_max = max(weighted_price), day_min = min(weighted_price)), by = date]
# now put only the columns we care about in a new data.table
tmp2 <- tmp[price_diff > 0 , .(date, exchange, weighted_price, day_min, day_max)]
# notice how we excluded days with no price difference
# now we only want to keep the rows with the maximum and minimum daily prices
tmp2 <- tmp2[weighted_price == day_min | weighted_price == day_max]
# now we'll add a new column designating the price as being the minimum or maximum
tmp2[, max_min := ifelse(weighted_price == day_min, &quot;min&quot;, &quot;max&quot;)]
# clean the name of the exchange
tmp2[, exchange := gsub(&quot;USD&quot;, &quot;&quot;, exchange)]
# now we'll add a new column containing the exchange name and the min_max column
tmp2[, max_min_exchange := paste(max_min, exchange, sep = &quot;-&quot;)]" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># add two columns to our data table which will contain the minimum and maximum prices</span></span>
<span class="line"><span style="color: #F8F8F2">tmp[, `:=`(</span><span style="color: #FD971F">day_max</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">max</span><span style="color: #F8F8F2">(weighted_price), </span><span style="color: #FD971F">day_min</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">min</span><span style="color: #F8F8F2">(weighted_price)), </span><span style="color: #FD971F">by</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> date]</span></span>
<span class="line"><span style="color: #88846F"># now put only the columns we care about in a new data.table</span></span>
<span class="line"><span style="color: #F8F8F2">tmp2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> tmp[price_diff </span><span style="color: #F92672">></span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">0</span><span style="color: #F8F8F2"> , .(date, exchange, weighted_price, day_min, day_max)]</span></span>
<span class="line"><span style="color: #88846F"># notice how we excluded days with no price difference</span></span>
<span class="line"><span style="color: #88846F"># now we only want to keep the rows with the maximum and minimum daily prices</span></span>
<span class="line"><span style="color: #F8F8F2">tmp2 </span><span style="color: #F92672"><-</span><span style="color: #F8F8F2"> tmp2[weighted_price </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> day_min </span><span style="color: #F92672">|</span><span style="color: #F8F8F2"> weighted_price </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> day_max]</span></span>
<span class="line"><span style="color: #88846F"># now we&#39;ll add a new column designating the price as being the minimum or maximum</span></span>
<span class="line"><span style="color: #F8F8F2">tmp2[, max_min </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">ifelse</span><span style="color: #F8F8F2">(weighted_price </span><span style="color: #F92672">==</span><span style="color: #F8F8F2"> day_min, </span><span style="color: #E6DB74">&quot;min&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;max&quot;</span><span style="color: #F8F8F2">)]</span></span>
<span class="line"><span style="color: #88846F"># clean the name of the exchange</span></span>
<span class="line"><span style="color: #F8F8F2">tmp2[, exchange </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">gsub</span><span style="color: #F8F8F2">(</span><span style="color: #E6DB74">&quot;USD&quot;</span><span style="color: #F8F8F2">, </span><span style="color: #E6DB74">&quot;&quot;</span><span style="color: #F8F8F2">, exchange)]</span></span>
<span class="line"><span style="color: #88846F"># now we&#39;ll add a new column containing the exchange name and the min_max column</span></span>
<span class="line"><span style="color: #F8F8F2">tmp2[, max_min_exchange </span><span style="color: #F92672">:=</span><span style="color: #F8F8F2"> </span><span style="color: #66D9EF">paste</span><span style="color: #F8F8F2">(max_min, exchange, </span><span style="color: #FD971F">sep</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;-&quot;</span><span style="color: #F8F8F2">)]</span></span></code></pre></div>



<p class="wp-block-paragraph">In the above chunk of code we created a new table which contains the maximum and minimum prices for each day. The table also contains a categorical column showing to which exchange this <strong>max/min</strong> price belong, and if the price was a maxima or a minima. Before we plot the table above, let’s have a quick look at it.</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.704864501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="head(tmp2)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #66D9EF">head</span><span style="color: #F8F8F2">(tmp2)</span></span></code></pre></div>



<p class="wp-block-paragraph"></p>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="251" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_yKe1-J73s4kQWcCYbVKIgw.webp" alt="" class="wp-image-4904" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_yKe1-J73s4kQWcCYbVKIgw.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_yKe1-J73s4kQWcCYbVKIgw-500x152.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_yKe1-J73s4kQWcCYbVKIgw-150x45.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_yKe1-J73s4kQWcCYbVKIgw-768x233.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph" id="f700">The <strong>max_min_exchange</strong> column contains all the data we need, so let’s make a <strong>barplot </strong>of this variable, we’ll color the <strong>barplot </strong>by the <strong>max_min</strong> criteria shown in <strong>max_min</strong> column</p>



<div class="wp-block-kevinbatdorf-code-block-pro cbp-has-line-numbers" style="font-size:.875rem;--cbp-line-number-color:#F8F8F2;--cbp-line-number-width:7.70489501953125px;line-height:1.25rem"><span style="display:block;padding:16px 0 0 16px;margin-bottom:-1px;width:100%;text-align:left;background-color:#272822"><svg xmlns="http://www.w3.org/2000/svg" width="54" height="14" viewBox="0 0 54 14"><g fill="none" fill-rule="evenodd" transform="translate(1 1)"><circle cx="6" cy="6" r="6" fill="#FF5F56" stroke="#E0443E" stroke-width=".5"></circle><circle cx="26" cy="6" r="6" fill="#FFBD2E" stroke="#DEA123" stroke-width=".5"></circle><circle cx="46" cy="6" r="6" fill="#27C93F" stroke="#1AAB29" stroke-width=".5"></circle></g></svg></span><span role="button" tabindex="0" data-code="# now make a barplot 
ggplot(tmp2, aes(x = max_min_exchange, fill = max_min)) + geom_bar() + theme_bw() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + ggtitle(&quot;Exchanges with the highest and lowest price differences in Bitcoin&quot;) + ylab(&quot;Frequency&quot;) + xlab(&quot;Exchange&quot;) + scale_fill_discrete(name = &quot;BTC Price Diff. Type&quot;)" style="color:#F8F8F2;display:none" aria-label="Copy" class="code-block-pro-copy-button"><svg xmlns="http://www.w3.org/2000/svg" style="width:24px;height:24px" fill="none" viewBox="0 0 24 24" stroke="currentColor" stroke-width="2"><path class="with-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2m-6 9l2 2 4-4"></path><path class="without-check" stroke-linecap="round" stroke-linejoin="round" d="M9 5H7a2 2 0 00-2 2v12a2 2 0 002 2h10a2 2 0 002-2V7a2 2 0 00-2-2h-2M9 5a2 2 0 002 2h2a2 2 0 002-2M9 5a2 2 0 012-2h2a2 2 0 012 2"></path></svg></span><pre class="shiki" style="background-color: #272822"><code><span class="line"><span style="color: #88846F"># now make a barplot </span></span>
<span class="line"><span style="color: #F8F8F2">ggplot(tmp2, aes(</span><span style="color: #FD971F">x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> max_min_exchange, </span><span style="color: #FD971F">fill</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> max_min)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> geom_bar() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme_bw() </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> theme(</span><span style="color: #FD971F">axis.text.x</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> element_text(</span><span style="color: #FD971F">angle</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">90</span><span style="color: #F8F8F2">, </span><span style="color: #FD971F">hjust</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #AE81FF">1</span><span style="color: #F8F8F2">)) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ggtitle(</span><span style="color: #E6DB74">&quot;Exchanges with the highest and lowest price differences in Bitcoin&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> ylab(</span><span style="color: #E6DB74">&quot;Frequency&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> xlab(</span><span style="color: #E6DB74">&quot;Exchange&quot;</span><span style="color: #F8F8F2">) </span><span style="color: #F92672">+</span><span style="color: #F8F8F2"> scale_fill_discrete(</span><span style="color: #FD971F">name</span><span style="color: #F8F8F2"> </span><span style="color: #F92672">=</span><span style="color: #F8F8F2"> </span><span style="color: #E6DB74">&quot;BTC Price Diff. Type&quot;</span><span style="color: #F8F8F2">)</span></span></code></pre></div>


<div class="wp-block-image">
<figure class="aligncenter size-full"><img loading="lazy" decoding="async" width="828" height="500" loading="lazy" src="https://analyticadss.com/wp-content/uploads/2022/12/1_L0E0OJPfWfCxcc1DKzVmiw.webp" alt="" class="wp-image-4905" srcset="https://analyticadss.com/wp-content/uploads/2022/12/1_L0E0OJPfWfCxcc1DKzVmiw.webp 828w, https://analyticadss.com/wp-content/uploads/2022/12/1_L0E0OJPfWfCxcc1DKzVmiw-500x302.webp 500w, https://analyticadss.com/wp-content/uploads/2022/12/1_L0E0OJPfWfCxcc1DKzVmiw-150x91.webp 150w, https://analyticadss.com/wp-content/uploads/2022/12/1_L0E0OJPfWfCxcc1DKzVmiw-768x464.webp 768w" sizes="auto, (max-width: 828px) 100vw, 828px" /></figure>
</div>


<p class="wp-block-paragraph" id="8183">This is interesting, <strong>Kraken</strong> seems to be the exchange with the most frequent highest prices for <strong>bitcoin</strong>. On the other hand, Bitstamp seems to be the one with the most frequent lowest prices among exchanges. So if you want to do <a href="https://www.investopedia.com/terms/a/arbitrage.asp" target="_blank" rel="noreferrer noopener">arbitrage</a> your best bit is to buy on <strong>Bitstamp </strong>and sell on <strong>Kraken</strong>.</p>



<p class="wp-block-paragraph"><a href="https://medium.com/tag/bitcoin?source=post_page-----9e0d1bff7c63---------------bitcoin-----------------"></a></p>



<p class="wp-block-paragraph">Read More blogs in AnalyticaDSS Blogs here : <a href="https://analyticadss.com/blog">BLOGS</a></p>



<p class="wp-block-paragraph">Read More blogs in Medium : <a href="https://medium.com/@aousabdo">Medium Blogs</a></p>



<p class="wp-block-paragraph">Read More blogs in R-bloggers : <a href="https://www.r-bloggers.com/">https://www.r-bloggers.com</a></p>
<p>The post <a href="https://analyticadss.com/analyzing-cryptocurrency-markets-using-r-part-1/">Analyzing Crypto Markets using R — Part 1</a> appeared first on <a href="https://analyticadss.com">Analytica Data Science Solutions</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
